gogo2/REAL_MARKET_DATA_POLICY.md
2025-05-24 02:42:11 +03:00

3.9 KiB

REAL MARKET DATA POLICY

CRITICAL REQUIREMENT: ONLY REAL MARKET DATA

This trading system is designed to work EXCLUSIVELY with real market data from cryptocurrency exchanges. NO SYNTHETIC, GENERATED, OR SIMULATED DATA IS ALLOWED for training, testing, or inference.

Policy Statement

ALLOWED DATA SOURCES

  • Binance API: Real-time and historical OHLCV data
  • Other Exchange APIs: Real market data from legitimate exchanges
  • Cached Real Data: Previously fetched real market data stored locally
  • TimescaleDB: Real market data stored in time-series database

PROHIBITED DATA SOURCES

  • Synthetic data generation
  • Random data generation
  • Simulated market conditions
  • Artificial price movements
  • Generated technical indicators
  • Mock data for testing

Implementation Guidelines

1. Data Provider (core/data_provider.py)

  • Only fetches data from real exchange APIs
  • Caches real data for performance
  • Never generates or synthesizes data
  • Validates data authenticity

2. CNN Training (models/cnn/scalping_cnn.py)

  • ScalpingDataGenerator only uses real market data
  • Dynamic feature detection from actual market data
  • Training samples generated from real price movements
  • Labels based on actual future price changes

3. RL Training (models/rl/scalping_agent.py)

  • Environment uses real historical data for backtesting
  • State representations from real market conditions
  • Reward functions based on actual trading outcomes
  • No simulated market scenarios

4. Configuration (config.yaml)

training:
  use_only_real_data: true  # CRITICAL: Never use synthetic/generated data

Verification Checklist

Before any training or testing session, verify:

  • Data source is a legitimate exchange API
  • No data generation functions are called
  • All training samples come from real market history
  • Cache contains only real market data
  • No synthetic indicators or features

Code Examples

CORRECT: Using Real Data

# Fetch real market data
df = self.data_provider.get_historical_data(symbol, timeframe, limit=1000, refresh=False)

# Generate training cases from real data
features, labels = self.data_generator.generate_training_cases(
    symbol, timeframes, num_samples=10000
)

INCORRECT: Generating Data

# NEVER DO THIS
synthetic_data = generate_synthetic_market_data()
random_prices = np.random.normal(100, 10, 1000)
simulated_candles = create_fake_ohlcv_data()

Logging and Monitoring

All data operations must log their source:

2025-05-24 02:36:16,674 - models.cnn.scalping_cnn - INFO - Generating 10000 training cases for ETH/USDT from REAL market data
2025-05-24 02:36:17,366 - models.cnn.scalping_cnn - INFO - Loaded 1000 real candles for ETH/USDT 1s

Testing Guidelines

Unit Tests

  • Test with small samples of real data
  • Use cached real data for reproducibility
  • Never create mock market data

Integration Tests

  • Use real API endpoints (with rate limiting)
  • Validate data authenticity
  • Test with multiple timeframes and symbols

Performance Tests

  • Benchmark with real market data volumes
  • Test memory usage with actual feature counts
  • Validate processing speed with real data complexity

Emergency Procedures

If synthetic data is accidentally introduced:

  1. STOP all training immediately
  2. PURGE any models trained with synthetic data
  3. VERIFY data sources and pipelines
  4. RETRAIN from scratch with verified real data
  5. DOCUMENT the incident and prevention measures

Compliance Verification

Regular audits must verify:

  • Data source authenticity
  • Training pipeline integrity
  • Model performance on real data
  • Cache content validation

Contact and Escalation

Any questions about data authenticity should be escalated immediately. When in doubt, ALWAYS choose real market data over convenience.


Remember: The integrity of our trading system depends on using only real market data. No exceptions.