# REAL MARKET DATA POLICY ## CRITICAL REQUIREMENT: ONLY REAL MARKET DATA This trading system is designed to work EXCLUSIVELY with real market data from cryptocurrency exchanges. **NO SYNTHETIC, GENERATED, OR SIMULATED DATA IS ALLOWED** for training, testing, or inference. ## Policy Statement ### ✅ ALLOWED DATA SOURCES - **Binance API**: Real-time and historical OHLCV data - **Other Exchange APIs**: Real market data from legitimate exchanges - **Cached Real Data**: Previously fetched real market data stored locally - **TimescaleDB**: Real market data stored in time-series database ### ❌ PROHIBITED DATA SOURCES - Synthetic data generation - Random data generation - Simulated market conditions - Artificial price movements - Generated technical indicators - Mock data for testing ## Implementation Guidelines ### 1. Data Provider (`core/data_provider.py`) - Only fetches data from real exchange APIs - Caches real data for performance - Never generates or synthesizes data - Validates data authenticity ### 2. CNN Training (`models/cnn/scalping_cnn.py`) - `ScalpingDataGenerator` only uses real market data - Dynamic feature detection from actual market data - Training samples generated from real price movements - Labels based on actual future price changes ### 3. RL Training (`models/rl/scalping_agent.py`) - Environment uses real historical data for backtesting - State representations from real market conditions - Reward functions based on actual trading outcomes - No simulated market scenarios ### 4. Configuration (`config.yaml`) ```yaml training: use_only_real_data: true # CRITICAL: Never use synthetic/generated data ``` ## Verification Checklist Before any training or testing session, verify: - [ ] Data source is a legitimate exchange API - [ ] No data generation functions are called - [ ] All training samples come from real market history - [ ] Cache contains only real market data - [ ] No synthetic indicators or features ## Code Examples ### ✅ CORRECT: Using Real Data ```python # Fetch real market data df = self.data_provider.get_historical_data(symbol, timeframe, limit=1000, refresh=False) # Generate training cases from real data features, labels = self.data_generator.generate_training_cases( symbol, timeframes, num_samples=10000 ) ``` ## Logging and Monitoring All data operations must log their source: ``` 2025-05-24 02:36:16,674 - models.cnn.scalping_cnn - INFO - Generating 10000 training cases for ETH/USDT from REAL market data 2025-05-24 02:36:17,366 - models.cnn.scalping_cnn - INFO - Loaded 1000 real candles for ETH/USDT 1s ``` ## Testing Guidelines ### Unit Tests - Test with small samples of real data - Use cached real data for reproducibility - Never create mock market data ### Integration Tests - Use real API endpoints (with rate limiting) - Validate data authenticity - Test with multiple timeframes and symbols ### Performance Tests - Benchmark with real market data volumes - Test memory usage with actual feature counts - Validate processing speed with real data complexity ## Emergency Procedures If synthetic data is accidentally introduced: 1. **STOP** all training immediately 2. **PURGE** any models trained with synthetic data 3. **VERIFY** data sources and pipelines 4. **RETRAIN** from scratch with verified real data 5. **DOCUMENT** the incident and prevention measures ## Compliance Verification Regular audits must verify: - Data source authenticity - Training pipeline integrity - Model performance on real data - Cache content validation ## Contact and Escalation Any questions about data authenticity should be escalated immediately. When in doubt, **ALWAYS** choose real market data over convenience. --- **Remember: The integrity of our trading system depends on using only real market data. No exceptions.** ## ❌ **EXAMPLES OF FORBIDDEN OPERATIONS** ### **Code Patterns to NEVER Use:** ```python # ❌ FORBIDDEN EXAMPLES - DO NOT IMPLEMENT # These patterns are STRICTLY FORBIDDEN: # - Any random data generation # - Any synthetic price creation # - Any mock trading data # - Any simulated market scenarios # ✅ ONLY ALLOWED: Real market data from exchanges real_data = binance_client.get_historical_klines(symbol, interval, limit) live_price = binance_client.get_ticker_price(symbol) ```