4.2 KiB
4.2 KiB
REAL MARKET DATA POLICY
CRITICAL REQUIREMENT: ONLY REAL MARKET DATA
This trading system is designed to work EXCLUSIVELY with real market data from cryptocurrency exchanges. NO SYNTHETIC, GENERATED, OR SIMULATED DATA IS ALLOWED for training, testing, or inference.
Policy Statement
✅ ALLOWED DATA SOURCES
- Binance API: Real-time and historical OHLCV data
- Other Exchange APIs: Real market data from legitimate exchanges
- Cached Real Data: Previously fetched real market data stored locally
- TimescaleDB: Real market data stored in time-series database
❌ PROHIBITED DATA SOURCES
- Synthetic data generation
- Random data generation
- Simulated market conditions
- Artificial price movements
- Generated technical indicators
- Mock data for testing
Implementation Guidelines
1. Data Provider (core/data_provider.py
)
- Only fetches data from real exchange APIs
- Caches real data for performance
- Never generates or synthesizes data
- Validates data authenticity
2. CNN Training (models/cnn/scalping_cnn.py
)
ScalpingDataGenerator
only uses real market data- Dynamic feature detection from actual market data
- Training samples generated from real price movements
- Labels based on actual future price changes
3. RL Training (models/rl/scalping_agent.py
)
- Environment uses real historical data for backtesting
- State representations from real market conditions
- Reward functions based on actual trading outcomes
- No simulated market scenarios
4. Configuration (config.yaml
)
training:
use_only_real_data: true # CRITICAL: Never use synthetic/generated data
Verification Checklist
Before any training or testing session, verify:
- Data source is a legitimate exchange API
- No data generation functions are called
- All training samples come from real market history
- Cache contains only real market data
- No synthetic indicators or features
Code Examples
✅ CORRECT: Using Real Data
# Fetch real market data
df = self.data_provider.get_historical_data(symbol, timeframe, limit=1000, refresh=False)
# Generate training cases from real data
features, labels = self.data_generator.generate_training_cases(
symbol, timeframes, num_samples=10000
)
Logging and Monitoring
All data operations must log their source:
2025-05-24 02:36:16,674 - models.cnn.scalping_cnn - INFO - Generating 10000 training cases for ETH/USDT from REAL market data
2025-05-24 02:36:17,366 - models.cnn.scalping_cnn - INFO - Loaded 1000 real candles for ETH/USDT 1s
Testing Guidelines
Unit Tests
- Test with small samples of real data
- Use cached real data for reproducibility
- Never create mock market data
Integration Tests
- Use real API endpoints (with rate limiting)
- Validate data authenticity
- Test with multiple timeframes and symbols
Performance Tests
- Benchmark with real market data volumes
- Test memory usage with actual feature counts
- Validate processing speed with real data complexity
Emergency Procedures
If synthetic data is accidentally introduced:
- STOP all training immediately
- PURGE any models trained with synthetic data
- VERIFY data sources and pipelines
- RETRAIN from scratch with verified real data
- DOCUMENT the incident and prevention measures
Compliance Verification
Regular audits must verify:
- Data source authenticity
- Training pipeline integrity
- Model performance on real data
- Cache content validation
Contact and Escalation
Any questions about data authenticity should be escalated immediately. When in doubt, ALWAYS choose real market data over convenience.
Remember: The integrity of our trading system depends on using only real market data. No exceptions.
❌ EXAMPLES OF FORBIDDEN OPERATIONS
Code Patterns to NEVER Use:
# ❌ FORBIDDEN EXAMPLES - DO NOT IMPLEMENT
# These patterns are STRICTLY FORBIDDEN:
# - Any random data generation
# - Any synthetic price creation
# - Any mock trading data
# - Any simulated market scenarios
# ✅ ONLY ALLOWED: Real market data from exchanges
real_data = binance_client.get_historical_klines(symbol, interval, limit)
live_price = binance_client.get_ticker_price(symbol)