139 lines
4.2 KiB
Markdown
139 lines
4.2 KiB
Markdown
# REAL MARKET DATA POLICY
|
|
|
|
## CRITICAL REQUIREMENT: ONLY REAL MARKET DATA
|
|
|
|
This trading system is designed to work EXCLUSIVELY with real market data from cryptocurrency exchanges. **NO SYNTHETIC, GENERATED, OR SIMULATED DATA IS ALLOWED** for training, testing, or inference.
|
|
|
|
## Policy Statement
|
|
|
|
### ✅ ALLOWED DATA SOURCES
|
|
- **Binance API**: Real-time and historical OHLCV data
|
|
- **Other Exchange APIs**: Real market data from legitimate exchanges
|
|
- **Cached Real Data**: Previously fetched real market data stored locally
|
|
- **TimescaleDB**: Real market data stored in time-series database
|
|
|
|
### ❌ PROHIBITED DATA SOURCES
|
|
- Synthetic data generation
|
|
- Random data generation
|
|
- Simulated market conditions
|
|
- Artificial price movements
|
|
- Generated technical indicators
|
|
- Mock data for testing
|
|
|
|
## Implementation Guidelines
|
|
|
|
### 1. Data Provider (`core/data_provider.py`)
|
|
- Only fetches data from real exchange APIs
|
|
- Caches real data for performance
|
|
- Never generates or synthesizes data
|
|
- Validates data authenticity
|
|
|
|
### 2. CNN Training (`models/cnn/scalping_cnn.py`)
|
|
- `ScalpingDataGenerator` only uses real market data
|
|
- Dynamic feature detection from actual market data
|
|
- Training samples generated from real price movements
|
|
- Labels based on actual future price changes
|
|
|
|
### 3. RL Training (`models/rl/scalping_agent.py`)
|
|
- Environment uses real historical data for backtesting
|
|
- State representations from real market conditions
|
|
- Reward functions based on actual trading outcomes
|
|
- No simulated market scenarios
|
|
|
|
### 4. Configuration (`config.yaml`)
|
|
```yaml
|
|
training:
|
|
use_only_real_data: true # CRITICAL: Never use synthetic/generated data
|
|
```
|
|
|
|
## Verification Checklist
|
|
|
|
Before any training or testing session, verify:
|
|
|
|
- [ ] Data source is a legitimate exchange API
|
|
- [ ] No data generation functions are called
|
|
- [ ] All training samples come from real market history
|
|
- [ ] Cache contains only real market data
|
|
- [ ] No synthetic indicators or features
|
|
|
|
## Code Examples
|
|
|
|
### ✅ CORRECT: Using Real Data
|
|
```python
|
|
# Fetch real market data
|
|
df = self.data_provider.get_historical_data(symbol, timeframe, limit=1000, refresh=False)
|
|
|
|
# Generate training cases from real data
|
|
features, labels = self.data_generator.generate_training_cases(
|
|
symbol, timeframes, num_samples=10000
|
|
)
|
|
```
|
|
|
|
## Logging and Monitoring
|
|
|
|
All data operations must log their source:
|
|
```
|
|
2025-05-24 02:36:16,674 - models.cnn.scalping_cnn - INFO - Generating 10000 training cases for ETH/USDT from REAL market data
|
|
2025-05-24 02:36:17,366 - models.cnn.scalping_cnn - INFO - Loaded 1000 real candles for ETH/USDT 1s
|
|
```
|
|
|
|
## Testing Guidelines
|
|
|
|
### Unit Tests
|
|
- Test with small samples of real data
|
|
- Use cached real data for reproducibility
|
|
- Never create mock market data
|
|
|
|
### Integration Tests
|
|
- Use real API endpoints (with rate limiting)
|
|
- Validate data authenticity
|
|
- Test with multiple timeframes and symbols
|
|
|
|
### Performance Tests
|
|
- Benchmark with real market data volumes
|
|
- Test memory usage with actual feature counts
|
|
- Validate processing speed with real data complexity
|
|
|
|
## Emergency Procedures
|
|
|
|
If synthetic data is accidentally introduced:
|
|
|
|
1. **STOP** all training immediately
|
|
2. **PURGE** any models trained with synthetic data
|
|
3. **VERIFY** data sources and pipelines
|
|
4. **RETRAIN** from scratch with verified real data
|
|
5. **DOCUMENT** the incident and prevention measures
|
|
|
|
## Compliance Verification
|
|
|
|
Regular audits must verify:
|
|
- Data source authenticity
|
|
- Training pipeline integrity
|
|
- Model performance on real data
|
|
- Cache content validation
|
|
|
|
## Contact and Escalation
|
|
|
|
Any questions about data authenticity should be escalated immediately. When in doubt, **ALWAYS** choose real market data over convenience.
|
|
|
|
---
|
|
|
|
**Remember: The integrity of our trading system depends on using only real market data. No exceptions.**
|
|
|
|
## ❌ **EXAMPLES OF FORBIDDEN OPERATIONS**
|
|
|
|
### **Code Patterns to NEVER Use:**
|
|
|
|
```python
|
|
# ❌ FORBIDDEN EXAMPLES - DO NOT IMPLEMENT
|
|
|
|
# These patterns are STRICTLY FORBIDDEN:
|
|
# - Any random data generation
|
|
# - Any synthetic price creation
|
|
# - Any mock trading data
|
|
# - Any simulated market scenarios
|
|
|
|
# ✅ ONLY ALLOWED: Real market data from exchanges
|
|
real_data = binance_client.get_historical_klines(symbol, interval, limit)
|
|
live_price = binance_client.get_ticker_price(symbol)
|
|
``` |