gogo2/REAL_MARKET_DATA_POLICY.md
2025-05-25 00:28:52 +03:00

139 lines
4.2 KiB
Markdown

# REAL MARKET DATA POLICY
## CRITICAL REQUIREMENT: ONLY REAL MARKET DATA
This trading system is designed to work EXCLUSIVELY with real market data from cryptocurrency exchanges. **NO SYNTHETIC, GENERATED, OR SIMULATED DATA IS ALLOWED** for training, testing, or inference.
## Policy Statement
### ✅ ALLOWED DATA SOURCES
- **Binance API**: Real-time and historical OHLCV data
- **Other Exchange APIs**: Real market data from legitimate exchanges
- **Cached Real Data**: Previously fetched real market data stored locally
- **TimescaleDB**: Real market data stored in time-series database
### ❌ PROHIBITED DATA SOURCES
- Synthetic data generation
- Random data generation
- Simulated market conditions
- Artificial price movements
- Generated technical indicators
- Mock data for testing
## Implementation Guidelines
### 1. Data Provider (`core/data_provider.py`)
- Only fetches data from real exchange APIs
- Caches real data for performance
- Never generates or synthesizes data
- Validates data authenticity
### 2. CNN Training (`models/cnn/scalping_cnn.py`)
- `ScalpingDataGenerator` only uses real market data
- Dynamic feature detection from actual market data
- Training samples generated from real price movements
- Labels based on actual future price changes
### 3. RL Training (`models/rl/scalping_agent.py`)
- Environment uses real historical data for backtesting
- State representations from real market conditions
- Reward functions based on actual trading outcomes
- No simulated market scenarios
### 4. Configuration (`config.yaml`)
```yaml
training:
use_only_real_data: true # CRITICAL: Never use synthetic/generated data
```
## Verification Checklist
Before any training or testing session, verify:
- [ ] Data source is a legitimate exchange API
- [ ] No data generation functions are called
- [ ] All training samples come from real market history
- [ ] Cache contains only real market data
- [ ] No synthetic indicators or features
## Code Examples
### ✅ CORRECT: Using Real Data
```python
# Fetch real market data
df = self.data_provider.get_historical_data(symbol, timeframe, limit=1000, refresh=False)
# Generate training cases from real data
features, labels = self.data_generator.generate_training_cases(
symbol, timeframes, num_samples=10000
)
```
## Logging and Monitoring
All data operations must log their source:
```
2025-05-24 02:36:16,674 - models.cnn.scalping_cnn - INFO - Generating 10000 training cases for ETH/USDT from REAL market data
2025-05-24 02:36:17,366 - models.cnn.scalping_cnn - INFO - Loaded 1000 real candles for ETH/USDT 1s
```
## Testing Guidelines
### Unit Tests
- Test with small samples of real data
- Use cached real data for reproducibility
- Never create mock market data
### Integration Tests
- Use real API endpoints (with rate limiting)
- Validate data authenticity
- Test with multiple timeframes and symbols
### Performance Tests
- Benchmark with real market data volumes
- Test memory usage with actual feature counts
- Validate processing speed with real data complexity
## Emergency Procedures
If synthetic data is accidentally introduced:
1. **STOP** all training immediately
2. **PURGE** any models trained with synthetic data
3. **VERIFY** data sources and pipelines
4. **RETRAIN** from scratch with verified real data
5. **DOCUMENT** the incident and prevention measures
## Compliance Verification
Regular audits must verify:
- Data source authenticity
- Training pipeline integrity
- Model performance on real data
- Cache content validation
## Contact and Escalation
Any questions about data authenticity should be escalated immediately. When in doubt, **ALWAYS** choose real market data over convenience.
---
**Remember: The integrity of our trading system depends on using only real market data. No exceptions.**
## ❌ **EXAMPLES OF FORBIDDEN OPERATIONS**
### **Code Patterns to NEVER Use:**
```python
# ❌ FORBIDDEN EXAMPLES - DO NOT IMPLEMENT
# These patterns are STRICTLY FORBIDDEN:
# - Any random data generation
# - Any synthetic price creation
# - Any mock trading data
# - Any simulated market scenarios
# ✅ ONLY ALLOWED: Real market data from exchanges
real_data = binance_client.get_historical_klines(symbol, interval, limit)
live_price = binance_client.get_ticker_price(symbol)
```