gogo2/CNN_TESTING_GUIDE.md
Dobromir Popov 310f3c5bf9 wip
2025-05-24 09:59:11 +03:00

196 lines
6.6 KiB
Markdown

# CNN Testing & Backtest Guide
## 📊 **CNN Test Cases and Training Data Location**
### **1. Test Scripts**
#### **Quick CNN Test (`test_cnn_only.py`)**
- **Purpose**: Fast CNN validation with real market data
- **Location**: `/test_cnn_only.py`
- **Test Configuration**:
- Symbols: `['ETH/USDT']`
- Timeframes: `['1m', '5m', '1h']`
- Samples: `500` (for quick testing)
- Epochs: `2`
- Batch size: `16`
- **Data Source**: **Real Binance API data only**
- **Output**: `test_models/quick_cnn.pt`
#### **Comprehensive Training Test (`test_training.py`)**
- **Purpose**: Full training pipeline validation
- **Location**: `/test_training.py`
- **Functions**:
- `test_cnn_training()` - Complete CNN training test
- `test_rl_training()` - RL training validation
- **Output**: `test_models/test_cnn.pt`
### **2. Test Model Storage**
#### **Directory**: `/test_models/`
- **quick_cnn.pt** (586KB) - Latest quick test model
- **quick_cnn_best.pt** (587KB) - Best performing quick test model
- **regular_save.pt** (384MB) - Full-size training model
- **robust_save.pt** (17KB) - Optimized lightweight model
- **backup models** - Automatic backups with `.backup` extension
### **3. Training Data Sources**
#### **Real Market Data (Primary)**
- **Exchange**: Binance API
- **Symbols**: ETH/USDT, BTC/USDT, etc.
- **Timeframes**: 1s, 1m, 5m, 15m, 1h, 4h, 1d
- **Features**: 48 technical indicators calculated from real OHLCV data
- **Storage**: Cached in `/cache/` directory
- **Format**: JSON files with tick-by-tick and aggregated candle data
#### **Feature Matrix Structure**
```python
# Multi-timeframe feature matrix: (timeframes, window_size, features)
feature_matrix.shape = (4, 20, 48) # 4 timeframes, 20 steps, 48 features
# 48 Features include:
features = [
'ad_line', 'adx', 'adx_neg', 'adx_pos', 'atr',
'bb_lower', 'bb_middle', 'bb_percent', 'bb_upper', 'bb_width',
'close', 'ema_12', 'ema_26', 'ema_50', 'high',
'keltner_lower', 'keltner_middle', 'keltner_upper', 'low',
'macd', 'macd_histogram', 'macd_signal', 'mfi', 'momentum_composite',
'obv', 'open', 'price_position', 'psar', 'roc',
'rsi_14', 'rsi_21', 'rsi_7', 'sma_10', 'sma_20', 'sma_50',
'stoch_d', 'stoch_k', 'trend_strength', 'true_range', 'ultimate_osc',
'volatility_regime', 'volume', 'volume_sma_10', 'volume_sma_20',
'volume_sma_50', 'vpt', 'vwap', 'williams_r'
]
```
### **4. Test Case Categories**
#### **Unit Tests**
- **Quick validation**: 500 samples, 2 epochs
- **Performance benchmarks**: Speed and accuracy metrics
- **Memory usage**: Resource consumption monitoring
#### **Integration Tests**
- **Full pipeline**: Data loading → Feature engineering → Training → Evaluation
- **Multi-symbol**: Testing across different cryptocurrency pairs
- **Multi-timeframe**: Validation across various time horizons
#### **Backtesting**
- **Historical performance**: Using past market data for validation
- **Walk-forward testing**: Progressive training on expanding datasets
- **Out-of-sample validation**: Testing on unseen data periods
### **5. VSCode Launch Configurations**
#### **Quick CNN Test**
```json
{
"name": "Quick CNN Test (Real Data + TensorBoard)",
"program": "test_cnn_only.py",
"env": {"PYTHONUNBUFFERED": "1"}
}
```
#### **Realtime RL Training with Monitoring**
```json
{
"name": "Realtime RL Training + TensorBoard + Web UI",
"program": "train_realtime_with_tensorboard.py",
"args": ["--episodes", "50", "--symbol", "ETH/USDT", "--web-port", "8051"]
}
```
### **6. Test Execution Commands**
#### **Quick CNN Test**
```bash
# Run quick CNN validation
python test_cnn_only.py
# Monitor training progress
tensorboard --logdir=runs
# Expected output:
# ✅ CNN Training completed!
# Best accuracy: 0.4600
# Total epochs: 2
# Training time: 0.61s
# TensorBoard logs: runs/cnn_training_1748043814
```
#### **Comprehensive Training Test**
```bash
# Run full training pipeline test
python test_training.py
# Monitor multiple training modes
tensorboard --logdir=runs
```
### **7. Test Data Validation**
#### **Real Market Data Policy**
-**No Synthetic Data**: All training uses authentic exchange data
-**Live API**: Direct connection to Binance for real-time prices
-**Multi-timeframe**: Consistent data across all time horizons
-**Technical Indicators**: Calculated from real OHLCV values
#### **Data Quality Checks**
- **Completeness**: Verifying all required timeframes have data
- **Consistency**: Cross-timeframe data alignment validation
- **Freshness**: Ensuring recent market data availability
- **Feature integrity**: Validating all 48 technical indicators
### **8. TensorBoard Monitoring**
#### **CNN Training Metrics**
- `Training/Loss` - Neural network training loss
- `Training/Accuracy` - Model prediction accuracy
- `Validation/Loss` - Validation dataset loss
- `Validation/Accuracy` - Out-of-sample accuracy
- `Best/ValidationAccuracy` - Best model performance
- `Data/InputShape` - Feature matrix dimensions
- `Model/TotalParams` - Neural network parameters
#### **Access URLs**
- **TensorBoard**: http://localhost:6006
- **Web Dashboard**: http://localhost:8051
- **Training Logs**: `/runs/` directory
### **9. Best Practices**
#### **Quick Testing**
1. **Start small**: Use `test_cnn_only.py` for fast validation
2. **Monitor metrics**: Keep TensorBoard open during training
3. **Check outputs**: Verify model files are created in `test_models/`
4. **Validate accuracy**: Ensure model performance meets expectations
#### **Production Training**
1. **Use full datasets**: Scale up sample sizes for production models
2. **Multi-symbol training**: Train on multiple cryptocurrency pairs
3. **Extended timeframes**: Include longer-term patterns
4. **Comprehensive validation**: Use walk-forward and out-of-sample testing
### **10. Troubleshooting**
#### **Common Issues**
- **Memory errors**: Reduce batch size or sample count
- **Data loading failures**: Check internet connection and API access
- **Feature mismatches**: Verify all timeframes have consistent data
- **TensorBoard not updating**: Restart TensorBoard after training starts
#### **Debug Commands**
```bash
# Check training status
python monitor_training.py
# Validate data availability
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_historical_data('ETH/USDT', '1m').shape)"
# Test feature generation
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_feature_matrix('ETH/USDT', ['1m', '5m', '1h'], 20).shape)"
```
---
**🔥 All CNN training and testing uses REAL market data from cryptocurrency exchanges. No synthetic or simulated data is used anywhere in the system.**