gogo2/CNN_TESTING_GUIDE.md

# CNN Testing & Backtest Guide

## 📊 **CNN Test Cases and Training Data Location**

### **1. Test Scripts**

#### **Quick CNN Test (`test_cnn_only.py`)**
- **Purpose**: Fast CNN validation with real market data
- **Location**: `/test_cnn_only.py`
- **Test Configuration**:
  - Symbols: `['ETH/USDT']`
  - Timeframes: `['1m', '5m', '1h']`
  - Samples: `500` (for quick testing)
  - Epochs: `2`
  - Batch size: `16`
- **Data Source**: **Real Binance API data only**
- **Output**: `test_models/quick_cnn.pt`

#### **Comprehensive Training Test (`test_training.py`)**
- **Purpose**: Full training pipeline validation
- **Location**: `/test_training.py`
- **Functions**:
  - `test_cnn_training()` - Complete CNN training test
  - `test_rl_training()` - RL training validation
- **Output**: `test_models/test_cnn.pt`

### **2. Test Model Storage**

#### **Directory**: `/test_models/`
- **quick_cnn.pt** (586KB) - Latest quick test model
- **quick_cnn_best.pt** (587KB) - Best performing quick test model
- **regular_save.pt** (384MB) - Full-size training model
- **robust_save.pt** (17KB) - Optimized lightweight model
- **backup models** - Automatic backups with `.backup` extension

### **3. Training Data Sources**

#### **Real Market Data (Primary)**
- **Exchange**: Binance API
- **Symbols**: ETH/USDT, BTC/USDT, etc.
- **Timeframes**: 1s, 1m, 5m, 15m, 1h, 4h, 1d
- **Features**: 48 technical indicators calculated from real OHLCV data
- **Storage**: Cached in `/cache/` directory
- **Format**: JSON files with tick-by-tick and aggregated candle data

#### **Feature Matrix Structure**
```python
# Multi-timeframe feature matrix: (timeframes, window_size, features)
feature_matrix.shape = (4, 20, 48)  # 4 timeframes, 20 steps, 48 features

# 48 Features include:
features = [
    'ad_line', 'adx', 'adx_neg', 'adx_pos', 'atr',
    'bb_lower', 'bb_middle', 'bb_percent', 'bb_upper', 'bb_width',
    'close', 'ema_12', 'ema_26', 'ema_50', 'high',
    'keltner_lower', 'keltner_middle', 'keltner_upper', 'low',
    'macd', 'macd_histogram', 'macd_signal', 'mfi', 'momentum_composite',
    'obv', 'open', 'price_position', 'psar', 'roc',
    'rsi_14', 'rsi_21', 'rsi_7', 'sma_10', 'sma_20', 'sma_50',
    'stoch_d', 'stoch_k', 'trend_strength', 'true_range', 'ultimate_osc',
    'volatility_regime', 'volume', 'volume_sma_10', 'volume_sma_20',
    'volume_sma_50', 'vpt', 'vwap', 'williams_r'
]
```

### **4. Test Case Categories**

#### **Unit Tests**
- **Quick validation**: 500 samples, 2 epochs
- **Performance benchmarks**: Speed and accuracy metrics
- **Memory usage**: Resource consumption monitoring

#### **Integration Tests**
- **Full pipeline**: Data loading → Feature engineering → Training → Evaluation
- **Multi-symbol**: Testing across different cryptocurrency pairs
- **Multi-timeframe**: Validation across various time horizons

#### **Backtesting**
- **Historical performance**: Using past market data for validation
- **Walk-forward testing**: Progressive training on expanding datasets
- **Out-of-sample validation**: Testing on unseen data periods

### **5. VSCode Launch Configurations**

#### **Quick CNN Test**
```json
{
    "name": "Quick CNN Test (Real Data + TensorBoard)",
    "program": "test_cnn_only.py",
    "env": {"PYTHONUNBUFFERED": "1"}
}
```

#### **Realtime RL Training with Monitoring**
```json
{
    "name": "Realtime RL Training + TensorBoard + Web UI",
    "program": "train_realtime_with_tensorboard.py",
    "args": ["--episodes", "50", "--symbol", "ETH/USDT", "--web-port", "8051"]
}
```

### **6. Test Execution Commands**

#### **Quick CNN Test**
```bash
# Run quick CNN validation
python test_cnn_only.py

# Monitor training progress
tensorboard --logdir=runs

# Expected output:
# ✅ CNN Training completed!
#   Best accuracy: 0.4600
#   Total epochs: 2
#   Training time: 0.61s
#   TensorBoard logs: runs/cnn_training_1748043814
```

#### **Comprehensive Training Test**
```bash
# Run full training pipeline test
python test_training.py

# Monitor multiple training modes
tensorboard --logdir=runs
```

### **7. Test Data Validation**

#### **Real Market Data Policy**
- ✅ **No Synthetic Data**: All training uses authentic exchange data
- ✅ **Live API**: Direct connection to Binance for real-time prices
- ✅ **Multi-timeframe**: Consistent data across all time horizons
- ✅ **Technical Indicators**: Calculated from real OHLCV values

#### **Data Quality Checks**
- **Completeness**: Verifying all required timeframes have data
- **Consistency**: Cross-timeframe data alignment validation
- **Freshness**: Ensuring recent market data availability
- **Feature integrity**: Validating all 48 technical indicators

### **8. TensorBoard Monitoring**

#### **CNN Training Metrics**
- `Training/Loss` - Neural network training loss
- `Training/Accuracy` - Model prediction accuracy
- `Validation/Loss` - Validation dataset loss
- `Validation/Accuracy` - Out-of-sample accuracy
- `Best/ValidationAccuracy` - Best model performance
- `Data/InputShape` - Feature matrix dimensions
- `Model/TotalParams` - Neural network parameters

#### **Access URLs**
- **TensorBoard**: http://localhost:6006
- **Web Dashboard**: http://localhost:8051
- **Training Logs**: `/runs/` directory

### **9. Best Practices**

#### **Quick Testing**
1. **Start small**: Use `test_cnn_only.py` for fast validation
2. **Monitor metrics**: Keep TensorBoard open during training
3. **Check outputs**: Verify model files are created in `test_models/`
4. **Validate accuracy**: Ensure model performance meets expectations

#### **Production Training**
1. **Use full datasets**: Scale up sample sizes for production models
2. **Multi-symbol training**: Train on multiple cryptocurrency pairs
3. **Extended timeframes**: Include longer-term patterns
4. **Comprehensive validation**: Use walk-forward and out-of-sample testing

### **10. Troubleshooting**

#### **Common Issues**
- **Memory errors**: Reduce batch size or sample count
- **Data loading failures**: Check internet connection and API access
- **Feature mismatches**: Verify all timeframes have consistent data
- **TensorBoard not updating**: Restart TensorBoard after training starts

#### **Debug Commands**
```bash
# Check training status
python monitor_training.py

# Validate data availability
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_historical_data('ETH/USDT', '1m').shape)"

# Test feature generation
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_feature_matrix('ETH/USDT', ['1m', '5m', '1h'], 20).shape)"
```

---

**🔥 All CNN training and testing uses REAL market data from cryptocurrency exchanges. No synthetic or simulated data is used anywhere in the system.**