gogo2/CNN_TESTING_GUIDE.md
Dobromir Popov 310f3c5bf9 wip
2025-05-24 09:59:11 +03:00

6.6 KiB

CNN Testing & Backtest Guide

📊 CNN Test Cases and Training Data Location

1. Test Scripts

Quick CNN Test (test_cnn_only.py)

  • Purpose: Fast CNN validation with real market data
  • Location: /test_cnn_only.py
  • Test Configuration:
    • Symbols: ['ETH/USDT']
    • Timeframes: ['1m', '5m', '1h']
    • Samples: 500 (for quick testing)
    • Epochs: 2
    • Batch size: 16
  • Data Source: Real Binance API data only
  • Output: test_models/quick_cnn.pt

Comprehensive Training Test (test_training.py)

  • Purpose: Full training pipeline validation
  • Location: /test_training.py
  • Functions:
    • test_cnn_training() - Complete CNN training test
    • test_rl_training() - RL training validation
  • Output: test_models/test_cnn.pt

2. Test Model Storage

Directory: /test_models/

  • quick_cnn.pt (586KB) - Latest quick test model
  • quick_cnn_best.pt (587KB) - Best performing quick test model
  • regular_save.pt (384MB) - Full-size training model
  • robust_save.pt (17KB) - Optimized lightweight model
  • backup models - Automatic backups with .backup extension

3. Training Data Sources

Real Market Data (Primary)

  • Exchange: Binance API
  • Symbols: ETH/USDT, BTC/USDT, etc.
  • Timeframes: 1s, 1m, 5m, 15m, 1h, 4h, 1d
  • Features: 48 technical indicators calculated from real OHLCV data
  • Storage: Cached in /cache/ directory
  • Format: JSON files with tick-by-tick and aggregated candle data

Feature Matrix Structure

# Multi-timeframe feature matrix: (timeframes, window_size, features)
feature_matrix.shape = (4, 20, 48)  # 4 timeframes, 20 steps, 48 features

# 48 Features include:
features = [
    'ad_line', 'adx', 'adx_neg', 'adx_pos', 'atr',
    'bb_lower', 'bb_middle', 'bb_percent', 'bb_upper', 'bb_width',
    'close', 'ema_12', 'ema_26', 'ema_50', 'high',
    'keltner_lower', 'keltner_middle', 'keltner_upper', 'low',
    'macd', 'macd_histogram', 'macd_signal', 'mfi', 'momentum_composite',
    'obv', 'open', 'price_position', 'psar', 'roc',
    'rsi_14', 'rsi_21', 'rsi_7', 'sma_10', 'sma_20', 'sma_50',
    'stoch_d', 'stoch_k', 'trend_strength', 'true_range', 'ultimate_osc',
    'volatility_regime', 'volume', 'volume_sma_10', 'volume_sma_20',
    'volume_sma_50', 'vpt', 'vwap', 'williams_r'
]

4. Test Case Categories

Unit Tests

  • Quick validation: 500 samples, 2 epochs
  • Performance benchmarks: Speed and accuracy metrics
  • Memory usage: Resource consumption monitoring

Integration Tests

  • Full pipeline: Data loading → Feature engineering → Training → Evaluation
  • Multi-symbol: Testing across different cryptocurrency pairs
  • Multi-timeframe: Validation across various time horizons

Backtesting

  • Historical performance: Using past market data for validation
  • Walk-forward testing: Progressive training on expanding datasets
  • Out-of-sample validation: Testing on unseen data periods

5. VSCode Launch Configurations

Quick CNN Test

{
    "name": "Quick CNN Test (Real Data + TensorBoard)",
    "program": "test_cnn_only.py",
    "env": {"PYTHONUNBUFFERED": "1"}
}

Realtime RL Training with Monitoring

{
    "name": "Realtime RL Training + TensorBoard + Web UI",
    "program": "train_realtime_with_tensorboard.py",
    "args": ["--episodes", "50", "--symbol", "ETH/USDT", "--web-port", "8051"]
}

6. Test Execution Commands

Quick CNN Test

# Run quick CNN validation
python test_cnn_only.py

# Monitor training progress
tensorboard --logdir=runs

# Expected output:
# ✅ CNN Training completed!
#   Best accuracy: 0.4600
#   Total epochs: 2
#   Training time: 0.61s
#   TensorBoard logs: runs/cnn_training_1748043814

Comprehensive Training Test

# Run full training pipeline test
python test_training.py

# Monitor multiple training modes
tensorboard --logdir=runs

7. Test Data Validation

Real Market Data Policy

  • No Synthetic Data: All training uses authentic exchange data
  • Live API: Direct connection to Binance for real-time prices
  • Multi-timeframe: Consistent data across all time horizons
  • Technical Indicators: Calculated from real OHLCV values

Data Quality Checks

  • Completeness: Verifying all required timeframes have data
  • Consistency: Cross-timeframe data alignment validation
  • Freshness: Ensuring recent market data availability
  • Feature integrity: Validating all 48 technical indicators

8. TensorBoard Monitoring

CNN Training Metrics

  • Training/Loss - Neural network training loss
  • Training/Accuracy - Model prediction accuracy
  • Validation/Loss - Validation dataset loss
  • Validation/Accuracy - Out-of-sample accuracy
  • Best/ValidationAccuracy - Best model performance
  • Data/InputShape - Feature matrix dimensions
  • Model/TotalParams - Neural network parameters

Access URLs

9. Best Practices

Quick Testing

  1. Start small: Use test_cnn_only.py for fast validation
  2. Monitor metrics: Keep TensorBoard open during training
  3. Check outputs: Verify model files are created in test_models/
  4. Validate accuracy: Ensure model performance meets expectations

Production Training

  1. Use full datasets: Scale up sample sizes for production models
  2. Multi-symbol training: Train on multiple cryptocurrency pairs
  3. Extended timeframes: Include longer-term patterns
  4. Comprehensive validation: Use walk-forward and out-of-sample testing

10. Troubleshooting

Common Issues

  • Memory errors: Reduce batch size or sample count
  • Data loading failures: Check internet connection and API access
  • Feature mismatches: Verify all timeframes have consistent data
  • TensorBoard not updating: Restart TensorBoard after training starts

Debug Commands

# Check training status
python monitor_training.py

# Validate data availability
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_historical_data('ETH/USDT', '1m').shape)"

# Test feature generation
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_feature_matrix('ETH/USDT', ['1m', '5m', '1h'], 20).shape)"

🔥 All CNN training and testing uses REAL market data from cryptocurrency exchanges. No synthetic or simulated data is used anywhere in the system.