Files
gogo2/MULTI_HORIZON_TRAINING_SYSTEM.md
Dobromir Popov 608da8233f main cleanup
2025-09-30 23:56:36 +03:00

9.2 KiB

Multi-Horizon Training System Documentation

Overview

The Multi-Horizon Training System addresses the core issues with your current training approach:

Problems with Current System

  1. Immediate Training: Training happens right after trades close (couple seconds), often before meaningful price movement
  2. No Profit Potential: Small timeframes don't provide enough movement for profitable trades
  3. Reactive Training: Models learn from very short-term outcomes rather than longer-term patterns
  4. Limited Prediction Horizons: Only predicts short timeframes that may not capture meaningful market moves

New System Benefits

  1. Multi-Timeframe Predictions: Predicts 1m, 5m, 15m, and 60m horizons every minute
  2. Deferred Training: Stores predictions and trains models when outcomes are actually known
  3. Min/Max Price Prediction: Focuses on predicting price ranges over longer periods for better profit potential
  4. Backtesting Capability: Can validate system performance on historical data
  5. Scalable Storage: Efficiently stores model inputs for future training

System Components

1. MultiHorizonPredictionManager (core/multi_horizon_prediction_manager.py)

  • Generates predictions for 1, 5, 15, and 60-minute horizons every minute
  • Uses ensemble approach combining CNN, RL, and technical analysis
  • Stores prediction snapshots with full model inputs for future training

Key Features:

  • Real-time prediction generation
  • Confidence-based filtering
  • Automatic validation when target times are reached

2. PredictionSnapshotStorage (core/prediction_snapshot_storage.py)

  • Efficiently stores prediction snapshots to disk
  • SQLite metadata database with compression
  • Batch retrieval for training
  • Automatic cleanup of old data

Storage Structure:

  • Compressed pickle files for snapshot data
  • SQLite database for fast metadata queries
  • Organized by symbol and prediction horizon

3. MultiHorizonTrainer (core/multi_horizon_trainer.py)

  • Trains models when prediction outcomes are known
  • Handles both CNN and RL model training
  • Uses stored snapshots to recreate training scenarios

Training Process:

  • Validates pending predictions against actual price data
  • Trains models using historical prediction accuracy
  • Supports batch training for efficiency

4. MultiHorizonBacktester (core/multi_horizon_backtester.py)

  • Backtests prediction accuracy on historical data
  • Validates system performance before deployment
  • Provides detailed accuracy and profitability analysis

Backtesting Features:

  • Historical data simulation
  • Accuracy metrics by prediction horizon
  • Profitability analysis
  • Performance reporting

5. Enhanced DataProvider (core/data_provider.py)

  • Added get_price_range_over_period() method
  • Supports min/max price queries over specific time ranges
  • Better integration with backtesting framework

Usage Examples

Running the System

# Run demonstration
python run_multi_horizon_training.py --mode demo

# Run backtest on 7 days of data
python run_multi_horizon_training.py --mode backtest --symbol ETH/USDT --days 7

# Force training session
python run_multi_horizon_training.py --mode train --horizon 60

# Run system for 5 minutes
python run_multi_horizon_training.py --mode run --runtime 300

Integration with Existing Code

from core.multi_horizon_prediction_manager import MultiHorizonPredictionManager
from core.prediction_snapshot_storage import PredictionSnapshotStorage
from core.multi_horizon_trainer import MultiHorizonTrainer

# Initialize components
prediction_manager = MultiHorizonPredictionManager(orchestrator=your_orchestrator)
snapshot_storage = PredictionSnapshotStorage()
trainer = MultiHorizonTrainer(orchestrator=your_orchestrator, snapshot_storage=snapshot_storage)

# Start the system
prediction_manager.start()
trainer.start()

# Get system status
status = prediction_manager.get_prediction_stats()
training_stats = trainer.get_training_stats()

Prediction Horizons

The system generates predictions for four horizons:

  • 1 minute: Very short-term predictions for scalping
  • 5 minutes: Short-term momentum predictions
  • 15 minutes: Medium-term trend predictions
  • 60 minutes: Long-term range predictions (focus area for meaningful moves)

Each prediction includes:

  • Predicted minimum price
  • Predicted maximum price
  • Confidence score
  • Model inputs for training
  • Market state snapshot

Training Strategy

When Training Occurs

  • Predictions are generated every minute
  • Models are trained when prediction target times are reached (1-60 minutes later)
  • Training uses the full context available at prediction time
  • Rewards are based on prediction accuracy within the predicted price range

Model Types Supported

  1. CNN Models: Trained on feature sequences to predict price ranges
  2. RL Models: Trained with reinforcement learning on prediction outcomes
  3. Ensemble: Combines multiple model predictions for better accuracy

Backtesting and Validation

Backtesting Process

  1. Load historical 1-minute data
  2. Simulate predictions at regular intervals
  3. Wait for target time to check actual outcomes
  4. Calculate accuracy and profitability metrics

Key Metrics

  • Range Accuracy: How well predicted min/max ranges match actual ranges
  • Confidence Correlation: How confidence scores relate to prediction accuracy
  • Profitability: Simulated trading performance based on predictions

Performance Analysis

Expected Improvements

  1. Better Profit Potential: 60-minute predictions allow for meaningful price moves
  2. More Stable Training: Training occurs on known outcomes, not immediate reactions
  3. Reduced Overfitting: Multi-horizon approach prevents overfitting to short-term noise
  4. Backtesting Validation: Historical testing ensures system robustness

Monitoring

The system provides comprehensive monitoring:

  • Prediction generation rates
  • Training session statistics
  • Model accuracy by horizon
  • Storage utilization
  • System health metrics

Configuration

Key Parameters

# Prediction horizons (minutes)
horizons = [1, 5, 15, 60]

# Prediction frequency
prediction_interval_seconds = 60

# Minimum confidence for storage
min_confidence_threshold = 0.3

# Training batch size
batch_size = 32

# Storage retention
max_age_days = 30

File Locations

  • Prediction snapshots: data/prediction_snapshots/
  • Backtest results: reports/
  • Cache data: cache/

Integration with Existing Dashboard

The system is designed to integrate with your existing dashboard:

  1. Real-time Monitoring: Dashboard can display prediction generation stats
  2. Training Progress: Show training session results
  3. Backtest Reports: Display historical performance analysis
  4. Model Comparison: Compare old vs new training approaches

Migration Path

Gradual Adoption

  1. Run in Parallel: Run new system alongside existing training
  2. Compare Performance: Use backtesting to compare approaches
  3. Gradual Transition: Move models to new training system incrementally
  4. Fallback Support: Keep old system as backup during transition

Data Compatibility

  • New system stores snapshots independently
  • Existing model weights can be used as starting points
  • Training data format is compatible with existing models

Troubleshooting

Common Issues

  1. Low Prediction Accuracy: Check confidence thresholds and feature quality
  2. Storage Issues: Monitor disk space and cleanup old snapshots
  3. Training Performance: Adjust batch sizes and learning rates
  4. Memory Usage: Use appropriate cache sizes for your hardware

Logging

All components use structured logging with consistent log levels:

  • INFO: Normal operations and results
  • WARNING: Potential issues that don't stop operation
  • ERROR: Serious problems requiring attention

Future Enhancements

Planned Features

  1. Advanced Ensemble Methods: More sophisticated model combination
  2. Adaptive Horizons: Dynamic horizon selection based on market conditions
  3. Cross-Symbol Training: Train models using data from multiple symbols
  4. Real-time Validation: Immediate feedback on prediction quality
  5. Performance Optimization: GPU acceleration and distributed training

Research Directions

  1. Optimal Horizon Selection: Which horizons provide best risk-adjusted returns
  2. Market Regime Detection: Adjust predictions based on market conditions
  3. Feature Engineering: Better input features for price range prediction
  4. Uncertainty Quantification: Better confidence score calibration

Conclusion

The Multi-Horizon Training System addresses your core concerns by:

  1. Extending Prediction Horizons: From seconds to 60 minutes for meaningful profit potential
  2. Deferred Training: Models learn from actual outcomes, not immediate reactions
  3. Comprehensive Storage: Full model inputs preserved for future training
  4. Backtesting Validation: Historical testing ensures system effectiveness
  5. Scalable Architecture: Efficient storage and training for long-term operation

This system should significantly improve your trading performance by focusing on longer-term, more profitable price movements while maintaining rigorous training and validation processes.