9.2 KiB
9.2 KiB
Multi-Horizon Training System Documentation
Overview
The Multi-Horizon Training System addresses the core issues with your current training approach:
Problems with Current System
- Immediate Training: Training happens right after trades close (couple seconds), often before meaningful price movement
- No Profit Potential: Small timeframes don't provide enough movement for profitable trades
- Reactive Training: Models learn from very short-term outcomes rather than longer-term patterns
- Limited Prediction Horizons: Only predicts short timeframes that may not capture meaningful market moves
New System Benefits
- Multi-Timeframe Predictions: Predicts 1m, 5m, 15m, and 60m horizons every minute
- Deferred Training: Stores predictions and trains models when outcomes are actually known
- Min/Max Price Prediction: Focuses on predicting price ranges over longer periods for better profit potential
- Backtesting Capability: Can validate system performance on historical data
- Scalable Storage: Efficiently stores model inputs for future training
System Components
1. MultiHorizonPredictionManager (core/multi_horizon_prediction_manager.py)
- Generates predictions for 1, 5, 15, and 60-minute horizons every minute
- Uses ensemble approach combining CNN, RL, and technical analysis
- Stores prediction snapshots with full model inputs for future training
Key Features:
- Real-time prediction generation
- Confidence-based filtering
- Automatic validation when target times are reached
2. PredictionSnapshotStorage (core/prediction_snapshot_storage.py)
- Efficiently stores prediction snapshots to disk
- SQLite metadata database with compression
- Batch retrieval for training
- Automatic cleanup of old data
Storage Structure:
- Compressed pickle files for snapshot data
- SQLite database for fast metadata queries
- Organized by symbol and prediction horizon
3. MultiHorizonTrainer (core/multi_horizon_trainer.py)
- Trains models when prediction outcomes are known
- Handles both CNN and RL model training
- Uses stored snapshots to recreate training scenarios
Training Process:
- Validates pending predictions against actual price data
- Trains models using historical prediction accuracy
- Supports batch training for efficiency
4. MultiHorizonBacktester (core/multi_horizon_backtester.py)
- Backtests prediction accuracy on historical data
- Validates system performance before deployment
- Provides detailed accuracy and profitability analysis
Backtesting Features:
- Historical data simulation
- Accuracy metrics by prediction horizon
- Profitability analysis
- Performance reporting
5. Enhanced DataProvider (core/data_provider.py)
- Added
get_price_range_over_period()method - Supports min/max price queries over specific time ranges
- Better integration with backtesting framework
Usage Examples
Running the System
# Run demonstration
python run_multi_horizon_training.py --mode demo
# Run backtest on 7 days of data
python run_multi_horizon_training.py --mode backtest --symbol ETH/USDT --days 7
# Force training session
python run_multi_horizon_training.py --mode train --horizon 60
# Run system for 5 minutes
python run_multi_horizon_training.py --mode run --runtime 300
Integration with Existing Code
from core.multi_horizon_prediction_manager import MultiHorizonPredictionManager
from core.prediction_snapshot_storage import PredictionSnapshotStorage
from core.multi_horizon_trainer import MultiHorizonTrainer
# Initialize components
prediction_manager = MultiHorizonPredictionManager(orchestrator=your_orchestrator)
snapshot_storage = PredictionSnapshotStorage()
trainer = MultiHorizonTrainer(orchestrator=your_orchestrator, snapshot_storage=snapshot_storage)
# Start the system
prediction_manager.start()
trainer.start()
# Get system status
status = prediction_manager.get_prediction_stats()
training_stats = trainer.get_training_stats()
Prediction Horizons
The system generates predictions for four horizons:
- 1 minute: Very short-term predictions for scalping
- 5 minutes: Short-term momentum predictions
- 15 minutes: Medium-term trend predictions
- 60 minutes: Long-term range predictions (focus area for meaningful moves)
Each prediction includes:
- Predicted minimum price
- Predicted maximum price
- Confidence score
- Model inputs for training
- Market state snapshot
Training Strategy
When Training Occurs
- Predictions are generated every minute
- Models are trained when prediction target times are reached (1-60 minutes later)
- Training uses the full context available at prediction time
- Rewards are based on prediction accuracy within the predicted price range
Model Types Supported
- CNN Models: Trained on feature sequences to predict price ranges
- RL Models: Trained with reinforcement learning on prediction outcomes
- Ensemble: Combines multiple model predictions for better accuracy
Backtesting and Validation
Backtesting Process
- Load historical 1-minute data
- Simulate predictions at regular intervals
- Wait for target time to check actual outcomes
- Calculate accuracy and profitability metrics
Key Metrics
- Range Accuracy: How well predicted min/max ranges match actual ranges
- Confidence Correlation: How confidence scores relate to prediction accuracy
- Profitability: Simulated trading performance based on predictions
Performance Analysis
Expected Improvements
- Better Profit Potential: 60-minute predictions allow for meaningful price moves
- More Stable Training: Training occurs on known outcomes, not immediate reactions
- Reduced Overfitting: Multi-horizon approach prevents overfitting to short-term noise
- Backtesting Validation: Historical testing ensures system robustness
Monitoring
The system provides comprehensive monitoring:
- Prediction generation rates
- Training session statistics
- Model accuracy by horizon
- Storage utilization
- System health metrics
Configuration
Key Parameters
# Prediction horizons (minutes)
horizons = [1, 5, 15, 60]
# Prediction frequency
prediction_interval_seconds = 60
# Minimum confidence for storage
min_confidence_threshold = 0.3
# Training batch size
batch_size = 32
# Storage retention
max_age_days = 30
File Locations
- Prediction snapshots:
data/prediction_snapshots/ - Backtest results:
reports/ - Cache data:
cache/
Integration with Existing Dashboard
The system is designed to integrate with your existing dashboard:
- Real-time Monitoring: Dashboard can display prediction generation stats
- Training Progress: Show training session results
- Backtest Reports: Display historical performance analysis
- Model Comparison: Compare old vs new training approaches
Migration Path
Gradual Adoption
- Run in Parallel: Run new system alongside existing training
- Compare Performance: Use backtesting to compare approaches
- Gradual Transition: Move models to new training system incrementally
- Fallback Support: Keep old system as backup during transition
Data Compatibility
- New system stores snapshots independently
- Existing model weights can be used as starting points
- Training data format is compatible with existing models
Troubleshooting
Common Issues
- Low Prediction Accuracy: Check confidence thresholds and feature quality
- Storage Issues: Monitor disk space and cleanup old snapshots
- Training Performance: Adjust batch sizes and learning rates
- Memory Usage: Use appropriate cache sizes for your hardware
Logging
All components use structured logging with consistent log levels:
INFO: Normal operations and resultsWARNING: Potential issues that don't stop operationERROR: Serious problems requiring attention
Future Enhancements
Planned Features
- Advanced Ensemble Methods: More sophisticated model combination
- Adaptive Horizons: Dynamic horizon selection based on market conditions
- Cross-Symbol Training: Train models using data from multiple symbols
- Real-time Validation: Immediate feedback on prediction quality
- Performance Optimization: GPU acceleration and distributed training
Research Directions
- Optimal Horizon Selection: Which horizons provide best risk-adjusted returns
- Market Regime Detection: Adjust predictions based on market conditions
- Feature Engineering: Better input features for price range prediction
- Uncertainty Quantification: Better confidence score calibration
Conclusion
The Multi-Horizon Training System addresses your core concerns by:
- Extending Prediction Horizons: From seconds to 60 minutes for meaningful profit potential
- Deferred Training: Models learn from actual outcomes, not immediate reactions
- Comprehensive Storage: Full model inputs preserved for future training
- Backtesting Validation: Historical testing ensures system effectiveness
- Scalable Architecture: Efficient storage and training for long-term operation
This system should significantly improve your trading performance by focusing on longer-term, more profitable price movements while maintaining rigorous training and validation processes.