253 lines
9.2 KiB
Markdown
253 lines
9.2 KiB
Markdown
# Multi-Horizon Training System Documentation
|
|
|
|
## Overview
|
|
|
|
The Multi-Horizon Training System addresses the core issues with your current training approach:
|
|
|
|
### Problems with Current System
|
|
1. **Immediate Training**: Training happens right after trades close (couple seconds), often before meaningful price movement
|
|
2. **No Profit Potential**: Small timeframes don't provide enough movement for profitable trades
|
|
3. **Reactive Training**: Models learn from very short-term outcomes rather than longer-term patterns
|
|
4. **Limited Prediction Horizons**: Only predicts short timeframes that may not capture meaningful market moves
|
|
|
|
### New System Benefits
|
|
1. **Multi-Timeframe Predictions**: Predicts 1m, 5m, 15m, and 60m horizons every minute
|
|
2. **Deferred Training**: Stores predictions and trains models when outcomes are actually known
|
|
3. **Min/Max Price Prediction**: Focuses on predicting price ranges over longer periods for better profit potential
|
|
4. **Backtesting Capability**: Can validate system performance on historical data
|
|
5. **Scalable Storage**: Efficiently stores model inputs for future training
|
|
|
|
## System Components
|
|
|
|
### 1. MultiHorizonPredictionManager (`core/multi_horizon_prediction_manager.py`)
|
|
- Generates predictions for 1, 5, 15, and 60-minute horizons every minute
|
|
- Uses ensemble approach combining CNN, RL, and technical analysis
|
|
- Stores prediction snapshots with full model inputs for future training
|
|
|
|
**Key Features:**
|
|
- Real-time prediction generation
|
|
- Confidence-based filtering
|
|
- Automatic validation when target times are reached
|
|
|
|
### 2. PredictionSnapshotStorage (`core/prediction_snapshot_storage.py`)
|
|
- Efficiently stores prediction snapshots to disk
|
|
- SQLite metadata database with compression
|
|
- Batch retrieval for training
|
|
- Automatic cleanup of old data
|
|
|
|
**Storage Structure:**
|
|
- Compressed pickle files for snapshot data
|
|
- SQLite database for fast metadata queries
|
|
- Organized by symbol and prediction horizon
|
|
|
|
### 3. MultiHorizonTrainer (`core/multi_horizon_trainer.py`)
|
|
- Trains models when prediction outcomes are known
|
|
- Handles both CNN and RL model training
|
|
- Uses stored snapshots to recreate training scenarios
|
|
|
|
**Training Process:**
|
|
- Validates pending predictions against actual price data
|
|
- Trains models using historical prediction accuracy
|
|
- Supports batch training for efficiency
|
|
|
|
### 4. MultiHorizonBacktester (`core/multi_horizon_backtester.py`)
|
|
- Backtests prediction accuracy on historical data
|
|
- Validates system performance before deployment
|
|
- Provides detailed accuracy and profitability analysis
|
|
|
|
**Backtesting Features:**
|
|
- Historical data simulation
|
|
- Accuracy metrics by prediction horizon
|
|
- Profitability analysis
|
|
- Performance reporting
|
|
|
|
### 5. Enhanced DataProvider (`core/data_provider.py`)
|
|
- Added `get_price_range_over_period()` method
|
|
- Supports min/max price queries over specific time ranges
|
|
- Better integration with backtesting framework
|
|
|
|
## Usage Examples
|
|
|
|
### Running the System
|
|
|
|
```bash
|
|
# Run demonstration
|
|
python run_multi_horizon_training.py --mode demo
|
|
|
|
# Run backtest on 7 days of data
|
|
python run_multi_horizon_training.py --mode backtest --symbol ETH/USDT --days 7
|
|
|
|
# Force training session
|
|
python run_multi_horizon_training.py --mode train --horizon 60
|
|
|
|
# Run system for 5 minutes
|
|
python run_multi_horizon_training.py --mode run --runtime 300
|
|
```
|
|
|
|
### Integration with Existing Code
|
|
|
|
```python
|
|
from core.multi_horizon_prediction_manager import MultiHorizonPredictionManager
|
|
from core.prediction_snapshot_storage import PredictionSnapshotStorage
|
|
from core.multi_horizon_trainer import MultiHorizonTrainer
|
|
|
|
# Initialize components
|
|
prediction_manager = MultiHorizonPredictionManager(orchestrator=your_orchestrator)
|
|
snapshot_storage = PredictionSnapshotStorage()
|
|
trainer = MultiHorizonTrainer(orchestrator=your_orchestrator, snapshot_storage=snapshot_storage)
|
|
|
|
# Start the system
|
|
prediction_manager.start()
|
|
trainer.start()
|
|
|
|
# Get system status
|
|
status = prediction_manager.get_prediction_stats()
|
|
training_stats = trainer.get_training_stats()
|
|
```
|
|
|
|
## Prediction Horizons
|
|
|
|
The system generates predictions for four horizons:
|
|
|
|
- **1 minute**: Very short-term predictions for scalping
|
|
- **5 minutes**: Short-term momentum predictions
|
|
- **15 minutes**: Medium-term trend predictions
|
|
- **60 minutes**: Long-term range predictions (focus area for meaningful moves)
|
|
|
|
Each prediction includes:
|
|
- Predicted minimum price
|
|
- Predicted maximum price
|
|
- Confidence score
|
|
- Model inputs for training
|
|
- Market state snapshot
|
|
|
|
## Training Strategy
|
|
|
|
### When Training Occurs
|
|
- Predictions are generated every minute
|
|
- Models are trained when prediction target times are reached (1-60 minutes later)
|
|
- Training uses the full context available at prediction time
|
|
- Rewards are based on prediction accuracy within the predicted price range
|
|
|
|
### Model Types Supported
|
|
1. **CNN Models**: Trained on feature sequences to predict price ranges
|
|
2. **RL Models**: Trained with reinforcement learning on prediction outcomes
|
|
3. **Ensemble**: Combines multiple model predictions for better accuracy
|
|
|
|
## Backtesting and Validation
|
|
|
|
### Backtesting Process
|
|
1. Load historical 1-minute data
|
|
2. Simulate predictions at regular intervals
|
|
3. Wait for target time to check actual outcomes
|
|
4. Calculate accuracy and profitability metrics
|
|
|
|
### Key Metrics
|
|
- **Range Accuracy**: How well predicted min/max ranges match actual ranges
|
|
- **Confidence Correlation**: How confidence scores relate to prediction accuracy
|
|
- **Profitability**: Simulated trading performance based on predictions
|
|
|
|
## Performance Analysis
|
|
|
|
### Expected Improvements
|
|
1. **Better Profit Potential**: 60-minute predictions allow for meaningful price moves
|
|
2. **More Stable Training**: Training occurs on known outcomes, not immediate reactions
|
|
3. **Reduced Overfitting**: Multi-horizon approach prevents overfitting to short-term noise
|
|
4. **Backtesting Validation**: Historical testing ensures system robustness
|
|
|
|
### Monitoring
|
|
The system provides comprehensive monitoring:
|
|
- Prediction generation rates
|
|
- Training session statistics
|
|
- Model accuracy by horizon
|
|
- Storage utilization
|
|
- System health metrics
|
|
|
|
## Configuration
|
|
|
|
### Key Parameters
|
|
```python
|
|
# Prediction horizons (minutes)
|
|
horizons = [1, 5, 15, 60]
|
|
|
|
# Prediction frequency
|
|
prediction_interval_seconds = 60
|
|
|
|
# Minimum confidence for storage
|
|
min_confidence_threshold = 0.3
|
|
|
|
# Training batch size
|
|
batch_size = 32
|
|
|
|
# Storage retention
|
|
max_age_days = 30
|
|
```
|
|
|
|
### File Locations
|
|
- Prediction snapshots: `data/prediction_snapshots/`
|
|
- Backtest results: `reports/`
|
|
- Cache data: `cache/`
|
|
|
|
## Integration with Existing Dashboard
|
|
|
|
The system is designed to integrate with your existing dashboard:
|
|
|
|
1. **Real-time Monitoring**: Dashboard can display prediction generation stats
|
|
2. **Training Progress**: Show training session results
|
|
3. **Backtest Reports**: Display historical performance analysis
|
|
4. **Model Comparison**: Compare old vs new training approaches
|
|
|
|
## Migration Path
|
|
|
|
### Gradual Adoption
|
|
1. **Run in Parallel**: Run new system alongside existing training
|
|
2. **Compare Performance**: Use backtesting to compare approaches
|
|
3. **Gradual Transition**: Move models to new training system incrementally
|
|
4. **Fallback Support**: Keep old system as backup during transition
|
|
|
|
### Data Compatibility
|
|
- New system stores snapshots independently
|
|
- Existing model weights can be used as starting points
|
|
- Training data format is compatible with existing models
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
1. **Low Prediction Accuracy**: Check confidence thresholds and feature quality
|
|
2. **Storage Issues**: Monitor disk space and cleanup old snapshots
|
|
3. **Training Performance**: Adjust batch sizes and learning rates
|
|
4. **Memory Usage**: Use appropriate cache sizes for your hardware
|
|
|
|
### Logging
|
|
All components use structured logging with consistent log levels:
|
|
- `INFO`: Normal operations and results
|
|
- `WARNING`: Potential issues that don't stop operation
|
|
- `ERROR`: Serious problems requiring attention
|
|
|
|
## Future Enhancements
|
|
|
|
### Planned Features
|
|
1. **Advanced Ensemble Methods**: More sophisticated model combination
|
|
2. **Adaptive Horizons**: Dynamic horizon selection based on market conditions
|
|
3. **Cross-Symbol Training**: Train models using data from multiple symbols
|
|
4. **Real-time Validation**: Immediate feedback on prediction quality
|
|
5. **Performance Optimization**: GPU acceleration and distributed training
|
|
|
|
### Research Directions
|
|
1. **Optimal Horizon Selection**: Which horizons provide best risk-adjusted returns
|
|
2. **Market Regime Detection**: Adjust predictions based on market conditions
|
|
3. **Feature Engineering**: Better input features for price range prediction
|
|
4. **Uncertainty Quantification**: Better confidence score calibration
|
|
|
|
## Conclusion
|
|
|
|
The Multi-Horizon Training System addresses your core concerns by:
|
|
|
|
1. **Extending Prediction Horizons**: From seconds to 60 minutes for meaningful profit potential
|
|
2. **Deferred Training**: Models learn from actual outcomes, not immediate reactions
|
|
3. **Comprehensive Storage**: Full model inputs preserved for future training
|
|
4. **Backtesting Validation**: Historical testing ensures system effectiveness
|
|
5. **Scalable Architecture**: Efficient storage and training for long-term operation
|
|
|
|
This system should significantly improve your trading performance by focusing on longer-term, more profitable price movements while maintaining rigorous training and validation processes.
|