gogo2/MULTI_HORIZON_TRAINING_SYSTEM.md

# Multi-Horizon Training System Documentation

## Overview

The Multi-Horizon Training System addresses the core issues with your current training approach:

### Problems with Current System
1. **Immediate Training**: Training happens right after trades close (couple seconds), often before meaningful price movement
2. **No Profit Potential**: Small timeframes don't provide enough movement for profitable trades
3. **Reactive Training**: Models learn from very short-term outcomes rather than longer-term patterns
4. **Limited Prediction Horizons**: Only predicts short timeframes that may not capture meaningful market moves

### New System Benefits
1. **Multi-Timeframe Predictions**: Predicts 1m, 5m, 15m, and 60m horizons every minute
2. **Deferred Training**: Stores predictions and trains models when outcomes are actually known
3. **Min/Max Price Prediction**: Focuses on predicting price ranges over longer periods for better profit potential
4. **Backtesting Capability**: Can validate system performance on historical data
5. **Scalable Storage**: Efficiently stores model inputs for future training

## System Components

### 1. MultiHorizonPredictionManager (`core/multi_horizon_prediction_manager.py`)
- Generates predictions for 1, 5, 15, and 60-minute horizons every minute
- Uses ensemble approach combining CNN, RL, and technical analysis
- Stores prediction snapshots with full model inputs for future training

**Key Features:**
- Real-time prediction generation
- Confidence-based filtering
- Automatic validation when target times are reached

### 2. PredictionSnapshotStorage (`core/prediction_snapshot_storage.py`)
- Efficiently stores prediction snapshots to disk
- SQLite metadata database with compression
- Batch retrieval for training
- Automatic cleanup of old data

**Storage Structure:**
- Compressed pickle files for snapshot data
- SQLite database for fast metadata queries
- Organized by symbol and prediction horizon

### 3. MultiHorizonTrainer (`core/multi_horizon_trainer.py`)
- Trains models when prediction outcomes are known
- Handles both CNN and RL model training
- Uses stored snapshots to recreate training scenarios

**Training Process:**
- Validates pending predictions against actual price data
- Trains models using historical prediction accuracy
- Supports batch training for efficiency

### 4. MultiHorizonBacktester (`core/multi_horizon_backtester.py`)
- Backtests prediction accuracy on historical data
- Validates system performance before deployment
- Provides detailed accuracy and profitability analysis

**Backtesting Features:**
- Historical data simulation
- Accuracy metrics by prediction horizon
- Profitability analysis
- Performance reporting

### 5. Enhanced DataProvider (`core/data_provider.py`)
- Added `get_price_range_over_period()` method
- Supports min/max price queries over specific time ranges
- Better integration with backtesting framework

## Usage Examples

### Running the System

```bash
# Run demonstration
python run_multi_horizon_training.py --mode demo

# Run backtest on 7 days of data
python run_multi_horizon_training.py --mode backtest --symbol ETH/USDT --days 7

# Force training session
python run_multi_horizon_training.py --mode train --horizon 60

# Run system for 5 minutes
python run_multi_horizon_training.py --mode run --runtime 300
```

### Integration with Existing Code

```python
from core.multi_horizon_prediction_manager import MultiHorizonPredictionManager
from core.prediction_snapshot_storage import PredictionSnapshotStorage
from core.multi_horizon_trainer import MultiHorizonTrainer

# Initialize components
prediction_manager = MultiHorizonPredictionManager(orchestrator=your_orchestrator)
snapshot_storage = PredictionSnapshotStorage()
trainer = MultiHorizonTrainer(orchestrator=your_orchestrator, snapshot_storage=snapshot_storage)

# Start the system
prediction_manager.start()
trainer.start()

# Get system status
status = prediction_manager.get_prediction_stats()
training_stats = trainer.get_training_stats()
```

## Prediction Horizons

The system generates predictions for four horizons:

- **1 minute**: Very short-term predictions for scalping
- **5 minutes**: Short-term momentum predictions
- **15 minutes**: Medium-term trend predictions
- **60 minutes**: Long-term range predictions (focus area for meaningful moves)

Each prediction includes:
- Predicted minimum price
- Predicted maximum price
- Confidence score
- Model inputs for training
- Market state snapshot

## Training Strategy

### When Training Occurs
- Predictions are generated every minute
- Models are trained when prediction target times are reached (1-60 minutes later)
- Training uses the full context available at prediction time
- Rewards are based on prediction accuracy within the predicted price range

### Model Types Supported
1. **CNN Models**: Trained on feature sequences to predict price ranges
2. **RL Models**: Trained with reinforcement learning on prediction outcomes
3. **Ensemble**: Combines multiple model predictions for better accuracy

## Backtesting and Validation

### Backtesting Process
1. Load historical 1-minute data
2. Simulate predictions at regular intervals
3. Wait for target time to check actual outcomes
4. Calculate accuracy and profitability metrics

### Key Metrics
- **Range Accuracy**: How well predicted min/max ranges match actual ranges
- **Confidence Correlation**: How confidence scores relate to prediction accuracy
- **Profitability**: Simulated trading performance based on predictions

## Performance Analysis

### Expected Improvements
1. **Better Profit Potential**: 60-minute predictions allow for meaningful price moves
2. **More Stable Training**: Training occurs on known outcomes, not immediate reactions
3. **Reduced Overfitting**: Multi-horizon approach prevents overfitting to short-term noise
4. **Backtesting Validation**: Historical testing ensures system robustness

### Monitoring
The system provides comprehensive monitoring:
- Prediction generation rates
- Training session statistics
- Model accuracy by horizon
- Storage utilization
- System health metrics

## Configuration

### Key Parameters
```python
# Prediction horizons (minutes)
horizons = [1, 5, 15, 60]

# Prediction frequency
prediction_interval_seconds = 60

# Minimum confidence for storage
min_confidence_threshold = 0.3

# Training batch size
batch_size = 32

# Storage retention
max_age_days = 30
```

### File Locations
- Prediction snapshots: `data/prediction_snapshots/`
- Backtest results: `reports/`
- Cache data: `cache/`

## Integration with Existing Dashboard

The system is designed to integrate with your existing dashboard:

1. **Real-time Monitoring**: Dashboard can display prediction generation stats
2. **Training Progress**: Show training session results
3. **Backtest Reports**: Display historical performance analysis
4. **Model Comparison**: Compare old vs new training approaches

## Migration Path

### Gradual Adoption
1. **Run in Parallel**: Run new system alongside existing training
2. **Compare Performance**: Use backtesting to compare approaches
3. **Gradual Transition**: Move models to new training system incrementally
4. **Fallback Support**: Keep old system as backup during transition

### Data Compatibility
- New system stores snapshots independently
- Existing model weights can be used as starting points
- Training data format is compatible with existing models

## Troubleshooting

### Common Issues
1. **Low Prediction Accuracy**: Check confidence thresholds and feature quality
2. **Storage Issues**: Monitor disk space and cleanup old snapshots
3. **Training Performance**: Adjust batch sizes and learning rates
4. **Memory Usage**: Use appropriate cache sizes for your hardware

### Logging
All components use structured logging with consistent log levels:
- `INFO`: Normal operations and results
- `WARNING`: Potential issues that don't stop operation
- `ERROR`: Serious problems requiring attention

## Future Enhancements

### Planned Features
1. **Advanced Ensemble Methods**: More sophisticated model combination
2. **Adaptive Horizons**: Dynamic horizon selection based on market conditions
3. **Cross-Symbol Training**: Train models using data from multiple symbols
4. **Real-time Validation**: Immediate feedback on prediction quality
5. **Performance Optimization**: GPU acceleration and distributed training

### Research Directions
1. **Optimal Horizon Selection**: Which horizons provide best risk-adjusted returns
2. **Market Regime Detection**: Adjust predictions based on market conditions
3. **Feature Engineering**: Better input features for price range prediction
4. **Uncertainty Quantification**: Better confidence score calibration

## Conclusion

The Multi-Horizon Training System addresses your core concerns by:

1. **Extending Prediction Horizons**: From seconds to 60 minutes for meaningful profit potential
2. **Deferred Training**: Models learn from actual outcomes, not immediate reactions
3. **Comprehensive Storage**: Full model inputs preserved for future training
4. **Backtesting Validation**: Historical testing ensures system effectiveness
5. **Scalable Architecture**: Efficient storage and training for long-term operation

This system should significantly improve your trading performance by focusing on longer-term, more profitable price movements while maintaining rigorous training and validation processes.