docs: Add comprehensive training fix implementation plan
- Document critical issues and fixes applied - Detail proper training loop architecture - Outline signal-position linking system - Define comprehensive reward calculation - List implementation phases and next steps
This commit is contained in:
286
TRAINING_FIX_IMPLEMENTATION.md
Normal file
286
TRAINING_FIX_IMPLEMENTATION.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Trading System Training Fix Implementation
|
||||
|
||||
**Date**: September 30, 2025
|
||||
**Status**: In Progress
|
||||
|
||||
---
|
||||
|
||||
## Critical Issues Identified
|
||||
|
||||
### 1. Division by Zero ✅ FIXED
|
||||
**Problem**: Trading executor crashed when price was 0 or invalid
|
||||
**Solution**: Added price validation before division in `core/trading_executor.py`
|
||||
```python
|
||||
if current_price <= 0:
|
||||
logger.error(f"Invalid price {current_price} for {symbol}")
|
||||
return False
|
||||
```
|
||||
|
||||
### 2. Mock Predictions ✅ FIXED
|
||||
**Problem**: System fell back to "mock predictions" when training unavailable (POLICY VIOLATION!)
|
||||
**Solution**: Removed mock fallback, system now fails gracefully
|
||||
```python
|
||||
logger.error("CRITICAL: Enhanced training system not available - predictions disabled. NEVER use mock data.")
|
||||
```
|
||||
|
||||
### 3. Torch Import ✅ ALREADY FIXED
|
||||
**Problem**: "cannot access local variable 'torch'" error
|
||||
**Status**: Already has None placeholder when import fails
|
||||
|
||||
---
|
||||
|
||||
## Training Loop Issues
|
||||
|
||||
###Current State (BROKEN):
|
||||
1. **Immediate Training on Next Tick**
|
||||
- Training happens on `next_price - current_price` (≈0.00)
|
||||
- No actual position tracking
|
||||
- Rewards are meaningless noise
|
||||
|
||||
2. **No Position Close Training**
|
||||
- Positions open/close but NO training triggered
|
||||
- Real PnL calculated but unused for training
|
||||
- Models never learn from actual trade outcomes
|
||||
|
||||
3. **Manual Trades Only**
|
||||
- Only manual trades trigger model training
|
||||
- Automated trades don't train models
|
||||
|
||||
---
|
||||
|
||||
## Proper Training Loop Implementation
|
||||
|
||||
### Required Components:
|
||||
|
||||
#### 1. Signal-Position Linking
|
||||
```python
|
||||
class SignalPositionTracker:
|
||||
"""Links trading signals to positions for outcome-based training"""
|
||||
|
||||
def __init__(self):
|
||||
self.active_trades = {} # position_id -> signal_data
|
||||
|
||||
def register_signal(self, signal_id, signal_data, position_id):
|
||||
"""Store signal context when position opens"""
|
||||
self.active_trades[position_id] = {
|
||||
'signal_id': signal_id,
|
||||
'signal': signal_data,
|
||||
'entry_time': datetime.now(),
|
||||
'market_state': signal_data.get('market_state'),
|
||||
'models_used': {
|
||||
'cnn': signal_data.get('cnn_contribution', 0),
|
||||
'dqn': signal_data.get('dqn_contribution', 0),
|
||||
'cob_rl': signal_data.get('cob_contribution', 0)
|
||||
}
|
||||
}
|
||||
|
||||
def get_signal_for_position(self, position_id):
|
||||
"""Retrieve signal when position closes"""
|
||||
return self.active_trades.pop(position_id, None)
|
||||
```
|
||||
|
||||
#### 2. Position Close Hook
|
||||
```python
|
||||
# In core/trading_executor.py after trade_record is created:
|
||||
|
||||
def _on_position_close(self, trade_record, position):
|
||||
"""Called when position closes - trigger training"""
|
||||
|
||||
# Get original signal
|
||||
signal_data = self.position_tracker.get_signal_for_position(position.id)
|
||||
if not signal_data:
|
||||
logger.warning(f"No signal found for position {position.id}")
|
||||
return
|
||||
|
||||
# Calculate comprehensive reward
|
||||
reward = self._calculate_training_reward(trade_record, signal_data)
|
||||
|
||||
# Train all models that contributed
|
||||
if self.orchestrator:
|
||||
self.orchestrator.train_on_trade_outcome(
|
||||
signal_data=signal_data,
|
||||
trade_record=trade_record,
|
||||
reward=reward
|
||||
)
|
||||
```
|
||||
|
||||
#### 3. Comprehensive Reward Function
|
||||
```python
|
||||
def _calculate_training_reward(self, trade_record, signal_data):
|
||||
"""Calculate sophisticated reward from closed trade"""
|
||||
|
||||
# Base PnL (already includes fees)
|
||||
pnl = trade_record.pnl
|
||||
|
||||
# Time penalty (encourage faster trades)
|
||||
hold_time_minutes = trade_record.hold_time_seconds / 60
|
||||
time_penalty = -0.001 * max(0, hold_time_minutes - 5) # Penalty after 5min
|
||||
|
||||
# Risk-adjusted reward
|
||||
position_risk = trade_record.quantity * trade_record.entry_price / self.balance
|
||||
risk_adjusted = pnl / (position_risk + 0.01)
|
||||
|
||||
# Consistency bonus/penalty
|
||||
recent_pnls = [t.pnl for t in self.trade_history[-20:]]
|
||||
if len(recent_pnls) > 1:
|
||||
pnl_std = np.std(recent_pnls)
|
||||
consistency = pnl / (pnl_std + 0.001)
|
||||
else:
|
||||
consistency = 0
|
||||
|
||||
# Win/loss streak adjustment
|
||||
if pnl > 0:
|
||||
streak_bonus = min(0.1, self.winning_streak * 0.02)
|
||||
else:
|
||||
streak_bonus = -min(0.2, self.losing_streak * 0.05)
|
||||
|
||||
# Final reward (scaled for model learning)
|
||||
final_reward = (
|
||||
pnl * 10.0 + # Base PnL (scaled)
|
||||
time_penalty + # Efficiency
|
||||
risk_adjusted * 2.0 + # Risk management
|
||||
consistency * 0.5 + # Volatility
|
||||
streak_bonus # Consistency
|
||||
)
|
||||
|
||||
logger.info(f"REWARD CALC: PnL={pnl:.4f}, Time={time_penalty:.4f}, "
|
||||
f"Risk={risk_adjusted:.4f}, Final={final_reward:.4f}")
|
||||
|
||||
return final_reward
|
||||
```
|
||||
|
||||
#### 4. Multi-Model Training
|
||||
```python
|
||||
# In core/orchestrator.py
|
||||
|
||||
def train_on_trade_outcome(self, signal_data, trade_record, reward):
|
||||
"""Train all models that contributed to the signal"""
|
||||
|
||||
market_state = signal_data.get('market_state')
|
||||
action = self._action_to_index(trade_record.side) # BUY=0, SELL=1
|
||||
|
||||
# Train CNN
|
||||
if self.cnn_model and signal_data['models_used']['cnn'] > 0:
|
||||
weight = signal_data['models_used']['cnn']
|
||||
self._train_cnn_on_outcome(market_state, action, reward, weight)
|
||||
logger.info(f"CNN trained with weight {weight:.2f}")
|
||||
|
||||
# Train DQN
|
||||
if self.dqn_agent and signal_data['models_used']['dqn'] > 0:
|
||||
weight = signal_data['models_used']['dqn']
|
||||
next_state = self._extract_current_state()
|
||||
self.dqn_agent.remember(market_state, action, reward * weight, next_state, done=True)
|
||||
|
||||
if len(self.dqn_agent.memory) > 32:
|
||||
loss = self.dqn_agent.replay(batch_size=32)
|
||||
logger.info(f"DQN trained with weight {weight:.2f}, loss={loss:.4f}")
|
||||
|
||||
# Train COB RL
|
||||
if self.cob_rl_model and signal_data['models_used']['cob_rl'] > 0:
|
||||
weight = signal_data['models_used']['cob_rl']
|
||||
cob_data = signal_data.get('cob_data', {})
|
||||
self._train_cob_on_outcome(cob_data, action, reward, weight)
|
||||
logger.info(f"COB RL trained with weight {weight:.2f}")
|
||||
|
||||
logger.info(f"✅ TRAINED ALL MODELS: PnL=${trade_record.pnl:.2f}, Reward={reward:.4f}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Phase 1: Core Infrastructure (Priority 1) ✅
|
||||
- [x] Fix division by zero
|
||||
- [x] Remove mock predictions
|
||||
- [x] Fix torch imports
|
||||
|
||||
### Phase 2: Training Loop (Priority 2) - IN PROGRESS
|
||||
- [ ] Create SignalPositionTracker class
|
||||
- [ ] Add position close hook in trading_executor
|
||||
- [ ] Implement comprehensive reward function
|
||||
- [ ] Add train_on_trade_outcome to orchestrator
|
||||
- [ ] Remove immediate training on next-tick
|
||||
|
||||
### Phase 3: Reward Improvements (Priority 3)
|
||||
- [ ] Multi-timeframe rewards (1m, 5m, 15m outcomes)
|
||||
- [ ] Selective training (skip tiny movements)
|
||||
- [ ] Better feature engineering
|
||||
- [ ] Prioritized experience replay
|
||||
|
||||
### Phase 4: Testing & Validation
|
||||
- [ ] Test with paper trading
|
||||
- [ ] Validate rewards are non-zero
|
||||
- [ ] Confirm models are training
|
||||
- [ ] Monitor training metrics
|
||||
|
||||
---
|
||||
|
||||
## Expected Improvements
|
||||
|
||||
### Before:
|
||||
- Rewards: ~0.00 (next-tick noise)
|
||||
- Training: Only on next-tick price
|
||||
- Learning: Models see no real outcomes
|
||||
- Effectiveness: 1/10
|
||||
|
||||
### After:
|
||||
- Rewards: Real PnL-based (-$5 to +$10 range)
|
||||
- Training: On actual position close
|
||||
- Learning: Models see real trade results
|
||||
- Effectiveness: 9/10
|
||||
|
||||
---
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. **core/trading_executor.py**
|
||||
- Add position close hook
|
||||
- Create SignalPositionTracker
|
||||
- Implement reward calculation
|
||||
|
||||
2. **core/orchestrator.py**
|
||||
- Add train_on_trade_outcome method
|
||||
- Implement multi-model training
|
||||
|
||||
3. **web/clean_dashboard.py**
|
||||
- Remove immediate training
|
||||
- Add signal registration on execution
|
||||
- Link signals to positions
|
||||
|
||||
4. **core/training_integration.py** (optional)
|
||||
- May need updates for consistency
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Validation
|
||||
|
||||
### Log Messages to Watch:
|
||||
```
|
||||
✅ TRAINED ALL MODELS: PnL=$2.35, Reward=25.40
|
||||
REWARD CALC: PnL=0.0235, Time=-0.002, Risk=1.15, Final=25.40
|
||||
CNN trained with weight 0.35
|
||||
DQN trained with weight 0.45, loss=0.0123
|
||||
COB RL trained with weight 0.20
|
||||
```
|
||||
|
||||
### Metrics to Track:
|
||||
- Average reward per trade (should be >> 0.01)
|
||||
- Training frequency (should match trade close frequency)
|
||||
- Model convergence (loss decreasing over time)
|
||||
- Win rate improvement (should increase with training)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Implement SignalPositionTracker
|
||||
2. Add position close hook
|
||||
3. Create reward calculation
|
||||
4. Test with 10 manual trades
|
||||
5. Validate rewards are meaningful
|
||||
6. Deploy to automated trading
|
||||
|
||||
---
|
||||
|
||||
**Status**: Phase 1 Complete, Phase 2 In Progress
|
||||
|
||||
Reference in New Issue
Block a user