# Trading System Training Fix Implementation **Date**: September 30, 2025 **Status**: In Progress --- ## Critical Issues Identified ### 1. Division by Zero FIXED **Problem**: Trading executor crashed when price was 0 or invalid **Solution**: Added price validation before division in `core/trading_executor.py` ```python if current_price <= 0: logger.error(f"Invalid price {current_price} for {symbol}") return False ``` ### 2. Mock Predictions FIXED **Problem**: System fell back to "mock predictions" when training unavailable (POLICY VIOLATION!) **Solution**: Removed mock fallback, system now fails gracefully ```python logger.error("CRITICAL: Enhanced training system not available - predictions disabled. NEVER use mock data.") ``` ### 3. Torch Import ALREADY FIXED **Problem**: "cannot access local variable 'torch'" error **Status**: Already has None placeholder when import fails --- ## Training Loop Issues ###Current State (BROKEN): 1. **Immediate Training on Next Tick** - Training happens on `next_price - current_price` (≈0.00) - No actual position tracking - Rewards are meaningless noise 2. **No Position Close Training** - Positions open/close but NO training triggered - Real PnL calculated but unused for training - Models never learn from actual trade outcomes 3. **Manual Trades Only** - Only manual trades trigger model training - Automated trades don't train models --- ## Proper Training Loop Implementation ### Required Components: #### 1. Signal-Position Linking ```python class SignalPositionTracker: """Links trading signals to positions for outcome-based training""" def __init__(self): self.active_trades = {} # position_id -> signal_data def register_signal(self, signal_id, signal_data, position_id): """Store signal context when position opens""" self.active_trades[position_id] = { 'signal_id': signal_id, 'signal': signal_data, 'entry_time': datetime.now(), 'market_state': signal_data.get('market_state'), 'models_used': { 'cnn': signal_data.get('cnn_contribution', 0), 'dqn': signal_data.get('dqn_contribution', 0), 'cob_rl': signal_data.get('cob_contribution', 0) } } def get_signal_for_position(self, position_id): """Retrieve signal when position closes""" return self.active_trades.pop(position_id, None) ``` #### 2. Position Close Hook ```python # In core/trading_executor.py after trade_record is created: def _on_position_close(self, trade_record, position): """Called when position closes - trigger training""" # Get original signal signal_data = self.position_tracker.get_signal_for_position(position.id) if not signal_data: logger.warning(f"No signal found for position {position.id}") return # Calculate comprehensive reward reward = self._calculate_training_reward(trade_record, signal_data) # Train all models that contributed if self.orchestrator: self.orchestrator.train_on_trade_outcome( signal_data=signal_data, trade_record=trade_record, reward=reward ) ``` #### 3. Comprehensive Reward Function ```python def _calculate_training_reward(self, trade_record, signal_data): """Calculate sophisticated reward from closed trade""" # Base PnL (already includes fees) pnl = trade_record.pnl # Time penalty (encourage faster trades) hold_time_minutes = trade_record.hold_time_seconds / 60 time_penalty = -0.001 * max(0, hold_time_minutes - 5) # Penalty after 5min # Risk-adjusted reward position_risk = trade_record.quantity * trade_record.entry_price / self.balance risk_adjusted = pnl / (position_risk + 0.01) # Consistency bonus/penalty recent_pnls = [t.pnl for t in self.trade_history[-20:]] if len(recent_pnls) > 1: pnl_std = np.std(recent_pnls) consistency = pnl / (pnl_std + 0.001) else: consistency = 0 # Win/loss streak adjustment if pnl > 0: streak_bonus = min(0.1, self.winning_streak * 0.02) else: streak_bonus = -min(0.2, self.losing_streak * 0.05) # Final reward (scaled for model learning) final_reward = ( pnl * 10.0 + # Base PnL (scaled) time_penalty + # Efficiency risk_adjusted * 2.0 + # Risk management consistency * 0.5 + # Volatility streak_bonus # Consistency ) logger.info(f"REWARD CALC: PnL={pnl:.4f}, Time={time_penalty:.4f}, " f"Risk={risk_adjusted:.4f}, Final={final_reward:.4f}") return final_reward ``` #### 4. Multi-Model Training ```python # In core/orchestrator.py def train_on_trade_outcome(self, signal_data, trade_record, reward): """Train all models that contributed to the signal""" market_state = signal_data.get('market_state') action = self._action_to_index(trade_record.side) # BUY=0, SELL=1 # Train CNN if self.cnn_model and signal_data['models_used']['cnn'] > 0: weight = signal_data['models_used']['cnn'] self._train_cnn_on_outcome(market_state, action, reward, weight) logger.info(f"CNN trained with weight {weight:.2f}") # Train DQN if self.dqn_agent and signal_data['models_used']['dqn'] > 0: weight = signal_data['models_used']['dqn'] next_state = self._extract_current_state() self.dqn_agent.remember(market_state, action, reward * weight, next_state, done=True) if len(self.dqn_agent.memory) > 32: loss = self.dqn_agent.replay(batch_size=32) logger.info(f"DQN trained with weight {weight:.2f}, loss={loss:.4f}") # Train COB RL if self.cob_rl_model and signal_data['models_used']['cob_rl'] > 0: weight = signal_data['models_used']['cob_rl'] cob_data = signal_data.get('cob_data', {}) self._train_cob_on_outcome(cob_data, action, reward, weight) logger.info(f"COB RL trained with weight {weight:.2f}") logger.info(f" TRAINED ALL MODELS: PnL=${trade_record.pnl:.2f}, Reward={reward:.4f}") ``` --- ## Implementation Steps ### Phase 1: Core Infrastructure (Priority 1) - [x] Fix division by zero - [x] Remove mock predictions - [x] Fix torch imports ### Phase 2: Training Loop (Priority 2) - IN PROGRESS - [ ] Create SignalPositionTracker class - [ ] Add position close hook in trading_executor - [ ] Implement comprehensive reward function - [ ] Add train_on_trade_outcome to orchestrator - [ ] Remove immediate training on next-tick ### Phase 3: Reward Improvements (Priority 3) - [ ] Multi-timeframe rewards (1m, 5m, 15m outcomes) - [ ] Selective training (skip tiny movements) - [ ] Better feature engineering - [ ] Prioritized experience replay ### Phase 4: Testing & Validation - [ ] Test with paper trading - [ ] Validate rewards are non-zero - [ ] Confirm models are training - [ ] Monitor training metrics --- ## Expected Improvements ### Before: - Rewards: ~0.00 (next-tick noise) - Training: Only on next-tick price - Learning: Models see no real outcomes - Effectiveness: 1/10 ### After: - Rewards: Real PnL-based (-$5 to +$10 range) - Training: On actual position close - Learning: Models see real trade results - Effectiveness: 9/10 --- ## Files to Modify 1. **core/trading_executor.py** - Add position close hook - Create SignalPositionTracker - Implement reward calculation 2. **core/orchestrator.py** - Add train_on_trade_outcome method - Implement multi-model training 3. **web/clean_dashboard.py** - Remove immediate training - Add signal registration on execution - Link signals to positions 4. **core/training_integration.py** (optional) - May need updates for consistency --- ## Monitoring & Validation ### Log Messages to Watch: ``` TRAINED ALL MODELS: PnL=$2.35, Reward=25.40 REWARD CALC: PnL=0.0235, Time=-0.002, Risk=1.15, Final=25.40 CNN trained with weight 0.35 DQN trained with weight 0.45, loss=0.0123 COB RL trained with weight 0.20 ``` ### Metrics to Track: - Average reward per trade (should be >> 0.01) - Training frequency (should match trade close frequency) - Model convergence (loss decreasing over time) - Win rate improvement (should increase with training) --- ## Next Steps 1. Implement SignalPositionTracker 2. Add position close hook 3. Create reward calculation 4. Test with 10 manual trades 5. Validate rewards are meaningful 6. Deploy to automated trading --- **Status**: Phase 1 Complete, Phase 2 In Progress