- Document critical issues and fixes applied - Detail proper training loop architecture - Outline signal-position linking system - Define comprehensive reward calculation - List implementation phases and next steps
8.6 KiB
8.6 KiB
Trading System Training Fix Implementation
Date: September 30, 2025
Status: In Progress
Critical Issues Identified
1. Division by Zero ✅ FIXED
Problem: Trading executor crashed when price was 0 or invalid
Solution: Added price validation before division in core/trading_executor.py
if current_price <= 0:
logger.error(f"Invalid price {current_price} for {symbol}")
return False
2. Mock Predictions ✅ FIXED
Problem: System fell back to "mock predictions" when training unavailable (POLICY VIOLATION!)
Solution: Removed mock fallback, system now fails gracefully
logger.error("CRITICAL: Enhanced training system not available - predictions disabled. NEVER use mock data.")
3. Torch Import ✅ ALREADY FIXED
Problem: "cannot access local variable 'torch'" error
Status: Already has None placeholder when import fails
Training Loop Issues
###Current State (BROKEN):
-
Immediate Training on Next Tick
- Training happens on
next_price - current_price(≈0.00) - No actual position tracking
- Rewards are meaningless noise
- Training happens on
-
No Position Close Training
- Positions open/close but NO training triggered
- Real PnL calculated but unused for training
- Models never learn from actual trade outcomes
-
Manual Trades Only
- Only manual trades trigger model training
- Automated trades don't train models
Proper Training Loop Implementation
Required Components:
1. Signal-Position Linking
class SignalPositionTracker:
"""Links trading signals to positions for outcome-based training"""
def __init__(self):
self.active_trades = {} # position_id -> signal_data
def register_signal(self, signal_id, signal_data, position_id):
"""Store signal context when position opens"""
self.active_trades[position_id] = {
'signal_id': signal_id,
'signal': signal_data,
'entry_time': datetime.now(),
'market_state': signal_data.get('market_state'),
'models_used': {
'cnn': signal_data.get('cnn_contribution', 0),
'dqn': signal_data.get('dqn_contribution', 0),
'cob_rl': signal_data.get('cob_contribution', 0)
}
}
def get_signal_for_position(self, position_id):
"""Retrieve signal when position closes"""
return self.active_trades.pop(position_id, None)
2. Position Close Hook
# In core/trading_executor.py after trade_record is created:
def _on_position_close(self, trade_record, position):
"""Called when position closes - trigger training"""
# Get original signal
signal_data = self.position_tracker.get_signal_for_position(position.id)
if not signal_data:
logger.warning(f"No signal found for position {position.id}")
return
# Calculate comprehensive reward
reward = self._calculate_training_reward(trade_record, signal_data)
# Train all models that contributed
if self.orchestrator:
self.orchestrator.train_on_trade_outcome(
signal_data=signal_data,
trade_record=trade_record,
reward=reward
)
3. Comprehensive Reward Function
def _calculate_training_reward(self, trade_record, signal_data):
"""Calculate sophisticated reward from closed trade"""
# Base PnL (already includes fees)
pnl = trade_record.pnl
# Time penalty (encourage faster trades)
hold_time_minutes = trade_record.hold_time_seconds / 60
time_penalty = -0.001 * max(0, hold_time_minutes - 5) # Penalty after 5min
# Risk-adjusted reward
position_risk = trade_record.quantity * trade_record.entry_price / self.balance
risk_adjusted = pnl / (position_risk + 0.01)
# Consistency bonus/penalty
recent_pnls = [t.pnl for t in self.trade_history[-20:]]
if len(recent_pnls) > 1:
pnl_std = np.std(recent_pnls)
consistency = pnl / (pnl_std + 0.001)
else:
consistency = 0
# Win/loss streak adjustment
if pnl > 0:
streak_bonus = min(0.1, self.winning_streak * 0.02)
else:
streak_bonus = -min(0.2, self.losing_streak * 0.05)
# Final reward (scaled for model learning)
final_reward = (
pnl * 10.0 + # Base PnL (scaled)
time_penalty + # Efficiency
risk_adjusted * 2.0 + # Risk management
consistency * 0.5 + # Volatility
streak_bonus # Consistency
)
logger.info(f"REWARD CALC: PnL={pnl:.4f}, Time={time_penalty:.4f}, "
f"Risk={risk_adjusted:.4f}, Final={final_reward:.4f}")
return final_reward
4. Multi-Model Training
# In core/orchestrator.py
def train_on_trade_outcome(self, signal_data, trade_record, reward):
"""Train all models that contributed to the signal"""
market_state = signal_data.get('market_state')
action = self._action_to_index(trade_record.side) # BUY=0, SELL=1
# Train CNN
if self.cnn_model and signal_data['models_used']['cnn'] > 0:
weight = signal_data['models_used']['cnn']
self._train_cnn_on_outcome(market_state, action, reward, weight)
logger.info(f"CNN trained with weight {weight:.2f}")
# Train DQN
if self.dqn_agent and signal_data['models_used']['dqn'] > 0:
weight = signal_data['models_used']['dqn']
next_state = self._extract_current_state()
self.dqn_agent.remember(market_state, action, reward * weight, next_state, done=True)
if len(self.dqn_agent.memory) > 32:
loss = self.dqn_agent.replay(batch_size=32)
logger.info(f"DQN trained with weight {weight:.2f}, loss={loss:.4f}")
# Train COB RL
if self.cob_rl_model and signal_data['models_used']['cob_rl'] > 0:
weight = signal_data['models_used']['cob_rl']
cob_data = signal_data.get('cob_data', {})
self._train_cob_on_outcome(cob_data, action, reward, weight)
logger.info(f"COB RL trained with weight {weight:.2f}")
logger.info(f"✅ TRAINED ALL MODELS: PnL=${trade_record.pnl:.2f}, Reward={reward:.4f}")
Implementation Steps
Phase 1: Core Infrastructure (Priority 1) ✅
- Fix division by zero
- Remove mock predictions
- Fix torch imports
Phase 2: Training Loop (Priority 2) - IN PROGRESS
- Create SignalPositionTracker class
- Add position close hook in trading_executor
- Implement comprehensive reward function
- Add train_on_trade_outcome to orchestrator
- Remove immediate training on next-tick
Phase 3: Reward Improvements (Priority 3)
- Multi-timeframe rewards (1m, 5m, 15m outcomes)
- Selective training (skip tiny movements)
- Better feature engineering
- Prioritized experience replay
Phase 4: Testing & Validation
- Test with paper trading
- Validate rewards are non-zero
- Confirm models are training
- Monitor training metrics
Expected Improvements
Before:
- Rewards: ~0.00 (next-tick noise)
- Training: Only on next-tick price
- Learning: Models see no real outcomes
- Effectiveness: 1/10
After:
- Rewards: Real PnL-based (-$5 to +$10 range)
- Training: On actual position close
- Learning: Models see real trade results
- Effectiveness: 9/10
Files to Modify
-
core/trading_executor.py
- Add position close hook
- Create SignalPositionTracker
- Implement reward calculation
-
core/orchestrator.py
- Add train_on_trade_outcome method
- Implement multi-model training
-
web/clean_dashboard.py
- Remove immediate training
- Add signal registration on execution
- Link signals to positions
-
core/training_integration.py (optional)
- May need updates for consistency
Monitoring & Validation
Log Messages to Watch:
✅ TRAINED ALL MODELS: PnL=$2.35, Reward=25.40
REWARD CALC: PnL=0.0235, Time=-0.002, Risk=1.15, Final=25.40
CNN trained with weight 0.35
DQN trained with weight 0.45, loss=0.0123
COB RL trained with weight 0.20
Metrics to Track:
- Average reward per trade (should be >> 0.01)
- Training frequency (should match trade close frequency)
- Model convergence (loss decreasing over time)
- Win rate improvement (should increase with training)
Next Steps
- Implement SignalPositionTracker
- Add position close hook
- Create reward calculation
- Test with 10 manual trades
- Validate rewards are meaningful
- Deploy to automated trading
Status: Phase 1 Complete, Phase 2 In Progress