12 KiB
Enhanced Multi-Modal Trading Architecture Guide
Overview
This document describes the enhanced multi-modal trading system that implements sophisticated decision-making through coordinated CNN and RL modules. The system is designed to handle multi-timeframe analysis across multiple symbols (ETH, BTC) with continuous learning capabilities.
Architecture Components
1. Enhanced Trading Orchestrator (core/enhanced_orchestrator.py
)
The heart of the system that coordinates all components:
Key Features:
- Multi-Symbol Coordination: Makes decisions across ETH and BTC considering correlations
- Timeframe Integration: Combines predictions from multiple timeframes (1m, 5m, 15m, 1h, 4h, 1d)
- Perfect Move Marking: Identifies and marks optimal trading decisions for CNN training
- RL Evaluation Loop: Evaluates trading outcomes to train RL agents
Data Structures:
@dataclass
class TimeframePrediction:
timeframe: str
action: str # 'BUY', 'SELL', 'HOLD'
confidence: float # 0.0 to 1.0
probabilities: Dict[str, float]
timestamp: datetime
market_features: Dict[str, float]
@dataclass
class TradingAction:
symbol: str
action: str
quantity: float
confidence: float
price: float
timestamp: datetime
reasoning: Dict[str, Any]
timeframe_analysis: List[TimeframePrediction]
Decision Making Process:
- Gather market states for all symbols and timeframes
- Get CNN predictions for each timeframe with confidence scores
- Combine timeframe predictions using weighted averaging
- Consider symbol correlations (ETH-BTC correlation ~0.85)
- Apply confidence thresholds and risk management
- Generate coordinated trading decisions
- Queue actions for RL evaluation
2. Enhanced CNN Trainer (training/enhanced_cnn_trainer.py
)
Implements supervised learning on marked perfect moves:
Key Features:
- Perfect Move Dataset: Trains on historically optimal decisions
- Timeframe-Specific Heads: Separate prediction heads for each timeframe
- Confidence Prediction: Predicts both action and confidence simultaneously
- Multi-Loss Training: Combines action classification and confidence regression
Network Architecture:
# Convolutional feature extraction
Conv1D(features=5, filters=64, kernel=3) -> BatchNorm -> ReLU -> Dropout
Conv1D(filters=128, kernel=3) -> BatchNorm -> ReLU -> Dropout
Conv1D(filters=256, kernel=3) -> BatchNorm -> ReLU -> Dropout
AdaptiveAvgPool1d(1) # Global average pooling
# Timeframe-specific heads
for each timeframe:
Linear(256 -> 128) -> ReLU -> Dropout
Linear(128 -> 64) -> ReLU -> Dropout
# Action prediction
Linear(64 -> 3) # BUY, HOLD, SELL
# Confidence prediction
Linear(64 -> 32) -> ReLU -> Linear(32 -> 1) -> Sigmoid
Training Process:
- Collect perfect moves from orchestrator with known outcomes
- Create dataset with features, optimal actions, and target confidence
- Train with combined loss:
action_loss + 0.5 * confidence_loss
- Use early stopping and model checkpointing
- Generate comprehensive training reports and visualizations
3. Enhanced RL Trainer (training/enhanced_rl_trainer.py
)
Implements continuous learning from trading evaluations:
Key Features:
- Prioritized Experience Replay: Learns from important experiences first
- Market Regime Adaptation: Adjusts confidence based on market conditions
- Multi-Symbol Agents: Separate RL agents for each trading symbol
- Double DQN Architecture: Reduces overestimation bias
Agent Architecture:
# Main Network
Linear(state_size -> 256) -> ReLU -> Dropout
Linear(256 -> 256) -> ReLU -> Dropout
Linear(256 -> 128) -> ReLU -> Dropout
# Dueling heads
value_head = Linear(128 -> 1)
advantage_head = Linear(128 -> action_space)
# Q-values = V(s) + A(s,a) - mean(A(s,a))
Learning Process:
- Store trading experiences with TD-error priorities
- Sample batches using prioritized replay
- Train with Double DQN to reduce overestimation
- Update target networks periodically
- Adapt exploration (epsilon) based on market regime stability
4. Market State and Feature Engineering
Market State Components:
@dataclass
class MarketState:
symbol: str
timestamp: datetime
prices: Dict[str, float] # {timeframe: price}
features: Dict[str, np.ndarray] # {timeframe: feature_matrix}
volatility: float
volume: float
trend_strength: float
market_regime: str # 'trending', 'ranging', 'volatile'
Feature Engineering:
- OHLCV Data: Open, High, Low, Close, Volume for each timeframe
- Technical Indicators: RSI, MACD, Bollinger Bands, etc.
- Market Regime Detection: Automatic classification of market conditions
- Volatility Analysis: Real-time volatility calculations
- Volume Analysis: Volume ratio compared to historical averages
System Workflow
1. Initialization Phase
# Load configuration
config = get_config('config.yaml')
# Initialize components
data_provider = DataProvider(config)
orchestrator = EnhancedTradingOrchestrator(data_provider)
cnn_trainer = EnhancedCNNTrainer(config, orchestrator)
rl_trainer = EnhancedRLTrainer(config, orchestrator)
# Load existing models or create new ones
models = initialize_models(load_existing=True)
register_models_with_orchestrator(models)
2. Trading Loop
while running:
# 1. Gather market data for all symbols and timeframes
market_states = await get_all_market_states()
# 2. Generate CNN predictions for each timeframe
for symbol in symbols:
for timeframe in timeframes:
prediction = cnn_model.predict_timeframe(features, timeframe)
# 3. Combine timeframe predictions with weights
combined_prediction = combine_timeframe_predictions(predictions)
# 4. Consider symbol correlations
coordinated_decision = coordinate_symbols(predictions, correlations)
# 5. Apply confidence thresholds and risk management
final_decision = apply_risk_management(coordinated_decision)
# 6. Execute trades (or log decisions)
execute_trading_decision(final_decision)
# 7. Queue for RL evaluation
queue_for_rl_evaluation(final_decision, market_state)
3. Continuous Learning Loop
# RL Learning (every hour)
async def rl_learning_loop():
while running:
# Evaluate past trading actions
await evaluate_trading_outcomes()
# Train RL agents on new experiences
for symbol, agent in rl_agents.items():
agent.replay() # Learn from prioritized experiences
# Adapt to market regime changes
adapt_to_market_conditions()
await asyncio.sleep(3600) # Wait 1 hour
# CNN Learning (every 6 hours)
async def cnn_learning_loop():
while running:
# Check for sufficient perfect moves
perfect_moves = get_perfect_moves_for_training()
if len(perfect_moves) >= 200:
# Train CNN on perfect moves
training_report = train_cnn_on_perfect_moves(perfect_moves)
# Update registered model
update_model_registry(trained_model)
await asyncio.sleep(6 * 3600) # Wait 6 hours
Key Algorithms
1. Timeframe Prediction Combination
def combine_timeframe_predictions(timeframe_predictions, symbol):
action_scores = {'BUY': 0.0, 'SELL': 0.0, 'HOLD': 0.0}
total_weight = 0.0
timeframe_weights = {
'1m': 0.05, '5m': 0.10, '15m': 0.15,
'1h': 0.25, '4h': 0.25, '1d': 0.20
}
for pred in timeframe_predictions:
weight = timeframe_weights[pred.timeframe] * pred.confidence
action_scores[pred.action] += weight
total_weight += weight
# Normalize and select best action
best_action = max(action_scores, key=action_scores.get)
confidence = action_scores[best_action] / total_weight
return best_action, confidence
2. Perfect Move Marking
def mark_perfect_move(action, initial_state, final_state, reward):
# Determine optimal action based on outcome
if reward > 0.02: # Significant positive outcome
optimal_action = action.action # Action was correct
optimal_confidence = min(0.95, abs(reward) * 10)
elif reward < -0.02: # Significant negative outcome
optimal_action = opposite_action(action.action) # Should have done opposite
optimal_confidence = min(0.95, abs(reward) * 10)
else: # Neutral outcome
optimal_action = 'HOLD' # Should have held
optimal_confidence = 0.3
# Create perfect move for CNN training
perfect_move = PerfectMove(
symbol=action.symbol,
timeframe=timeframe,
timestamp=action.timestamp,
optimal_action=optimal_action,
confidence_should_have_been=optimal_confidence,
market_state_before=initial_state,
market_state_after=final_state,
actual_outcome=reward
)
return perfect_move
3. RL Reward Calculation
def calculate_reward(action, price_change, confidence):
base_reward = 0.0
# Reward based on action correctness
if action == 'BUY' and price_change > 0:
base_reward = price_change * 10 # Reward proportional to gain
elif action == 'SELL' and price_change < 0:
base_reward = abs(price_change) * 10 # Reward for avoiding loss
elif action == 'HOLD':
if abs(price_change) < 0.005: # Correct hold
base_reward = 0.01
else: # Missed opportunity
base_reward = -0.01
else:
base_reward = -abs(price_change) * 5 # Penalty for wrong actions
# Scale by confidence
confidence_multiplier = 0.5 + confidence # 0.5 to 1.5 range
return base_reward * confidence_multiplier
Configuration and Deployment
1. Running the System
# Basic trading mode
python enhanced_trading_main.py --mode trade
# Training only mode
python enhanced_trading_main.py --mode train
# Fresh start without loading existing models
python enhanced_trading_main.py --mode trade --no-load-models
# Custom configuration
python enhanced_trading_main.py --config custom_config.yaml
2. Key Configuration Parameters
# Enhanced Orchestrator Settings
orchestrator:
confidence_threshold: 0.6 # Higher threshold for enhanced system
decision_frequency: 30 # Faster decisions (30 seconds)
# CNN Configuration
cnn:
timeframes: ["1m", "5m", "15m", "1h", "4h", "1d"]
confidence_threshold: 0.6
model_dir: "models/enhanced_cnn"
# RL Configuration
rl:
hidden_size: 256
buffer_size: 10000
model_dir: "models/enhanced_rl"
market_regime_weights:
trending: 1.2
ranging: 0.8
volatile: 0.6
3. Memory Management
The system is designed to work within 8GB memory constraints:
- Total system limit: 8GB
- Per-model limit: 2GB
- Automatic memory cleanup every 30 minutes
- GPU memory management with dynamic allocation
4. Monitoring and Logging
- Comprehensive logging with component-specific levels
- TensorBoard integration for training visualization
- Performance metrics tracking
- Memory usage monitoring
- Real-time decision logging with full reasoning
Performance Characteristics
Expected Behavior:
- Decision Frequency: 30-second intervals between decisions
- CNN Training: Every 6 hours when sufficient perfect moves available
- RL Training: Continuous learning every hour
- Memory Usage: <8GB total system usage
- Confidence Thresholds: 0.6+ for trading actions
Key Metrics:
- Decision Accuracy: Tracked via RL reward system
- Confidence Calibration: CNN confidence vs actual outcomes
- Symbol Correlation: ETH-BTC coordination effectiveness
- Training Progress: Loss curves and validation accuracy
- Market Adaptation: Performance across different regimes
Future Enhancements
- Additional Symbols: Easy extension to support more trading pairs
- Advanced Features: Sentiment analysis, news integration
- Risk Management: Portfolio-level risk optimization
- Backtesting: Historical performance evaluation
- Live Trading: Real exchange integration
- Model Ensembles: Multiple CNN/RL model combinations
This architecture provides a robust foundation for sophisticated algorithmic trading with continuous learning and adaptation capabilities.