gogo2/ENHANCED_ARCHITECTURE_GUIDE.md
Dobromir Popov 2f50ed920f new overhaul
2025-05-24 11:00:40 +03:00

12 KiB

Enhanced Multi-Modal Trading Architecture Guide

Overview

This document describes the enhanced multi-modal trading system that implements sophisticated decision-making through coordinated CNN and RL modules. The system is designed to handle multi-timeframe analysis across multiple symbols (ETH, BTC) with continuous learning capabilities.

Architecture Components

1. Enhanced Trading Orchestrator (core/enhanced_orchestrator.py)

The heart of the system that coordinates all components:

Key Features:

  • Multi-Symbol Coordination: Makes decisions across ETH and BTC considering correlations
  • Timeframe Integration: Combines predictions from multiple timeframes (1m, 5m, 15m, 1h, 4h, 1d)
  • Perfect Move Marking: Identifies and marks optimal trading decisions for CNN training
  • RL Evaluation Loop: Evaluates trading outcomes to train RL agents

Data Structures:

@dataclass
class TimeframePrediction:
    timeframe: str
    action: str  # 'BUY', 'SELL', 'HOLD'
    confidence: float  # 0.0 to 1.0
    probabilities: Dict[str, float]
    timestamp: datetime
    market_features: Dict[str, float]

@dataclass
class TradingAction:
    symbol: str
    action: str
    quantity: float
    confidence: float
    price: float
    timestamp: datetime
    reasoning: Dict[str, Any]
    timeframe_analysis: List[TimeframePrediction]

Decision Making Process:

  1. Gather market states for all symbols and timeframes
  2. Get CNN predictions for each timeframe with confidence scores
  3. Combine timeframe predictions using weighted averaging
  4. Consider symbol correlations (ETH-BTC correlation ~0.85)
  5. Apply confidence thresholds and risk management
  6. Generate coordinated trading decisions
  7. Queue actions for RL evaluation

2. Enhanced CNN Trainer (training/enhanced_cnn_trainer.py)

Implements supervised learning on marked perfect moves:

Key Features:

  • Perfect Move Dataset: Trains on historically optimal decisions
  • Timeframe-Specific Heads: Separate prediction heads for each timeframe
  • Confidence Prediction: Predicts both action and confidence simultaneously
  • Multi-Loss Training: Combines action classification and confidence regression

Network Architecture:

# Convolutional feature extraction
Conv1D(features=5, filters=64, kernel=3) -> BatchNorm -> ReLU -> Dropout
Conv1D(filters=128, kernel=3) -> BatchNorm -> ReLU -> Dropout  
Conv1D(filters=256, kernel=3) -> BatchNorm -> ReLU -> Dropout
AdaptiveAvgPool1d(1)  # Global average pooling

# Timeframe-specific heads
for each timeframe:
    Linear(256 -> 128) -> ReLU -> Dropout
    Linear(128 -> 64) -> ReLU -> Dropout
    
    # Action prediction
    Linear(64 -> 3)  # BUY, HOLD, SELL
    
    # Confidence prediction  
    Linear(64 -> 32) -> ReLU -> Linear(32 -> 1) -> Sigmoid

Training Process:

  1. Collect perfect moves from orchestrator with known outcomes
  2. Create dataset with features, optimal actions, and target confidence
  3. Train with combined loss: action_loss + 0.5 * confidence_loss
  4. Use early stopping and model checkpointing
  5. Generate comprehensive training reports and visualizations

3. Enhanced RL Trainer (training/enhanced_rl_trainer.py)

Implements continuous learning from trading evaluations:

Key Features:

  • Prioritized Experience Replay: Learns from important experiences first
  • Market Regime Adaptation: Adjusts confidence based on market conditions
  • Multi-Symbol Agents: Separate RL agents for each trading symbol
  • Double DQN Architecture: Reduces overestimation bias

Agent Architecture:

# Main Network
Linear(state_size -> 256) -> ReLU -> Dropout
Linear(256 -> 256) -> ReLU -> Dropout
Linear(256 -> 128) -> ReLU -> Dropout

# Dueling heads
value_head = Linear(128 -> 1)
advantage_head = Linear(128 -> action_space)

# Q-values = V(s) + A(s,a) - mean(A(s,a))

Learning Process:

  1. Store trading experiences with TD-error priorities
  2. Sample batches using prioritized replay
  3. Train with Double DQN to reduce overestimation
  4. Update target networks periodically
  5. Adapt exploration (epsilon) based on market regime stability

4. Market State and Feature Engineering

Market State Components:

@dataclass
class MarketState:
    symbol: str
    timestamp: datetime
    prices: Dict[str, float]  # {timeframe: price}
    features: Dict[str, np.ndarray]  # {timeframe: feature_matrix}
    volatility: float
    volume: float
    trend_strength: float
    market_regime: str  # 'trending', 'ranging', 'volatile'

Feature Engineering:

  • OHLCV Data: Open, High, Low, Close, Volume for each timeframe
  • Technical Indicators: RSI, MACD, Bollinger Bands, etc.
  • Market Regime Detection: Automatic classification of market conditions
  • Volatility Analysis: Real-time volatility calculations
  • Volume Analysis: Volume ratio compared to historical averages

System Workflow

1. Initialization Phase

# Load configuration
config = get_config('config.yaml')

# Initialize components
data_provider = DataProvider(config)
orchestrator = EnhancedTradingOrchestrator(data_provider)
cnn_trainer = EnhancedCNNTrainer(config, orchestrator)
rl_trainer = EnhancedRLTrainer(config, orchestrator)

# Load existing models or create new ones
models = initialize_models(load_existing=True)
register_models_with_orchestrator(models)

2. Trading Loop

while running:
    # 1. Gather market data for all symbols and timeframes
    market_states = await get_all_market_states()
    
    # 2. Generate CNN predictions for each timeframe
    for symbol in symbols:
        for timeframe in timeframes:
            prediction = cnn_model.predict_timeframe(features, timeframe)
            
    # 3. Combine timeframe predictions with weights
    combined_prediction = combine_timeframe_predictions(predictions)
    
    # 4. Consider symbol correlations
    coordinated_decision = coordinate_symbols(predictions, correlations)
    
    # 5. Apply confidence thresholds and risk management
    final_decision = apply_risk_management(coordinated_decision)
    
    # 6. Execute trades (or log decisions)
    execute_trading_decision(final_decision)
    
    # 7. Queue for RL evaluation
    queue_for_rl_evaluation(final_decision, market_state)

3. Continuous Learning Loop

# RL Learning (every hour)
async def rl_learning_loop():
    while running:
        # Evaluate past trading actions
        await evaluate_trading_outcomes()
        
        # Train RL agents on new experiences
        for symbol, agent in rl_agents.items():
            agent.replay()  # Learn from prioritized experiences
            
        # Adapt to market regime changes
        adapt_to_market_conditions()
        
        await asyncio.sleep(3600)  # Wait 1 hour

# CNN Learning (every 6 hours)  
async def cnn_learning_loop():
    while running:
        # Check for sufficient perfect moves
        perfect_moves = get_perfect_moves_for_training()
        
        if len(perfect_moves) >= 200:
            # Train CNN on perfect moves
            training_report = train_cnn_on_perfect_moves(perfect_moves)
            
            # Update registered model
            update_model_registry(trained_model)
            
        await asyncio.sleep(6 * 3600)  # Wait 6 hours

Key Algorithms

1. Timeframe Prediction Combination

def combine_timeframe_predictions(timeframe_predictions, symbol):
    action_scores = {'BUY': 0.0, 'SELL': 0.0, 'HOLD': 0.0}
    total_weight = 0.0
    
    timeframe_weights = {
        '1m': 0.05, '5m': 0.10, '15m': 0.15,
        '1h': 0.25, '4h': 0.25, '1d': 0.20
    }
    
    for pred in timeframe_predictions:
        weight = timeframe_weights[pred.timeframe] * pred.confidence
        action_scores[pred.action] += weight
        total_weight += weight
    
    # Normalize and select best action
    best_action = max(action_scores, key=action_scores.get)
    confidence = action_scores[best_action] / total_weight
    
    return best_action, confidence

2. Perfect Move Marking

def mark_perfect_move(action, initial_state, final_state, reward):
    # Determine optimal action based on outcome
    if reward > 0.02:  # Significant positive outcome
        optimal_action = action.action  # Action was correct
        optimal_confidence = min(0.95, abs(reward) * 10)
    elif reward < -0.02:  # Significant negative outcome
        optimal_action = opposite_action(action.action)  # Should have done opposite
        optimal_confidence = min(0.95, abs(reward) * 10)
    else:  # Neutral outcome
        optimal_action = 'HOLD'  # Should have held
        optimal_confidence = 0.3
    
    # Create perfect move for CNN training
    perfect_move = PerfectMove(
        symbol=action.symbol,
        timeframe=timeframe,
        timestamp=action.timestamp,
        optimal_action=optimal_action,
        confidence_should_have_been=optimal_confidence,
        market_state_before=initial_state,
        market_state_after=final_state,
        actual_outcome=reward
    )
    
    return perfect_move

3. RL Reward Calculation

def calculate_reward(action, price_change, confidence):
    base_reward = 0.0
    
    # Reward based on action correctness
    if action == 'BUY' and price_change > 0:
        base_reward = price_change * 10  # Reward proportional to gain
    elif action == 'SELL' and price_change < 0:
        base_reward = abs(price_change) * 10  # Reward for avoiding loss
    elif action == 'HOLD':
        if abs(price_change) < 0.005:  # Correct hold
            base_reward = 0.01
        else:  # Missed opportunity
            base_reward = -0.01
    else:
        base_reward = -abs(price_change) * 5  # Penalty for wrong actions
    
    # Scale by confidence
    confidence_multiplier = 0.5 + confidence  # 0.5 to 1.5 range
    return base_reward * confidence_multiplier

Configuration and Deployment

1. Running the System

# Basic trading mode
python enhanced_trading_main.py --mode trade

# Training only mode  
python enhanced_trading_main.py --mode train

# Fresh start without loading existing models
python enhanced_trading_main.py --mode trade --no-load-models

# Custom configuration
python enhanced_trading_main.py --config custom_config.yaml

2. Key Configuration Parameters

# Enhanced Orchestrator Settings
orchestrator:
  confidence_threshold: 0.6    # Higher threshold for enhanced system
  decision_frequency: 30       # Faster decisions (30 seconds)
  
# CNN Configuration
cnn:
  timeframes: ["1m", "5m", "15m", "1h", "4h", "1d"]
  confidence_threshold: 0.6
  model_dir: "models/enhanced_cnn"
  
# RL Configuration
rl:
  hidden_size: 256
  buffer_size: 10000
  model_dir: "models/enhanced_rl"
  market_regime_weights:
    trending: 1.2
    ranging: 0.8
    volatile: 0.6

3. Memory Management

The system is designed to work within 8GB memory constraints:

  • Total system limit: 8GB
  • Per-model limit: 2GB
  • Automatic memory cleanup every 30 minutes
  • GPU memory management with dynamic allocation

4. Monitoring and Logging

  • Comprehensive logging with component-specific levels
  • TensorBoard integration for training visualization
  • Performance metrics tracking
  • Memory usage monitoring
  • Real-time decision logging with full reasoning

Performance Characteristics

Expected Behavior:

  1. Decision Frequency: 30-second intervals between decisions
  2. CNN Training: Every 6 hours when sufficient perfect moves available
  3. RL Training: Continuous learning every hour
  4. Memory Usage: <8GB total system usage
  5. Confidence Thresholds: 0.6+ for trading actions

Key Metrics:

  • Decision Accuracy: Tracked via RL reward system
  • Confidence Calibration: CNN confidence vs actual outcomes
  • Symbol Correlation: ETH-BTC coordination effectiveness
  • Training Progress: Loss curves and validation accuracy
  • Market Adaptation: Performance across different regimes

Future Enhancements

  1. Additional Symbols: Easy extension to support more trading pairs
  2. Advanced Features: Sentiment analysis, news integration
  3. Risk Management: Portfolio-level risk optimization
  4. Backtesting: Historical performance evaluation
  5. Live Trading: Real exchange integration
  6. Model Ensembles: Multiple CNN/RL model combinations

This architecture provides a robust foundation for sophisticated algorithmic trading with continuous learning and adaptation capabilities.