gogo2/ENHANCED_ARCHITECTURE_GUIDE.md at 2ba0406b9f673c5d83f75e0aa52adae1c99427e8

Dobromir Popov 2f50ed920f new overhaul

2025-05-24 11:00:40 +03:00

12 KiB

Raw Blame History

Overview

This document describes the enhanced multi-modal trading system that implements sophisticated decision-making through coordinated CNN and RL modules. The system is designed to handle multi-timeframe analysis across multiple symbols (ETH, BTC) with continuous learning capabilities.

Architecture Components

1. Enhanced Trading Orchestrator (`core/enhanced_orchestrator.py`)

The heart of the system that coordinates all components:

Key Features:

Multi-Symbol Coordination: Makes decisions across ETH and BTC considering correlations
Timeframe Integration: Combines predictions from multiple timeframes (1m, 5m, 15m, 1h, 4h, 1d)
Perfect Move Marking: Identifies and marks optimal trading decisions for CNN training
RL Evaluation Loop: Evaluates trading outcomes to train RL agents

Data Structures:

@dataclass
class TimeframePrediction:
    timeframe: str
    action: str  # 'BUY', 'SELL', 'HOLD'
    confidence: float  # 0.0 to 1.0
    probabilities: Dict[str, float]
    timestamp: datetime
    market_features: Dict[str, float]

@dataclass
class TradingAction:
    symbol: str
    action: str
    quantity: float
    confidence: float
    price: float
    timestamp: datetime
    reasoning: Dict[str, Any]
    timeframe_analysis: List[TimeframePrediction]

Decision Making Process:

Gather market states for all symbols and timeframes
Get CNN predictions for each timeframe with confidence scores
Combine timeframe predictions using weighted averaging
Consider symbol correlations (ETH-BTC correlation ~0.85)
Apply confidence thresholds and risk management
Generate coordinated trading decisions
Queue actions for RL evaluation

2. Enhanced CNN Trainer (`training/enhanced_cnn_trainer.py`)

Implements supervised learning on marked perfect moves:

Key Features:

Perfect Move Dataset: Trains on historically optimal decisions
Timeframe-Specific Heads: Separate prediction heads for each timeframe
Confidence Prediction: Predicts both action and confidence simultaneously
Multi-Loss Training: Combines action classification and confidence regression

Network Architecture:

# Convolutional feature extraction
Conv1D(features=5, filters=64, kernel=3) -> BatchNorm -> ReLU -> Dropout
Conv1D(filters=128, kernel=3) -> BatchNorm -> ReLU -> Dropout  
Conv1D(filters=256, kernel=3) -> BatchNorm -> ReLU -> Dropout
AdaptiveAvgPool1d(1)  # Global average pooling

# Timeframe-specific heads
for each timeframe:
    Linear(256 -> 128) -> ReLU -> Dropout
    Linear(128 -> 64) -> ReLU -> Dropout
    
    # Action prediction
    Linear(64 -> 3)  # BUY, HOLD, SELL
    
    # Confidence prediction  
    Linear(64 -> 32) -> ReLU -> Linear(32 -> 1) -> Sigmoid

Training Process:

Collect perfect moves from orchestrator with known outcomes
Create dataset with features, optimal actions, and target confidence
Train with combined loss: action_loss + 0.5 * confidence_loss
Use early stopping and model checkpointing
Generate comprehensive training reports and visualizations

3. Enhanced RL Trainer (`training/enhanced_rl_trainer.py`)

Implements continuous learning from trading evaluations:

Key Features:

Prioritized Experience Replay: Learns from important experiences first
Market Regime Adaptation: Adjusts confidence based on market conditions
Multi-Symbol Agents: Separate RL agents for each trading symbol
Double DQN Architecture: Reduces overestimation bias

Agent Architecture:

# Main Network
Linear(state_size -> 256) -> ReLU -> Dropout
Linear(256 -> 256) -> ReLU -> Dropout
Linear(256 -> 128) -> ReLU -> Dropout

# Dueling heads
value_head = Linear(128 -> 1)
advantage_head = Linear(128 -> action_space)

# Q-values = V(s) + A(s,a) - mean(A(s,a))

Learning Process:

Store trading experiences with TD-error priorities
Sample batches using prioritized replay
Train with Double DQN to reduce overestimation
Update target networks periodically
Adapt exploration (epsilon) based on market regime stability

4. Market State and Feature Engineering

Market State Components:

@dataclass
class MarketState:
    symbol: str
    timestamp: datetime
    prices: Dict[str, float]  # {timeframe: price}
    features: Dict[str, np.ndarray]  # {timeframe: feature_matrix}
    volatility: float
    volume: float
    trend_strength: float
    market_regime: str  # 'trending', 'ranging', 'volatile'

Feature Engineering:

OHLCV Data: Open, High, Low, Close, Volume for each timeframe
Technical Indicators: RSI, MACD, Bollinger Bands, etc.
Market Regime Detection: Automatic classification of market conditions
Volatility Analysis: Real-time volatility calculations
Volume Analysis: Volume ratio compared to historical averages

System Workflow

1. Initialization Phase

# Load configuration
config = get_config('config.yaml')

# Initialize components
data_provider = DataProvider(config)
orchestrator = EnhancedTradingOrchestrator(data_provider)
cnn_trainer = EnhancedCNNTrainer(config, orchestrator)
rl_trainer = EnhancedRLTrainer(config, orchestrator)

# Load existing models or create new ones
models = initialize_models(load_existing=True)
register_models_with_orchestrator(models)

2. Trading Loop

while running:
    # 1. Gather market data for all symbols and timeframes
    market_states = await get_all_market_states()
    
    # 2. Generate CNN predictions for each timeframe
    for symbol in symbols:
        for timeframe in timeframes:
            prediction = cnn_model.predict_timeframe(features, timeframe)
            
    # 3. Combine timeframe predictions with weights
    combined_prediction = combine_timeframe_predictions(predictions)
    
    # 4. Consider symbol correlations
    coordinated_decision = coordinate_symbols(predictions, correlations)
    
    # 5. Apply confidence thresholds and risk management
    final_decision = apply_risk_management(coordinated_decision)
    
    # 6. Execute trades (or log decisions)
    execute_trading_decision(final_decision)
    
    # 7. Queue for RL evaluation
    queue_for_rl_evaluation(final_decision, market_state)

3. Continuous Learning Loop

# RL Learning (every hour)
async def rl_learning_loop():
    while running:
        # Evaluate past trading actions
        await evaluate_trading_outcomes()
        
        # Train RL agents on new experiences
        for symbol, agent in rl_agents.items():
            agent.replay()  # Learn from prioritized experiences
            
        # Adapt to market regime changes
        adapt_to_market_conditions()
        
        await asyncio.sleep(3600)  # Wait 1 hour

# CNN Learning (every 6 hours)  
async def cnn_learning_loop():
    while running:
        # Check for sufficient perfect moves
        perfect_moves = get_perfect_moves_for_training()
        
        if len(perfect_moves) >= 200:
            # Train CNN on perfect moves
            training_report = train_cnn_on_perfect_moves(perfect_moves)
            
            # Update registered model
            update_model_registry(trained_model)
            
        await asyncio.sleep(6 * 3600)  # Wait 6 hours

Key Algorithms

1. Timeframe Prediction Combination

def combine_timeframe_predictions(timeframe_predictions, symbol):
    action_scores = {'BUY': 0.0, 'SELL': 0.0, 'HOLD': 0.0}
    total_weight = 0.0
    
    timeframe_weights = {
        '1m': 0.05, '5m': 0.10, '15m': 0.15,
        '1h': 0.25, '4h': 0.25, '1d': 0.20
    }
    
    for pred in timeframe_predictions:
        weight = timeframe_weights[pred.timeframe] * pred.confidence
        action_scores[pred.action] += weight
        total_weight += weight
    
    # Normalize and select best action
    best_action = max(action_scores, key=action_scores.get)
    confidence = action_scores[best_action] / total_weight
    
    return best_action, confidence

2. Perfect Move Marking

def mark_perfect_move(action, initial_state, final_state, reward):
    # Determine optimal action based on outcome
    if reward > 0.02:  # Significant positive outcome
        optimal_action = action.action  # Action was correct
        optimal_confidence = min(0.95, abs(reward) * 10)
    elif reward < -0.02:  # Significant negative outcome
        optimal_action = opposite_action(action.action)  # Should have done opposite
        optimal_confidence = min(0.95, abs(reward) * 10)
    else:  # Neutral outcome
        optimal_action = 'HOLD'  # Should have held
        optimal_confidence = 0.3
    
    # Create perfect move for CNN training
    perfect_move = PerfectMove(
        symbol=action.symbol,
        timeframe=timeframe,
        timestamp=action.timestamp,
        optimal_action=optimal_action,
        confidence_should_have_been=optimal_confidence,
        market_state_before=initial_state,
        market_state_after=final_state,
        actual_outcome=reward
    )
    
    return perfect_move

3. RL Reward Calculation

def calculate_reward(action, price_change, confidence):
    base_reward = 0.0
    
    # Reward based on action correctness
    if action == 'BUY' and price_change > 0:
        base_reward = price_change * 10  # Reward proportional to gain
    elif action == 'SELL' and price_change < 0:
        base_reward = abs(price_change) * 10  # Reward for avoiding loss
    elif action == 'HOLD':
        if abs(price_change) < 0.005:  # Correct hold
            base_reward = 0.01
        else:  # Missed opportunity
            base_reward = -0.01
    else:
        base_reward = -abs(price_change) * 5  # Penalty for wrong actions
    
    # Scale by confidence
    confidence_multiplier = 0.5 + confidence  # 0.5 to 1.5 range
    return base_reward * confidence_multiplier

Configuration and Deployment

1. Running the System

# Basic trading mode
python enhanced_trading_main.py --mode trade

# Training only mode  
python enhanced_trading_main.py --mode train

# Fresh start without loading existing models
python enhanced_trading_main.py --mode trade --no-load-models

# Custom configuration
python enhanced_trading_main.py --config custom_config.yaml

2. Key Configuration Parameters

# Enhanced Orchestrator Settings
orchestrator:
  confidence_threshold: 0.6    # Higher threshold for enhanced system
  decision_frequency: 30       # Faster decisions (30 seconds)
  
# CNN Configuration
cnn:
  timeframes: ["1m", "5m", "15m", "1h", "4h", "1d"]
  confidence_threshold: 0.6
  model_dir: "models/enhanced_cnn"
  
# RL Configuration
rl:
  hidden_size: 256
  buffer_size: 10000
  model_dir: "models/enhanced_rl"
  market_regime_weights:
    trending: 1.2
    ranging: 0.8
    volatile: 0.6

3. Memory Management

The system is designed to work within 8GB memory constraints:

Total system limit: 8GB
Per-model limit: 2GB
Automatic memory cleanup every 30 minutes
GPU memory management with dynamic allocation

4. Monitoring and Logging

Comprehensive logging with component-specific levels
TensorBoard integration for training visualization
Performance metrics tracking
Memory usage monitoring
Real-time decision logging with full reasoning

Performance Characteristics

Expected Behavior:

Decision Frequency: 30-second intervals between decisions
CNN Training: Every 6 hours when sufficient perfect moves available
RL Training: Continuous learning every hour
Memory Usage: <8GB total system usage
Confidence Thresholds: 0.6+ for trading actions

Key Metrics:

Decision Accuracy: Tracked via RL reward system
Confidence Calibration: CNN confidence vs actual outcomes
Symbol Correlation: ETH-BTC coordination effectiveness
Training Progress: Loss curves and validation accuracy
Market Adaptation: Performance across different regimes

Future Enhancements

Additional Symbols: Easy extension to support more trading pairs
Advanced Features: Sentiment analysis, news integration
Risk Management: Portfolio-level risk optimization
Backtesting: Historical performance evaluation
Live Trading: Real exchange integration
Model Ensembles: Multiple CNN/RL model combinations

This architecture provides a robust foundation for sophisticated algorithmic trading with continuous learning and adaptation capabilities.

12 KiB Raw Blame History

Enhanced Multi-Modal Trading Architecture Guide

Overview

Architecture Components

1. Enhanced Trading Orchestrator (core/enhanced_orchestrator.py)

2. Enhanced CNN Trainer (training/enhanced_cnn_trainer.py)

3. Enhanced RL Trainer (training/enhanced_rl_trainer.py)

4. Market State and Feature Engineering

System Workflow

1. Initialization Phase

2. Trading Loop

3. Continuous Learning Loop

Key Algorithms

1. Timeframe Prediction Combination

2. Perfect Move Marking

3. RL Reward Calculation

Configuration and Deployment

1. Running the System

2. Key Configuration Parameters

3. Memory Management

4. Monitoring and Logging

Performance Characteristics

Expected Behavior:

Key Metrics:

Future Enhancements

12 KiB

Raw Blame History

1. Enhanced Trading Orchestrator (`core/enhanced_orchestrator.py`)

2. Enhanced CNN Trainer (`training/enhanced_cnn_trainer.py`)

3. Enhanced RL Trainer (`training/enhanced_rl_trainer.py`)