new overhaul

2025-05-24 11:00:40 +03:00
parent b5ad023b16
commit 2f50ed920f
9 changed files with 2998 additions and 36 deletions
--- a/ENHANCED_ARCHITECTURE_GUIDE.md
+++ b/ENHANCED_ARCHITECTURE_GUIDE.md
@@ -0,0 +1,377 @@
+# Enhanced Multi-Modal Trading Architecture Guide
+
+## Overview
+
+This document describes the enhanced multi-modal trading system that implements sophisticated decision-making through coordinated CNN and RL modules. The system is designed to handle multi-timeframe analysis across multiple symbols (ETH, BTC) with continuous learning capabilities.
+
+## Architecture Components
+
+### 1. Enhanced Trading Orchestrator (`core/enhanced_orchestrator.py`)
+
+The heart of the system that coordinates all components:
+
+**Key Features:**
+- **Multi-Symbol Coordination**: Makes decisions across ETH and BTC considering correlations
+- **Timeframe Integration**: Combines predictions from multiple timeframes (1m, 5m, 15m, 1h, 4h, 1d)
+- **Perfect Move Marking**: Identifies and marks optimal trading decisions for CNN training
+- **RL Evaluation Loop**: Evaluates trading outcomes to train RL agents
+
+**Data Structures:**
+```python
+@dataclass
+class TimeframePrediction:
+    timeframe: str
+    action: str  # 'BUY', 'SELL', 'HOLD'
+    confidence: float  # 0.0 to 1.0
+    probabilities: Dict[str, float]
+    timestamp: datetime
+    market_features: Dict[str, float]
+
+@dataclass
+class TradingAction:
+    symbol: str
+    action: str
+    quantity: float
+    confidence: float
+    price: float
+    timestamp: datetime
+    reasoning: Dict[str, Any]
+    timeframe_analysis: List[TimeframePrediction]
+```
+
+**Decision Making Process:**
+1. Gather market states for all symbols and timeframes
+2. Get CNN predictions for each timeframe with confidence scores
+3. Combine timeframe predictions using weighted averaging
+4. Consider symbol correlations (ETH-BTC correlation ~0.85)
+5. Apply confidence thresholds and risk management
+6. Generate coordinated trading decisions
+7. Queue actions for RL evaluation
+
+### 2. Enhanced CNN Trainer (`training/enhanced_cnn_trainer.py`)
+
+Implements supervised learning on marked perfect moves:
+
+**Key Features:**
+- **Perfect Move Dataset**: Trains on historically optimal decisions
+- **Timeframe-Specific Heads**: Separate prediction heads for each timeframe
+- **Confidence Prediction**: Predicts both action and confidence simultaneously
+- **Multi-Loss Training**: Combines action classification and confidence regression
+
+**Network Architecture:**
+```python
+# Convolutional feature extraction
+Conv1D(features=5, filters=64, kernel=3) -> BatchNorm -> ReLU -> Dropout
+Conv1D(filters=128, kernel=3) -> BatchNorm -> ReLU -> Dropout  
+Conv1D(filters=256, kernel=3) -> BatchNorm -> ReLU -> Dropout
+AdaptiveAvgPool1d(1)  # Global average pooling
+
+# Timeframe-specific heads
+for each timeframe:
+    Linear(256 -> 128) -> ReLU -> Dropout
+    Linear(128 -> 64) -> ReLU -> Dropout
+    
+    # Action prediction
+    Linear(64 -> 3)  # BUY, HOLD, SELL
+    
+    # Confidence prediction  
+    Linear(64 -> 32) -> ReLU -> Linear(32 -> 1) -> Sigmoid
+```
+
+**Training Process:**
+1. Collect perfect moves from orchestrator with known outcomes
+2. Create dataset with features, optimal actions, and target confidence
+3. Train with combined loss: `action_loss + 0.5 * confidence_loss`
+4. Use early stopping and model checkpointing
+5. Generate comprehensive training reports and visualizations
+
+### 3. Enhanced RL Trainer (`training/enhanced_rl_trainer.py`)
+
+Implements continuous learning from trading evaluations:
+
+**Key Features:**
+- **Prioritized Experience Replay**: Learns from important experiences first
+- **Market Regime Adaptation**: Adjusts confidence based on market conditions
+- **Multi-Symbol Agents**: Separate RL agents for each trading symbol
+- **Double DQN Architecture**: Reduces overestimation bias
+
+**Agent Architecture:**
+```python
+# Main Network
+Linear(state_size -> 256) -> ReLU -> Dropout
+Linear(256 -> 256) -> ReLU -> Dropout
+Linear(256 -> 128) -> ReLU -> Dropout
+
+# Dueling heads
+value_head = Linear(128 -> 1)
+advantage_head = Linear(128 -> action_space)
+
+# Q-values = V(s) + A(s,a) - mean(A(s,a))
+```
+
+**Learning Process:**
+1. Store trading experiences with TD-error priorities
+2. Sample batches using prioritized replay
+3. Train with Double DQN to reduce overestimation
+4. Update target networks periodically
+5. Adapt exploration (epsilon) based on market regime stability
+
+### 4. Market State and Feature Engineering
+
+**Market State Components:**
+```python
+@dataclass
+class MarketState:
+    symbol: str
+    timestamp: datetime
+    prices: Dict[str, float]  # {timeframe: price}
+    features: Dict[str, np.ndarray]  # {timeframe: feature_matrix}
+    volatility: float
+    volume: float
+    trend_strength: float
+    market_regime: str  # 'trending', 'ranging', 'volatile'
+```
+
+**Feature Engineering:**
+- **OHLCV Data**: Open, High, Low, Close, Volume for each timeframe
+- **Technical Indicators**: RSI, MACD, Bollinger Bands, etc.
+- **Market Regime Detection**: Automatic classification of market conditions
+- **Volatility Analysis**: Real-time volatility calculations
+- **Volume Analysis**: Volume ratio compared to historical averages
+
+## System Workflow
+
+### 1. Initialization Phase
+```python
+# Load configuration
+config = get_config('config.yaml')
+
+# Initialize components
+data_provider = DataProvider(config)
+orchestrator = EnhancedTradingOrchestrator(data_provider)
+cnn_trainer = EnhancedCNNTrainer(config, orchestrator)
+rl_trainer = EnhancedRLTrainer(config, orchestrator)
+
+# Load existing models or create new ones
+models = initialize_models(load_existing=True)
+register_models_with_orchestrator(models)
+```
+
+### 2. Trading Loop
+```python
+while running:
+    # 1. Gather market data for all symbols and timeframes
+    market_states = await get_all_market_states()
+    
+    # 2. Generate CNN predictions for each timeframe
+    for symbol in symbols:
+        for timeframe in timeframes:
+            prediction = cnn_model.predict_timeframe(features, timeframe)
+            
+    # 3. Combine timeframe predictions with weights
+    combined_prediction = combine_timeframe_predictions(predictions)
+    
+    # 4. Consider symbol correlations
+    coordinated_decision = coordinate_symbols(predictions, correlations)
+    
+    # 5. Apply confidence thresholds and risk management
+    final_decision = apply_risk_management(coordinated_decision)
+    
+    # 6. Execute trades (or log decisions)
+    execute_trading_decision(final_decision)
+    
+    # 7. Queue for RL evaluation
+    queue_for_rl_evaluation(final_decision, market_state)
+```
+
+### 3. Continuous Learning Loop
+```python
+# RL Learning (every hour)
+async def rl_learning_loop():
+    while running:
+        # Evaluate past trading actions
+        await evaluate_trading_outcomes()
+        
+        # Train RL agents on new experiences
+        for symbol, agent in rl_agents.items():
+            agent.replay()  # Learn from prioritized experiences
+            
+        # Adapt to market regime changes
+        adapt_to_market_conditions()
+        
+        await asyncio.sleep(3600)  # Wait 1 hour
+
+# CNN Learning (every 6 hours)  
+async def cnn_learning_loop():
+    while running:
+        # Check for sufficient perfect moves
+        perfect_moves = get_perfect_moves_for_training()
+        
+        if len(perfect_moves) >= 200:
+            # Train CNN on perfect moves
+            training_report = train_cnn_on_perfect_moves(perfect_moves)
+            
+            # Update registered model
+            update_model_registry(trained_model)
+            
+        await asyncio.sleep(6 * 3600)  # Wait 6 hours
+```
+
+## Key Algorithms
+
+### 1. Timeframe Prediction Combination
+```python
+def combine_timeframe_predictions(timeframe_predictions, symbol):
+    action_scores = {'BUY': 0.0, 'SELL': 0.0, 'HOLD': 0.0}
+    total_weight = 0.0
+    
+    timeframe_weights = {
+        '1m': 0.05, '5m': 0.10, '15m': 0.15,
+        '1h': 0.25, '4h': 0.25, '1d': 0.20
+    }
+    
+    for pred in timeframe_predictions:
+        weight = timeframe_weights[pred.timeframe] * pred.confidence
+        action_scores[pred.action] += weight
+        total_weight += weight
+    
+    # Normalize and select best action
+    best_action = max(action_scores, key=action_scores.get)
+    confidence = action_scores[best_action] / total_weight
+    
+    return best_action, confidence
+```
+
+### 2. Perfect Move Marking
+```python
+def mark_perfect_move(action, initial_state, final_state, reward):
+    # Determine optimal action based on outcome
+    if reward > 0.02:  # Significant positive outcome
+        optimal_action = action.action  # Action was correct
+        optimal_confidence = min(0.95, abs(reward) * 10)
+    elif reward < -0.02:  # Significant negative outcome
+        optimal_action = opposite_action(action.action)  # Should have done opposite
+        optimal_confidence = min(0.95, abs(reward) * 10)
+    else:  # Neutral outcome
+        optimal_action = 'HOLD'  # Should have held
+        optimal_confidence = 0.3
+    
+    # Create perfect move for CNN training
+    perfect_move = PerfectMove(
+        symbol=action.symbol,
+        timeframe=timeframe,
+        timestamp=action.timestamp,
+        optimal_action=optimal_action,
+        confidence_should_have_been=optimal_confidence,
+        market_state_before=initial_state,
+        market_state_after=final_state,
+        actual_outcome=reward
+    )
+    
+    return perfect_move
+```
+
+### 3. RL Reward Calculation
+```python
+def calculate_reward(action, price_change, confidence):
+    base_reward = 0.0
+    
+    # Reward based on action correctness
+    if action == 'BUY' and price_change > 0:
+        base_reward = price_change * 10  # Reward proportional to gain
+    elif action == 'SELL' and price_change < 0:
+        base_reward = abs(price_change) * 10  # Reward for avoiding loss
+    elif action == 'HOLD':
+        if abs(price_change) < 0.005:  # Correct hold
+            base_reward = 0.01
+        else:  # Missed opportunity
+            base_reward = -0.01
+    else:
+        base_reward = -abs(price_change) * 5  # Penalty for wrong actions
+    
+    # Scale by confidence
+    confidence_multiplier = 0.5 + confidence  # 0.5 to 1.5 range
+    return base_reward * confidence_multiplier
+```
+
+## Configuration and Deployment
+
+### 1. Running the System
+```bash
+# Basic trading mode
+python enhanced_trading_main.py --mode trade
+
+# Training only mode  
+python enhanced_trading_main.py --mode train
+
+# Fresh start without loading existing models
+python enhanced_trading_main.py --mode trade --no-load-models
+
+# Custom configuration
+python enhanced_trading_main.py --config custom_config.yaml
+```
+
+### 2. Key Configuration Parameters
+```yaml
+# Enhanced Orchestrator Settings
+orchestrator:
+  confidence_threshold: 0.6    # Higher threshold for enhanced system
+  decision_frequency: 30       # Faster decisions (30 seconds)
+  
+# CNN Configuration
+cnn:
+  timeframes: ["1m", "5m", "15m", "1h", "4h", "1d"]
+  confidence_threshold: 0.6
+  model_dir: "models/enhanced_cnn"
+  
+# RL Configuration
+rl:
+  hidden_size: 256
+  buffer_size: 10000
+  model_dir: "models/enhanced_rl"
+  market_regime_weights:
+    trending: 1.2
+    ranging: 0.8
+    volatile: 0.6
+```
+
+### 3. Memory Management
+The system is designed to work within 8GB memory constraints:
+- Total system limit: 8GB
+- Per-model limit: 2GB
+- Automatic memory cleanup every 30 minutes
+- GPU memory management with dynamic allocation
+
+### 4. Monitoring and Logging
+- Comprehensive logging with component-specific levels
+- TensorBoard integration for training visualization
+- Performance metrics tracking
+- Memory usage monitoring
+- Real-time decision logging with full reasoning
+
+## Performance Characteristics
+
+### Expected Behavior:
+1. **Decision Frequency**: 30-second intervals between decisions
+2. **CNN Training**: Every 6 hours when sufficient perfect moves available
+3. **RL Training**: Continuous learning every hour
+4. **Memory Usage**: <8GB total system usage
+5. **Confidence Thresholds**: 0.6+ for trading actions
+
+### Key Metrics:
+- **Decision Accuracy**: Tracked via RL reward system
+- **Confidence Calibration**: CNN confidence vs actual outcomes
+- **Symbol Correlation**: ETH-BTC coordination effectiveness
+- **Training Progress**: Loss curves and validation accuracy
+- **Market Adaptation**: Performance across different regimes
+
+## Future Enhancements
+
+1. **Additional Symbols**: Easy extension to support more trading pairs
+2. **Advanced Features**: Sentiment analysis, news integration
+3. **Risk Management**: Portfolio-level risk optimization
+4. **Backtesting**: Historical performance evaluation
+5. **Live Trading**: Real exchange integration
+6. **Model Ensembles**: Multiple CNN/RL model combinations
+
+This architecture provides a robust foundation for sophisticated algorithmic trading with continuous learning and adaptation capabilities.