gogo2/ENHANCED_ARCHITECTURE_GUIDE.md
Dobromir Popov 2f50ed920f new overhaul
2025-05-24 11:00:40 +03:00

377 lines
12 KiB
Markdown

# Enhanced Multi-Modal Trading Architecture Guide
## Overview
This document describes the enhanced multi-modal trading system that implements sophisticated decision-making through coordinated CNN and RL modules. The system is designed to handle multi-timeframe analysis across multiple symbols (ETH, BTC) with continuous learning capabilities.
## Architecture Components
### 1. Enhanced Trading Orchestrator (`core/enhanced_orchestrator.py`)
The heart of the system that coordinates all components:
**Key Features:**
- **Multi-Symbol Coordination**: Makes decisions across ETH and BTC considering correlations
- **Timeframe Integration**: Combines predictions from multiple timeframes (1m, 5m, 15m, 1h, 4h, 1d)
- **Perfect Move Marking**: Identifies and marks optimal trading decisions for CNN training
- **RL Evaluation Loop**: Evaluates trading outcomes to train RL agents
**Data Structures:**
```python
@dataclass
class TimeframePrediction:
timeframe: str
action: str # 'BUY', 'SELL', 'HOLD'
confidence: float # 0.0 to 1.0
probabilities: Dict[str, float]
timestamp: datetime
market_features: Dict[str, float]
@dataclass
class TradingAction:
symbol: str
action: str
quantity: float
confidence: float
price: float
timestamp: datetime
reasoning: Dict[str, Any]
timeframe_analysis: List[TimeframePrediction]
```
**Decision Making Process:**
1. Gather market states for all symbols and timeframes
2. Get CNN predictions for each timeframe with confidence scores
3. Combine timeframe predictions using weighted averaging
4. Consider symbol correlations (ETH-BTC correlation ~0.85)
5. Apply confidence thresholds and risk management
6. Generate coordinated trading decisions
7. Queue actions for RL evaluation
### 2. Enhanced CNN Trainer (`training/enhanced_cnn_trainer.py`)
Implements supervised learning on marked perfect moves:
**Key Features:**
- **Perfect Move Dataset**: Trains on historically optimal decisions
- **Timeframe-Specific Heads**: Separate prediction heads for each timeframe
- **Confidence Prediction**: Predicts both action and confidence simultaneously
- **Multi-Loss Training**: Combines action classification and confidence regression
**Network Architecture:**
```python
# Convolutional feature extraction
Conv1D(features=5, filters=64, kernel=3) -> BatchNorm -> ReLU -> Dropout
Conv1D(filters=128, kernel=3) -> BatchNorm -> ReLU -> Dropout
Conv1D(filters=256, kernel=3) -> BatchNorm -> ReLU -> Dropout
AdaptiveAvgPool1d(1) # Global average pooling
# Timeframe-specific heads
for each timeframe:
Linear(256 -> 128) -> ReLU -> Dropout
Linear(128 -> 64) -> ReLU -> Dropout
# Action prediction
Linear(64 -> 3) # BUY, HOLD, SELL
# Confidence prediction
Linear(64 -> 32) -> ReLU -> Linear(32 -> 1) -> Sigmoid
```
**Training Process:**
1. Collect perfect moves from orchestrator with known outcomes
2. Create dataset with features, optimal actions, and target confidence
3. Train with combined loss: `action_loss + 0.5 * confidence_loss`
4. Use early stopping and model checkpointing
5. Generate comprehensive training reports and visualizations
### 3. Enhanced RL Trainer (`training/enhanced_rl_trainer.py`)
Implements continuous learning from trading evaluations:
**Key Features:**
- **Prioritized Experience Replay**: Learns from important experiences first
- **Market Regime Adaptation**: Adjusts confidence based on market conditions
- **Multi-Symbol Agents**: Separate RL agents for each trading symbol
- **Double DQN Architecture**: Reduces overestimation bias
**Agent Architecture:**
```python
# Main Network
Linear(state_size -> 256) -> ReLU -> Dropout
Linear(256 -> 256) -> ReLU -> Dropout
Linear(256 -> 128) -> ReLU -> Dropout
# Dueling heads
value_head = Linear(128 -> 1)
advantage_head = Linear(128 -> action_space)
# Q-values = V(s) + A(s,a) - mean(A(s,a))
```
**Learning Process:**
1. Store trading experiences with TD-error priorities
2. Sample batches using prioritized replay
3. Train with Double DQN to reduce overestimation
4. Update target networks periodically
5. Adapt exploration (epsilon) based on market regime stability
### 4. Market State and Feature Engineering
**Market State Components:**
```python
@dataclass
class MarketState:
symbol: str
timestamp: datetime
prices: Dict[str, float] # {timeframe: price}
features: Dict[str, np.ndarray] # {timeframe: feature_matrix}
volatility: float
volume: float
trend_strength: float
market_regime: str # 'trending', 'ranging', 'volatile'
```
**Feature Engineering:**
- **OHLCV Data**: Open, High, Low, Close, Volume for each timeframe
- **Technical Indicators**: RSI, MACD, Bollinger Bands, etc.
- **Market Regime Detection**: Automatic classification of market conditions
- **Volatility Analysis**: Real-time volatility calculations
- **Volume Analysis**: Volume ratio compared to historical averages
## System Workflow
### 1. Initialization Phase
```python
# Load configuration
config = get_config('config.yaml')
# Initialize components
data_provider = DataProvider(config)
orchestrator = EnhancedTradingOrchestrator(data_provider)
cnn_trainer = EnhancedCNNTrainer(config, orchestrator)
rl_trainer = EnhancedRLTrainer(config, orchestrator)
# Load existing models or create new ones
models = initialize_models(load_existing=True)
register_models_with_orchestrator(models)
```
### 2. Trading Loop
```python
while running:
# 1. Gather market data for all symbols and timeframes
market_states = await get_all_market_states()
# 2. Generate CNN predictions for each timeframe
for symbol in symbols:
for timeframe in timeframes:
prediction = cnn_model.predict_timeframe(features, timeframe)
# 3. Combine timeframe predictions with weights
combined_prediction = combine_timeframe_predictions(predictions)
# 4. Consider symbol correlations
coordinated_decision = coordinate_symbols(predictions, correlations)
# 5. Apply confidence thresholds and risk management
final_decision = apply_risk_management(coordinated_decision)
# 6. Execute trades (or log decisions)
execute_trading_decision(final_decision)
# 7. Queue for RL evaluation
queue_for_rl_evaluation(final_decision, market_state)
```
### 3. Continuous Learning Loop
```python
# RL Learning (every hour)
async def rl_learning_loop():
while running:
# Evaluate past trading actions
await evaluate_trading_outcomes()
# Train RL agents on new experiences
for symbol, agent in rl_agents.items():
agent.replay() # Learn from prioritized experiences
# Adapt to market regime changes
adapt_to_market_conditions()
await asyncio.sleep(3600) # Wait 1 hour
# CNN Learning (every 6 hours)
async def cnn_learning_loop():
while running:
# Check for sufficient perfect moves
perfect_moves = get_perfect_moves_for_training()
if len(perfect_moves) >= 200:
# Train CNN on perfect moves
training_report = train_cnn_on_perfect_moves(perfect_moves)
# Update registered model
update_model_registry(trained_model)
await asyncio.sleep(6 * 3600) # Wait 6 hours
```
## Key Algorithms
### 1. Timeframe Prediction Combination
```python
def combine_timeframe_predictions(timeframe_predictions, symbol):
action_scores = {'BUY': 0.0, 'SELL': 0.0, 'HOLD': 0.0}
total_weight = 0.0
timeframe_weights = {
'1m': 0.05, '5m': 0.10, '15m': 0.15,
'1h': 0.25, '4h': 0.25, '1d': 0.20
}
for pred in timeframe_predictions:
weight = timeframe_weights[pred.timeframe] * pred.confidence
action_scores[pred.action] += weight
total_weight += weight
# Normalize and select best action
best_action = max(action_scores, key=action_scores.get)
confidence = action_scores[best_action] / total_weight
return best_action, confidence
```
### 2. Perfect Move Marking
```python
def mark_perfect_move(action, initial_state, final_state, reward):
# Determine optimal action based on outcome
if reward > 0.02: # Significant positive outcome
optimal_action = action.action # Action was correct
optimal_confidence = min(0.95, abs(reward) * 10)
elif reward < -0.02: # Significant negative outcome
optimal_action = opposite_action(action.action) # Should have done opposite
optimal_confidence = min(0.95, abs(reward) * 10)
else: # Neutral outcome
optimal_action = 'HOLD' # Should have held
optimal_confidence = 0.3
# Create perfect move for CNN training
perfect_move = PerfectMove(
symbol=action.symbol,
timeframe=timeframe,
timestamp=action.timestamp,
optimal_action=optimal_action,
confidence_should_have_been=optimal_confidence,
market_state_before=initial_state,
market_state_after=final_state,
actual_outcome=reward
)
return perfect_move
```
### 3. RL Reward Calculation
```python
def calculate_reward(action, price_change, confidence):
base_reward = 0.0
# Reward based on action correctness
if action == 'BUY' and price_change > 0:
base_reward = price_change * 10 # Reward proportional to gain
elif action == 'SELL' and price_change < 0:
base_reward = abs(price_change) * 10 # Reward for avoiding loss
elif action == 'HOLD':
if abs(price_change) < 0.005: # Correct hold
base_reward = 0.01
else: # Missed opportunity
base_reward = -0.01
else:
base_reward = -abs(price_change) * 5 # Penalty for wrong actions
# Scale by confidence
confidence_multiplier = 0.5 + confidence # 0.5 to 1.5 range
return base_reward * confidence_multiplier
```
## Configuration and Deployment
### 1. Running the System
```bash
# Basic trading mode
python enhanced_trading_main.py --mode trade
# Training only mode
python enhanced_trading_main.py --mode train
# Fresh start without loading existing models
python enhanced_trading_main.py --mode trade --no-load-models
# Custom configuration
python enhanced_trading_main.py --config custom_config.yaml
```
### 2. Key Configuration Parameters
```yaml
# Enhanced Orchestrator Settings
orchestrator:
confidence_threshold: 0.6 # Higher threshold for enhanced system
decision_frequency: 30 # Faster decisions (30 seconds)
# CNN Configuration
cnn:
timeframes: ["1m", "5m", "15m", "1h", "4h", "1d"]
confidence_threshold: 0.6
model_dir: "models/enhanced_cnn"
# RL Configuration
rl:
hidden_size: 256
buffer_size: 10000
model_dir: "models/enhanced_rl"
market_regime_weights:
trending: 1.2
ranging: 0.8
volatile: 0.6
```
### 3. Memory Management
The system is designed to work within 8GB memory constraints:
- Total system limit: 8GB
- Per-model limit: 2GB
- Automatic memory cleanup every 30 minutes
- GPU memory management with dynamic allocation
### 4. Monitoring and Logging
- Comprehensive logging with component-specific levels
- TensorBoard integration for training visualization
- Performance metrics tracking
- Memory usage monitoring
- Real-time decision logging with full reasoning
## Performance Characteristics
### Expected Behavior:
1. **Decision Frequency**: 30-second intervals between decisions
2. **CNN Training**: Every 6 hours when sufficient perfect moves available
3. **RL Training**: Continuous learning every hour
4. **Memory Usage**: <8GB total system usage
5. **Confidence Thresholds**: 0.6+ for trading actions
### Key Metrics:
- **Decision Accuracy**: Tracked via RL reward system
- **Confidence Calibration**: CNN confidence vs actual outcomes
- **Symbol Correlation**: ETH-BTC coordination effectiveness
- **Training Progress**: Loss curves and validation accuracy
- **Market Adaptation**: Performance across different regimes
## Future Enhancements
1. **Additional Symbols**: Easy extension to support more trading pairs
2. **Advanced Features**: Sentiment analysis, news integration
3. **Risk Management**: Portfolio-level risk optimization
4. **Backtesting**: Historical performance evaluation
5. **Live Trading**: Real exchange integration
6. **Model Ensembles**: Multiple CNN/RL model combinations
This architecture provides a robust foundation for sophisticated algorithmic trading with continuous learning and adaptation capabilities.