377 lines
12 KiB
Markdown
377 lines
12 KiB
Markdown
# Enhanced Multi-Modal Trading Architecture Guide
|
|
|
|
## Overview
|
|
|
|
This document describes the enhanced multi-modal trading system that implements sophisticated decision-making through coordinated CNN and RL modules. The system is designed to handle multi-timeframe analysis across multiple symbols (ETH, BTC) with continuous learning capabilities.
|
|
|
|
## Architecture Components
|
|
|
|
### 1. Enhanced Trading Orchestrator (`core/enhanced_orchestrator.py`)
|
|
|
|
The heart of the system that coordinates all components:
|
|
|
|
**Key Features:**
|
|
- **Multi-Symbol Coordination**: Makes decisions across ETH and BTC considering correlations
|
|
- **Timeframe Integration**: Combines predictions from multiple timeframes (1m, 5m, 15m, 1h, 4h, 1d)
|
|
- **Perfect Move Marking**: Identifies and marks optimal trading decisions for CNN training
|
|
- **RL Evaluation Loop**: Evaluates trading outcomes to train RL agents
|
|
|
|
**Data Structures:**
|
|
```python
|
|
@dataclass
|
|
class TimeframePrediction:
|
|
timeframe: str
|
|
action: str # 'BUY', 'SELL', 'HOLD'
|
|
confidence: float # 0.0 to 1.0
|
|
probabilities: Dict[str, float]
|
|
timestamp: datetime
|
|
market_features: Dict[str, float]
|
|
|
|
@dataclass
|
|
class TradingAction:
|
|
symbol: str
|
|
action: str
|
|
quantity: float
|
|
confidence: float
|
|
price: float
|
|
timestamp: datetime
|
|
reasoning: Dict[str, Any]
|
|
timeframe_analysis: List[TimeframePrediction]
|
|
```
|
|
|
|
**Decision Making Process:**
|
|
1. Gather market states for all symbols and timeframes
|
|
2. Get CNN predictions for each timeframe with confidence scores
|
|
3. Combine timeframe predictions using weighted averaging
|
|
4. Consider symbol correlations (ETH-BTC correlation ~0.85)
|
|
5. Apply confidence thresholds and risk management
|
|
6. Generate coordinated trading decisions
|
|
7. Queue actions for RL evaluation
|
|
|
|
### 2. Enhanced CNN Trainer (`training/enhanced_cnn_trainer.py`)
|
|
|
|
Implements supervised learning on marked perfect moves:
|
|
|
|
**Key Features:**
|
|
- **Perfect Move Dataset**: Trains on historically optimal decisions
|
|
- **Timeframe-Specific Heads**: Separate prediction heads for each timeframe
|
|
- **Confidence Prediction**: Predicts both action and confidence simultaneously
|
|
- **Multi-Loss Training**: Combines action classification and confidence regression
|
|
|
|
**Network Architecture:**
|
|
```python
|
|
# Convolutional feature extraction
|
|
Conv1D(features=5, filters=64, kernel=3) -> BatchNorm -> ReLU -> Dropout
|
|
Conv1D(filters=128, kernel=3) -> BatchNorm -> ReLU -> Dropout
|
|
Conv1D(filters=256, kernel=3) -> BatchNorm -> ReLU -> Dropout
|
|
AdaptiveAvgPool1d(1) # Global average pooling
|
|
|
|
# Timeframe-specific heads
|
|
for each timeframe:
|
|
Linear(256 -> 128) -> ReLU -> Dropout
|
|
Linear(128 -> 64) -> ReLU -> Dropout
|
|
|
|
# Action prediction
|
|
Linear(64 -> 3) # BUY, HOLD, SELL
|
|
|
|
# Confidence prediction
|
|
Linear(64 -> 32) -> ReLU -> Linear(32 -> 1) -> Sigmoid
|
|
```
|
|
|
|
**Training Process:**
|
|
1. Collect perfect moves from orchestrator with known outcomes
|
|
2. Create dataset with features, optimal actions, and target confidence
|
|
3. Train with combined loss: `action_loss + 0.5 * confidence_loss`
|
|
4. Use early stopping and model checkpointing
|
|
5. Generate comprehensive training reports and visualizations
|
|
|
|
### 3. Enhanced RL Trainer (`training/enhanced_rl_trainer.py`)
|
|
|
|
Implements continuous learning from trading evaluations:
|
|
|
|
**Key Features:**
|
|
- **Prioritized Experience Replay**: Learns from important experiences first
|
|
- **Market Regime Adaptation**: Adjusts confidence based on market conditions
|
|
- **Multi-Symbol Agents**: Separate RL agents for each trading symbol
|
|
- **Double DQN Architecture**: Reduces overestimation bias
|
|
|
|
**Agent Architecture:**
|
|
```python
|
|
# Main Network
|
|
Linear(state_size -> 256) -> ReLU -> Dropout
|
|
Linear(256 -> 256) -> ReLU -> Dropout
|
|
Linear(256 -> 128) -> ReLU -> Dropout
|
|
|
|
# Dueling heads
|
|
value_head = Linear(128 -> 1)
|
|
advantage_head = Linear(128 -> action_space)
|
|
|
|
# Q-values = V(s) + A(s,a) - mean(A(s,a))
|
|
```
|
|
|
|
**Learning Process:**
|
|
1. Store trading experiences with TD-error priorities
|
|
2. Sample batches using prioritized replay
|
|
3. Train with Double DQN to reduce overestimation
|
|
4. Update target networks periodically
|
|
5. Adapt exploration (epsilon) based on market regime stability
|
|
|
|
### 4. Market State and Feature Engineering
|
|
|
|
**Market State Components:**
|
|
```python
|
|
@dataclass
|
|
class MarketState:
|
|
symbol: str
|
|
timestamp: datetime
|
|
prices: Dict[str, float] # {timeframe: price}
|
|
features: Dict[str, np.ndarray] # {timeframe: feature_matrix}
|
|
volatility: float
|
|
volume: float
|
|
trend_strength: float
|
|
market_regime: str # 'trending', 'ranging', 'volatile'
|
|
```
|
|
|
|
**Feature Engineering:**
|
|
- **OHLCV Data**: Open, High, Low, Close, Volume for each timeframe
|
|
- **Technical Indicators**: RSI, MACD, Bollinger Bands, etc.
|
|
- **Market Regime Detection**: Automatic classification of market conditions
|
|
- **Volatility Analysis**: Real-time volatility calculations
|
|
- **Volume Analysis**: Volume ratio compared to historical averages
|
|
|
|
## System Workflow
|
|
|
|
### 1. Initialization Phase
|
|
```python
|
|
# Load configuration
|
|
config = get_config('config.yaml')
|
|
|
|
# Initialize components
|
|
data_provider = DataProvider(config)
|
|
orchestrator = EnhancedTradingOrchestrator(data_provider)
|
|
cnn_trainer = EnhancedCNNTrainer(config, orchestrator)
|
|
rl_trainer = EnhancedRLTrainer(config, orchestrator)
|
|
|
|
# Load existing models or create new ones
|
|
models = initialize_models(load_existing=True)
|
|
register_models_with_orchestrator(models)
|
|
```
|
|
|
|
### 2. Trading Loop
|
|
```python
|
|
while running:
|
|
# 1. Gather market data for all symbols and timeframes
|
|
market_states = await get_all_market_states()
|
|
|
|
# 2. Generate CNN predictions for each timeframe
|
|
for symbol in symbols:
|
|
for timeframe in timeframes:
|
|
prediction = cnn_model.predict_timeframe(features, timeframe)
|
|
|
|
# 3. Combine timeframe predictions with weights
|
|
combined_prediction = combine_timeframe_predictions(predictions)
|
|
|
|
# 4. Consider symbol correlations
|
|
coordinated_decision = coordinate_symbols(predictions, correlations)
|
|
|
|
# 5. Apply confidence thresholds and risk management
|
|
final_decision = apply_risk_management(coordinated_decision)
|
|
|
|
# 6. Execute trades (or log decisions)
|
|
execute_trading_decision(final_decision)
|
|
|
|
# 7. Queue for RL evaluation
|
|
queue_for_rl_evaluation(final_decision, market_state)
|
|
```
|
|
|
|
### 3. Continuous Learning Loop
|
|
```python
|
|
# RL Learning (every hour)
|
|
async def rl_learning_loop():
|
|
while running:
|
|
# Evaluate past trading actions
|
|
await evaluate_trading_outcomes()
|
|
|
|
# Train RL agents on new experiences
|
|
for symbol, agent in rl_agents.items():
|
|
agent.replay() # Learn from prioritized experiences
|
|
|
|
# Adapt to market regime changes
|
|
adapt_to_market_conditions()
|
|
|
|
await asyncio.sleep(3600) # Wait 1 hour
|
|
|
|
# CNN Learning (every 6 hours)
|
|
async def cnn_learning_loop():
|
|
while running:
|
|
# Check for sufficient perfect moves
|
|
perfect_moves = get_perfect_moves_for_training()
|
|
|
|
if len(perfect_moves) >= 200:
|
|
# Train CNN on perfect moves
|
|
training_report = train_cnn_on_perfect_moves(perfect_moves)
|
|
|
|
# Update registered model
|
|
update_model_registry(trained_model)
|
|
|
|
await asyncio.sleep(6 * 3600) # Wait 6 hours
|
|
```
|
|
|
|
## Key Algorithms
|
|
|
|
### 1. Timeframe Prediction Combination
|
|
```python
|
|
def combine_timeframe_predictions(timeframe_predictions, symbol):
|
|
action_scores = {'BUY': 0.0, 'SELL': 0.0, 'HOLD': 0.0}
|
|
total_weight = 0.0
|
|
|
|
timeframe_weights = {
|
|
'1m': 0.05, '5m': 0.10, '15m': 0.15,
|
|
'1h': 0.25, '4h': 0.25, '1d': 0.20
|
|
}
|
|
|
|
for pred in timeframe_predictions:
|
|
weight = timeframe_weights[pred.timeframe] * pred.confidence
|
|
action_scores[pred.action] += weight
|
|
total_weight += weight
|
|
|
|
# Normalize and select best action
|
|
best_action = max(action_scores, key=action_scores.get)
|
|
confidence = action_scores[best_action] / total_weight
|
|
|
|
return best_action, confidence
|
|
```
|
|
|
|
### 2. Perfect Move Marking
|
|
```python
|
|
def mark_perfect_move(action, initial_state, final_state, reward):
|
|
# Determine optimal action based on outcome
|
|
if reward > 0.02: # Significant positive outcome
|
|
optimal_action = action.action # Action was correct
|
|
optimal_confidence = min(0.95, abs(reward) * 10)
|
|
elif reward < -0.02: # Significant negative outcome
|
|
optimal_action = opposite_action(action.action) # Should have done opposite
|
|
optimal_confidence = min(0.95, abs(reward) * 10)
|
|
else: # Neutral outcome
|
|
optimal_action = 'HOLD' # Should have held
|
|
optimal_confidence = 0.3
|
|
|
|
# Create perfect move for CNN training
|
|
perfect_move = PerfectMove(
|
|
symbol=action.symbol,
|
|
timeframe=timeframe,
|
|
timestamp=action.timestamp,
|
|
optimal_action=optimal_action,
|
|
confidence_should_have_been=optimal_confidence,
|
|
market_state_before=initial_state,
|
|
market_state_after=final_state,
|
|
actual_outcome=reward
|
|
)
|
|
|
|
return perfect_move
|
|
```
|
|
|
|
### 3. RL Reward Calculation
|
|
```python
|
|
def calculate_reward(action, price_change, confidence):
|
|
base_reward = 0.0
|
|
|
|
# Reward based on action correctness
|
|
if action == 'BUY' and price_change > 0:
|
|
base_reward = price_change * 10 # Reward proportional to gain
|
|
elif action == 'SELL' and price_change < 0:
|
|
base_reward = abs(price_change) * 10 # Reward for avoiding loss
|
|
elif action == 'HOLD':
|
|
if abs(price_change) < 0.005: # Correct hold
|
|
base_reward = 0.01
|
|
else: # Missed opportunity
|
|
base_reward = -0.01
|
|
else:
|
|
base_reward = -abs(price_change) * 5 # Penalty for wrong actions
|
|
|
|
# Scale by confidence
|
|
confidence_multiplier = 0.5 + confidence # 0.5 to 1.5 range
|
|
return base_reward * confidence_multiplier
|
|
```
|
|
|
|
## Configuration and Deployment
|
|
|
|
### 1. Running the System
|
|
```bash
|
|
# Basic trading mode
|
|
python enhanced_trading_main.py --mode trade
|
|
|
|
# Training only mode
|
|
python enhanced_trading_main.py --mode train
|
|
|
|
# Fresh start without loading existing models
|
|
python enhanced_trading_main.py --mode trade --no-load-models
|
|
|
|
# Custom configuration
|
|
python enhanced_trading_main.py --config custom_config.yaml
|
|
```
|
|
|
|
### 2. Key Configuration Parameters
|
|
```yaml
|
|
# Enhanced Orchestrator Settings
|
|
orchestrator:
|
|
confidence_threshold: 0.6 # Higher threshold for enhanced system
|
|
decision_frequency: 30 # Faster decisions (30 seconds)
|
|
|
|
# CNN Configuration
|
|
cnn:
|
|
timeframes: ["1m", "5m", "15m", "1h", "4h", "1d"]
|
|
confidence_threshold: 0.6
|
|
model_dir: "models/enhanced_cnn"
|
|
|
|
# RL Configuration
|
|
rl:
|
|
hidden_size: 256
|
|
buffer_size: 10000
|
|
model_dir: "models/enhanced_rl"
|
|
market_regime_weights:
|
|
trending: 1.2
|
|
ranging: 0.8
|
|
volatile: 0.6
|
|
```
|
|
|
|
### 3. Memory Management
|
|
The system is designed to work within 8GB memory constraints:
|
|
- Total system limit: 8GB
|
|
- Per-model limit: 2GB
|
|
- Automatic memory cleanup every 30 minutes
|
|
- GPU memory management with dynamic allocation
|
|
|
|
### 4. Monitoring and Logging
|
|
- Comprehensive logging with component-specific levels
|
|
- TensorBoard integration for training visualization
|
|
- Performance metrics tracking
|
|
- Memory usage monitoring
|
|
- Real-time decision logging with full reasoning
|
|
|
|
## Performance Characteristics
|
|
|
|
### Expected Behavior:
|
|
1. **Decision Frequency**: 30-second intervals between decisions
|
|
2. **CNN Training**: Every 6 hours when sufficient perfect moves available
|
|
3. **RL Training**: Continuous learning every hour
|
|
4. **Memory Usage**: <8GB total system usage
|
|
5. **Confidence Thresholds**: 0.6+ for trading actions
|
|
|
|
### Key Metrics:
|
|
- **Decision Accuracy**: Tracked via RL reward system
|
|
- **Confidence Calibration**: CNN confidence vs actual outcomes
|
|
- **Symbol Correlation**: ETH-BTC coordination effectiveness
|
|
- **Training Progress**: Loss curves and validation accuracy
|
|
- **Market Adaptation**: Performance across different regimes
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Additional Symbols**: Easy extension to support more trading pairs
|
|
2. **Advanced Features**: Sentiment analysis, news integration
|
|
3. **Risk Management**: Portfolio-level risk optimization
|
|
4. **Backtesting**: Historical performance evaluation
|
|
5. **Live Trading**: Real exchange integration
|
|
6. **Model Ensembles**: Multiple CNN/RL model combinations
|
|
|
|
This architecture provides a robust foundation for sophisticated algorithmic trading with continuous learning and adaptation capabilities. |