new overhaul
This commit is contained in:
377
ENHANCED_ARCHITECTURE_GUIDE.md
Normal file
377
ENHANCED_ARCHITECTURE_GUIDE.md
Normal file
@ -0,0 +1,377 @@
|
||||
# Enhanced Multi-Modal Trading Architecture Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the enhanced multi-modal trading system that implements sophisticated decision-making through coordinated CNN and RL modules. The system is designed to handle multi-timeframe analysis across multiple symbols (ETH, BTC) with continuous learning capabilities.
|
||||
|
||||
## Architecture Components
|
||||
|
||||
### 1. Enhanced Trading Orchestrator (`core/enhanced_orchestrator.py`)
|
||||
|
||||
The heart of the system that coordinates all components:
|
||||
|
||||
**Key Features:**
|
||||
- **Multi-Symbol Coordination**: Makes decisions across ETH and BTC considering correlations
|
||||
- **Timeframe Integration**: Combines predictions from multiple timeframes (1m, 5m, 15m, 1h, 4h, 1d)
|
||||
- **Perfect Move Marking**: Identifies and marks optimal trading decisions for CNN training
|
||||
- **RL Evaluation Loop**: Evaluates trading outcomes to train RL agents
|
||||
|
||||
**Data Structures:**
|
||||
```python
|
||||
@dataclass
|
||||
class TimeframePrediction:
|
||||
timeframe: str
|
||||
action: str # 'BUY', 'SELL', 'HOLD'
|
||||
confidence: float # 0.0 to 1.0
|
||||
probabilities: Dict[str, float]
|
||||
timestamp: datetime
|
||||
market_features: Dict[str, float]
|
||||
|
||||
@dataclass
|
||||
class TradingAction:
|
||||
symbol: str
|
||||
action: str
|
||||
quantity: float
|
||||
confidence: float
|
||||
price: float
|
||||
timestamp: datetime
|
||||
reasoning: Dict[str, Any]
|
||||
timeframe_analysis: List[TimeframePrediction]
|
||||
```
|
||||
|
||||
**Decision Making Process:**
|
||||
1. Gather market states for all symbols and timeframes
|
||||
2. Get CNN predictions for each timeframe with confidence scores
|
||||
3. Combine timeframe predictions using weighted averaging
|
||||
4. Consider symbol correlations (ETH-BTC correlation ~0.85)
|
||||
5. Apply confidence thresholds and risk management
|
||||
6. Generate coordinated trading decisions
|
||||
7. Queue actions for RL evaluation
|
||||
|
||||
### 2. Enhanced CNN Trainer (`training/enhanced_cnn_trainer.py`)
|
||||
|
||||
Implements supervised learning on marked perfect moves:
|
||||
|
||||
**Key Features:**
|
||||
- **Perfect Move Dataset**: Trains on historically optimal decisions
|
||||
- **Timeframe-Specific Heads**: Separate prediction heads for each timeframe
|
||||
- **Confidence Prediction**: Predicts both action and confidence simultaneously
|
||||
- **Multi-Loss Training**: Combines action classification and confidence regression
|
||||
|
||||
**Network Architecture:**
|
||||
```python
|
||||
# Convolutional feature extraction
|
||||
Conv1D(features=5, filters=64, kernel=3) -> BatchNorm -> ReLU -> Dropout
|
||||
Conv1D(filters=128, kernel=3) -> BatchNorm -> ReLU -> Dropout
|
||||
Conv1D(filters=256, kernel=3) -> BatchNorm -> ReLU -> Dropout
|
||||
AdaptiveAvgPool1d(1) # Global average pooling
|
||||
|
||||
# Timeframe-specific heads
|
||||
for each timeframe:
|
||||
Linear(256 -> 128) -> ReLU -> Dropout
|
||||
Linear(128 -> 64) -> ReLU -> Dropout
|
||||
|
||||
# Action prediction
|
||||
Linear(64 -> 3) # BUY, HOLD, SELL
|
||||
|
||||
# Confidence prediction
|
||||
Linear(64 -> 32) -> ReLU -> Linear(32 -> 1) -> Sigmoid
|
||||
```
|
||||
|
||||
**Training Process:**
|
||||
1. Collect perfect moves from orchestrator with known outcomes
|
||||
2. Create dataset with features, optimal actions, and target confidence
|
||||
3. Train with combined loss: `action_loss + 0.5 * confidence_loss`
|
||||
4. Use early stopping and model checkpointing
|
||||
5. Generate comprehensive training reports and visualizations
|
||||
|
||||
### 3. Enhanced RL Trainer (`training/enhanced_rl_trainer.py`)
|
||||
|
||||
Implements continuous learning from trading evaluations:
|
||||
|
||||
**Key Features:**
|
||||
- **Prioritized Experience Replay**: Learns from important experiences first
|
||||
- **Market Regime Adaptation**: Adjusts confidence based on market conditions
|
||||
- **Multi-Symbol Agents**: Separate RL agents for each trading symbol
|
||||
- **Double DQN Architecture**: Reduces overestimation bias
|
||||
|
||||
**Agent Architecture:**
|
||||
```python
|
||||
# Main Network
|
||||
Linear(state_size -> 256) -> ReLU -> Dropout
|
||||
Linear(256 -> 256) -> ReLU -> Dropout
|
||||
Linear(256 -> 128) -> ReLU -> Dropout
|
||||
|
||||
# Dueling heads
|
||||
value_head = Linear(128 -> 1)
|
||||
advantage_head = Linear(128 -> action_space)
|
||||
|
||||
# Q-values = V(s) + A(s,a) - mean(A(s,a))
|
||||
```
|
||||
|
||||
**Learning Process:**
|
||||
1. Store trading experiences with TD-error priorities
|
||||
2. Sample batches using prioritized replay
|
||||
3. Train with Double DQN to reduce overestimation
|
||||
4. Update target networks periodically
|
||||
5. Adapt exploration (epsilon) based on market regime stability
|
||||
|
||||
### 4. Market State and Feature Engineering
|
||||
|
||||
**Market State Components:**
|
||||
```python
|
||||
@dataclass
|
||||
class MarketState:
|
||||
symbol: str
|
||||
timestamp: datetime
|
||||
prices: Dict[str, float] # {timeframe: price}
|
||||
features: Dict[str, np.ndarray] # {timeframe: feature_matrix}
|
||||
volatility: float
|
||||
volume: float
|
||||
trend_strength: float
|
||||
market_regime: str # 'trending', 'ranging', 'volatile'
|
||||
```
|
||||
|
||||
**Feature Engineering:**
|
||||
- **OHLCV Data**: Open, High, Low, Close, Volume for each timeframe
|
||||
- **Technical Indicators**: RSI, MACD, Bollinger Bands, etc.
|
||||
- **Market Regime Detection**: Automatic classification of market conditions
|
||||
- **Volatility Analysis**: Real-time volatility calculations
|
||||
- **Volume Analysis**: Volume ratio compared to historical averages
|
||||
|
||||
## System Workflow
|
||||
|
||||
### 1. Initialization Phase
|
||||
```python
|
||||
# Load configuration
|
||||
config = get_config('config.yaml')
|
||||
|
||||
# Initialize components
|
||||
data_provider = DataProvider(config)
|
||||
orchestrator = EnhancedTradingOrchestrator(data_provider)
|
||||
cnn_trainer = EnhancedCNNTrainer(config, orchestrator)
|
||||
rl_trainer = EnhancedRLTrainer(config, orchestrator)
|
||||
|
||||
# Load existing models or create new ones
|
||||
models = initialize_models(load_existing=True)
|
||||
register_models_with_orchestrator(models)
|
||||
```
|
||||
|
||||
### 2. Trading Loop
|
||||
```python
|
||||
while running:
|
||||
# 1. Gather market data for all symbols and timeframes
|
||||
market_states = await get_all_market_states()
|
||||
|
||||
# 2. Generate CNN predictions for each timeframe
|
||||
for symbol in symbols:
|
||||
for timeframe in timeframes:
|
||||
prediction = cnn_model.predict_timeframe(features, timeframe)
|
||||
|
||||
# 3. Combine timeframe predictions with weights
|
||||
combined_prediction = combine_timeframe_predictions(predictions)
|
||||
|
||||
# 4. Consider symbol correlations
|
||||
coordinated_decision = coordinate_symbols(predictions, correlations)
|
||||
|
||||
# 5. Apply confidence thresholds and risk management
|
||||
final_decision = apply_risk_management(coordinated_decision)
|
||||
|
||||
# 6. Execute trades (or log decisions)
|
||||
execute_trading_decision(final_decision)
|
||||
|
||||
# 7. Queue for RL evaluation
|
||||
queue_for_rl_evaluation(final_decision, market_state)
|
||||
```
|
||||
|
||||
### 3. Continuous Learning Loop
|
||||
```python
|
||||
# RL Learning (every hour)
|
||||
async def rl_learning_loop():
|
||||
while running:
|
||||
# Evaluate past trading actions
|
||||
await evaluate_trading_outcomes()
|
||||
|
||||
# Train RL agents on new experiences
|
||||
for symbol, agent in rl_agents.items():
|
||||
agent.replay() # Learn from prioritized experiences
|
||||
|
||||
# Adapt to market regime changes
|
||||
adapt_to_market_conditions()
|
||||
|
||||
await asyncio.sleep(3600) # Wait 1 hour
|
||||
|
||||
# CNN Learning (every 6 hours)
|
||||
async def cnn_learning_loop():
|
||||
while running:
|
||||
# Check for sufficient perfect moves
|
||||
perfect_moves = get_perfect_moves_for_training()
|
||||
|
||||
if len(perfect_moves) >= 200:
|
||||
# Train CNN on perfect moves
|
||||
training_report = train_cnn_on_perfect_moves(perfect_moves)
|
||||
|
||||
# Update registered model
|
||||
update_model_registry(trained_model)
|
||||
|
||||
await asyncio.sleep(6 * 3600) # Wait 6 hours
|
||||
```
|
||||
|
||||
## Key Algorithms
|
||||
|
||||
### 1. Timeframe Prediction Combination
|
||||
```python
|
||||
def combine_timeframe_predictions(timeframe_predictions, symbol):
|
||||
action_scores = {'BUY': 0.0, 'SELL': 0.0, 'HOLD': 0.0}
|
||||
total_weight = 0.0
|
||||
|
||||
timeframe_weights = {
|
||||
'1m': 0.05, '5m': 0.10, '15m': 0.15,
|
||||
'1h': 0.25, '4h': 0.25, '1d': 0.20
|
||||
}
|
||||
|
||||
for pred in timeframe_predictions:
|
||||
weight = timeframe_weights[pred.timeframe] * pred.confidence
|
||||
action_scores[pred.action] += weight
|
||||
total_weight += weight
|
||||
|
||||
# Normalize and select best action
|
||||
best_action = max(action_scores, key=action_scores.get)
|
||||
confidence = action_scores[best_action] / total_weight
|
||||
|
||||
return best_action, confidence
|
||||
```
|
||||
|
||||
### 2. Perfect Move Marking
|
||||
```python
|
||||
def mark_perfect_move(action, initial_state, final_state, reward):
|
||||
# Determine optimal action based on outcome
|
||||
if reward > 0.02: # Significant positive outcome
|
||||
optimal_action = action.action # Action was correct
|
||||
optimal_confidence = min(0.95, abs(reward) * 10)
|
||||
elif reward < -0.02: # Significant negative outcome
|
||||
optimal_action = opposite_action(action.action) # Should have done opposite
|
||||
optimal_confidence = min(0.95, abs(reward) * 10)
|
||||
else: # Neutral outcome
|
||||
optimal_action = 'HOLD' # Should have held
|
||||
optimal_confidence = 0.3
|
||||
|
||||
# Create perfect move for CNN training
|
||||
perfect_move = PerfectMove(
|
||||
symbol=action.symbol,
|
||||
timeframe=timeframe,
|
||||
timestamp=action.timestamp,
|
||||
optimal_action=optimal_action,
|
||||
confidence_should_have_been=optimal_confidence,
|
||||
market_state_before=initial_state,
|
||||
market_state_after=final_state,
|
||||
actual_outcome=reward
|
||||
)
|
||||
|
||||
return perfect_move
|
||||
```
|
||||
|
||||
### 3. RL Reward Calculation
|
||||
```python
|
||||
def calculate_reward(action, price_change, confidence):
|
||||
base_reward = 0.0
|
||||
|
||||
# Reward based on action correctness
|
||||
if action == 'BUY' and price_change > 0:
|
||||
base_reward = price_change * 10 # Reward proportional to gain
|
||||
elif action == 'SELL' and price_change < 0:
|
||||
base_reward = abs(price_change) * 10 # Reward for avoiding loss
|
||||
elif action == 'HOLD':
|
||||
if abs(price_change) < 0.005: # Correct hold
|
||||
base_reward = 0.01
|
||||
else: # Missed opportunity
|
||||
base_reward = -0.01
|
||||
else:
|
||||
base_reward = -abs(price_change) * 5 # Penalty for wrong actions
|
||||
|
||||
# Scale by confidence
|
||||
confidence_multiplier = 0.5 + confidence # 0.5 to 1.5 range
|
||||
return base_reward * confidence_multiplier
|
||||
```
|
||||
|
||||
## Configuration and Deployment
|
||||
|
||||
### 1. Running the System
|
||||
```bash
|
||||
# Basic trading mode
|
||||
python enhanced_trading_main.py --mode trade
|
||||
|
||||
# Training only mode
|
||||
python enhanced_trading_main.py --mode train
|
||||
|
||||
# Fresh start without loading existing models
|
||||
python enhanced_trading_main.py --mode trade --no-load-models
|
||||
|
||||
# Custom configuration
|
||||
python enhanced_trading_main.py --config custom_config.yaml
|
||||
```
|
||||
|
||||
### 2. Key Configuration Parameters
|
||||
```yaml
|
||||
# Enhanced Orchestrator Settings
|
||||
orchestrator:
|
||||
confidence_threshold: 0.6 # Higher threshold for enhanced system
|
||||
decision_frequency: 30 # Faster decisions (30 seconds)
|
||||
|
||||
# CNN Configuration
|
||||
cnn:
|
||||
timeframes: ["1m", "5m", "15m", "1h", "4h", "1d"]
|
||||
confidence_threshold: 0.6
|
||||
model_dir: "models/enhanced_cnn"
|
||||
|
||||
# RL Configuration
|
||||
rl:
|
||||
hidden_size: 256
|
||||
buffer_size: 10000
|
||||
model_dir: "models/enhanced_rl"
|
||||
market_regime_weights:
|
||||
trending: 1.2
|
||||
ranging: 0.8
|
||||
volatile: 0.6
|
||||
```
|
||||
|
||||
### 3. Memory Management
|
||||
The system is designed to work within 8GB memory constraints:
|
||||
- Total system limit: 8GB
|
||||
- Per-model limit: 2GB
|
||||
- Automatic memory cleanup every 30 minutes
|
||||
- GPU memory management with dynamic allocation
|
||||
|
||||
### 4. Monitoring and Logging
|
||||
- Comprehensive logging with component-specific levels
|
||||
- TensorBoard integration for training visualization
|
||||
- Performance metrics tracking
|
||||
- Memory usage monitoring
|
||||
- Real-time decision logging with full reasoning
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Expected Behavior:
|
||||
1. **Decision Frequency**: 30-second intervals between decisions
|
||||
2. **CNN Training**: Every 6 hours when sufficient perfect moves available
|
||||
3. **RL Training**: Continuous learning every hour
|
||||
4. **Memory Usage**: <8GB total system usage
|
||||
5. **Confidence Thresholds**: 0.6+ for trading actions
|
||||
|
||||
### Key Metrics:
|
||||
- **Decision Accuracy**: Tracked via RL reward system
|
||||
- **Confidence Calibration**: CNN confidence vs actual outcomes
|
||||
- **Symbol Correlation**: ETH-BTC coordination effectiveness
|
||||
- **Training Progress**: Loss curves and validation accuracy
|
||||
- **Market Adaptation**: Performance across different regimes
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Additional Symbols**: Easy extension to support more trading pairs
|
||||
2. **Advanced Features**: Sentiment analysis, news integration
|
||||
3. **Risk Management**: Portfolio-level risk optimization
|
||||
4. **Backtesting**: Historical performance evaluation
|
||||
5. **Live Trading**: Real exchange integration
|
||||
6. **Model Ensembles**: Multiple CNN/RL model combinations
|
||||
|
||||
This architecture provides a robust foundation for sophisticated algorithmic trading with continuous learning and adaptation capabilities.
|
Reference in New Issue
Block a user