initial movel changes to fix performance
This commit is contained in:
224
TODO_IMPROVEMENTS.md
Normal file
224
TODO_IMPROVEMENTS.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# Cryptocurrency Trading System Improvements
|
||||
|
||||
## Overview
|
||||
This document outlines necessary improvements to our cryptocurrency trading system to enhance performance, profitability, and monitoring capabilities.
|
||||
|
||||
## High Priority Tasks
|
||||
|
||||
### 1. GPU Utilization for Training
|
||||
- [x] Fix GPU detection and utilization during training
|
||||
- [x] Debug why CUDA is detected but not utilized (check logs showing "Starting training on device: cpu")
|
||||
- [x] Ensure PyTorch correctly detects and uses available CUDA devices
|
||||
- [x] Add GPU memory monitoring during training
|
||||
- [x] Optimize batch sizes for GPU training
|
||||
|
||||
Implementation status:
|
||||
- Added `setup_gpu()` function in `train_rl_with_realtime.py` to properly detect and configure GPU usage
|
||||
- Added device parameter to DQNAgent to ensure models are created on the correct device
|
||||
- Implemented mixed precision training for faster GPU-based training
|
||||
- Added GPU memory monitoring and logging to TensorBoard
|
||||
|
||||
### 2. Trade Signal Rate Display
|
||||
- [x] Add metrics to track and display trading frequency
|
||||
- [x] Implement counter for actions per second/minute/hour
|
||||
- [x] Add visualization to the chart showing trading frequency over time
|
||||
- [x] Create a moving average of trade signals to show trends
|
||||
- [x] Add dashboard section showing current and average trading rates
|
||||
|
||||
Implementation status:
|
||||
- Added trade time tracking in `_add_trade_compat` function
|
||||
- Added `calculate_trade_rate` method to `RealTimeChart` class
|
||||
- Updated dashboard layout to display trade rates
|
||||
- Added visualization of trade frequency in chart's bottom panel
|
||||
|
||||
### 3. Reward Function Optimization
|
||||
- [x] Revise reward function to better balance profit and risk
|
||||
- [x] Increase transaction fee penalty for more realistic simulation
|
||||
- [x] Implement progressive rewards based on holding time
|
||||
- [x] Add penalty for frequent trading (to reduce noise)
|
||||
- [x] Scale rewards based on market volatility
|
||||
- [x] Implement risk-adjusted returns (Sharpe ratio) in reward calculation
|
||||
|
||||
Implementation status:
|
||||
- Created `improved_reward_function.py` with `ImprovedRewardCalculator` class
|
||||
- Implemented Sharpe ratio for risk-adjusted rewards
|
||||
- Added frequency penalty for excessive trading
|
||||
- Added holding time rewards for profitable positions
|
||||
- Integrated with `EnhancedRLTradingEnvironment` class
|
||||
|
||||
### 4. Multi-timeframe Price Direction Prediction
|
||||
- [ ] Extend CNN model to predict price direction for multiple timeframes
|
||||
- [ ] Modify CNN output to predict short, mid, and long-term price directions
|
||||
- [ ] Create data generation method for back-propagation using historical data
|
||||
- [ ] Implement real-time example generation for training
|
||||
- [ ] Feed direction predictions to RL agent as additional state information
|
||||
|
||||
## Medium Priority Tasks
|
||||
|
||||
### 5. Position Sizing Optimization
|
||||
- [ ] Implement dynamic position sizing based on confidence and volatility
|
||||
- [ ] Add confidence score to model outputs
|
||||
- [ ] Scale position size based on prediction confidence
|
||||
- [ ] Implement Kelly criterion for optimal position sizing
|
||||
|
||||
### 6. Training Data Augmentation
|
||||
- [ ] Implement data augmentation for more robust training
|
||||
- [ ] Simulate different market conditions
|
||||
- [ ] Add noise to training data
|
||||
- [ ] Generate synthetic data for rare market events
|
||||
|
||||
### 7. Model Interpretability
|
||||
- [ ] Add visualization for model decision making
|
||||
- [ ] Implement feature importance analysis
|
||||
- [ ] Add attention visualization for key price patterns
|
||||
- [ ] Create explainable AI components
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Completed: Displaying Trade Rate
|
||||
The trade rate display implementation has been completed in the `RealTimeChart` class:
|
||||
```python
|
||||
def calculate_trade_rate(self):
|
||||
"""Calculate and return trading rate statistics based on recent trades"""
|
||||
if not hasattr(self, 'trade_times') or not self.trade_times:
|
||||
return {"per_second": 0, "per_minute": 0, "per_hour": 0}
|
||||
|
||||
# Get current time
|
||||
now = datetime.now()
|
||||
|
||||
# Calculate different time windows
|
||||
one_second_ago = now - timedelta(seconds=1)
|
||||
one_minute_ago = now - timedelta(minutes=1)
|
||||
one_hour_ago = now - timedelta(hours=1)
|
||||
|
||||
# Count trades in different time windows
|
||||
trades_last_second = sum(1 for t in self.trade_times if t > one_second_ago)
|
||||
trades_last_minute = sum(1 for t in self.trade_times if t > one_minute_ago)
|
||||
trades_last_hour = sum(1 for t in self.trade_times if t > one_hour_ago)
|
||||
|
||||
# Calculate rates
|
||||
return {
|
||||
"per_second": trades_last_second,
|
||||
"per_minute": trades_last_minute,
|
||||
"per_hour": trades_last_hour
|
||||
}
|
||||
```
|
||||
|
||||
### Completed: Improved Reward Function
|
||||
The improved reward function has been implemented in `improved_reward_function.py`:
|
||||
```python
|
||||
def calculate_reward(self, action, price_change, position_held_time=0,
|
||||
volatility=None, is_profitable=False):
|
||||
"""
|
||||
Calculate the improved reward with risk adjustment
|
||||
"""
|
||||
# Calculate trading fee
|
||||
fee = self.base_fee_rate
|
||||
|
||||
# Calculate frequency penalty
|
||||
frequency_penalty = self._calculate_frequency_penalty()
|
||||
|
||||
# Base reward calculation
|
||||
if action == 0: # BUY
|
||||
# Small penalty for transaction plus frequency penalty
|
||||
reward = -fee - frequency_penalty
|
||||
|
||||
elif action == 1: # SELL
|
||||
# Calculate profit percentage minus fees (both entry and exit)
|
||||
profit_pct = price_change
|
||||
net_profit = profit_pct - (fee * 2)
|
||||
|
||||
# Scale reward and apply frequency penalty
|
||||
reward = net_profit * 10 # Scale reward
|
||||
reward -= frequency_penalty
|
||||
|
||||
# Record PnL for risk adjustment
|
||||
self.record_pnl(net_profit)
|
||||
|
||||
else: # HOLD
|
||||
# Small reward for holding a profitable position, small cost otherwise
|
||||
if is_profitable:
|
||||
reward = self._calculate_holding_reward(position_held_time, price_change)
|
||||
else:
|
||||
reward = -0.0001 # Very small negative reward
|
||||
|
||||
# Apply risk adjustment if enabled
|
||||
if self.risk_adjusted:
|
||||
reward = self._calculate_risk_adjustment(reward)
|
||||
|
||||
# Record this action for future frequency calculations
|
||||
self.record_trade(action=action)
|
||||
|
||||
return reward
|
||||
```
|
||||
|
||||
### Completed: GPU Optimization
|
||||
Added GPU optimization in `train_rl_with_realtime.py`:
|
||||
```python
|
||||
def setup_gpu():
|
||||
"""
|
||||
Configure GPU usage for PyTorch training
|
||||
|
||||
Returns:
|
||||
tuple: (success, device, message)
|
||||
"""
|
||||
try:
|
||||
if torch.cuda.is_available():
|
||||
gpu_count = torch.cuda.device_count()
|
||||
device_info = [torch.cuda.get_device_name(i) for i in range(gpu_count)]
|
||||
logger.info(f"Found {gpu_count} GPU(s): {', '.join(device_info)}")
|
||||
|
||||
device = torch.device("cuda:0")
|
||||
|
||||
# Test CUDA by creating a small tensor
|
||||
test_tensor = torch.tensor([1.0, 2.0, 3.0], device=device)
|
||||
|
||||
# Enable mixed precision if supported
|
||||
if hasattr(torch.cuda, 'amp') and torch.cuda.is_bf16_supported():
|
||||
logger.info("BFloat16 is supported - enabling for faster training")
|
||||
|
||||
return True, device, f"GPU enabled: {device_info}"
|
||||
else:
|
||||
return False, torch.device("cpu"), "GPU not available, using CPU"
|
||||
except Exception as e:
|
||||
return False, torch.device("cpu"), f"GPU setup failed: {str(e)}"
|
||||
```
|
||||
|
||||
### CNN Price Direction Prediction (To be implemented)
|
||||
```python
|
||||
def generate_direction_examples(self, historical_data, timeframes=['1m', '1h', '1d']):
|
||||
"""Generate price direction examples from historical data"""
|
||||
examples = []
|
||||
labels = []
|
||||
|
||||
for tf in timeframes:
|
||||
df = historical_data[tf]
|
||||
for i in range(20, len(df) - 10):
|
||||
# Use window of 20 candles for input
|
||||
window = df.iloc[i-20:i]
|
||||
|
||||
# Create labels for future price direction (next 5, 10, 20 candles)
|
||||
future_5 = df.iloc[i].close < df.iloc[i+5].close # True if price goes up
|
||||
future_10 = df.iloc[i].close < df.iloc[i+10].close
|
||||
future_20 = df.iloc[i].close < df.iloc[min(i+20, len(df)-1)].close
|
||||
|
||||
examples.append(window.values)
|
||||
labels.append([future_5, future_10, future_20])
|
||||
|
||||
return np.array(examples), np.array(labels)
|
||||
```
|
||||
|
||||
## Validation Plan
|
||||
After implementing these improvements, we should validate the system with:
|
||||
1. Backtesting on historical data
|
||||
2. Forward testing with small position sizes
|
||||
3. A/B testing of different reward functions
|
||||
4. Measuring the improvement in profitability and Sharpe ratio
|
||||
|
||||
## Progress Tracking
|
||||
- Implementation started: June 2023
|
||||
- GPU utilization fixed: July 2023
|
||||
- Trade signal rate display implemented: July 2023
|
||||
- Reward function optimized: July 2023
|
||||
- CNN direction prediction added: To be completed
|
||||
- Full system tested: To be completed
|
||||
Reference in New Issue
Block a user