initial movel changes to fix performance

2025-04-02 14:03:20 +03:00
parent aec536d007
commit 70eb7bba9b
8 changed files with 1619 additions and 279 deletions
--- a/TODO_IMPROVEMENTS.md
+++ b/TODO_IMPROVEMENTS.md
@@ -0,0 +1,224 @@
+# Cryptocurrency Trading System Improvements
+
+## Overview
+This document outlines necessary improvements to our cryptocurrency trading system to enhance performance, profitability, and monitoring capabilities.
+
+## High Priority Tasks
+
+### 1. GPU Utilization for Training
+- [x] Fix GPU detection and utilization during training
+  - [x] Debug why CUDA is detected but not utilized (check logs showing "Starting training on device: cpu")
+  - [x] Ensure PyTorch correctly detects and uses available CUDA devices
+  - [x] Add GPU memory monitoring during training
+  - [x] Optimize batch sizes for GPU training
+
+Implementation status:
+- Added `setup_gpu()` function in `train_rl_with_realtime.py` to properly detect and configure GPU usage
+- Added device parameter to DQNAgent to ensure models are created on the correct device
+- Implemented mixed precision training for faster GPU-based training
+- Added GPU memory monitoring and logging to TensorBoard
+
+### 2. Trade Signal Rate Display
+- [x] Add metrics to track and display trading frequency
+  - [x] Implement counter for actions per second/minute/hour
+  - [x] Add visualization to the chart showing trading frequency over time
+  - [x] Create a moving average of trade signals to show trends
+  - [x] Add dashboard section showing current and average trading rates
+
+Implementation status:
+- Added trade time tracking in `_add_trade_compat` function
+- Added `calculate_trade_rate` method to `RealTimeChart` class
+- Updated dashboard layout to display trade rates
+- Added visualization of trade frequency in chart's bottom panel
+
+### 3. Reward Function Optimization
+- [x] Revise reward function to better balance profit and risk
+  - [x] Increase transaction fee penalty for more realistic simulation
+  - [x] Implement progressive rewards based on holding time
+  - [x] Add penalty for frequent trading (to reduce noise)
+  - [x] Scale rewards based on market volatility
+  - [x] Implement risk-adjusted returns (Sharpe ratio) in reward calculation
+
+Implementation status:
+- Created `improved_reward_function.py` with `ImprovedRewardCalculator` class
+- Implemented Sharpe ratio for risk-adjusted rewards
+- Added frequency penalty for excessive trading
+- Added holding time rewards for profitable positions
+- Integrated with `EnhancedRLTradingEnvironment` class
+
+### 4. Multi-timeframe Price Direction Prediction
+- [ ] Extend CNN model to predict price direction for multiple timeframes
+  - [ ] Modify CNN output to predict short, mid, and long-term price directions
+  - [ ] Create data generation method for back-propagation using historical data
+  - [ ] Implement real-time example generation for training
+  - [ ] Feed direction predictions to RL agent as additional state information
+
+## Medium Priority Tasks
+
+### 5. Position Sizing Optimization
+- [ ] Implement dynamic position sizing based on confidence and volatility
+  - [ ] Add confidence score to model outputs
+  - [ ] Scale position size based on prediction confidence
+  - [ ] Implement Kelly criterion for optimal position sizing
+
+### 6. Training Data Augmentation
+- [ ] Implement data augmentation for more robust training
+  - [ ] Simulate different market conditions
+  - [ ] Add noise to training data
+  - [ ] Generate synthetic data for rare market events
+
+### 7. Model Interpretability
+- [ ] Add visualization for model decision making
+  - [ ] Implement feature importance analysis
+  - [ ] Add attention visualization for key price patterns
+  - [ ] Create explainable AI components
+
+## Implementation Details
+
+### Completed: Displaying Trade Rate
+The trade rate display implementation has been completed in the `RealTimeChart` class:
+```python
+def calculate_trade_rate(self):
+    """Calculate and return trading rate statistics based on recent trades"""
+    if not hasattr(self, 'trade_times') or not self.trade_times:
+        return {"per_second": 0, "per_minute": 0, "per_hour": 0}
+    
+    # Get current time
+    now = datetime.now()
+    
+    # Calculate different time windows
+    one_second_ago = now - timedelta(seconds=1)
+    one_minute_ago = now - timedelta(minutes=1)
+    one_hour_ago = now - timedelta(hours=1)
+    
+    # Count trades in different time windows
+    trades_last_second = sum(1 for t in self.trade_times if t > one_second_ago)
+    trades_last_minute = sum(1 for t in self.trade_times if t > one_minute_ago)
+    trades_last_hour = sum(1 for t in self.trade_times if t > one_hour_ago)
+    
+    # Calculate rates
+    return {
+        "per_second": trades_last_second,
+        "per_minute": trades_last_minute,
+        "per_hour": trades_last_hour
+    }
+```
+
+### Completed: Improved Reward Function
+The improved reward function has been implemented in `improved_reward_function.py`:
+```python
+def calculate_reward(self, action, price_change, position_held_time=0, 
+                     volatility=None, is_profitable=False):
+    """
+    Calculate the improved reward with risk adjustment
+    """
+    # Calculate trading fee
+    fee = self.base_fee_rate
+    
+    # Calculate frequency penalty
+    frequency_penalty = self._calculate_frequency_penalty()
+    
+    # Base reward calculation
+    if action == 0:  # BUY
+        # Small penalty for transaction plus frequency penalty
+        reward = -fee - frequency_penalty
+        
+    elif action == 1:  # SELL
+        # Calculate profit percentage minus fees (both entry and exit)
+        profit_pct = price_change
+        net_profit = profit_pct - (fee * 2)
+        
+        # Scale reward and apply frequency penalty
+        reward = net_profit * 10  # Scale reward
+        reward -= frequency_penalty
+        
+        # Record PnL for risk adjustment
+        self.record_pnl(net_profit)
+        
+    else:  # HOLD
+        # Small reward for holding a profitable position, small cost otherwise
+        if is_profitable:
+            reward = self._calculate_holding_reward(position_held_time, price_change)
+        else:
+            reward = -0.0001  # Very small negative reward
+    
+    # Apply risk adjustment if enabled
+    if self.risk_adjusted:
+        reward = self._calculate_risk_adjustment(reward)
+        
+    # Record this action for future frequency calculations
+    self.record_trade(action=action)
+    
+    return reward
+```
+
+### Completed: GPU Optimization
+Added GPU optimization in `train_rl_with_realtime.py`:
+```python
+def setup_gpu():
+    """
+    Configure GPU usage for PyTorch training
+    
+    Returns:
+        tuple: (success, device, message)
+    """
+    try:
+        if torch.cuda.is_available():
+            gpu_count = torch.cuda.device_count()
+            device_info = [torch.cuda.get_device_name(i) for i in range(gpu_count)]
+            logger.info(f"Found {gpu_count} GPU(s): {', '.join(device_info)}")
+            
+            device = torch.device("cuda:0")
+            
+            # Test CUDA by creating a small tensor
+            test_tensor = torch.tensor([1.0, 2.0, 3.0], device=device)
+            
+            # Enable mixed precision if supported
+            if hasattr(torch.cuda, 'amp') and torch.cuda.is_bf16_supported():
+                logger.info("BFloat16 is supported - enabling for faster training")
+            
+            return True, device, f"GPU enabled: {device_info}"
+        else:
+            return False, torch.device("cpu"), "GPU not available, using CPU"
+    except Exception as e:
+        return False, torch.device("cpu"), f"GPU setup failed: {str(e)}"
+```
+
+### CNN Price Direction Prediction (To be implemented)
+```python
+def generate_direction_examples(self, historical_data, timeframes=['1m', '1h', '1d']):
+    """Generate price direction examples from historical data"""
+    examples = []
+    labels = []
+    
+    for tf in timeframes:
+        df = historical_data[tf]
+        for i in range(20, len(df) - 10):
+            # Use window of 20 candles for input
+            window = df.iloc[i-20:i]
+            
+            # Create labels for future price direction (next 5, 10, 20 candles)
+            future_5 = df.iloc[i].close < df.iloc[i+5].close  # True if price goes up
+            future_10 = df.iloc[i].close < df.iloc[i+10].close
+            future_20 = df.iloc[i].close < df.iloc[min(i+20, len(df)-1)].close
+            
+            examples.append(window.values)
+            labels.append([future_5, future_10, future_20])
+    
+    return np.array(examples), np.array(labels)
+```
+
+## Validation Plan
+After implementing these improvements, we should validate the system with:
+1. Backtesting on historical data 
+2. Forward testing with small position sizes
+3. A/B testing of different reward functions
+4. Measuring the improvement in profitability and Sharpe ratio
+
+## Progress Tracking
+- Implementation started: June 2023
+- GPU utilization fixed: July 2023
+- Trade signal rate display implemented: July 2023
+- Reward function optimized: July 2023
+- CNN direction prediction added: To be completed
+- Full system tested: To be completed