gogo2/TODO_IMPROVEMENTS.md
2025-04-02 14:03:20 +03:00

8.7 KiB

Cryptocurrency Trading System Improvements

Overview

This document outlines necessary improvements to our cryptocurrency trading system to enhance performance, profitability, and monitoring capabilities.

High Priority Tasks

1. GPU Utilization for Training

  • Fix GPU detection and utilization during training
    • Debug why CUDA is detected but not utilized (check logs showing "Starting training on device: cpu")
    • Ensure PyTorch correctly detects and uses available CUDA devices
    • Add GPU memory monitoring during training
    • Optimize batch sizes for GPU training

Implementation status:

  • Added setup_gpu() function in train_rl_with_realtime.py to properly detect and configure GPU usage
  • Added device parameter to DQNAgent to ensure models are created on the correct device
  • Implemented mixed precision training for faster GPU-based training
  • Added GPU memory monitoring and logging to TensorBoard

2. Trade Signal Rate Display

  • Add metrics to track and display trading frequency
    • Implement counter for actions per second/minute/hour
    • Add visualization to the chart showing trading frequency over time
    • Create a moving average of trade signals to show trends
    • Add dashboard section showing current and average trading rates

Implementation status:

  • Added trade time tracking in _add_trade_compat function
  • Added calculate_trade_rate method to RealTimeChart class
  • Updated dashboard layout to display trade rates
  • Added visualization of trade frequency in chart's bottom panel

3. Reward Function Optimization

  • Revise reward function to better balance profit and risk
    • Increase transaction fee penalty for more realistic simulation
    • Implement progressive rewards based on holding time
    • Add penalty for frequent trading (to reduce noise)
    • Scale rewards based on market volatility
    • Implement risk-adjusted returns (Sharpe ratio) in reward calculation

Implementation status:

  • Created improved_reward_function.py with ImprovedRewardCalculator class
  • Implemented Sharpe ratio for risk-adjusted rewards
  • Added frequency penalty for excessive trading
  • Added holding time rewards for profitable positions
  • Integrated with EnhancedRLTradingEnvironment class

4. Multi-timeframe Price Direction Prediction

  • Extend CNN model to predict price direction for multiple timeframes
    • Modify CNN output to predict short, mid, and long-term price directions
    • Create data generation method for back-propagation using historical data
    • Implement real-time example generation for training
    • Feed direction predictions to RL agent as additional state information

Medium Priority Tasks

5. Position Sizing Optimization

  • Implement dynamic position sizing based on confidence and volatility
    • Add confidence score to model outputs
    • Scale position size based on prediction confidence
    • Implement Kelly criterion for optimal position sizing

6. Training Data Augmentation

  • Implement data augmentation for more robust training
    • Simulate different market conditions
    • Add noise to training data
    • Generate synthetic data for rare market events

7. Model Interpretability

  • Add visualization for model decision making
    • Implement feature importance analysis
    • Add attention visualization for key price patterns
    • Create explainable AI components

Implementation Details

Completed: Displaying Trade Rate

The trade rate display implementation has been completed in the RealTimeChart class:

def calculate_trade_rate(self):
    """Calculate and return trading rate statistics based on recent trades"""
    if not hasattr(self, 'trade_times') or not self.trade_times:
        return {"per_second": 0, "per_minute": 0, "per_hour": 0}
    
    # Get current time
    now = datetime.now()
    
    # Calculate different time windows
    one_second_ago = now - timedelta(seconds=1)
    one_minute_ago = now - timedelta(minutes=1)
    one_hour_ago = now - timedelta(hours=1)
    
    # Count trades in different time windows
    trades_last_second = sum(1 for t in self.trade_times if t > one_second_ago)
    trades_last_minute = sum(1 for t in self.trade_times if t > one_minute_ago)
    trades_last_hour = sum(1 for t in self.trade_times if t > one_hour_ago)
    
    # Calculate rates
    return {
        "per_second": trades_last_second,
        "per_minute": trades_last_minute,
        "per_hour": trades_last_hour
    }

Completed: Improved Reward Function

The improved reward function has been implemented in improved_reward_function.py:

def calculate_reward(self, action, price_change, position_held_time=0, 
                     volatility=None, is_profitable=False):
    """
    Calculate the improved reward with risk adjustment
    """
    # Calculate trading fee
    fee = self.base_fee_rate
    
    # Calculate frequency penalty
    frequency_penalty = self._calculate_frequency_penalty()
    
    # Base reward calculation
    if action == 0:  # BUY
        # Small penalty for transaction plus frequency penalty
        reward = -fee - frequency_penalty
        
    elif action == 1:  # SELL
        # Calculate profit percentage minus fees (both entry and exit)
        profit_pct = price_change
        net_profit = profit_pct - (fee * 2)
        
        # Scale reward and apply frequency penalty
        reward = net_profit * 10  # Scale reward
        reward -= frequency_penalty
        
        # Record PnL for risk adjustment
        self.record_pnl(net_profit)
        
    else:  # HOLD
        # Small reward for holding a profitable position, small cost otherwise
        if is_profitable:
            reward = self._calculate_holding_reward(position_held_time, price_change)
        else:
            reward = -0.0001  # Very small negative reward
    
    # Apply risk adjustment if enabled
    if self.risk_adjusted:
        reward = self._calculate_risk_adjustment(reward)
        
    # Record this action for future frequency calculations
    self.record_trade(action=action)
    
    return reward

Completed: GPU Optimization

Added GPU optimization in train_rl_with_realtime.py:

def setup_gpu():
    """
    Configure GPU usage for PyTorch training
    
    Returns:
        tuple: (success, device, message)
    """
    try:
        if torch.cuda.is_available():
            gpu_count = torch.cuda.device_count()
            device_info = [torch.cuda.get_device_name(i) for i in range(gpu_count)]
            logger.info(f"Found {gpu_count} GPU(s): {', '.join(device_info)}")
            
            device = torch.device("cuda:0")
            
            # Test CUDA by creating a small tensor
            test_tensor = torch.tensor([1.0, 2.0, 3.0], device=device)
            
            # Enable mixed precision if supported
            if hasattr(torch.cuda, 'amp') and torch.cuda.is_bf16_supported():
                logger.info("BFloat16 is supported - enabling for faster training")
            
            return True, device, f"GPU enabled: {device_info}"
        else:
            return False, torch.device("cpu"), "GPU not available, using CPU"
    except Exception as e:
        return False, torch.device("cpu"), f"GPU setup failed: {str(e)}"

CNN Price Direction Prediction (To be implemented)

def generate_direction_examples(self, historical_data, timeframes=['1m', '1h', '1d']):
    """Generate price direction examples from historical data"""
    examples = []
    labels = []
    
    for tf in timeframes:
        df = historical_data[tf]
        for i in range(20, len(df) - 10):
            # Use window of 20 candles for input
            window = df.iloc[i-20:i]
            
            # Create labels for future price direction (next 5, 10, 20 candles)
            future_5 = df.iloc[i].close < df.iloc[i+5].close  # True if price goes up
            future_10 = df.iloc[i].close < df.iloc[i+10].close
            future_20 = df.iloc[i].close < df.iloc[min(i+20, len(df)-1)].close
            
            examples.append(window.values)
            labels.append([future_5, future_10, future_20])
    
    return np.array(examples), np.array(labels)

Validation Plan

After implementing these improvements, we should validate the system with:

  1. Backtesting on historical data
  2. Forward testing with small position sizes
  3. A/B testing of different reward functions
  4. Measuring the improvement in profitability and Sharpe ratio

Progress Tracking

  • Implementation started: June 2023
  • GPU utilization fixed: July 2023
  • Trade signal rate display implemented: July 2023
  • Reward function optimized: July 2023
  • CNN direction prediction added: To be completed
  • Full system tested: To be completed