gogo2/TODO_IMPROVEMENTS.md

# Cryptocurrency Trading System Improvements

## Overview
This document outlines necessary improvements to our cryptocurrency trading system to enhance performance, profitability, and monitoring capabilities.

## High Priority Tasks

### 1. GPU Utilization for Training
- [x] Fix GPU detection and utilization during training
  - [x] Debug why CUDA is detected but not utilized (check logs showing "Starting training on device: cpu")
  - [x] Ensure PyTorch correctly detects and uses available CUDA devices
  - [x] Add GPU memory monitoring during training
  - [x] Optimize batch sizes for GPU training

Implementation status:
- Added `setup_gpu()` function in `train_rl_with_realtime.py` to properly detect and configure GPU usage
- Added device parameter to DQNAgent to ensure models are created on the correct device
- Implemented mixed precision training for faster GPU-based training
- Added GPU memory monitoring and logging to TensorBoard

### 2. Trade Signal Rate Display
- [x] Add metrics to track and display trading frequency
  - [x] Implement counter for actions per second/minute/hour
  - [x] Add visualization to the chart showing trading frequency over time
  - [x] Create a moving average of trade signals to show trends
  - [x] Add dashboard section showing current and average trading rates

Implementation status:
- Added trade time tracking in `_add_trade_compat` function
- Added `calculate_trade_rate` method to `RealTimeChart` class
- Updated dashboard layout to display trade rates
- Added visualization of trade frequency in chart's bottom panel

### 3. Reward Function Optimization
- [x] Revise reward function to better balance profit and risk
  - [x] Increase transaction fee penalty for more realistic simulation
  - [x] Implement progressive rewards based on holding time
  - [x] Add penalty for frequent trading (to reduce noise)
  - [x] Scale rewards based on market volatility
  - [x] Implement risk-adjusted returns (Sharpe ratio) in reward calculation

Implementation status:
- Created `improved_reward_function.py` with `ImprovedRewardCalculator` class
- Implemented Sharpe ratio for risk-adjusted rewards
- Added frequency penalty for excessive trading
- Added holding time rewards for profitable positions
- Integrated with `EnhancedRLTradingEnvironment` class

### 4. Multi-timeframe Price Direction Prediction
- [ ] Extend CNN model to predict price direction for multiple timeframes
  - [ ] Modify CNN output to predict short, mid, and long-term price directions
  - [ ] Create data generation method for back-propagation using historical data
  - [ ] Implement real-time example generation for training
  - [ ] Feed direction predictions to RL agent as additional state information

## Medium Priority Tasks

### 5. Position Sizing Optimization
- [ ] Implement dynamic position sizing based on confidence and volatility
  - [ ] Add confidence score to model outputs
  - [ ] Scale position size based on prediction confidence
  - [ ] Implement Kelly criterion for optimal position sizing

### 6. Training Data Augmentation
- [ ] Implement data augmentation for more robust training
  - [ ] Simulate different market conditions
  - [ ] Add noise to training data
  - [ ] Generate synthetic data for rare market events

### 7. Model Interpretability
- [ ] Add visualization for model decision making
  - [ ] Implement feature importance analysis
  - [ ] Add attention visualization for key price patterns
  - [ ] Create explainable AI components

## Implementation Details

### Completed: Displaying Trade Rate
The trade rate display implementation has been completed in the `RealTimeChart` class:
```python
def calculate_trade_rate(self):
    """Calculate and return trading rate statistics based on recent trades"""
    if not hasattr(self, 'trade_times') or not self.trade_times:
        return {"per_second": 0, "per_minute": 0, "per_hour": 0}

    # Get current time
    now = datetime.now()

    # Calculate different time windows
    one_second_ago = now - timedelta(seconds=1)
    one_minute_ago = now - timedelta(minutes=1)
    one_hour_ago = now - timedelta(hours=1)

    # Count trades in different time windows
    trades_last_second = sum(1 for t in self.trade_times if t > one_second_ago)
    trades_last_minute = sum(1 for t in self.trade_times if t > one_minute_ago)
    trades_last_hour = sum(1 for t in self.trade_times if t > one_hour_ago)

    # Calculate rates
    return {
        "per_second": trades_last_second,
        "per_minute": trades_last_minute,
        "per_hour": trades_last_hour
    }
```

### Completed: Improved Reward Function
The improved reward function has been implemented in `improved_reward_function.py`:
```python
def calculate_reward(self, action, price_change, position_held_time=0,
                     volatility=None, is_profitable=False):
    """
    Calculate the improved reward with risk adjustment
    """
    # Calculate trading fee
    fee = self.base_fee_rate

    # Calculate frequency penalty
    frequency_penalty = self._calculate_frequency_penalty()

    # Base reward calculation
    if action == 0:  # BUY
        # Small penalty for transaction plus frequency penalty
        reward = -fee - frequency_penalty

    elif action == 1:  # SELL
        # Calculate profit percentage minus fees (both entry and exit)
        profit_pct = price_change
        net_profit = profit_pct - (fee * 2)

        # Scale reward and apply frequency penalty
        reward = net_profit * 10  # Scale reward
        reward -= frequency_penalty

        # Record PnL for risk adjustment
        self.record_pnl(net_profit)

    else:  # HOLD
        # Small reward for holding a profitable position, small cost otherwise
        if is_profitable:
            reward = self._calculate_holding_reward(position_held_time, price_change)
        else:
            reward = -0.0001  # Very small negative reward

    # Apply risk adjustment if enabled
    if self.risk_adjusted:
        reward = self._calculate_risk_adjustment(reward)

    # Record this action for future frequency calculations
    self.record_trade(action=action)

    return reward
```

### Completed: GPU Optimization
Added GPU optimization in `train_rl_with_realtime.py`:
```python
def setup_gpu():
    """
    Configure GPU usage for PyTorch training

    Returns:
        tuple: (success, device, message)
    """
    try:
        if torch.cuda.is_available():
            gpu_count = torch.cuda.device_count()
            device_info = [torch.cuda.get_device_name(i) for i in range(gpu_count)]
            logger.info(f"Found {gpu_count} GPU(s): {', '.join(device_info)}")

            device = torch.device("cuda:0")

            # Test CUDA by creating a small tensor
            test_tensor = torch.tensor([1.0, 2.0, 3.0], device=device)

            # Enable mixed precision if supported
            if hasattr(torch.cuda, 'amp') and torch.cuda.is_bf16_supported():
                logger.info("BFloat16 is supported - enabling for faster training")

            return True, device, f"GPU enabled: {device_info}"
        else:
            return False, torch.device("cpu"), "GPU not available, using CPU"
    except Exception as e:
        return False, torch.device("cpu"), f"GPU setup failed: {str(e)}"
```

### CNN Price Direction Prediction (To be implemented)
```python
def generate_direction_examples(self, historical_data, timeframes=['1m', '1h', '1d']):
    """Generate price direction examples from historical data"""
    examples = []
    labels = []

    for tf in timeframes:
        df = historical_data[tf]
        for i in range(20, len(df) - 10):
            # Use window of 20 candles for input
            window = df.iloc[i-20:i]

            # Create labels for future price direction (next 5, 10, 20 candles)
            future_5 = df.iloc[i].close < df.iloc[i+5].close  # True if price goes up
            future_10 = df.iloc[i].close < df.iloc[i+10].close
            future_20 = df.iloc[i].close < df.iloc[min(i+20, len(df)-1)].close

            examples.append(window.values)
            labels.append([future_5, future_10, future_20])

    return np.array(examples), np.array(labels)
```

## Validation Plan
After implementing these improvements, we should validate the system with:
1. Backtesting on historical data
2. Forward testing with small position sizes
3. A/B testing of different reward functions
4. Measuring the improvement in profitability and Sharpe ratio

## Progress Tracking
- Implementation started: June 2023
- GPU utilization fixed: July 2023
- Trade signal rate display implemented: July 2023
- Reward function optimized: July 2023
- CNN direction prediction added: To be completed
- Full system tested: To be completed