Files
gogo2/docs/main/_MODEL_INPUT_OUTPUT_STRUCTURE.md
2025-11-12 14:36:28 +02:00

7.2 KiB

Transformer Model Input/Output Structure

FIXED ISSUE: Batch Data Deletion Bug

Problem: Training was failing after epoch 1 with "At least one timeframe must be provided" Root Cause: Batch tensors were being deleted after each use in the training loop, but the same batch dictionaries were being reused across all epochs. Solution: Removed batch deletion from inside the epoch loop and moved cleanup to after all epochs complete.

Current Model Architecture

INPUT Structure (Multi-Timeframe)

The model accepts the following inputs in the forward() method:

forward(
    # Price data for different timeframes - [batch, seq_len, 5] OHLCV
    price_data_1s=None,   # 1-second timeframe
    price_data_1m=None,   # 1-minute timeframe  
    price_data_1h=None,   # 1-hour timeframe
    price_data_1d=None,   # 1-day timeframe
    
    # Reference data
    btc_data_1m=None,     # BTC reference - [batch, seq_len, 5]
    
    # Additional features
    cob_data=None,        # COB orderbook data - [batch, seq_len, 100]
    tech_data=None,       # Technical indicators - [batch, 40]
    market_data=None,     # Market context (pivots, volume) - [batch, 30]
    position_state=None,  # Current position state - [batch, 5]
    
    # Legacy support
    price_data=None       # Fallback to single timeframe
)

At least one timeframe (price_data_1s, 1m, 1h, or 1d) must be provided, otherwise the model raises:

ValueError: At least one timeframe must be provided

OUTPUT Structure

The model returns a dictionary with the following predictions:

outputs = {
    # PRIMARY OUTPUTS (trained with loss):
    'action_logits': tensor,        # [batch, 3] - BUY/SELL/HOLD logits
    'action_probs': tensor,         # [batch, 3] - softmax probabilities
    'price_prediction': tensor,     # [batch, 1] - next price change ratio
    'confidence': tensor,           # [batch, 1] - prediction confidence
    
    # TREND ANALYSIS (trained with loss):
    'trend_analysis': {
        'angle_radians': tensor,    # [batch, 1] - trend angle in radians
        'steepness': tensor,        # [batch, 1] - trend steepness (0-1)
        'direction': tensor         # [batch, 1] - direction (-1/0/+1)
    },
    
    # NEXT CANDLE PREDICTIONS (evaluated but NOT trained):
    'next_candles': {
        '1s': tensor,               # [batch, 5] - predicted OHLCV for 1s
        '1m': tensor,               # [batch, 5] - predicted OHLCV for 1m
        '1h': tensor,               # [batch, 5] - predicted OHLCV for 1h
        '1d': tensor,               # [batch, 5] - predicted OHLCV for 1d
    },
    'btc_next_candle': tensor,      # [batch, 5] - predicted BTC OHLCV
    
    # PIVOT PREDICTIONS:
    'next_pivots': {
        'L1': {
            'price': tensor,            # [batch, 1] - pivot price
            'type_prob_high': tensor,   # [batch, 1] - probability of high
            'type_prob_low': tensor,    # [batch, 1] - probability of low
            'pivot_type': tensor,       # [batch, 1] - 0=high, 1=low
            'confidence': tensor        # [batch, 1] - confidence
        },
        # Same structure for L2, L3, L4, L5
    },
    
    # AUXILIARY OUTPUTS:
    'volatility_prediction': tensor,    # [batch, 1]
    'trend_strength_prediction': tensor, # [batch, 1]
    'uncertainty_mean': tensor,         # [batch, 1]
    'uncertainty_std': tensor           # [batch, 1]
}

TRAINING TARGETS (in batch)

batch = {
    # Input features (see INPUT Structure above)
    'price_data_1s': tensor,
    'price_data_1m': tensor,
    'price_data_1h': tensor,
    'price_data_1d': tensor,
    'btc_data_1m': tensor,
    'cob_data': tensor,
    'tech_data': tensor,
    'market_data': tensor,
    'position_state': tensor,
    
    # Training targets:
    'actions': tensor,              # [batch] - target action (0/1/2)
    'future_prices': tensor,        # [batch, 1] - actual price change ratio
    'trade_success': tensor,        # [batch, 1] - 1.0 if profitable
    'trend_target': tensor,         # [batch, 3] - [angle, steepness, direction]
}

LOSS CALCULATION

Current loss function in train_step():

total_loss = action_loss + 0.1 * price_loss + 0.05 * trend_loss

where:
- action_loss: CrossEntropyLoss(action_logits, actions)
- price_loss: MSELoss(price_prediction, future_prices)
- trend_loss: MSELoss(trend_pred, trend_target)

NOTE: Next candle predictions are currently only used for accuracy evaluation, NOT trained directly.

CURRENT ISSUES AND RECOMMENDATIONS

Issue 1: Next Candle Predictions Not Trained

Status: The model outputs next candle predictions for each timeframe, but these are NOT included in the loss function. Impact: The model is not explicitly learning to predict next candle OHLCV values.

Recommendation: Add next candle loss to training:

# Calculate next candle loss for each available timeframe
candle_loss = 0.0
if 'next_candles' in outputs:
    for tf in ['1s', '1m', '1h', '1d']:
        if tf in outputs['next_candles'] and f'future_candle_{tf}' in batch:
            pred_candle = outputs['next_candles'][tf]  # [batch, 5]
            target_candle = batch[f'future_candle_{tf}']  # [batch, 5]
            candle_loss += MSELoss(pred_candle, target_candle)

total_loss = action_loss + 0.1 * price_loss + 0.05 * trend_loss + 0.1 * candle_loss

Issue 2: Annotation Timeframe vs Prediction Timeframe

Current Behavior:

  • Annotations are created at a specific point in time
  • The model receives multiple timeframes (1s, 1m, 1h, 1d) as input
  • Predictions are made for ALL timeframes simultaneously
  • Only the 1m timeframe prediction is currently evaluated for accuracy

Question: Should predictions be specific to the annotation's timeframe?

Options:

  1. Multi-timeframe predictions (current): Keep predicting all timeframes, add loss for each
  2. Annotation-specific predictions: Only predict/train on the timeframe that matches the annotation
  3. Weighted predictions: Weight the loss by the annotation's timeframe (e.g., if annotated on 1m, weight 1m prediction higher)

Issue 3: Missing Target Data for Next Candles

Current: The batch only contains future_prices (next close price change) Needed: To train next candle predictions, we need full OHLCV targets:

  • future_candle_1s: [batch, 5] - next 1s candle OHLCV
  • future_candle_1m: [batch, 5] - next 1m candle OHLCV
  • future_candle_1h: [batch, 5] - next 1h candle OHLCV
  • future_candle_1d: [batch, 5] - next 1d candle OHLCV

Location to add: ANNOTATE/core/real_training_adapter.py in _convert_annotation_to_transformer_batch()

SUMMARY

Fixed: Batch deletion bug causing epoch 2+ failures Working: Model can predict next candles for all timeframes Working: Model can predict trend vector (angle, steepness, direction) Missing: Loss calculation for next candle predictions Missing: Target data (future OHLCV) for next candle training ⚠️ Unclear: Should predictions be timeframe-specific or multi-timeframe?

NEXT STEPS

  1. Add future OHLCV target data to training batches
  2. Add next candle loss to the training loop
  3. Clarify prediction strategy: Single timeframe vs multi-timeframe
  4. Test training with enhanced loss function