Transformer Model Input/Output Structure

FIXED ISSUE: Batch Data Deletion Bug

Problem: Training was failing after epoch 1 with "At least one timeframe must be provided" Root Cause: Batch tensors were being deleted after each use in the training loop, but the same batch dictionaries were being reused across all epochs. Solution: Removed batch deletion from inside the epoch loop and moved cleanup to after all epochs complete.

Current Model Architecture

INPUT Structure (Multi-Timeframe)

The model accepts the following inputs in the forward() method:

forward(
    # Price data for different timeframes - [batch, seq_len, 5] OHLCV
    price_data_1s=None,   # 1-second timeframe
    price_data_1m=None,   # 1-minute timeframe  
    price_data_1h=None,   # 1-hour timeframe
    price_data_1d=None,   # 1-day timeframe
    
    # Reference data
    btc_data_1m=None,     # BTC reference - [batch, seq_len, 5]
    
    # Additional features
    cob_data=None,        # COB orderbook data - [batch, seq_len, 100]
    tech_data=None,       # Technical indicators - [batch, 40]
    market_data=None,     # Market context (pivots, volume) - [batch, 30]
    position_state=None,  # Current position state - [batch, 5]
    
    # Legacy support
    price_data=None       # Fallback to single timeframe
)

At least one timeframe (price_data_1s, 1m, 1h, or 1d) must be provided, otherwise the model raises:

ValueError: At least one timeframe must be provided

OUTPUT Structure

The model returns a dictionary with the following predictions:

outputs = {
    # PRIMARY OUTPUTS (trained with loss):
    'action_logits': tensor,        # [batch, 3] - BUY/SELL/HOLD logits
    'action_probs': tensor,         # [batch, 3] - softmax probabilities
    'price_prediction': tensor,     # [batch, 1] - next price change ratio
    'confidence': tensor,           # [batch, 1] - prediction confidence
    
    # TREND ANALYSIS (trained with loss):
    'trend_analysis': {
        'angle_radians': tensor,    # [batch, 1] - trend angle in radians
        'steepness': tensor,        # [batch, 1] - trend steepness (0-1)
        'direction': tensor         # [batch, 1] - direction (-1/0/+1)
    },
    
    # NEXT CANDLE PREDICTIONS (evaluated but NOT trained):
    'next_candles': {
        '1s': tensor,               # [batch, 5] - predicted OHLCV for 1s
        '1m': tensor,               # [batch, 5] - predicted OHLCV for 1m
        '1h': tensor,               # [batch, 5] - predicted OHLCV for 1h
        '1d': tensor,               # [batch, 5] - predicted OHLCV for 1d
    },
    'btc_next_candle': tensor,      # [batch, 5] - predicted BTC OHLCV
    
    # PIVOT PREDICTIONS:
    'next_pivots': {
        'L1': {
            'price': tensor,            # [batch, 1] - pivot price
            'type_prob_high': tensor,   # [batch, 1] - probability of high
            'type_prob_low': tensor,    # [batch, 1] - probability of low
            'pivot_type': tensor,       # [batch, 1] - 0=high, 1=low
            'confidence': tensor        # [batch, 1] - confidence
        },
        # Same structure for L2, L3, L4, L5
    },
    
    # AUXILIARY OUTPUTS:
    'volatility_prediction': tensor,    # [batch, 1]
    'trend_strength_prediction': tensor, # [batch, 1]
    'uncertainty_mean': tensor,         # [batch, 1]
    'uncertainty_std': tensor           # [batch, 1]
}

TRAINING TARGETS (in batch)

batch = {
    # Input features (see INPUT Structure above)
    'price_data_1s': tensor,
    'price_data_1m': tensor,
    'price_data_1h': tensor,
    'price_data_1d': tensor,
    'btc_data_1m': tensor,
    'cob_data': tensor,
    'tech_data': tensor,
    'market_data': tensor,
    'position_state': tensor,
    
    # Training targets:
    'actions': tensor,              # [batch] - target action (0/1/2)
    'future_prices': tensor,        # [batch, 1] - actual price change ratio
    'trade_success': tensor,        # [batch, 1] - 1.0 if profitable
    'trend_target': tensor,         # [batch, 3] - [angle, steepness, direction]
}

LOSS CALCULATION

Current loss function in train_step():

total_loss = action_loss + 0.1 * price_loss + 0.05 * trend_loss

where:
- action_loss: CrossEntropyLoss(action_logits, actions)
- price_loss: MSELoss(price_prediction, future_prices)
- trend_loss: MSELoss(trend_pred, trend_target)

NOTE: Next candle predictions are currently only used for accuracy evaluation, NOT trained directly.

CURRENT ISSUES AND RECOMMENDATIONS

Issue 1: Next Candle Predictions Not Trained

Status: The model outputs next candle predictions for each timeframe, but these are NOT included in the loss function. Impact: The model is not explicitly learning to predict next candle OHLCV values.

Recommendation: Add next candle loss to training:

# Calculate next candle loss for each available timeframe
candle_loss = 0.0
if 'next_candles' in outputs:
    for tf in ['1s', '1m', '1h', '1d']:
        if tf in outputs['next_candles'] and f'future_candle_{tf}' in batch:
            pred_candle = outputs['next_candles'][tf]  # [batch, 5]
            target_candle = batch[f'future_candle_{tf}']  # [batch, 5]
            candle_loss += MSELoss(pred_candle, target_candle)

total_loss = action_loss + 0.1 * price_loss + 0.05 * trend_loss + 0.1 * candle_loss

Issue 2: Annotation Timeframe vs Prediction Timeframe

Current Behavior:

Annotations are created at a specific point in time
The model receives multiple timeframes (1s, 1m, 1h, 1d) as input
Predictions are made for ALL timeframes simultaneously
Only the 1m timeframe prediction is currently evaluated for accuracy

Question: Should predictions be specific to the annotation's timeframe?

Options:

Multi-timeframe predictions (current): Keep predicting all timeframes, add loss for each
Annotation-specific predictions: Only predict/train on the timeframe that matches the annotation
Weighted predictions: Weight the loss by the annotation's timeframe (e.g., if annotated on 1m, weight 1m prediction higher)

Issue 3: Missing Target Data for Next Candles

Current: The batch only contains future_prices (next close price change) Needed: To train next candle predictions, we need full OHLCV targets:

future_candle_1s: [batch, 5] - next 1s candle OHLCV
future_candle_1m: [batch, 5] - next 1m candle OHLCV
future_candle_1h: [batch, 5] - next 1h candle OHLCV
future_candle_1d: [batch, 5] - next 1d candle OHLCV

Location to add: ANNOTATE/core/real_training_adapter.py in _convert_annotation_to_transformer_batch()

SUMMARY

✅ Fixed: Batch deletion bug causing epoch 2+ failures ✅ Working: Model can predict next candles for all timeframes ✅ Working: Model can predict trend vector (angle, steepness, direction) ❌ Missing: Loss calculation for next candle predictions ❌ Missing: Target data (future OHLCV) for next candle training ⚠️ Unclear: Should predictions be timeframe-specific or multi-timeframe?

NEXT STEPS

Add future OHLCV target data to training batches
Add next candle loss to the training loop
Clarify prediction strategy: Single timeframe vs multi-timeframe
Test training with enhanced loss function

7.2 KiB Raw Blame History