Files
gogo2/ANNOTATE/TRAINING_IMPROVEMENTS_SUMMARY.md
2025-10-31 03:14:35 +02:00

5.6 KiB

Training Improvements Summary

What Changed

1. Extended Data Fetching Window

Before:

context_window = 5  # Only ±5 minutes
start_time = timestamp - 5 minutes
end_time = timestamp + 5 minutes

After:

context_window = 5
negative_samples_window = 15  # ±15 candles
extended_window = max(5, 15 + 10)  # = 25 minutes

start_time = timestamp - 25 minutes
end_time = timestamp + 25 minutes

Impact: Fetches enough data to create ±15 candle negative samples


2. Dynamic Candle Limits

Before:

limit = 200  # Fixed for all timeframes

After:

if timeframe == '1s':
    limit = extended_window_minutes * 60 * 2 + 100  # ~3100
elif timeframe == '1m':
    limit = extended_window_minutes * 2 + 50        # ~100
elif timeframe == '1h':
    limit = max(200, extended_window_minutes // 30) # 200+
elif timeframe == '1d':
    limit = 200

Impact: Requests appropriate amount of data per timeframe


3. Improved Logging

Before:

DEBUG - Added 30 negative samples

After:

INFO - Test case 1: ENTRY sample - LONG @ 2500.0
INFO - Test case 1: Added 30 HOLD samples (during position)
INFO - Test case 1: EXIT sample @ 2562.5 (2.50%)
INFO - Test case 1: Added 30 NO_TRADE samples (±15 candles)
INFO -    → 15 before signal, 15 after signal

Impact: Clear visibility into training data composition


4. Historical Data Priority

Before:

df = data_provider.get_historical_data(limit=100)  # Latest data

After:

# Try DuckDB first (historical at specific timestamp)
df = duckdb_storage.get_ohlcv_data(
    start_time=start_time,
    end_time=end_time
)

# Fallback to replay
if df is None:
    df = data_provider.get_historical_data_replay(
        start_time=start_time,
        end_time=end_time
    )

# Last resort: latest data (with warning)
if df is None:
    logger.warning("Using latest data as fallback")
    df = data_provider.get_historical_data(limit=limit)

Impact: Trains on correct historical data, not current data


Training Data Composition

Per Annotation

Sample Type Count Repetitions Total Batches
ENTRY 1 100 100
HOLD ~30 25 750
EXIT 1 100 100
NO_TRADE ~30 50 1,500
Total ~62 - ~2,450

5 Annotations

Sample Type Count Total Batches
ENTRY 5 500
HOLD ~150 3,750
EXIT 5 500
NO_TRADE ~150 7,500
Total ~310 ~12,250

Key Ratio: 1:30 (entry:no_trade) - Model learns to be selective!


What This Achieves

1. Continuous Data Training

  • Trains on every candle ±15 around signals
  • Not just isolated entry/exit points
  • Learns from continuous price action

2. Negative Sampling

  • 30 NO_TRADE samples per annotation
  • 15 before signal (don't enter too early)
  • 15 after signal (don't chase)

3. Context Learning

  • Model sees what happened before signal
  • Model sees what happened after signal
  • Learns timing and context

4. Selective Trading

  • High ratio of NO_TRADE samples
  • Teaches model to wait for quality setups
  • Reduces false signals

Example Training Output

Starting REAL training with 5 test cases for model Transformer

Preparing training data from 5 test cases...
   Negative sampling: +/-15 candles around signals
   Training repetitions: 100x per sample

   Fetching market state dynamically for test case 1...
   Fetching HISTORICAL market state for ETH/USDT at 2025-10-27 14:00
   Timeframes: ['1s', '1m', '1h', '1d'], Extended window: ±25 minutes
   (Includes ±15 candles for negative sampling)
       1m: 100 candles from DuckDB (historical)
       1h: 200 candles from DuckDB (historical)
       1d: 200 candles from DuckDB (historical)
    Fetched market state with 3 timeframes

   Test case 1: ENTRY sample - LONG @ 2500.0
   Test case 1: Added 30 HOLD samples (during position)
   Test case 1: EXIT sample @ 2562.5 (2.50%)
   Test case 1: Added 30 NO_TRADE samples (±15 candles)
      → 15 before signal, 15 after signal

 Prepared 310 training samples from 5 test cases
   ENTRY samples: 5
   HOLD samples: 150
   EXIT samples: 5
   NO_TRADE samples: 150
   Ratio: 1:30.0 (entry:no_trade)

 Starting Transformer training...
    Converting annotation data to transformer format...
     Converted 310 samples to 12,250 training batches

Files Modified

  1. ANNOTATE/core/real_training_adapter.py
    • Extended data fetching window
    • Dynamic candle limits
    • Improved logging
    • Historical data priority

New Documentation

  1. ANNOTATE/CONTINUOUS_DATA_TRAINING_STRATEGY.md

    • Detailed explanation of training strategy
    • Sample composition breakdown
    • Configuration guidelines
    • Monitoring tips
  2. ANNOTATE/DATA_LOADING_ARCHITECTURE.md

    • Data storage architecture
    • Dynamic loading strategy
    • Troubleshooting guide
  3. MODEL_INPUTS_OUTPUTS_REFERENCE.md

    • All model inputs/outputs
    • Shape specifications
    • Integration examples

Next Steps

  1. Test Training

    • Run training with 5+ annotations
    • Verify NO_TRADE samples are created
    • Check logs for data fetching
  2. Monitor Ratios

    • Ideal: 1:20 to 1:40 (entry:no_trade)
    • Adjust negative_samples_window if needed
  3. Verify Data

    • Ensure DuckDB has historical data
    • Check for "fallback" warnings
    • Confirm timestamps match annotations
  4. Tune Parameters

    • Adjust extended_window_minutes if needed
    • Modify repetitions based on dataset size
    • Balance training time vs accuracy