5.6 KiB
5.6 KiB
Training Improvements Summary
What Changed
1. Extended Data Fetching Window ✅
Before:
context_window = 5 # Only ±5 minutes
start_time = timestamp - 5 minutes
end_time = timestamp + 5 minutes
After:
context_window = 5
negative_samples_window = 15 # ±15 candles
extended_window = max(5, 15 + 10) # = 25 minutes
start_time = timestamp - 25 minutes
end_time = timestamp + 25 minutes
Impact: Fetches enough data to create ±15 candle negative samples
2. Dynamic Candle Limits ✅
Before:
limit = 200 # Fixed for all timeframes
After:
if timeframe == '1s':
limit = extended_window_minutes * 60 * 2 + 100 # ~3100
elif timeframe == '1m':
limit = extended_window_minutes * 2 + 50 # ~100
elif timeframe == '1h':
limit = max(200, extended_window_minutes // 30) # 200+
elif timeframe == '1d':
limit = 200
Impact: Requests appropriate amount of data per timeframe
3. Improved Logging ✅
Before:
DEBUG - Added 30 negative samples
After:
INFO - Test case 1: ENTRY sample - LONG @ 2500.0
INFO - Test case 1: Added 30 HOLD samples (during position)
INFO - Test case 1: EXIT sample @ 2562.5 (2.50%)
INFO - Test case 1: Added 30 NO_TRADE samples (±15 candles)
INFO - → 15 before signal, 15 after signal
Impact: Clear visibility into training data composition
4. Historical Data Priority ✅
Before:
df = data_provider.get_historical_data(limit=100) # Latest data
After:
# Try DuckDB first (historical at specific timestamp)
df = duckdb_storage.get_ohlcv_data(
start_time=start_time,
end_time=end_time
)
# Fallback to replay
if df is None:
df = data_provider.get_historical_data_replay(
start_time=start_time,
end_time=end_time
)
# Last resort: latest data (with warning)
if df is None:
logger.warning("Using latest data as fallback")
df = data_provider.get_historical_data(limit=limit)
Impact: Trains on correct historical data, not current data
Training Data Composition
Per Annotation
| Sample Type | Count | Repetitions | Total Batches |
|---|---|---|---|
| ENTRY | 1 | 100 | 100 |
| HOLD | ~30 | 25 | 750 |
| EXIT | 1 | 100 | 100 |
| NO_TRADE | ~30 | 50 | 1,500 |
| Total | ~62 | - | ~2,450 |
5 Annotations
| Sample Type | Count | Total Batches |
|---|---|---|
| ENTRY | 5 | 500 |
| HOLD | ~150 | 3,750 |
| EXIT | 5 | 500 |
| NO_TRADE | ~150 | 7,500 |
| Total | ~310 | ~12,250 |
Key Ratio: 1:30 (entry:no_trade) - Model learns to be selective!
What This Achieves
1. Continuous Data Training ✅
- Trains on every candle ±15 around signals
- Not just isolated entry/exit points
- Learns from continuous price action
2. Negative Sampling ✅
- 30 NO_TRADE samples per annotation
- 15 before signal (don't enter too early)
- 15 after signal (don't chase)
3. Context Learning ✅
- Model sees what happened before signal
- Model sees what happened after signal
- Learns timing and context
4. Selective Trading ✅
- High ratio of NO_TRADE samples
- Teaches model to wait for quality setups
- Reduces false signals
Example Training Output
Starting REAL training with 5 test cases for model Transformer
Preparing training data from 5 test cases...
Negative sampling: +/-15 candles around signals
Training repetitions: 100x per sample
Fetching market state dynamically for test case 1...
Fetching HISTORICAL market state for ETH/USDT at 2025-10-27 14:00
Timeframes: ['1s', '1m', '1h', '1d'], Extended window: ±25 minutes
(Includes ±15 candles for negative sampling)
1m: 100 candles from DuckDB (historical)
1h: 200 candles from DuckDB (historical)
1d: 200 candles from DuckDB (historical)
Fetched market state with 3 timeframes
Test case 1: ENTRY sample - LONG @ 2500.0
Test case 1: Added 30 HOLD samples (during position)
Test case 1: EXIT sample @ 2562.5 (2.50%)
Test case 1: Added 30 NO_TRADE samples (±15 candles)
→ 15 before signal, 15 after signal
Prepared 310 training samples from 5 test cases
ENTRY samples: 5
HOLD samples: 150
EXIT samples: 5
NO_TRADE samples: 150
Ratio: 1:30.0 (entry:no_trade)
Starting Transformer training...
Converting annotation data to transformer format...
Converted 310 samples to 12,250 training batches
Files Modified
ANNOTATE/core/real_training_adapter.py- Extended data fetching window
- Dynamic candle limits
- Improved logging
- Historical data priority
New Documentation
-
ANNOTATE/CONTINUOUS_DATA_TRAINING_STRATEGY.md- Detailed explanation of training strategy
- Sample composition breakdown
- Configuration guidelines
- Monitoring tips
-
ANNOTATE/DATA_LOADING_ARCHITECTURE.md- Data storage architecture
- Dynamic loading strategy
- Troubleshooting guide
-
MODEL_INPUTS_OUTPUTS_REFERENCE.md- All model inputs/outputs
- Shape specifications
- Integration examples
Next Steps
-
Test Training
- Run training with 5+ annotations
- Verify NO_TRADE samples are created
- Check logs for data fetching
-
Monitor Ratios
- Ideal: 1:20 to 1:40 (entry:no_trade)
- Adjust
negative_samples_windowif needed
-
Verify Data
- Ensure DuckDB has historical data
- Check for "fallback" warnings
- Confirm timestamps match annotations
-
Tune Parameters
- Adjust
extended_window_minutesif needed - Modify repetitions based on dataset size
- Balance training time vs accuracy
- Adjust