7.9 KiB
ANNOTATE - Training Data Format
🎯 Overview
The ANNOTATE system generates training data that includes ±5 minutes of market data around each trade signal. This allows models to learn:
- WHERE to generate signals (at entry/exit points)
- WHERE NOT to generate signals (before entry, after exit)
- Context around the signal (what led to the trade)
📦 Test Case Structure
Complete Format
{
"test_case_id": "annotation_uuid",
"symbol": "ETH/USDT",
"timestamp": "2024-01-15T10:30:00Z",
"action": "BUY",
"market_state": {
"ohlcv_1s": {
"timestamps": [...], // ±5 minutes of 1s candles (~600 candles)
"open": [...],
"high": [...],
"low": [...],
"close": [...],
"volume": [...]
},
"ohlcv_1m": {
"timestamps": [...], // ±5 minutes of 1m candles (~10 candles)
"open": [...],
"high": [...],
"low": [...],
"close": [...],
"volume": [...]
},
"ohlcv_1h": {
"timestamps": [...], // ±5 minutes of 1h candles (usually 1 candle)
"open": [...],
"high": [...],
"low": [...],
"close": [...],
"volume": [...]
},
"ohlcv_1d": {
"timestamps": [...], // ±5 minutes of 1d candles (usually 1 candle)
"open": [...],
"high": [...],
"low": [...],
"close": [...],
"volume": [...]
},
"training_labels": {
"labels_1m": [0, 0, 0, 1, 2, 2, 3, 0, 0, 0], // Label for each 1m candle
"direction": "LONG",
"entry_timestamp": "2024-01-15T10:30:00",
"exit_timestamp": "2024-01-15T10:35:00"
}
},
"expected_outcome": {
"direction": "LONG",
"profit_loss_pct": 2.5,
"entry_price": 2400.50,
"exit_price": 2460.75,
"holding_period_seconds": 300
},
"annotation_metadata": {
"annotator": "manual",
"confidence": 1.0,
"notes": "",
"created_at": "2024-01-15T11:00:00Z",
"timeframe": "1m"
}
}
🏷️ Training Labels
Label System
Each timestamp in the ±5 minute window is labeled:
| Label | Meaning | Description |
|---|---|---|
| 0 | NO SIGNAL | Before entry or after exit - model should NOT signal |
| 1 | ENTRY SIGNAL | At entry time - model SHOULD signal BUY/SELL |
| 2 | HOLD | Between entry and exit - model should maintain position |
| 3 | EXIT SIGNAL | At exit time - model SHOULD signal close position |
Example Timeline
Time: 10:25 10:26 10:27 10:28 10:29 10:30 10:31 10:32 10:33 10:34 10:35 10:36 10:37
Label: 0 0 0 0 0 1 2 2 2 2 3 0 0
Action: NO NO NO NO NO ENTRY HOLD HOLD HOLD HOLD EXIT NO NO
Why This Matters
- Negative Examples: Model learns NOT to signal at random times
- Context: Model sees what happens before/after the signal
- Precision: Model learns exact timing, not just "buy somewhere"
📊 Data Window
Time Window: ±5 Minutes
Entry Time: 10:30:00
Window Start: 10:25:00 (5 minutes before)
Window End: 10:35:00 (5 minutes after)
Candle Counts by Timeframe
| Timeframe | Candles in ±5min | Purpose |
|---|---|---|
| 1s | ~600 candles | Micro-structure, order flow |
| 1m | ~10 candles | Short-term patterns |
| 1h | ~1 candle | Trend context |
| 1d | ~1 candle | Market regime |
🎓 Training Strategy
Positive Examples (Signal Points)
- Entry Point (Label 1): Model learns to recognize entry conditions
- Exit Point (Label 3): Model learns to recognize exit conditions
Negative Examples (Non-Signal Points)
- Before Entry (Label 0): Model learns NOT to signal too early
- After Exit (Label 0): Model learns NOT to signal too late
- During Hold (Label 2): Model learns to maintain position
Balanced Training
For each annotation:
- 1 entry signal (Label 1)
- 1 exit signal (Label 3)
- ~3-5 hold periods (Label 2)
- ~5-8 no-signal periods (Label 0)
This creates a balanced dataset where the model learns:
- When TO act (20% of time)
- When NOT to act (80% of time)
🔧 Implementation Details
Data Fetching
# Get ±5 minutes around entry
entry_time = annotation.entry['timestamp']
start_time = entry_time - timedelta(minutes=5)
end_time = entry_time + timedelta(minutes=5)
# Fetch data for window
df = data_provider.get_historical_data(
symbol=symbol,
timeframe=timeframe,
limit=1000
)
# Filter to window
df_window = df[(df.index >= start_time) & (df.index <= end_time)]
Label Generation
for timestamp in timestamps:
if near_entry(timestamp):
label = 1 # ENTRY SIGNAL
elif near_exit(timestamp):
label = 3 # EXIT SIGNAL
elif between_entry_and_exit(timestamp):
label = 2 # HOLD
else:
label = 0 # NO SIGNAL
📈 Model Training Usage
CNN Training
# Input: OHLCV data for ±5 minutes
# Output: Probability distribution over labels [0, 1, 2, 3]
for timestamp, label in zip(timestamps, labels):
features = extract_features(ohlcv_data, timestamp)
prediction = model(features)
loss = cross_entropy(prediction, label)
loss.backward()
DQN Training
# State: Current market state
# Action: BUY/SELL/HOLD
# Reward: Based on label and outcome
for timestamp, label in zip(timestamps, labels):
state = get_state(ohlcv_data, timestamp)
action = agent.select_action(state)
if label == 1: # Should signal entry
reward = +1 if action == BUY else -1
elif label == 0: # Should NOT signal
reward = +1 if action == HOLD else -1
🎯 Benefits
1. Precision Training
- Model learns exact timing of signals
- Not just "buy somewhere in this range"
- Reduces false positives
2. Negative Examples
- Model learns when NOT to trade
- Critical for avoiding bad signals
- Improves precision/recall balance
3. Context Awareness
- Model sees what led to the signal
- Understands market conditions before entry
- Better pattern recognition
4. Realistic Scenarios
- Includes normal market noise
- Not just "perfect" entry points
- Model learns to filter noise
📊 Example Use Case
Scenario: Breakout Trade
Annotation:
- Entry: 10:30:00 @ $2400 (breakout)
- Exit: 10:35:00 @ $2460 (+2.5%)
Training Data Generated:
10:25 - 10:29: NO SIGNAL (consolidation before breakout)
10:30: ENTRY SIGNAL (breakout confirmed)
10:31 - 10:34: HOLD (price moving up)
10:35: EXIT SIGNAL (target reached)
10:36 - 10:40: NO SIGNAL (after exit)
Model Learns:
- Don't signal during consolidation
- Signal at breakout confirmation
- Hold during profitable move
- Exit at target
- Don't signal after exit
🔍 Verification
Check Test Case Quality
# Load test case
with open('test_case.json') as f:
tc = json.load(f)
# Verify data completeness
assert 'market_state' in tc
assert 'ohlcv_1m' in tc['market_state']
assert 'training_labels' in tc['market_state']
# Check label distribution
labels = tc['market_state']['training_labels']['labels_1m']
print(f"NO_SIGNAL: {labels.count(0)}")
print(f"ENTRY: {labels.count(1)}")
print(f"HOLD: {labels.count(2)}")
print(f"EXIT: {labels.count(3)}")
Summary
The ANNOTATE system generates production-ready training data with:
±5 minutes of context around each signal
Training labels for each timestamp
Negative examples (where NOT to signal)
Positive examples (where TO signal)
All 4 timeframes (1s, 1m, 1h, 1d)
Complete market state (OHLCV data)
This enables models to learn:
- Precise timing of entry/exit signals
- When NOT to trade (avoiding false positives)
- Context awareness (what leads to signals)
- Realistic scenarios (including market noise)
Result: Better trained models with higher precision and fewer false signals! 🎯