Files

Dobromir Popov b8f54e61fa remove emojis from console

2025-10-25 16:35:08 +03:00

7.9 KiB

Raw Permalink Blame History

ANNOTATE - Training Data Format

🎯 Overview

The ANNOTATE system generates training data that includes ±5 minutes of market data around each trade signal. This allows models to learn:

WHERE to generate signals (at entry/exit points)
WHERE NOT to generate signals (before entry, after exit)
Context around the signal (what led to the trade)

📦 Test Case Structure

Complete Format

{
  "test_case_id": "annotation_uuid",
  "symbol": "ETH/USDT",
  "timestamp": "2024-01-15T10:30:00Z",
  "action": "BUY",
  
  "market_state": {
    "ohlcv_1s": {
      "timestamps": [...],  // ±5 minutes of 1s candles (~600 candles)
      "open": [...],
      "high": [...],
      "low": [...],
      "close": [...],
      "volume": [...]
    },
    "ohlcv_1m": {
      "timestamps": [...],  // ±5 minutes of 1m candles (~10 candles)
      "open": [...],
      "high": [...],
      "low": [...],
      "close": [...],
      "volume": [...]
    },
    "ohlcv_1h": {
      "timestamps": [...],  // ±5 minutes of 1h candles (usually 1 candle)
      "open": [...],
      "high": [...],
      "low": [...],
      "close": [...],
      "volume": [...]
    },
    "ohlcv_1d": {
      "timestamps": [...],  // ±5 minutes of 1d candles (usually 1 candle)
      "open": [...],
      "high": [...],
      "low": [...],
      "close": [...],
      "volume": [...]
    },
    
    "training_labels": {
      "labels_1m": [0, 0, 0, 1, 2, 2, 3, 0, 0, 0],  // Label for each 1m candle
      "direction": "LONG",
      "entry_timestamp": "2024-01-15T10:30:00",
      "exit_timestamp": "2024-01-15T10:35:00"
    }
  },
  
  "expected_outcome": {
    "direction": "LONG",
    "profit_loss_pct": 2.5,
    "entry_price": 2400.50,
    "exit_price": 2460.75,
    "holding_period_seconds": 300
  },
  
  "annotation_metadata": {
    "annotator": "manual",
    "confidence": 1.0,
    "notes": "",
    "created_at": "2024-01-15T11:00:00Z",
    "timeframe": "1m"
  }
}

🏷️ Training Labels

Label System

Each timestamp in the ±5 minute window is labeled:

Label	Meaning	Description
0	NO SIGNAL	Before entry or after exit - model should NOT signal
1	ENTRY SIGNAL	At entry time - model SHOULD signal BUY/SELL
2	HOLD	Between entry and exit - model should maintain position
3	EXIT SIGNAL	At exit time - model SHOULD signal close position

Example Timeline

Time:    10:25  10:26  10:27  10:28  10:29  10:30  10:31  10:32  10:33  10:34  10:35  10:36  10:37
Label:     0      0      0      0      0      1      2      2      2      2      3      0      0
Action:   NO     NO     NO     NO     NO    ENTRY  HOLD   HOLD   HOLD   HOLD   EXIT    NO     NO

Why This Matters

Negative Examples: Model learns NOT to signal at random times
Context: Model sees what happens before/after the signal
Precision: Model learns exact timing, not just "buy somewhere"

📊 Data Window

Time Window: ±5 Minutes

Entry Time: 10:30:00
Window Start: 10:25:00 (5 minutes before)
Window End: 10:35:00 (5 minutes after)

Candle Counts by Timeframe

Timeframe	Candles in ±5min	Purpose
1s	~600 candles	Micro-structure, order flow
1m	~10 candles	Short-term patterns
1h	~1 candle	Trend context
1d	~1 candle	Market regime

🎓 Training Strategy

Positive Examples (Signal Points)

Entry Point (Label 1): Model learns to recognize entry conditions
Exit Point (Label 3): Model learns to recognize exit conditions

Negative Examples (Non-Signal Points)

Before Entry (Label 0): Model learns NOT to signal too early
After Exit (Label 0): Model learns NOT to signal too late
During Hold (Label 2): Model learns to maintain position

Balanced Training

For each annotation:

1 entry signal (Label 1)
1 exit signal (Label 3)
~3-5 hold periods (Label 2)
~5-8 no-signal periods (Label 0)

This creates a balanced dataset where the model learns:

When TO act (20% of time)
When NOT to act (80% of time)

🔧 Implementation Details

Data Fetching

# Get ±5 minutes around entry
entry_time = annotation.entry['timestamp']
start_time = entry_time - timedelta(minutes=5)
end_time = entry_time + timedelta(minutes=5)

# Fetch data for window
df = data_provider.get_historical_data(
    symbol=symbol,
    timeframe=timeframe,
    limit=1000
)

# Filter to window
df_window = df[(df.index >= start_time) & (df.index <= end_time)]

Label Generation

for timestamp in timestamps:
    if near_entry(timestamp):
        label = 1  # ENTRY SIGNAL
    elif near_exit(timestamp):
        label = 3  # EXIT SIGNAL
    elif between_entry_and_exit(timestamp):
        label = 2  # HOLD
    else:
        label = 0  # NO SIGNAL

📈 Model Training Usage

CNN Training

# Input: OHLCV data for ±5 minutes
# Output: Probability distribution over labels [0, 1, 2, 3]

for timestamp, label in zip(timestamps, labels):
    features = extract_features(ohlcv_data, timestamp)
    prediction = model(features)
    loss = cross_entropy(prediction, label)
    loss.backward()

DQN Training

# State: Current market state
# Action: BUY/SELL/HOLD
# Reward: Based on label and outcome

for timestamp, label in zip(timestamps, labels):
    state = get_state(ohlcv_data, timestamp)
    action = agent.select_action(state)
    
    if label == 1:  # Should signal entry
        reward = +1 if action == BUY else -1
    elif label == 0:  # Should NOT signal
        reward = +1 if action == HOLD else -1

🎯 Benefits

1. Precision Training

Model learns exact timing of signals
Not just "buy somewhere in this range"
Reduces false positives

2. Negative Examples

Model learns when NOT to trade
Critical for avoiding bad signals
Improves precision/recall balance

3. Context Awareness

Model sees what led to the signal
Understands market conditions before entry
Better pattern recognition

4. Realistic Scenarios

Includes normal market noise
Not just "perfect" entry points
Model learns to filter noise

📊 Example Use Case

Scenario: Breakout Trade

Annotation:

Entry: 10:30:00 @ $2400 (breakout)
Exit: 10:35:00 @ $2460 (+2.5%)

Training Data Generated:

10:25 - 10:29: NO SIGNAL (consolidation before breakout)
10:30:        ENTRY SIGNAL (breakout confirmed)
10:31 - 10:34: HOLD (price moving up)
10:35:        EXIT SIGNAL (target reached)
10:36 - 10:40: NO SIGNAL (after exit)

Model Learns:

Don't signal during consolidation
Signal at breakout confirmation
Hold during profitable move
Exit at target
Don't signal after exit

🔍 Verification

Check Test Case Quality

# Load test case
with open('test_case.json') as f:
    tc = json.load(f)

# Verify data completeness
assert 'market_state' in tc
assert 'ohlcv_1m' in tc['market_state']
assert 'training_labels' in tc['market_state']

# Check label distribution
labels = tc['market_state']['training_labels']['labels_1m']
print(f"NO_SIGNAL: {labels.count(0)}")
print(f"ENTRY: {labels.count(1)}")
print(f"HOLD: {labels.count(2)}")
print(f"EXIT: {labels.count(3)}")

Summary

The ANNOTATE system generates production-ready training data with:

±5 minutes of context around each signal
Training labels for each timestamp
Negative examples (where NOT to signal)
Positive examples (where TO signal)
All 4 timeframes (1s, 1m, 1h, 1d)
Complete market state (OHLCV data)

This enables models to learn:

Precise timing of entry/exit signals
When NOT to trade (avoiding false positives)
Context awareness (what leads to signals)
Realistic scenarios (including market noise)

Result: Better trained models with higher precision and fewer false signals! 🎯

7.9 KiB Raw Permalink Blame History