# ANNOTATE - Training Data Format ## 🎯 Overview The ANNOTATE system generates training data that includes **±5 minutes of market data** around each trade signal. This allows models to learn: - **WHERE to generate signals** (at entry/exit points) - **WHERE NOT to generate signals** (before entry, after exit) - **Context around the signal** (what led to the trade) --- ## 📦 Test Case Structure ### Complete Format ```json { "test_case_id": "annotation_uuid", "symbol": "ETH/USDT", "timestamp": "2024-01-15T10:30:00Z", "action": "BUY", "market_state": { "ohlcv_1s": { "timestamps": [...], // ±5 minutes of 1s candles (~600 candles) "open": [...], "high": [...], "low": [...], "close": [...], "volume": [...] }, "ohlcv_1m": { "timestamps": [...], // ±5 minutes of 1m candles (~10 candles) "open": [...], "high": [...], "low": [...], "close": [...], "volume": [...] }, "ohlcv_1h": { "timestamps": [...], // ±5 minutes of 1h candles (usually 1 candle) "open": [...], "high": [...], "low": [...], "close": [...], "volume": [...] }, "ohlcv_1d": { "timestamps": [...], // ±5 minutes of 1d candles (usually 1 candle) "open": [...], "high": [...], "low": [...], "close": [...], "volume": [...] }, "training_labels": { "labels_1m": [0, 0, 0, 1, 2, 2, 3, 0, 0, 0], // Label for each 1m candle "direction": "LONG", "entry_timestamp": "2024-01-15T10:30:00", "exit_timestamp": "2024-01-15T10:35:00" } }, "expected_outcome": { "direction": "LONG", "profit_loss_pct": 2.5, "entry_price": 2400.50, "exit_price": 2460.75, "holding_period_seconds": 300 }, "annotation_metadata": { "annotator": "manual", "confidence": 1.0, "notes": "", "created_at": "2024-01-15T11:00:00Z", "timeframe": "1m" } } ``` --- ## 🏷️ Training Labels ### Label System Each timestamp in the ±5 minute window is labeled: | Label | Meaning | Description | |-------|---------|-------------| | **0** | NO SIGNAL | Before entry or after exit - model should NOT signal | | **1** | ENTRY SIGNAL | At entry time - model SHOULD signal BUY/SELL | | **2** | HOLD | Between entry and exit - model should maintain position | | **3** | EXIT SIGNAL | At exit time - model SHOULD signal close position | ### Example Timeline ``` Time: 10:25 10:26 10:27 10:28 10:29 10:30 10:31 10:32 10:33 10:34 10:35 10:36 10:37 Label: 0 0 0 0 0 1 2 2 2 2 3 0 0 Action: NO NO NO NO NO ENTRY HOLD HOLD HOLD HOLD EXIT NO NO ``` ### Why This Matters - **Negative Examples**: Model learns NOT to signal at random times - **Context**: Model sees what happens before/after the signal - **Precision**: Model learns exact timing, not just "buy somewhere" --- ## 📊 Data Window ### Time Window: ±5 Minutes **Entry Time**: 10:30:00 **Window Start**: 10:25:00 (5 minutes before) **Window End**: 10:35:00 (5 minutes after) ### Candle Counts by Timeframe | Timeframe | Candles in ±5min | Purpose | |-----------|------------------|---------| | **1s** | ~600 candles | Micro-structure, order flow | | **1m** | ~10 candles | Short-term patterns | | **1h** | ~1 candle | Trend context | | **1d** | ~1 candle | Market regime | --- ## 🎓 Training Strategy ### Positive Examples (Signal Points) - **Entry Point** (Label 1): Model learns to recognize entry conditions - **Exit Point** (Label 3): Model learns to recognize exit conditions ### Negative Examples (Non-Signal Points) - **Before Entry** (Label 0): Model learns NOT to signal too early - **After Exit** (Label 0): Model learns NOT to signal too late - **During Hold** (Label 2): Model learns to maintain position ### Balanced Training For each annotation: - **1 entry signal** (Label 1) - **1 exit signal** (Label 3) - **~3-5 hold periods** (Label 2) - **~5-8 no-signal periods** (Label 0) This creates a balanced dataset where the model learns: - When TO act (20% of time) - When NOT to act (80% of time) --- ## 🔧 Implementation Details ### Data Fetching ```python # Get ±5 minutes around entry entry_time = annotation.entry['timestamp'] start_time = entry_time - timedelta(minutes=5) end_time = entry_time + timedelta(minutes=5) # Fetch data for window df = data_provider.get_historical_data( symbol=symbol, timeframe=timeframe, limit=1000 ) # Filter to window df_window = df[(df.index >= start_time) & (df.index <= end_time)] ``` ### Label Generation ```python for timestamp in timestamps: if near_entry(timestamp): label = 1 # ENTRY SIGNAL elif near_exit(timestamp): label = 3 # EXIT SIGNAL elif between_entry_and_exit(timestamp): label = 2 # HOLD else: label = 0 # NO SIGNAL ``` --- ## 📈 Model Training Usage ### CNN Training ```python # Input: OHLCV data for ±5 minutes # Output: Probability distribution over labels [0, 1, 2, 3] for timestamp, label in zip(timestamps, labels): features = extract_features(ohlcv_data, timestamp) prediction = model(features) loss = cross_entropy(prediction, label) loss.backward() ``` ### DQN Training ```python # State: Current market state # Action: BUY/SELL/HOLD # Reward: Based on label and outcome for timestamp, label in zip(timestamps, labels): state = get_state(ohlcv_data, timestamp) action = agent.select_action(state) if label == 1: # Should signal entry reward = +1 if action == BUY else -1 elif label == 0: # Should NOT signal reward = +1 if action == HOLD else -1 ``` --- ## 🎯 Benefits ### 1. Precision Training - Model learns **exact timing** of signals - Not just "buy somewhere in this range" - Reduces false positives ### 2. Negative Examples - Model learns when **NOT** to trade - Critical for avoiding bad signals - Improves precision/recall balance ### 3. Context Awareness - Model sees **what led to the signal** - Understands market conditions before entry - Better pattern recognition ### 4. Realistic Scenarios - Includes normal market noise - Not just "perfect" entry points - Model learns to filter noise --- ## 📊 Example Use Case ### Scenario: Breakout Trade **Annotation:** - Entry: 10:30:00 @ $2400 (breakout) - Exit: 10:35:00 @ $2460 (+2.5%) **Training Data Generated:** ``` 10:25 - 10:29: NO SIGNAL (consolidation before breakout) 10:30: ENTRY SIGNAL (breakout confirmed) 10:31 - 10:34: HOLD (price moving up) 10:35: EXIT SIGNAL (target reached) 10:36 - 10:40: NO SIGNAL (after exit) ``` **Model Learns:** - Don't signal during consolidation - Signal at breakout confirmation - Hold during profitable move - Exit at target - Don't signal after exit --- ## 🔍 Verification ### Check Test Case Quality ```python # Load test case with open('test_case.json') as f: tc = json.load(f) # Verify data completeness assert 'market_state' in tc assert 'ohlcv_1m' in tc['market_state'] assert 'training_labels' in tc['market_state'] # Check label distribution labels = tc['market_state']['training_labels']['labels_1m'] print(f"NO_SIGNAL: {labels.count(0)}") print(f"ENTRY: {labels.count(1)}") print(f"HOLD: {labels.count(2)}") print(f"EXIT: {labels.count(3)}") ``` --- ## Summary The ANNOTATE system generates **production-ready training data** with: **±5 minutes of context** around each signal **Training labels** for each timestamp **Negative examples** (where NOT to signal) **Positive examples** (where TO signal) **All 4 timeframes** (1s, 1m, 1h, 1d) **Complete market state** (OHLCV data) This enables models to learn: - **Precise timing** of entry/exit signals - **When NOT to trade** (avoiding false positives) - **Context awareness** (what leads to signals) - **Realistic scenarios** (including market noise) **Result**: Better trained models with higher precision and fewer false signals! 🎯