311 lines
7.9 KiB
Markdown
311 lines
7.9 KiB
Markdown
# ANNOTATE - Training Data Format
|
|
|
|
## 🎯 Overview
|
|
|
|
The ANNOTATE system generates training data that includes **±5 minutes of market data** around each trade signal. This allows models to learn:
|
|
- **WHERE to generate signals** (at entry/exit points)
|
|
- **WHERE NOT to generate signals** (before entry, after exit)
|
|
- **Context around the signal** (what led to the trade)
|
|
|
|
---
|
|
|
|
## 📦 Test Case Structure
|
|
|
|
### Complete Format
|
|
```json
|
|
{
|
|
"test_case_id": "annotation_uuid",
|
|
"symbol": "ETH/USDT",
|
|
"timestamp": "2024-01-15T10:30:00Z",
|
|
"action": "BUY",
|
|
|
|
"market_state": {
|
|
"ohlcv_1s": {
|
|
"timestamps": [...], // ±5 minutes of 1s candles (~600 candles)
|
|
"open": [...],
|
|
"high": [...],
|
|
"low": [...],
|
|
"close": [...],
|
|
"volume": [...]
|
|
},
|
|
"ohlcv_1m": {
|
|
"timestamps": [...], // ±5 minutes of 1m candles (~10 candles)
|
|
"open": [...],
|
|
"high": [...],
|
|
"low": [...],
|
|
"close": [...],
|
|
"volume": [...]
|
|
},
|
|
"ohlcv_1h": {
|
|
"timestamps": [...], // ±5 minutes of 1h candles (usually 1 candle)
|
|
"open": [...],
|
|
"high": [...],
|
|
"low": [...],
|
|
"close": [...],
|
|
"volume": [...]
|
|
},
|
|
"ohlcv_1d": {
|
|
"timestamps": [...], // ±5 minutes of 1d candles (usually 1 candle)
|
|
"open": [...],
|
|
"high": [...],
|
|
"low": [...],
|
|
"close": [...],
|
|
"volume": [...]
|
|
},
|
|
|
|
"training_labels": {
|
|
"labels_1m": [0, 0, 0, 1, 2, 2, 3, 0, 0, 0], // Label for each 1m candle
|
|
"direction": "LONG",
|
|
"entry_timestamp": "2024-01-15T10:30:00",
|
|
"exit_timestamp": "2024-01-15T10:35:00"
|
|
}
|
|
},
|
|
|
|
"expected_outcome": {
|
|
"direction": "LONG",
|
|
"profit_loss_pct": 2.5,
|
|
"entry_price": 2400.50,
|
|
"exit_price": 2460.75,
|
|
"holding_period_seconds": 300
|
|
},
|
|
|
|
"annotation_metadata": {
|
|
"annotator": "manual",
|
|
"confidence": 1.0,
|
|
"notes": "",
|
|
"created_at": "2024-01-15T11:00:00Z",
|
|
"timeframe": "1m"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🏷️ Training Labels
|
|
|
|
### Label System
|
|
Each timestamp in the ±5 minute window is labeled:
|
|
|
|
| Label | Meaning | Description |
|
|
|-------|---------|-------------|
|
|
| **0** | NO SIGNAL | Before entry or after exit - model should NOT signal |
|
|
| **1** | ENTRY SIGNAL | At entry time - model SHOULD signal BUY/SELL |
|
|
| **2** | HOLD | Between entry and exit - model should maintain position |
|
|
| **3** | EXIT SIGNAL | At exit time - model SHOULD signal close position |
|
|
|
|
### Example Timeline
|
|
```
|
|
Time: 10:25 10:26 10:27 10:28 10:29 10:30 10:31 10:32 10:33 10:34 10:35 10:36 10:37
|
|
Label: 0 0 0 0 0 1 2 2 2 2 3 0 0
|
|
Action: NO NO NO NO NO ENTRY HOLD HOLD HOLD HOLD EXIT NO NO
|
|
```
|
|
|
|
### Why This Matters
|
|
- **Negative Examples**: Model learns NOT to signal at random times
|
|
- **Context**: Model sees what happens before/after the signal
|
|
- **Precision**: Model learns exact timing, not just "buy somewhere"
|
|
|
|
---
|
|
|
|
## 📊 Data Window
|
|
|
|
### Time Window: ±5 Minutes
|
|
|
|
**Entry Time**: 10:30:00
|
|
**Window Start**: 10:25:00 (5 minutes before)
|
|
**Window End**: 10:35:00 (5 minutes after)
|
|
|
|
### Candle Counts by Timeframe
|
|
|
|
| Timeframe | Candles in ±5min | Purpose |
|
|
|-----------|------------------|---------|
|
|
| **1s** | ~600 candles | Micro-structure, order flow |
|
|
| **1m** | ~10 candles | Short-term patterns |
|
|
| **1h** | ~1 candle | Trend context |
|
|
| **1d** | ~1 candle | Market regime |
|
|
|
|
---
|
|
|
|
## 🎓 Training Strategy
|
|
|
|
### Positive Examples (Signal Points)
|
|
- **Entry Point** (Label 1): Model learns to recognize entry conditions
|
|
- **Exit Point** (Label 3): Model learns to recognize exit conditions
|
|
|
|
### Negative Examples (Non-Signal Points)
|
|
- **Before Entry** (Label 0): Model learns NOT to signal too early
|
|
- **After Exit** (Label 0): Model learns NOT to signal too late
|
|
- **During Hold** (Label 2): Model learns to maintain position
|
|
|
|
### Balanced Training
|
|
For each annotation:
|
|
- **1 entry signal** (Label 1)
|
|
- **1 exit signal** (Label 3)
|
|
- **~3-5 hold periods** (Label 2)
|
|
- **~5-8 no-signal periods** (Label 0)
|
|
|
|
This creates a balanced dataset where the model learns:
|
|
- When TO act (20% of time)
|
|
- When NOT to act (80% of time)
|
|
|
|
---
|
|
|
|
## 🔧 Implementation Details
|
|
|
|
### Data Fetching
|
|
```python
|
|
# Get ±5 minutes around entry
|
|
entry_time = annotation.entry['timestamp']
|
|
start_time = entry_time - timedelta(minutes=5)
|
|
end_time = entry_time + timedelta(minutes=5)
|
|
|
|
# Fetch data for window
|
|
df = data_provider.get_historical_data(
|
|
symbol=symbol,
|
|
timeframe=timeframe,
|
|
limit=1000
|
|
)
|
|
|
|
# Filter to window
|
|
df_window = df[(df.index >= start_time) & (df.index <= end_time)]
|
|
```
|
|
|
|
### Label Generation
|
|
```python
|
|
for timestamp in timestamps:
|
|
if near_entry(timestamp):
|
|
label = 1 # ENTRY SIGNAL
|
|
elif near_exit(timestamp):
|
|
label = 3 # EXIT SIGNAL
|
|
elif between_entry_and_exit(timestamp):
|
|
label = 2 # HOLD
|
|
else:
|
|
label = 0 # NO SIGNAL
|
|
```
|
|
|
|
---
|
|
|
|
## 📈 Model Training Usage
|
|
|
|
### CNN Training
|
|
```python
|
|
# Input: OHLCV data for ±5 minutes
|
|
# Output: Probability distribution over labels [0, 1, 2, 3]
|
|
|
|
for timestamp, label in zip(timestamps, labels):
|
|
features = extract_features(ohlcv_data, timestamp)
|
|
prediction = model(features)
|
|
loss = cross_entropy(prediction, label)
|
|
loss.backward()
|
|
```
|
|
|
|
### DQN Training
|
|
```python
|
|
# State: Current market state
|
|
# Action: BUY/SELL/HOLD
|
|
# Reward: Based on label and outcome
|
|
|
|
for timestamp, label in zip(timestamps, labels):
|
|
state = get_state(ohlcv_data, timestamp)
|
|
action = agent.select_action(state)
|
|
|
|
if label == 1: # Should signal entry
|
|
reward = +1 if action == BUY else -1
|
|
elif label == 0: # Should NOT signal
|
|
reward = +1 if action == HOLD else -1
|
|
```
|
|
|
|
---
|
|
|
|
## 🎯 Benefits
|
|
|
|
### 1. Precision Training
|
|
- Model learns **exact timing** of signals
|
|
- Not just "buy somewhere in this range"
|
|
- Reduces false positives
|
|
|
|
### 2. Negative Examples
|
|
- Model learns when **NOT** to trade
|
|
- Critical for avoiding bad signals
|
|
- Improves precision/recall balance
|
|
|
|
### 3. Context Awareness
|
|
- Model sees **what led to the signal**
|
|
- Understands market conditions before entry
|
|
- Better pattern recognition
|
|
|
|
### 4. Realistic Scenarios
|
|
- Includes normal market noise
|
|
- Not just "perfect" entry points
|
|
- Model learns to filter noise
|
|
|
|
---
|
|
|
|
## 📊 Example Use Case
|
|
|
|
### Scenario: Breakout Trade
|
|
|
|
**Annotation:**
|
|
- Entry: 10:30:00 @ $2400 (breakout)
|
|
- Exit: 10:35:00 @ $2460 (+2.5%)
|
|
|
|
**Training Data Generated:**
|
|
```
|
|
10:25 - 10:29: NO SIGNAL (consolidation before breakout)
|
|
10:30: ENTRY SIGNAL (breakout confirmed)
|
|
10:31 - 10:34: HOLD (price moving up)
|
|
10:35: EXIT SIGNAL (target reached)
|
|
10:36 - 10:40: NO SIGNAL (after exit)
|
|
```
|
|
|
|
**Model Learns:**
|
|
- Don't signal during consolidation
|
|
- Signal at breakout confirmation
|
|
- Hold during profitable move
|
|
- Exit at target
|
|
- Don't signal after exit
|
|
|
|
---
|
|
|
|
## 🔍 Verification
|
|
|
|
### Check Test Case Quality
|
|
```python
|
|
# Load test case
|
|
with open('test_case.json') as f:
|
|
tc = json.load(f)
|
|
|
|
# Verify data completeness
|
|
assert 'market_state' in tc
|
|
assert 'ohlcv_1m' in tc['market_state']
|
|
assert 'training_labels' in tc['market_state']
|
|
|
|
# Check label distribution
|
|
labels = tc['market_state']['training_labels']['labels_1m']
|
|
print(f"NO_SIGNAL: {labels.count(0)}")
|
|
print(f"ENTRY: {labels.count(1)}")
|
|
print(f"HOLD: {labels.count(2)}")
|
|
print(f"EXIT: {labels.count(3)}")
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
The ANNOTATE system generates **production-ready training data** with:
|
|
|
|
**±5 minutes of context** around each signal
|
|
**Training labels** for each timestamp
|
|
**Negative examples** (where NOT to signal)
|
|
**Positive examples** (where TO signal)
|
|
**All 4 timeframes** (1s, 1m, 1h, 1d)
|
|
**Complete market state** (OHLCV data)
|
|
|
|
This enables models to learn:
|
|
- **Precise timing** of entry/exit signals
|
|
- **When NOT to trade** (avoiding false positives)
|
|
- **Context awareness** (what leads to signals)
|
|
- **Realistic scenarios** (including market noise)
|
|
|
|
**Result**: Better trained models with higher precision and fewer false signals! 🎯
|