364 lines
8.5 KiB
Markdown
364 lines
8.5 KiB
Markdown
# ANNOTATE - Model Training & Inference Guide
|
|
|
|
## 🎯 Overview
|
|
|
|
This guide covers how to use the ANNOTATE system for:
|
|
1. **Generating Training Data** - From manual annotations
|
|
2. **Training Models** - Using annotated test cases
|
|
3. **Real-Time Inference** - Live model predictions with streaming data
|
|
|
|
---
|
|
|
|
## 📦 Test Case Generation
|
|
|
|
### Automatic Generation
|
|
When you save an annotation, a test case is **automatically generated** and saved to disk.
|
|
|
|
**Location**: `ANNOTATE/data/test_cases/annotation_<id>.json`
|
|
|
|
### What's Included
|
|
Each test case contains:
|
|
- **Market State** - OHLCV data for all 4 timeframes (100 candles each)
|
|
- **Entry/Exit Prices** - Exact prices from annotation
|
|
- **Expected Outcome** - Direction (LONG/SHORT) and P&L percentage
|
|
- **Timestamp** - When the trade occurred
|
|
- **Action** - BUY or SELL signal
|
|
|
|
### Test Case Format
|
|
```json
|
|
{
|
|
"test_case_id": "annotation_uuid",
|
|
"symbol": "ETH/USDT",
|
|
"timestamp": "2024-01-15T10:30:00Z",
|
|
"action": "BUY",
|
|
"market_state": {
|
|
"ohlcv_1s": {
|
|
"timestamps": [...], // 100 candles
|
|
"open": [...],
|
|
"high": [...],
|
|
"low": [...],
|
|
"close": [...],
|
|
"volume": [...]
|
|
},
|
|
"ohlcv_1m": {...}, // 100 candles
|
|
"ohlcv_1h": {...}, // 100 candles
|
|
"ohlcv_1d": {...} // 100 candles
|
|
},
|
|
"expected_outcome": {
|
|
"direction": "LONG",
|
|
"profit_loss_pct": 2.5,
|
|
"entry_price": 2400.50,
|
|
"exit_price": 2460.75,
|
|
"holding_period_seconds": 300
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🎓 Model Training
|
|
|
|
### Available Models
|
|
The system integrates with your existing models:
|
|
- **StandardizedCNN** - CNN model for pattern recognition
|
|
- **DQN** - Deep Q-Network for reinforcement learning
|
|
- **Transformer** - Transformer model for sequence analysis
|
|
- **COB** - Order book-based RL model
|
|
|
|
### Training Process
|
|
|
|
#### Step 1: Create Annotations
|
|
1. Mark profitable trades on historical data
|
|
2. Test cases are auto-generated and saved
|
|
3. Verify test cases exist in `ANNOTATE/data/test_cases/`
|
|
|
|
#### Step 2: Select Model
|
|
1. Open training panel (right sidebar)
|
|
2. Select model from dropdown
|
|
3. Available models are loaded from orchestrator
|
|
|
|
#### Step 3: Start Training
|
|
1. Click **"Train Model"** button
|
|
2. System loads all test cases from disk
|
|
3. Training starts in background thread
|
|
4. Progress displayed in real-time
|
|
|
|
#### Step 4: Monitor Progress
|
|
- **Current Epoch** - Shows training progress
|
|
- **Loss** - Training loss value
|
|
- **Status** - Running/Completed/Failed
|
|
|
|
### Training Details
|
|
|
|
**What Happens During Training:**
|
|
1. System loads all test cases from `ANNOTATE/data/test_cases/`
|
|
2. Prepares training data (market state → expected outcome)
|
|
3. Calls model's training method
|
|
4. Updates model weights based on annotations
|
|
5. Saves updated model checkpoint
|
|
|
|
**Training Parameters:**
|
|
- **Epochs**: 10 (configurable)
|
|
- **Batch Size**: Depends on model
|
|
- **Learning Rate**: Model-specific
|
|
- **Data**: All available test cases
|
|
|
|
---
|
|
|
|
## Real-Time Inference
|
|
|
|
### Overview
|
|
Real-time inference mode runs your trained model on **live streaming data** from the DataProvider, generating predictions in real-time.
|
|
|
|
### Starting Real-Time Inference
|
|
|
|
#### Step 1: Select Model
|
|
Choose the model you want to run inference with.
|
|
|
|
#### Step 2: Start Inference
|
|
1. Click **"Start Live Inference"** button
|
|
2. System loads model from orchestrator
|
|
3. Connects to DataProvider's live data stream
|
|
4. Begins generating predictions every second
|
|
|
|
#### Step 3: Monitor Signals
|
|
- **Latest Signal** - BUY/SELL/HOLD
|
|
- **Confidence** - Model confidence (0-100%)
|
|
- **Price** - Current market price
|
|
- **Timestamp** - When signal was generated
|
|
|
|
### How It Works
|
|
|
|
```
|
|
DataProvider (Live Data)
|
|
↓
|
|
Latest Market State (4 timeframes)
|
|
↓
|
|
Model Inference
|
|
↓
|
|
Prediction (Action + Confidence)
|
|
↓
|
|
Display on UI + Chart Markers
|
|
```
|
|
|
|
### Signal Display
|
|
- Signals appear in training panel
|
|
- Latest 50 signals stored
|
|
- Can be displayed on charts (future feature)
|
|
- Updates every second
|
|
|
|
### Stopping Inference
|
|
1. Click **"Stop Inference"** button
|
|
2. Inference loop terminates
|
|
3. Final signals remain visible
|
|
|
|
---
|
|
|
|
## 🔧 Integration with Orchestrator
|
|
|
|
### Model Loading
|
|
Models are loaded directly from the orchestrator:
|
|
|
|
```python
|
|
# CNN Model
|
|
model = orchestrator.cnn_model
|
|
|
|
# DQN Agent
|
|
model = orchestrator.rl_agent
|
|
|
|
# Transformer
|
|
model = orchestrator.primary_transformer
|
|
|
|
# COB RL
|
|
model = orchestrator.cob_rl_agent
|
|
```
|
|
|
|
### Data Consistency
|
|
- Uses **same DataProvider** as main system
|
|
- Same cached data
|
|
- Same data structure
|
|
- Perfect consistency
|
|
|
|
---
|
|
|
|
## 📊 Training Workflow Example
|
|
|
|
### Scenario: Train CNN on Breakout Patterns
|
|
|
|
**Step 1: Annotate Trades**
|
|
```
|
|
1. Find 10 clear breakout patterns
|
|
2. Mark entry/exit for each
|
|
3. Test cases auto-generated
|
|
4. Result: 10 test cases in ANNOTATE/data/test_cases/
|
|
```
|
|
|
|
**Step 2: Train Model**
|
|
```
|
|
1. Select "StandardizedCNN" from dropdown
|
|
2. Click "Train Model"
|
|
3. System loads 10 test cases
|
|
4. Training runs for 10 epochs
|
|
5. Model learns breakout patterns
|
|
```
|
|
|
|
**Step 3: Test with Real-Time Inference**
|
|
```
|
|
1. Click "Start Live Inference"
|
|
2. Model analyzes live data
|
|
3. Generates BUY signals on breakouts
|
|
4. Monitor confidence levels
|
|
5. Verify model learned correctly
|
|
```
|
|
|
|
---
|
|
|
|
## 🎯 Best Practices
|
|
|
|
### For Training
|
|
|
|
**1. Quality Over Quantity**
|
|
- Start with 10-20 high-quality annotations
|
|
- Focus on clear, obvious patterns
|
|
- Verify each annotation is correct
|
|
|
|
**2. Diverse Scenarios**
|
|
- Include different market conditions
|
|
- Mix LONG and SHORT trades
|
|
- Various timeframes and volatility levels
|
|
|
|
**3. Incremental Training**
|
|
- Train with small batches first
|
|
- Verify model learns correctly
|
|
- Add more annotations gradually
|
|
|
|
**4. Test After Training**
|
|
- Use real-time inference to verify
|
|
- Check if model recognizes patterns
|
|
- Adjust annotations if needed
|
|
|
|
### For Real-Time Inference
|
|
|
|
**1. Monitor Confidence**
|
|
- High confidence (>70%) = Strong signal
|
|
- Medium confidence (50-70%) = Moderate signal
|
|
- Low confidence (<50%) = Weak signal
|
|
|
|
**2. Verify Against Charts**
|
|
- Check if signals make sense
|
|
- Compare with your own analysis
|
|
- Look for false positives
|
|
|
|
**3. Track Performance**
|
|
- Note which signals were correct
|
|
- Identify patterns in errors
|
|
- Use insights to improve annotations
|
|
|
|
---
|
|
|
|
## 🔍 Troubleshooting
|
|
|
|
### Training Issues
|
|
|
|
**Issue**: "No test cases found"
|
|
- **Solution**: Create annotations first, test cases are auto-generated
|
|
|
|
**Issue**: Training fails immediately
|
|
- **Solution**: Check model is loaded in orchestrator, verify test case format
|
|
|
|
**Issue**: Loss not decreasing
|
|
- **Solution**: May need more/better quality annotations, check data quality
|
|
|
|
### Inference Issues
|
|
|
|
**Issue**: No signals generated
|
|
- **Solution**: Verify DataProvider has live data, check model is loaded
|
|
|
|
**Issue**: All signals are HOLD
|
|
- **Solution**: Model may need more training, check confidence levels
|
|
|
|
**Issue**: Signals don't match expectations
|
|
- **Solution**: Review training data, may need different annotations
|
|
|
|
---
|
|
|
|
## 📈 Performance Metrics
|
|
|
|
### Training Metrics
|
|
- **Loss** - Lower is better (target: <0.1)
|
|
- **Accuracy** - Higher is better (target: >80%)
|
|
- **Epochs** - More epochs = more learning
|
|
- **Duration** - Training time in seconds
|
|
|
|
### Inference Metrics
|
|
- **Latency** - Time to generate prediction (~1s)
|
|
- **Confidence** - Model certainty (0-100%)
|
|
- **Signal Rate** - Predictions per minute
|
|
- **Accuracy** - Correct predictions vs total
|
|
|
|
---
|
|
|
|
## Advanced Usage
|
|
|
|
### Custom Training Parameters
|
|
Edit `ANNOTATE/core/training_simulator.py`:
|
|
```python
|
|
'total_epochs': 10, # Increase for more training
|
|
```
|
|
|
|
### Model-Specific Training
|
|
Each model type has its own training method:
|
|
- `_train_cnn()` - For CNN models
|
|
- `_train_dqn()` - For DQN agents
|
|
- `_train_transformer()` - For Transformers
|
|
- `_train_cob()` - For COB models
|
|
|
|
### Batch Training
|
|
Train on specific annotations:
|
|
```python
|
|
# In future: Select specific annotations for training
|
|
annotation_ids = ['id1', 'id2', 'id3']
|
|
```
|
|
|
|
---
|
|
|
|
## 📝 File Locations
|
|
|
|
### Test Cases
|
|
```
|
|
ANNOTATE/data/test_cases/annotation_<id>.json
|
|
```
|
|
|
|
### Training Results
|
|
```
|
|
ANNOTATE/data/training_results/
|
|
```
|
|
|
|
### Model Checkpoints
|
|
```
|
|
models/checkpoints/ (main system)
|
|
```
|
|
|
|
---
|
|
|
|
## 🎊 Summary
|
|
|
|
The ANNOTATE system provides:
|
|
|
|
**Automatic Test Case Generation** - From annotations
|
|
**Production-Ready Training** - Integrates with orchestrator
|
|
**Real-Time Inference** - Live predictions on streaming data
|
|
**Data Consistency** - Same data as main system
|
|
**Easy Monitoring** - Real-time progress and signals
|
|
|
|
**You can now:**
|
|
1. Mark profitable trades
|
|
2. Generate training data automatically
|
|
3. Train models with your annotations
|
|
4. Test models with real-time inference
|
|
5. Monitor model performance live
|
|
|
|
---
|
|
|
|
**Happy Training!**
|