Files
gogo2/ANNOTATE/TRAINING_GUIDE.md
2025-10-25 16:35:08 +03:00

364 lines
8.5 KiB
Markdown

# ANNOTATE - Model Training & Inference Guide
## 🎯 Overview
This guide covers how to use the ANNOTATE system for:
1. **Generating Training Data** - From manual annotations
2. **Training Models** - Using annotated test cases
3. **Real-Time Inference** - Live model predictions with streaming data
---
## 📦 Test Case Generation
### Automatic Generation
When you save an annotation, a test case is **automatically generated** and saved to disk.
**Location**: `ANNOTATE/data/test_cases/annotation_<id>.json`
### What's Included
Each test case contains:
- **Market State** - OHLCV data for all 4 timeframes (100 candles each)
- **Entry/Exit Prices** - Exact prices from annotation
- **Expected Outcome** - Direction (LONG/SHORT) and P&L percentage
- **Timestamp** - When the trade occurred
- **Action** - BUY or SELL signal
### Test Case Format
```json
{
"test_case_id": "annotation_uuid",
"symbol": "ETH/USDT",
"timestamp": "2024-01-15T10:30:00Z",
"action": "BUY",
"market_state": {
"ohlcv_1s": {
"timestamps": [...], // 100 candles
"open": [...],
"high": [...],
"low": [...],
"close": [...],
"volume": [...]
},
"ohlcv_1m": {...}, // 100 candles
"ohlcv_1h": {...}, // 100 candles
"ohlcv_1d": {...} // 100 candles
},
"expected_outcome": {
"direction": "LONG",
"profit_loss_pct": 2.5,
"entry_price": 2400.50,
"exit_price": 2460.75,
"holding_period_seconds": 300
}
}
```
---
## 🎓 Model Training
### Available Models
The system integrates with your existing models:
- **StandardizedCNN** - CNN model for pattern recognition
- **DQN** - Deep Q-Network for reinforcement learning
- **Transformer** - Transformer model for sequence analysis
- **COB** - Order book-based RL model
### Training Process
#### Step 1: Create Annotations
1. Mark profitable trades on historical data
2. Test cases are auto-generated and saved
3. Verify test cases exist in `ANNOTATE/data/test_cases/`
#### Step 2: Select Model
1. Open training panel (right sidebar)
2. Select model from dropdown
3. Available models are loaded from orchestrator
#### Step 3: Start Training
1. Click **"Train Model"** button
2. System loads all test cases from disk
3. Training starts in background thread
4. Progress displayed in real-time
#### Step 4: Monitor Progress
- **Current Epoch** - Shows training progress
- **Loss** - Training loss value
- **Status** - Running/Completed/Failed
### Training Details
**What Happens During Training:**
1. System loads all test cases from `ANNOTATE/data/test_cases/`
2. Prepares training data (market state → expected outcome)
3. Calls model's training method
4. Updates model weights based on annotations
5. Saves updated model checkpoint
**Training Parameters:**
- **Epochs**: 10 (configurable)
- **Batch Size**: Depends on model
- **Learning Rate**: Model-specific
- **Data**: All available test cases
---
## Real-Time Inference
### Overview
Real-time inference mode runs your trained model on **live streaming data** from the DataProvider, generating predictions in real-time.
### Starting Real-Time Inference
#### Step 1: Select Model
Choose the model you want to run inference with.
#### Step 2: Start Inference
1. Click **"Start Live Inference"** button
2. System loads model from orchestrator
3. Connects to DataProvider's live data stream
4. Begins generating predictions every second
#### Step 3: Monitor Signals
- **Latest Signal** - BUY/SELL/HOLD
- **Confidence** - Model confidence (0-100%)
- **Price** - Current market price
- **Timestamp** - When signal was generated
### How It Works
```
DataProvider (Live Data)
Latest Market State (4 timeframes)
Model Inference
Prediction (Action + Confidence)
Display on UI + Chart Markers
```
### Signal Display
- Signals appear in training panel
- Latest 50 signals stored
- Can be displayed on charts (future feature)
- Updates every second
### Stopping Inference
1. Click **"Stop Inference"** button
2. Inference loop terminates
3. Final signals remain visible
---
## 🔧 Integration with Orchestrator
### Model Loading
Models are loaded directly from the orchestrator:
```python
# CNN Model
model = orchestrator.cnn_model
# DQN Agent
model = orchestrator.rl_agent
# Transformer
model = orchestrator.primary_transformer
# COB RL
model = orchestrator.cob_rl_agent
```
### Data Consistency
- Uses **same DataProvider** as main system
- Same cached data
- Same data structure
- Perfect consistency
---
## 📊 Training Workflow Example
### Scenario: Train CNN on Breakout Patterns
**Step 1: Annotate Trades**
```
1. Find 10 clear breakout patterns
2. Mark entry/exit for each
3. Test cases auto-generated
4. Result: 10 test cases in ANNOTATE/data/test_cases/
```
**Step 2: Train Model**
```
1. Select "StandardizedCNN" from dropdown
2. Click "Train Model"
3. System loads 10 test cases
4. Training runs for 10 epochs
5. Model learns breakout patterns
```
**Step 3: Test with Real-Time Inference**
```
1. Click "Start Live Inference"
2. Model analyzes live data
3. Generates BUY signals on breakouts
4. Monitor confidence levels
5. Verify model learned correctly
```
---
## 🎯 Best Practices
### For Training
**1. Quality Over Quantity**
- Start with 10-20 high-quality annotations
- Focus on clear, obvious patterns
- Verify each annotation is correct
**2. Diverse Scenarios**
- Include different market conditions
- Mix LONG and SHORT trades
- Various timeframes and volatility levels
**3. Incremental Training**
- Train with small batches first
- Verify model learns correctly
- Add more annotations gradually
**4. Test After Training**
- Use real-time inference to verify
- Check if model recognizes patterns
- Adjust annotations if needed
### For Real-Time Inference
**1. Monitor Confidence**
- High confidence (>70%) = Strong signal
- Medium confidence (50-70%) = Moderate signal
- Low confidence (<50%) = Weak signal
**2. Verify Against Charts**
- Check if signals make sense
- Compare with your own analysis
- Look for false positives
**3. Track Performance**
- Note which signals were correct
- Identify patterns in errors
- Use insights to improve annotations
---
## 🔍 Troubleshooting
### Training Issues
**Issue**: "No test cases found"
- **Solution**: Create annotations first, test cases are auto-generated
**Issue**: Training fails immediately
- **Solution**: Check model is loaded in orchestrator, verify test case format
**Issue**: Loss not decreasing
- **Solution**: May need more/better quality annotations, check data quality
### Inference Issues
**Issue**: No signals generated
- **Solution**: Verify DataProvider has live data, check model is loaded
**Issue**: All signals are HOLD
- **Solution**: Model may need more training, check confidence levels
**Issue**: Signals don't match expectations
- **Solution**: Review training data, may need different annotations
---
## 📈 Performance Metrics
### Training Metrics
- **Loss** - Lower is better (target: <0.1)
- **Accuracy** - Higher is better (target: >80%)
- **Epochs** - More epochs = more learning
- **Duration** - Training time in seconds
### Inference Metrics
- **Latency** - Time to generate prediction (~1s)
- **Confidence** - Model certainty (0-100%)
- **Signal Rate** - Predictions per minute
- **Accuracy** - Correct predictions vs total
---
## Advanced Usage
### Custom Training Parameters
Edit `ANNOTATE/core/training_simulator.py`:
```python
'total_epochs': 10, # Increase for more training
```
### Model-Specific Training
Each model type has its own training method:
- `_train_cnn()` - For CNN models
- `_train_dqn()` - For DQN agents
- `_train_transformer()` - For Transformers
- `_train_cob()` - For COB models
### Batch Training
Train on specific annotations:
```python
# In future: Select specific annotations for training
annotation_ids = ['id1', 'id2', 'id3']
```
---
## 📝 File Locations
### Test Cases
```
ANNOTATE/data/test_cases/annotation_<id>.json
```
### Training Results
```
ANNOTATE/data/training_results/
```
### Model Checkpoints
```
models/checkpoints/ (main system)
```
---
## 🎊 Summary
The ANNOTATE system provides:
**Automatic Test Case Generation** - From annotations
**Production-Ready Training** - Integrates with orchestrator
**Real-Time Inference** - Live predictions on streaming data
**Data Consistency** - Same data as main system
**Easy Monitoring** - Real-time progress and signals
**You can now:**
1. Mark profitable trades
2. Generate training data automatically
3. Train models with your annotations
4. Test models with real-time inference
5. Monitor model performance live
---
**Happy Training!**