gogo2/ANNOTATE/TRAINING_GUIDE.md

# ANNOTATE - Model Training & Inference Guide

## 🎯 Overview

This guide covers how to use the ANNOTATE system for:
1. **Generating Training Data** - From manual annotations
2. **Training Models** - Using annotated test cases
3. **Real-Time Inference** - Live model predictions with streaming data

---

## 📦 Test Case Generation

### Automatic Generation
When you save an annotation, a test case is **automatically generated** and saved to disk.

**Location**: `ANNOTATE/data/test_cases/annotation_<id>.json`

### What's Included
Each test case contains:
-  **Market State** - OHLCV data for all 4 timeframes (100 candles each)
-  **Entry/Exit Prices** - Exact prices from annotation
-  **Expected Outcome** - Direction (LONG/SHORT) and P&L percentage
-  **Timestamp** - When the trade occurred
-  **Action** - BUY or SELL signal

### Test Case Format
```json
{
  "test_case_id": "annotation_uuid",
  "symbol": "ETH/USDT",
  "timestamp": "2024-01-15T10:30:00Z",
  "action": "BUY",
  "market_state": {
    "ohlcv_1s": {
      "timestamps": [...],  // 100 candles
      "open": [...],
      "high": [...],
      "low": [...],
      "close": [...],
      "volume": [...]
    },
    "ohlcv_1m": {...},  // 100 candles
    "ohlcv_1h": {...},  // 100 candles
    "ohlcv_1d": {...}   // 100 candles
  },
  "expected_outcome": {
    "direction": "LONG",
    "profit_loss_pct": 2.5,
    "entry_price": 2400.50,
    "exit_price": 2460.75,
    "holding_period_seconds": 300
  }
}
```

---

## 🎓 Model Training

### Available Models
The system integrates with your existing models:
- **StandardizedCNN** - CNN model for pattern recognition
- **DQN** - Deep Q-Network for reinforcement learning
- **Transformer** - Transformer model for sequence analysis
- **COB** - Order book-based RL model

### Training Process

#### Step 1: Create Annotations
1. Mark profitable trades on historical data
2. Test cases are auto-generated and saved
3. Verify test cases exist in `ANNOTATE/data/test_cases/`

#### Step 2: Select Model
1. Open training panel (right sidebar)
2. Select model from dropdown
3. Available models are loaded from orchestrator

#### Step 3: Start Training
1. Click **"Train Model"** button
2. System loads all test cases from disk
3. Training starts in background thread
4. Progress displayed in real-time

#### Step 4: Monitor Progress
- **Current Epoch** - Shows training progress
- **Loss** - Training loss value
- **Status** - Running/Completed/Failed

### Training Details

**What Happens During Training:**
1. System loads all test cases from `ANNOTATE/data/test_cases/`
2. Prepares training data (market state → expected outcome)
3. Calls model's training method
4. Updates model weights based on annotations
5. Saves updated model checkpoint

**Training Parameters:**
- **Epochs**: 10 (configurable)
- **Batch Size**: Depends on model
- **Learning Rate**: Model-specific
- **Data**: All available test cases

---

##  Real-Time Inference

### Overview
Real-time inference mode runs your trained model on **live streaming data** from the DataProvider, generating predictions in real-time.

### Starting Real-Time Inference

#### Step 1: Select Model
Choose the model you want to run inference with.

#### Step 2: Start Inference
1. Click **"Start Live Inference"** button
2. System loads model from orchestrator
3. Connects to DataProvider's live data stream
4. Begins generating predictions every second

#### Step 3: Monitor Signals
- **Latest Signal** - BUY/SELL/HOLD
- **Confidence** - Model confidence (0-100%)
- **Price** - Current market price
- **Timestamp** - When signal was generated

### How It Works

```
DataProvider (Live Data)
    ↓
Latest Market State (4 timeframes)
    ↓
Model Inference
    ↓
Prediction (Action + Confidence)
    ↓
Display on UI + Chart Markers
```

### Signal Display
- Signals appear in training panel
- Latest 50 signals stored
- Can be displayed on charts (future feature)
- Updates every second

### Stopping Inference
1. Click **"Stop Inference"** button
2. Inference loop terminates
3. Final signals remain visible

---

## 🔧 Integration with Orchestrator

### Model Loading
Models are loaded directly from the orchestrator:

```python
# CNN Model
model = orchestrator.cnn_model

# DQN Agent
model = orchestrator.rl_agent

# Transformer
model = orchestrator.primary_transformer

# COB RL
model = orchestrator.cob_rl_agent
```

### Data Consistency
- Uses **same DataProvider** as main system
- Same cached data
- Same data structure
- Perfect consistency

---

## 📊 Training Workflow Example

### Scenario: Train CNN on Breakout Patterns

**Step 1: Annotate Trades**
```
1. Find 10 clear breakout patterns
2. Mark entry/exit for each
3. Test cases auto-generated
4. Result: 10 test cases in ANNOTATE/data/test_cases/
```

**Step 2: Train Model**
```
1. Select "StandardizedCNN" from dropdown
2. Click "Train Model"
3. System loads 10 test cases
4. Training runs for 10 epochs
5. Model learns breakout patterns
```

**Step 3: Test with Real-Time Inference**
```
1. Click "Start Live Inference"
2. Model analyzes live data
3. Generates BUY signals on breakouts
4. Monitor confidence levels
5. Verify model learned correctly
```

---

## 🎯 Best Practices

### For Training

**1. Quality Over Quantity**
- Start with 10-20 high-quality annotations
- Focus on clear, obvious patterns
- Verify each annotation is correct

**2. Diverse Scenarios**
- Include different market conditions
- Mix LONG and SHORT trades
- Various timeframes and volatility levels

**3. Incremental Training**
- Train with small batches first
- Verify model learns correctly
- Add more annotations gradually

**4. Test After Training**
- Use real-time inference to verify
- Check if model recognizes patterns
- Adjust annotations if needed

### For Real-Time Inference

**1. Monitor Confidence**
- High confidence (>70%) = Strong signal
- Medium confidence (50-70%) = Moderate signal
- Low confidence (<50%) = Weak signal

**2. Verify Against Charts**
- Check if signals make sense
- Compare with your own analysis
- Look for false positives

**3. Track Performance**
- Note which signals were correct
- Identify patterns in errors
- Use insights to improve annotations

---

## 🔍 Troubleshooting

### Training Issues

**Issue**: "No test cases found"
- **Solution**: Create annotations first, test cases are auto-generated

**Issue**: Training fails immediately
- **Solution**: Check model is loaded in orchestrator, verify test case format

**Issue**: Loss not decreasing
- **Solution**: May need more/better quality annotations, check data quality

### Inference Issues

**Issue**: No signals generated
- **Solution**: Verify DataProvider has live data, check model is loaded

**Issue**: All signals are HOLD
- **Solution**: Model may need more training, check confidence levels

**Issue**: Signals don't match expectations
- **Solution**: Review training data, may need different annotations

---

## 📈 Performance Metrics

### Training Metrics
- **Loss** - Lower is better (target: <0.1)
- **Accuracy** - Higher is better (target: >80%)
- **Epochs** - More epochs = more learning
- **Duration** - Training time in seconds

### Inference Metrics
- **Latency** - Time to generate prediction (~1s)
- **Confidence** - Model certainty (0-100%)
- **Signal Rate** - Predictions per minute
- **Accuracy** - Correct predictions vs total

---

##  Advanced Usage

### Custom Training Parameters
Edit `ANNOTATE/core/training_simulator.py`:
```python
'total_epochs': 10,  # Increase for more training
```

### Model-Specific Training
Each model type has its own training method:
- `_train_cnn()` - For CNN models
- `_train_dqn()` - For DQN agents
- `_train_transformer()` - For Transformers
- `_train_cob()` - For COB models

### Batch Training
Train on specific annotations:
```python
# In future: Select specific annotations for training
annotation_ids = ['id1', 'id2', 'id3']
```

---

## 📝 File Locations

### Test Cases
```
ANNOTATE/data/test_cases/annotation_<id>.json
```

### Training Results
```
ANNOTATE/data/training_results/
```

### Model Checkpoints
```
models/checkpoints/  (main system)
```

---

## 🎊 Summary

The ANNOTATE system provides:

 **Automatic Test Case Generation** - From annotations
 **Production-Ready Training** - Integrates with orchestrator
 **Real-Time Inference** - Live predictions on streaming data
 **Data Consistency** - Same data as main system
 **Easy Monitoring** - Real-time progress and signals

**You can now:**
1. Mark profitable trades
2. Generate training data automatically
3. Train models with your annotations
4. Test models with real-time inference
5. Monitor model performance live

---

**Happy Training!**