refactoring. inference real data triggers
This commit is contained in:
228
ANNOTATE/INFERENCE_TRAINING_SYSTEM.md
Normal file
228
ANNOTATE/INFERENCE_TRAINING_SYSTEM.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# Event-Driven Inference Training System
|
||||
|
||||
## Overview
|
||||
|
||||
This system provides a flexible, efficient, and robust training pipeline that:
|
||||
1. **Stores inference frames by reference** (not copying 600 candles every second)
|
||||
2. **Uses DuckDB** for efficient data storage and retrieval
|
||||
3. **Subscribes to events** (candle completion, pivot points) for training triggers
|
||||
4. **Supports multiple training methods** (backprop for Transformer, others for different models)
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **InferenceTrainingCoordinator** (`inference_training_system.py`)
|
||||
- Manages inference frame references
|
||||
- Matches inference frames to actual results
|
||||
- Distributes training events to subscribers
|
||||
|
||||
2. **TrainingEventSubscriber** (interface)
|
||||
- Implemented by training adapters
|
||||
- Receives callbacks for candle completion and pivot events
|
||||
|
||||
3. **DataProvider Extensions**
|
||||
- `subscribe_candle_completion()` - Subscribe to candle completion events
|
||||
- `subscribe_pivot_events()` - Subscribe to pivot events (L2L, L2H, etc.)
|
||||
|
||||
4. **DuckDB Storage**
|
||||
- Stores OHLCV data, MA indicators, pivots
|
||||
- Efficient queries by timestamp range
|
||||
- No data copying - just references
|
||||
|
||||
## Data Flow
|
||||
|
||||
### 1. Inference Phase
|
||||
|
||||
```
|
||||
Model Inference
|
||||
↓
|
||||
Create InferenceFrameReference
|
||||
↓
|
||||
Store reference (timestamp range, norm_params, prediction metadata)
|
||||
↓
|
||||
Register with InferenceTrainingCoordinator
|
||||
```
|
||||
|
||||
**No copying** - just store:
|
||||
- `data_range_start` / `data_range_end` (timestamp range for 600 candles)
|
||||
- `norm_params` (small dict)
|
||||
- `predicted_action`, `predicted_candle`, `confidence`
|
||||
- `target_timestamp` (for candles - when result will be available)
|
||||
|
||||
### 2. Training Trigger Phase
|
||||
|
||||
#### Time-Based (Candle Completion)
|
||||
|
||||
```
|
||||
Candle Closes
|
||||
↓
|
||||
DataProvider emits CandleCompletionEvent
|
||||
↓
|
||||
InferenceTrainingCoordinator matches inference frames
|
||||
↓
|
||||
Calls subscriber.on_candle_completion(event, inference_ref)
|
||||
↓
|
||||
Training adapter retrieves data from DuckDB using reference
|
||||
↓
|
||||
Train model with actual candle result
|
||||
```
|
||||
|
||||
#### Event-Based (Pivot Points)
|
||||
|
||||
```
|
||||
Pivot Detected (L2L, L2H, etc.)
|
||||
↓
|
||||
DataProvider emits PivotEvent
|
||||
↓
|
||||
InferenceTrainingCoordinator finds matching inference frames
|
||||
↓
|
||||
Calls subscriber.on_pivot_event(event, inference_refs)
|
||||
↓
|
||||
Training adapter retrieves data from DuckDB
|
||||
↓
|
||||
Train model with pivot result
|
||||
```
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Extend DataProvider
|
||||
|
||||
Add subscription methods to `core/data_provider.py`:
|
||||
|
||||
```python
|
||||
def subscribe_candle_completion(self, callback: Callable, symbol: str, timeframe: str):
|
||||
"""Subscribe to candle completion events"""
|
||||
# Register callback
|
||||
# Emit event when candle closes
|
||||
|
||||
def subscribe_pivot_events(self, callback: Callable, symbol: str, timeframe: str, pivot_types: List[str]):
|
||||
"""Subscribe to pivot events (L2L, L2H, etc.)"""
|
||||
# Register callback
|
||||
# Emit event when pivot detected
|
||||
```
|
||||
|
||||
### Step 2: Update RealTrainingAdapter
|
||||
|
||||
Make `RealTrainingAdapter` implement `TrainingEventSubscriber`:
|
||||
|
||||
```python
|
||||
class RealTrainingAdapter(TrainingEventSubscriber):
|
||||
def __init__(self, ...):
|
||||
# Initialize InferenceTrainingCoordinator
|
||||
self.training_coordinator = InferenceTrainingCoordinator(
|
||||
data_provider=self.data_provider,
|
||||
duckdb_storage=self.data_provider.duckdb_storage
|
||||
)
|
||||
|
||||
# Subscribe to events
|
||||
self.training_coordinator.subscribe_to_candle_completion(
|
||||
self, symbol='ETH/USDT', timeframe='1m'
|
||||
)
|
||||
self.training_coordinator.subscribe_to_pivot_events(
|
||||
self, symbol='ETH/USDT', timeframe='1m',
|
||||
pivot_types=['L2L', 'L2H', 'L3L', 'L3H']
|
||||
)
|
||||
|
||||
def on_candle_completion(self, event: CandleCompletionEvent, inference_ref: Optional[InferenceFrameReference]):
|
||||
"""Called when candle completes"""
|
||||
if not inference_ref:
|
||||
return # No matching inference frame
|
||||
|
||||
# Retrieve inference data from DuckDB
|
||||
model_inputs = self.training_coordinator.get_inference_data(inference_ref)
|
||||
if not model_inputs:
|
||||
return
|
||||
|
||||
# Create training batch with actual candle
|
||||
batch = self._create_training_batch(model_inputs, event.ohlcv, inference_ref)
|
||||
|
||||
# Train model (backprop for Transformer, other methods for other models)
|
||||
self._train_on_batch(batch, inference_ref)
|
||||
|
||||
def on_pivot_event(self, event: PivotEvent, inference_refs: List[InferenceFrameReference]):
|
||||
"""Called when pivot detected"""
|
||||
for inference_ref in inference_refs:
|
||||
# Retrieve inference data
|
||||
model_inputs = self.training_coordinator.get_inference_data(inference_ref)
|
||||
if not model_inputs:
|
||||
continue
|
||||
|
||||
# Create training batch with pivot result
|
||||
batch = self._create_pivot_training_batch(model_inputs, event, inference_ref)
|
||||
|
||||
# Train model
|
||||
self._train_on_batch(batch, inference_ref)
|
||||
```
|
||||
|
||||
### Step 3: Update Inference Loop
|
||||
|
||||
In `_realtime_inference_loop()`, register inference frames:
|
||||
|
||||
```python
|
||||
# After making prediction
|
||||
prediction = self._make_realtime_prediction(...)
|
||||
|
||||
# Create inference frame reference
|
||||
inference_ref = InferenceFrameReference(
|
||||
inference_id=str(uuid.uuid4()),
|
||||
symbol=symbol,
|
||||
timeframe=timeframe,
|
||||
prediction_timestamp=datetime.now(timezone.utc),
|
||||
target_timestamp=next_candle_time, # For candles
|
||||
data_range_start=start_time, # 600 candles before
|
||||
data_range_end=current_time,
|
||||
norm_params=norm_params,
|
||||
predicted_action=prediction['action'],
|
||||
predicted_candle=prediction['predicted_candle'],
|
||||
confidence=prediction['confidence']
|
||||
)
|
||||
|
||||
# Register with coordinator (no copying!)
|
||||
self.training_coordinator.register_inference_frame(inference_ref)
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Memory Efficient**: No copying 600 candles every second
|
||||
2. **Flexible**: Supports time-based (candles) and event-based (pivots) training
|
||||
3. **Robust**: Event-driven architecture with proper error handling
|
||||
4. **Simple**: Clear separation of concerns
|
||||
5. **Scalable**: DuckDB handles efficient queries
|
||||
6. **Extensible**: Easy to add new training methods or event types
|
||||
|
||||
## DuckDB Schema Extensions
|
||||
|
||||
Ensure DuckDB stores:
|
||||
- OHLCV data (already exists)
|
||||
- MA indicators (add to ohlcv_data or separate table)
|
||||
- Pivot points (add pivot_data table)
|
||||
|
||||
```sql
|
||||
-- Add technical indicators to ohlcv_data
|
||||
ALTER TABLE ohlcv_data ADD COLUMN sma_10 DOUBLE;
|
||||
ALTER TABLE ohlcv_data ADD COLUMN sma_20 DOUBLE;
|
||||
ALTER TABLE ohlcv_data ADD COLUMN ema_12 DOUBLE;
|
||||
-- ... etc
|
||||
|
||||
-- Create pivot points table
|
||||
CREATE TABLE IF NOT EXISTS pivot_points (
|
||||
id INTEGER PRIMARY KEY,
|
||||
symbol VARCHAR NOT NULL,
|
||||
timeframe VARCHAR NOT NULL,
|
||||
timestamp BIGINT NOT NULL,
|
||||
price DOUBLE NOT NULL,
|
||||
pivot_type VARCHAR NOT NULL, -- 'L2L', 'L2H', etc.
|
||||
level INTEGER NOT NULL,
|
||||
strength DOUBLE NOT NULL,
|
||||
UNIQUE(symbol, timeframe, timestamp, pivot_type)
|
||||
);
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Implement DataProvider subscription methods
|
||||
2. Update RealTrainingAdapter to use InferenceTrainingCoordinator
|
||||
3. Extend DuckDB schema for indicators and pivots
|
||||
4. Test with live inference
|
||||
5. Add support for other model types (not just Transformer)
|
||||
Reference in New Issue
Block a user