# Event-Driven Inference Training System ## Overview This system provides a flexible, efficient, and robust training pipeline that: 1. **Stores inference frames by reference** (not copying 600 candles every second) 2. **Uses DuckDB** for efficient data storage and retrieval 3. **Subscribes to events** (candle completion, pivot points) for training triggers 4. **Supports multiple training methods** (backprop for Transformer, others for different models) ## Architecture ### Components 1. **InferenceTrainingCoordinator** (`inference_training_system.py`) - Manages inference frame references - Matches inference frames to actual results - Distributes training events to subscribers 2. **TrainingEventSubscriber** (interface) - Implemented by training adapters - Receives callbacks for candle completion and pivot events 3. **DataProvider Extensions** - `subscribe_candle_completion()` - Subscribe to candle completion events - `subscribe_pivot_events()` - Subscribe to pivot events (L2L, L2H, etc.) 4. **DuckDB Storage** - Stores OHLCV data, MA indicators, pivots - Efficient queries by timestamp range - No data copying - just references ## Data Flow ### 1. Inference Phase ``` Model Inference ↓ Create InferenceFrameReference ↓ Store reference (timestamp range, norm_params, prediction metadata) ↓ Register with InferenceTrainingCoordinator ``` **No copying** - just store: - `data_range_start` / `data_range_end` (timestamp range for 600 candles) - `norm_params` (small dict) - `predicted_action`, `predicted_candle`, `confidence` - `target_timestamp` (for candles - when result will be available) ### 2. Training Trigger Phase #### Time-Based (Candle Completion) ``` Candle Closes ↓ DataProvider emits CandleCompletionEvent ↓ InferenceTrainingCoordinator matches inference frames ↓ Calls subscriber.on_candle_completion(event, inference_ref) ↓ Training adapter retrieves data from DuckDB using reference ↓ Train model with actual candle result ``` #### Event-Based (Pivot Points) ``` Pivot Detected (L2L, L2H, etc.) ↓ DataProvider emits PivotEvent ↓ InferenceTrainingCoordinator finds matching inference frames ↓ Calls subscriber.on_pivot_event(event, inference_refs) ↓ Training adapter retrieves data from DuckDB ↓ Train model with pivot result ``` ## Implementation Steps ### Step 1: Extend DataProvider Add subscription methods to `core/data_provider.py`: ```python def subscribe_candle_completion(self, callback: Callable, symbol: str, timeframe: str): """Subscribe to candle completion events""" # Register callback # Emit event when candle closes def subscribe_pivot_events(self, callback: Callable, symbol: str, timeframe: str, pivot_types: List[str]): """Subscribe to pivot events (L2L, L2H, etc.)""" # Register callback # Emit event when pivot detected ``` ### Step 2: Update RealTrainingAdapter Make `RealTrainingAdapter` implement `TrainingEventSubscriber`: ```python class RealTrainingAdapter(TrainingEventSubscriber): def __init__(self, ...): # Initialize InferenceTrainingCoordinator self.training_coordinator = InferenceTrainingCoordinator( data_provider=self.data_provider, duckdb_storage=self.data_provider.duckdb_storage ) # Subscribe to events self.training_coordinator.subscribe_to_candle_completion( self, symbol='ETH/USDT', timeframe='1m' ) self.training_coordinator.subscribe_to_pivot_events( self, symbol='ETH/USDT', timeframe='1m', pivot_types=['L2L', 'L2H', 'L3L', 'L3H'] ) def on_candle_completion(self, event: CandleCompletionEvent, inference_ref: Optional[InferenceFrameReference]): """Called when candle completes""" if not inference_ref: return # No matching inference frame # Retrieve inference data from DuckDB model_inputs = self.training_coordinator.get_inference_data(inference_ref) if not model_inputs: return # Create training batch with actual candle batch = self._create_training_batch(model_inputs, event.ohlcv, inference_ref) # Train model (backprop for Transformer, other methods for other models) self._train_on_batch(batch, inference_ref) def on_pivot_event(self, event: PivotEvent, inference_refs: List[InferenceFrameReference]): """Called when pivot detected""" for inference_ref in inference_refs: # Retrieve inference data model_inputs = self.training_coordinator.get_inference_data(inference_ref) if not model_inputs: continue # Create training batch with pivot result batch = self._create_pivot_training_batch(model_inputs, event, inference_ref) # Train model self._train_on_batch(batch, inference_ref) ``` ### Step 3: Update Inference Loop In `_realtime_inference_loop()`, register inference frames: ```python # After making prediction prediction = self._make_realtime_prediction(...) # Create inference frame reference inference_ref = InferenceFrameReference( inference_id=str(uuid.uuid4()), symbol=symbol, timeframe=timeframe, prediction_timestamp=datetime.now(timezone.utc), target_timestamp=next_candle_time, # For candles data_range_start=start_time, # 600 candles before data_range_end=current_time, norm_params=norm_params, predicted_action=prediction['action'], predicted_candle=prediction['predicted_candle'], confidence=prediction['confidence'] ) # Register with coordinator (no copying!) self.training_coordinator.register_inference_frame(inference_ref) ``` ## Benefits 1. **Memory Efficient**: No copying 600 candles every second 2. **Flexible**: Supports time-based (candles) and event-based (pivots) training 3. **Robust**: Event-driven architecture with proper error handling 4. **Simple**: Clear separation of concerns 5. **Scalable**: DuckDB handles efficient queries 6. **Extensible**: Easy to add new training methods or event types ## DuckDB Schema Extensions Ensure DuckDB stores: - OHLCV data (already exists) - MA indicators (add to ohlcv_data or separate table) - Pivot points (add pivot_data table) ```sql -- Add technical indicators to ohlcv_data ALTER TABLE ohlcv_data ADD COLUMN sma_10 DOUBLE; ALTER TABLE ohlcv_data ADD COLUMN sma_20 DOUBLE; ALTER TABLE ohlcv_data ADD COLUMN ema_12 DOUBLE; -- ... etc -- Create pivot points table CREATE TABLE IF NOT EXISTS pivot_points ( id INTEGER PRIMARY KEY, symbol VARCHAR NOT NULL, timeframe VARCHAR NOT NULL, timestamp BIGINT NOT NULL, price DOUBLE NOT NULL, pivot_type VARCHAR NOT NULL, -- 'L2L', 'L2H', etc. level INTEGER NOT NULL, strength DOUBLE NOT NULL, UNIQUE(symbol, timeframe, timestamp, pivot_type) ); ``` ## Next Steps 1. Implement DataProvider subscription methods 2. Update RealTrainingAdapter to use InferenceTrainingCoordinator 3. Extend DuckDB schema for indicators and pivots 4. Test with live inference 5. Add support for other model types (not just Transformer)