# Event-Driven Inference Training System - Architecture & Refactoring ## Overview Implemented a complete event-driven, reference-based inference training system that eliminates code duplication and provides a flexible, robust training pipeline. ## Architecture Decisions ### Component Placement #### 1. **InferenceTrainingCoordinator → TradingOrchestrator** ✅ **Rationale:** - Orchestrator already manages models, training, and predictions - Centralizes coordination logic - Reduces duplication (orchestrator has model access) - Natural fit for inference-training coordination **Location:** `core/orchestrator.py` (line ~514) ```python self.inference_training_coordinator = InferenceTrainingCoordinator( data_provider=self.data_provider, duckdb_storage=self.data_provider.duckdb_storage ) ``` **Benefits:** - Single source of truth for inference frame management - Reuses orchestrator's model access - Eliminates duplicate prediction storage logic #### 2. **Event Subscription Methods → DataProvider** ✅ **Rationale:** - Data layer responsibility - emits events when data changes - Natural place for candle completion and pivot detection **Location:** `core/data_provider.py` - `subscribe_candle_completion()` - Subscribe to candle events - `subscribe_pivot_events()` - Subscribe to pivot events - `_emit_candle_completion()` - Emit when candle closes - `_emit_pivot_event()` - Emit when pivot detected - `_check_and_emit_pivot_events()` - Check for new pivots **Benefits:** - Clean separation of concerns - Event-driven architecture - Easy to extend with new event types #### 3. **TrainingEventSubscriber Interface → RealTrainingAdapter** ✅ **Rationale:** - Training layer implements subscriber interface - Receives callbacks for training events **Location:** `ANNOTATE/core/real_training_adapter.py` - Implements `TrainingEventSubscriber` - `on_candle_completion()` - Train on candle completion - `on_pivot_event()` - Train on pivot detection - Uses orchestrator's coordinator (no duplication) **Benefits:** - Clear interface for training adapters - Supports multiple training methods - Easy to add new adapters ## Code Duplication Reduction ### Before (Duplicated Logic) 1. **Data Retrieval:** - `_get_realtime_market_data()` in RealTrainingAdapter - Similar logic in orchestrator - Similar logic in data_provider 2. **Prediction Storage:** - `store_transformer_prediction()` in orchestrator - `inference_input_cache` in RealTrainingAdapter session (copying 600 candles!) - `prediction_cache` in app.py 3. **Training Coordination:** - Training logic scattered across multiple files - No centralized coordination ### After (Centralized) 1. **Data Retrieval:** - Single source: `data_provider.get_historical_data()` queries DuckDB - Coordinator retrieves data on-demand using references - **No copying** - just timestamp ranges 2. **Prediction Storage:** - Orchestrator's `inference_training_coordinator` manages references - References stored (not copied) - just timestamp ranges + norm_params - Data retrieved from DuckDB when needed 3. **Training Coordination:** - Orchestrator's coordinator handles event distribution - RealTrainingAdapter implements subscriber interface - Single training lock in RealTrainingAdapter ## Implementation Details ### Reference-Based Storage **InferenceFrameReference** stores: - `data_range_start` / `data_range_end` (timestamp range for 600 candles) - `norm_params` (small dict - can be stored) - `predicted_action`, `predicted_candle`, `confidence` - `target_timestamp` (for candles - when result will be available) **No copying** - when training is triggered: 1. Coordinator retrieves data from DuckDB using reference 2. Normalizes using stored params 3. Creates training batch 4. Trains model ### Event-Driven Training #### Time-Based (Candle Completion) ``` Candle Closes ↓ DataProvider._update_candle() detects new candle ↓ _emit_candle_completion() called ↓ InferenceTrainingCoordinator._handle_candle_completion() ↓ Matches inference frames by target_timestamp ↓ Calls subscriber.on_candle_completion(event, inference_ref) ↓ RealTrainingAdapter retrieves data from DuckDB ↓ Trains model with actual candle result ``` #### Event-Based (Pivot Points) ``` Pivot Detected (L2L, L2H, etc.) ↓ DataProvider.get_williams_pivot_levels() calculates pivots ↓ _check_and_emit_pivot_events() finds new pivots ↓ _emit_pivot_event() called ↓ InferenceTrainingCoordinator._handle_pivot_event() ↓ Finds matching inference frames (within 5-minute window) ↓ Calls subscriber.on_pivot_event(event, inference_refs) ↓ RealTrainingAdapter retrieves data from DuckDB ↓ Trains model with pivot result ``` ## Key Benefits 1. **Memory Efficient**: No copying 600 candles every second 2. **Event-Driven**: Clean separation of concerns 3. **Flexible**: Supports time-based (candles) and event-based (pivots) 4. **Centralized**: Coordinator in orchestrator reduces duplication 5. **Extensible**: Easy to add new training methods or event types 6. **Robust**: Proper error handling and thread safety ## Files Modified 1. **`ANNOTATE/core/inference_training_system.py`** (NEW) - Core system with coordinator and events 2. **`core/data_provider.py`** - Added subscription methods - Added event emission - Added pivot event checking 3. **`core/orchestrator.py`** - Integrated InferenceTrainingCoordinator 4. **`ANNOTATE/core/real_training_adapter.py`** - Implements TrainingEventSubscriber - Uses orchestrator's coordinator - Removed old caching code (reference-based now) ## Next Steps 1. **Test the System** - Test candle completion events - Test pivot events - Test data retrieval from DuckDB - Test training on inference frames 2. **Optimize Pivot Detection** - Add periodic pivot checking (background thread) - Cache pivot calculations - Emit events more efficiently 3. **Extend DuckDB Schema** - Add MA indicators to ohlcv_data - Create pivot_points table - Store technical indicators 4. **Remove Old Code** - Remove `inference_input_cache` from session - Remove `_make_realtime_prediction_with_cache()` (deprecated) - Clean up duplicate code ## Summary The system is now: - ✅ **Memory efficient** - No copying 600 candles - ✅ **Event-driven** - Clean architecture - ✅ **Centralized** - Coordinator in orchestrator - ✅ **Flexible** - Supports multiple training methods - ✅ **Robust** - Proper error handling The refactoring successfully reduces code duplication by: 1. Centralizing coordination in orchestrator 2. Using reference-based storage instead of copying 3. Implementing event-driven architecture 4. Reusing existing data provider and orchestrator infrastructure