226 lines
6.7 KiB
Markdown
226 lines
6.7 KiB
Markdown
# Event-Driven Inference Training System - Architecture & Refactoring
|
|
|
|
## Overview
|
|
|
|
Implemented a complete event-driven, reference-based inference training system that eliminates code duplication and provides a flexible, robust training pipeline.
|
|
|
|
## Architecture Decisions
|
|
|
|
### Component Placement
|
|
|
|
#### 1. **InferenceTrainingCoordinator → TradingOrchestrator** ✅
|
|
|
|
**Rationale:**
|
|
- Orchestrator already manages models, training, and predictions
|
|
- Centralizes coordination logic
|
|
- Reduces duplication (orchestrator has model access)
|
|
- Natural fit for inference-training coordination
|
|
|
|
**Location:** `core/orchestrator.py` (line ~514)
|
|
```python
|
|
self.inference_training_coordinator = InferenceTrainingCoordinator(
|
|
data_provider=self.data_provider,
|
|
duckdb_storage=self.data_provider.duckdb_storage
|
|
)
|
|
```
|
|
|
|
**Benefits:**
|
|
- Single source of truth for inference frame management
|
|
- Reuses orchestrator's model access
|
|
- Eliminates duplicate prediction storage logic
|
|
|
|
#### 2. **Event Subscription Methods → DataProvider** ✅
|
|
|
|
**Rationale:**
|
|
- Data layer responsibility - emits events when data changes
|
|
- Natural place for candle completion and pivot detection
|
|
|
|
**Location:** `core/data_provider.py`
|
|
- `subscribe_candle_completion()` - Subscribe to candle events
|
|
- `subscribe_pivot_events()` - Subscribe to pivot events
|
|
- `_emit_candle_completion()` - Emit when candle closes
|
|
- `_emit_pivot_event()` - Emit when pivot detected
|
|
- `_check_and_emit_pivot_events()` - Check for new pivots
|
|
|
|
**Benefits:**
|
|
- Clean separation of concerns
|
|
- Event-driven architecture
|
|
- Easy to extend with new event types
|
|
|
|
#### 3. **TrainingEventSubscriber Interface → RealTrainingAdapter** ✅
|
|
|
|
**Rationale:**
|
|
- Training layer implements subscriber interface
|
|
- Receives callbacks for training events
|
|
|
|
**Location:** `ANNOTATE/core/real_training_adapter.py`
|
|
- Implements `TrainingEventSubscriber`
|
|
- `on_candle_completion()` - Train on candle completion
|
|
- `on_pivot_event()` - Train on pivot detection
|
|
- Uses orchestrator's coordinator (no duplication)
|
|
|
|
**Benefits:**
|
|
- Clear interface for training adapters
|
|
- Supports multiple training methods
|
|
- Easy to add new adapters
|
|
|
|
## Code Duplication Reduction
|
|
|
|
### Before (Duplicated Logic)
|
|
|
|
1. **Data Retrieval:**
|
|
- `_get_realtime_market_data()` in RealTrainingAdapter
|
|
- Similar logic in orchestrator
|
|
- Similar logic in data_provider
|
|
|
|
2. **Prediction Storage:**
|
|
- `store_transformer_prediction()` in orchestrator
|
|
- `inference_input_cache` in RealTrainingAdapter session (copying 600 candles!)
|
|
- `prediction_cache` in app.py
|
|
|
|
3. **Training Coordination:**
|
|
- Training logic scattered across multiple files
|
|
- No centralized coordination
|
|
|
|
### After (Centralized)
|
|
|
|
1. **Data Retrieval:**
|
|
- Single source: `data_provider.get_historical_data()` queries DuckDB
|
|
- Coordinator retrieves data on-demand using references
|
|
- **No copying** - just timestamp ranges
|
|
|
|
2. **Prediction Storage:**
|
|
- Orchestrator's `inference_training_coordinator` manages references
|
|
- References stored (not copied) - just timestamp ranges + norm_params
|
|
- Data retrieved from DuckDB when needed
|
|
|
|
3. **Training Coordination:**
|
|
- Orchestrator's coordinator handles event distribution
|
|
- RealTrainingAdapter implements subscriber interface
|
|
- Single training lock in RealTrainingAdapter
|
|
|
|
## Implementation Details
|
|
|
|
### Reference-Based Storage
|
|
|
|
**InferenceFrameReference** stores:
|
|
- `data_range_start` / `data_range_end` (timestamp range for 600 candles)
|
|
- `norm_params` (small dict - can be stored)
|
|
- `predicted_action`, `predicted_candle`, `confidence`
|
|
- `target_timestamp` (for candles - when result will be available)
|
|
|
|
**No copying** - when training is triggered:
|
|
1. Coordinator retrieves data from DuckDB using reference
|
|
2. Normalizes using stored params
|
|
3. Creates training batch
|
|
4. Trains model
|
|
|
|
### Event-Driven Training
|
|
|
|
#### Time-Based (Candle Completion)
|
|
|
|
```
|
|
Candle Closes
|
|
↓
|
|
DataProvider._update_candle() detects new candle
|
|
↓
|
|
_emit_candle_completion() called
|
|
↓
|
|
InferenceTrainingCoordinator._handle_candle_completion()
|
|
↓
|
|
Matches inference frames by target_timestamp
|
|
↓
|
|
Calls subscriber.on_candle_completion(event, inference_ref)
|
|
↓
|
|
RealTrainingAdapter retrieves data from DuckDB
|
|
↓
|
|
Trains model with actual candle result
|
|
```
|
|
|
|
#### Event-Based (Pivot Points)
|
|
|
|
```
|
|
Pivot Detected (L2L, L2H, etc.)
|
|
↓
|
|
DataProvider.get_williams_pivot_levels() calculates pivots
|
|
↓
|
|
_check_and_emit_pivot_events() finds new pivots
|
|
↓
|
|
_emit_pivot_event() called
|
|
↓
|
|
InferenceTrainingCoordinator._handle_pivot_event()
|
|
↓
|
|
Finds matching inference frames (within 5-minute window)
|
|
↓
|
|
Calls subscriber.on_pivot_event(event, inference_refs)
|
|
↓
|
|
RealTrainingAdapter retrieves data from DuckDB
|
|
↓
|
|
Trains model with pivot result
|
|
```
|
|
|
|
## Key Benefits
|
|
|
|
1. **Memory Efficient**: No copying 600 candles every second
|
|
2. **Event-Driven**: Clean separation of concerns
|
|
3. **Flexible**: Supports time-based (candles) and event-based (pivots)
|
|
4. **Centralized**: Coordinator in orchestrator reduces duplication
|
|
5. **Extensible**: Easy to add new training methods or event types
|
|
6. **Robust**: Proper error handling and thread safety
|
|
|
|
## Files Modified
|
|
|
|
1. **`ANNOTATE/core/inference_training_system.py`** (NEW)
|
|
- Core system with coordinator and events
|
|
|
|
2. **`core/data_provider.py`**
|
|
- Added subscription methods
|
|
- Added event emission
|
|
- Added pivot event checking
|
|
|
|
3. **`core/orchestrator.py`**
|
|
- Integrated InferenceTrainingCoordinator
|
|
|
|
4. **`ANNOTATE/core/real_training_adapter.py`**
|
|
- Implements TrainingEventSubscriber
|
|
- Uses orchestrator's coordinator
|
|
- Removed old caching code (reference-based now)
|
|
|
|
## Next Steps
|
|
|
|
1. **Test the System**
|
|
- Test candle completion events
|
|
- Test pivot events
|
|
- Test data retrieval from DuckDB
|
|
- Test training on inference frames
|
|
|
|
2. **Optimize Pivot Detection**
|
|
- Add periodic pivot checking (background thread)
|
|
- Cache pivot calculations
|
|
- Emit events more efficiently
|
|
|
|
3. **Extend DuckDB Schema**
|
|
- Add MA indicators to ohlcv_data
|
|
- Create pivot_points table
|
|
- Store technical indicators
|
|
|
|
4. **Remove Old Code**
|
|
- Remove `inference_input_cache` from session
|
|
- Remove `_make_realtime_prediction_with_cache()` (deprecated)
|
|
- Clean up duplicate code
|
|
|
|
## Summary
|
|
|
|
The system is now:
|
|
- ✅ **Memory efficient** - No copying 600 candles
|
|
- ✅ **Event-driven** - Clean architecture
|
|
- ✅ **Centralized** - Coordinator in orchestrator
|
|
- ✅ **Flexible** - Supports multiple training methods
|
|
- ✅ **Robust** - Proper error handling
|
|
|
|
The refactoring successfully reduces code duplication by:
|
|
1. Centralizing coordination in orchestrator
|
|
2. Using reference-based storage instead of copying
|
|
3. Implementing event-driven architecture
|
|
4. Reusing existing data provider and orchestrator infrastructure
|