Files
gogo2/ANNOTATE/ARCHITECTURE_REFACTORING.md
2025-12-09 11:59:15 +02:00

226 lines
6.7 KiB
Markdown

# Event-Driven Inference Training System - Architecture & Refactoring
## Overview
Implemented a complete event-driven, reference-based inference training system that eliminates code duplication and provides a flexible, robust training pipeline.
## Architecture Decisions
### Component Placement
#### 1. **InferenceTrainingCoordinator → TradingOrchestrator** ✅
**Rationale:**
- Orchestrator already manages models, training, and predictions
- Centralizes coordination logic
- Reduces duplication (orchestrator has model access)
- Natural fit for inference-training coordination
**Location:** `core/orchestrator.py` (line ~514)
```python
self.inference_training_coordinator = InferenceTrainingCoordinator(
data_provider=self.data_provider,
duckdb_storage=self.data_provider.duckdb_storage
)
```
**Benefits:**
- Single source of truth for inference frame management
- Reuses orchestrator's model access
- Eliminates duplicate prediction storage logic
#### 2. **Event Subscription Methods → DataProvider** ✅
**Rationale:**
- Data layer responsibility - emits events when data changes
- Natural place for candle completion and pivot detection
**Location:** `core/data_provider.py`
- `subscribe_candle_completion()` - Subscribe to candle events
- `subscribe_pivot_events()` - Subscribe to pivot events
- `_emit_candle_completion()` - Emit when candle closes
- `_emit_pivot_event()` - Emit when pivot detected
- `_check_and_emit_pivot_events()` - Check for new pivots
**Benefits:**
- Clean separation of concerns
- Event-driven architecture
- Easy to extend with new event types
#### 3. **TrainingEventSubscriber Interface → RealTrainingAdapter** ✅
**Rationale:**
- Training layer implements subscriber interface
- Receives callbacks for training events
**Location:** `ANNOTATE/core/real_training_adapter.py`
- Implements `TrainingEventSubscriber`
- `on_candle_completion()` - Train on candle completion
- `on_pivot_event()` - Train on pivot detection
- Uses orchestrator's coordinator (no duplication)
**Benefits:**
- Clear interface for training adapters
- Supports multiple training methods
- Easy to add new adapters
## Code Duplication Reduction
### Before (Duplicated Logic)
1. **Data Retrieval:**
- `_get_realtime_market_data()` in RealTrainingAdapter
- Similar logic in orchestrator
- Similar logic in data_provider
2. **Prediction Storage:**
- `store_transformer_prediction()` in orchestrator
- `inference_input_cache` in RealTrainingAdapter session (copying 600 candles!)
- `prediction_cache` in app.py
3. **Training Coordination:**
- Training logic scattered across multiple files
- No centralized coordination
### After (Centralized)
1. **Data Retrieval:**
- Single source: `data_provider.get_historical_data()` queries DuckDB
- Coordinator retrieves data on-demand using references
- **No copying** - just timestamp ranges
2. **Prediction Storage:**
- Orchestrator's `inference_training_coordinator` manages references
- References stored (not copied) - just timestamp ranges + norm_params
- Data retrieved from DuckDB when needed
3. **Training Coordination:**
- Orchestrator's coordinator handles event distribution
- RealTrainingAdapter implements subscriber interface
- Single training lock in RealTrainingAdapter
## Implementation Details
### Reference-Based Storage
**InferenceFrameReference** stores:
- `data_range_start` / `data_range_end` (timestamp range for 600 candles)
- `norm_params` (small dict - can be stored)
- `predicted_action`, `predicted_candle`, `confidence`
- `target_timestamp` (for candles - when result will be available)
**No copying** - when training is triggered:
1. Coordinator retrieves data from DuckDB using reference
2. Normalizes using stored params
3. Creates training batch
4. Trains model
### Event-Driven Training
#### Time-Based (Candle Completion)
```
Candle Closes
DataProvider._update_candle() detects new candle
_emit_candle_completion() called
InferenceTrainingCoordinator._handle_candle_completion()
Matches inference frames by target_timestamp
Calls subscriber.on_candle_completion(event, inference_ref)
RealTrainingAdapter retrieves data from DuckDB
Trains model with actual candle result
```
#### Event-Based (Pivot Points)
```
Pivot Detected (L2L, L2H, etc.)
DataProvider.get_williams_pivot_levels() calculates pivots
_check_and_emit_pivot_events() finds new pivots
_emit_pivot_event() called
InferenceTrainingCoordinator._handle_pivot_event()
Finds matching inference frames (within 5-minute window)
Calls subscriber.on_pivot_event(event, inference_refs)
RealTrainingAdapter retrieves data from DuckDB
Trains model with pivot result
```
## Key Benefits
1. **Memory Efficient**: No copying 600 candles every second
2. **Event-Driven**: Clean separation of concerns
3. **Flexible**: Supports time-based (candles) and event-based (pivots)
4. **Centralized**: Coordinator in orchestrator reduces duplication
5. **Extensible**: Easy to add new training methods or event types
6. **Robust**: Proper error handling and thread safety
## Files Modified
1. **`ANNOTATE/core/inference_training_system.py`** (NEW)
- Core system with coordinator and events
2. **`core/data_provider.py`**
- Added subscription methods
- Added event emission
- Added pivot event checking
3. **`core/orchestrator.py`**
- Integrated InferenceTrainingCoordinator
4. **`ANNOTATE/core/real_training_adapter.py`**
- Implements TrainingEventSubscriber
- Uses orchestrator's coordinator
- Removed old caching code (reference-based now)
## Next Steps
1. **Test the System**
- Test candle completion events
- Test pivot events
- Test data retrieval from DuckDB
- Test training on inference frames
2. **Optimize Pivot Detection**
- Add periodic pivot checking (background thread)
- Cache pivot calculations
- Emit events more efficiently
3. **Extend DuckDB Schema**
- Add MA indicators to ohlcv_data
- Create pivot_points table
- Store technical indicators
4. **Remove Old Code**
- Remove `inference_input_cache` from session
- Remove `_make_realtime_prediction_with_cache()` (deprecated)
- Clean up duplicate code
## Summary
The system is now:
-**Memory efficient** - No copying 600 candles
-**Event-driven** - Clean architecture
-**Centralized** - Coordinator in orchestrator
-**Flexible** - Supports multiple training methods
-**Robust** - Proper error handling
The refactoring successfully reduces code duplication by:
1. Centralizing coordination in orchestrator
2. Using reference-based storage instead of copying
3. Implementing event-driven architecture
4. Reusing existing data provider and orchestrator infrastructure