Files
gogo2/ANNOTATE/ARCHITECTURE_REFACTORING.md
2025-12-09 11:59:15 +02:00

6.7 KiB

Event-Driven Inference Training System - Architecture & Refactoring

Overview

Implemented a complete event-driven, reference-based inference training system that eliminates code duplication and provides a flexible, robust training pipeline.

Architecture Decisions

Component Placement

1. InferenceTrainingCoordinator → TradingOrchestrator

Rationale:

  • Orchestrator already manages models, training, and predictions
  • Centralizes coordination logic
  • Reduces duplication (orchestrator has model access)
  • Natural fit for inference-training coordination

Location: core/orchestrator.py (line ~514)

self.inference_training_coordinator = InferenceTrainingCoordinator(
    data_provider=self.data_provider,
    duckdb_storage=self.data_provider.duckdb_storage
)

Benefits:

  • Single source of truth for inference frame management
  • Reuses orchestrator's model access
  • Eliminates duplicate prediction storage logic

2. Event Subscription Methods → DataProvider

Rationale:

  • Data layer responsibility - emits events when data changes
  • Natural place for candle completion and pivot detection

Location: core/data_provider.py

  • subscribe_candle_completion() - Subscribe to candle events
  • subscribe_pivot_events() - Subscribe to pivot events
  • _emit_candle_completion() - Emit when candle closes
  • _emit_pivot_event() - Emit when pivot detected
  • _check_and_emit_pivot_events() - Check for new pivots

Benefits:

  • Clean separation of concerns
  • Event-driven architecture
  • Easy to extend with new event types

3. TrainingEventSubscriber Interface → RealTrainingAdapter

Rationale:

  • Training layer implements subscriber interface
  • Receives callbacks for training events

Location: ANNOTATE/core/real_training_adapter.py

  • Implements TrainingEventSubscriber
  • on_candle_completion() - Train on candle completion
  • on_pivot_event() - Train on pivot detection
  • Uses orchestrator's coordinator (no duplication)

Benefits:

  • Clear interface for training adapters
  • Supports multiple training methods
  • Easy to add new adapters

Code Duplication Reduction

Before (Duplicated Logic)

  1. Data Retrieval:

    • _get_realtime_market_data() in RealTrainingAdapter
    • Similar logic in orchestrator
    • Similar logic in data_provider
  2. Prediction Storage:

    • store_transformer_prediction() in orchestrator
    • inference_input_cache in RealTrainingAdapter session (copying 600 candles!)
    • prediction_cache in app.py
  3. Training Coordination:

    • Training logic scattered across multiple files
    • No centralized coordination

After (Centralized)

  1. Data Retrieval:

    • Single source: data_provider.get_historical_data() queries DuckDB
    • Coordinator retrieves data on-demand using references
    • No copying - just timestamp ranges
  2. Prediction Storage:

    • Orchestrator's inference_training_coordinator manages references
    • References stored (not copied) - just timestamp ranges + norm_params
    • Data retrieved from DuckDB when needed
  3. Training Coordination:

    • Orchestrator's coordinator handles event distribution
    • RealTrainingAdapter implements subscriber interface
    • Single training lock in RealTrainingAdapter

Implementation Details

Reference-Based Storage

InferenceFrameReference stores:

  • data_range_start / data_range_end (timestamp range for 600 candles)
  • norm_params (small dict - can be stored)
  • predicted_action, predicted_candle, confidence
  • target_timestamp (for candles - when result will be available)

No copying - when training is triggered:

  1. Coordinator retrieves data from DuckDB using reference
  2. Normalizes using stored params
  3. Creates training batch
  4. Trains model

Event-Driven Training

Time-Based (Candle Completion)

Candle Closes
    ↓
DataProvider._update_candle() detects new candle
    ↓
_emit_candle_completion() called
    ↓
InferenceTrainingCoordinator._handle_candle_completion()
    ↓
Matches inference frames by target_timestamp
    ↓
Calls subscriber.on_candle_completion(event, inference_ref)
    ↓
RealTrainingAdapter retrieves data from DuckDB
    ↓
Trains model with actual candle result

Event-Based (Pivot Points)

Pivot Detected (L2L, L2H, etc.)
    ↓
DataProvider.get_williams_pivot_levels() calculates pivots
    ↓
_check_and_emit_pivot_events() finds new pivots
    ↓
_emit_pivot_event() called
    ↓
InferenceTrainingCoordinator._handle_pivot_event()
    ↓
Finds matching inference frames (within 5-minute window)
    ↓
Calls subscriber.on_pivot_event(event, inference_refs)
    ↓
RealTrainingAdapter retrieves data from DuckDB
    ↓
Trains model with pivot result

Key Benefits

  1. Memory Efficient: No copying 600 candles every second
  2. Event-Driven: Clean separation of concerns
  3. Flexible: Supports time-based (candles) and event-based (pivots)
  4. Centralized: Coordinator in orchestrator reduces duplication
  5. Extensible: Easy to add new training methods or event types
  6. Robust: Proper error handling and thread safety

Files Modified

  1. ANNOTATE/core/inference_training_system.py (NEW)

    • Core system with coordinator and events
  2. core/data_provider.py

    • Added subscription methods
    • Added event emission
    • Added pivot event checking
  3. core/orchestrator.py

    • Integrated InferenceTrainingCoordinator
  4. ANNOTATE/core/real_training_adapter.py

    • Implements TrainingEventSubscriber
    • Uses orchestrator's coordinator
    • Removed old caching code (reference-based now)

Next Steps

  1. Test the System

    • Test candle completion events
    • Test pivot events
    • Test data retrieval from DuckDB
    • Test training on inference frames
  2. Optimize Pivot Detection

    • Add periodic pivot checking (background thread)
    • Cache pivot calculations
    • Emit events more efficiently
  3. Extend DuckDB Schema

    • Add MA indicators to ohlcv_data
    • Create pivot_points table
    • Store technical indicators
  4. Remove Old Code

    • Remove inference_input_cache from session
    • Remove _make_realtime_prediction_with_cache() (deprecated)
    • Clean up duplicate code

Summary

The system is now:

  • Memory efficient - No copying 600 candles
  • Event-driven - Clean architecture
  • Centralized - Coordinator in orchestrator
  • Flexible - Supports multiple training methods
  • Robust - Proper error handling

The refactoring successfully reduces code duplication by:

  1. Centralizing coordination in orchestrator
  2. Using reference-based storage instead of copying
  3. Implementing event-driven architecture
  4. Reusing existing data provider and orchestrator infrastructure