popov/gogo2

Fork 0

Files

Dobromir Popov 992d6de25b refactoring. inference real data triggers

2025-12-09 11:59:15 +02:00

7.1 KiB

Raw Blame History

Event-Driven Inference Training System

Overview

This system provides a flexible, efficient, and robust training pipeline that:

Stores inference frames by reference (not copying 600 candles every second)
Uses DuckDB for efficient data storage and retrieval
Subscribes to events (candle completion, pivot points) for training triggers
Supports multiple training methods (backprop for Transformer, others for different models)

Architecture

Components

InferenceTrainingCoordinator (inference_training_system.py)
- Manages inference frame references
- Matches inference frames to actual results
- Distributes training events to subscribers
TrainingEventSubscriber (interface)
- Implemented by training adapters
- Receives callbacks for candle completion and pivot events
DataProvider Extensions
- subscribe_candle_completion() - Subscribe to candle completion events
- subscribe_pivot_events() - Subscribe to pivot events (L2L, L2H, etc.)
DuckDB Storage
- Stores OHLCV data, MA indicators, pivots
- Efficient queries by timestamp range
- No data copying - just references

Data Flow

1. Inference Phase

Model Inference
    ↓
Create InferenceFrameReference
    ↓
Store reference (timestamp range, norm_params, prediction metadata)
    ↓
Register with InferenceTrainingCoordinator

No copying - just store:

data_range_start / data_range_end (timestamp range for 600 candles)
norm_params (small dict)
predicted_action, predicted_candle, confidence
target_timestamp (for candles - when result will be available)

2. Training Trigger Phase

Time-Based (Candle Completion)

Candle Closes
    ↓
DataProvider emits CandleCompletionEvent
    ↓
InferenceTrainingCoordinator matches inference frames
    ↓
Calls subscriber.on_candle_completion(event, inference_ref)
    ↓
Training adapter retrieves data from DuckDB using reference
    ↓
Train model with actual candle result

Event-Based (Pivot Points)

Pivot Detected (L2L, L2H, etc.)
    ↓
DataProvider emits PivotEvent
    ↓
InferenceTrainingCoordinator finds matching inference frames
    ↓
Calls subscriber.on_pivot_event(event, inference_refs)
    ↓
Training adapter retrieves data from DuckDB
    ↓
Train model with pivot result

Implementation Steps

Step 1: Extend DataProvider

Add subscription methods to core/data_provider.py:

def subscribe_candle_completion(self, callback: Callable, symbol: str, timeframe: str):
    """Subscribe to candle completion events"""
    # Register callback
    # Emit event when candle closes

def subscribe_pivot_events(self, callback: Callable, symbol: str, timeframe: str, pivot_types: List[str]):
    """Subscribe to pivot events (L2L, L2H, etc.)"""
    # Register callback
    # Emit event when pivot detected

Step 2: Update RealTrainingAdapter

Make RealTrainingAdapter implement TrainingEventSubscriber:

class RealTrainingAdapter(TrainingEventSubscriber):
    def __init__(self, ...):
        # Initialize InferenceTrainingCoordinator
        self.training_coordinator = InferenceTrainingCoordinator(
            data_provider=self.data_provider,
            duckdb_storage=self.data_provider.duckdb_storage
        )
        
        # Subscribe to events
        self.training_coordinator.subscribe_to_candle_completion(
            self, symbol='ETH/USDT', timeframe='1m'
        )
        self.training_coordinator.subscribe_to_pivot_events(
            self, symbol='ETH/USDT', timeframe='1m',
            pivot_types=['L2L', 'L2H', 'L3L', 'L3H']
        )
    
    def on_candle_completion(self, event: CandleCompletionEvent, inference_ref: Optional[InferenceFrameReference]):
        """Called when candle completes"""
        if not inference_ref:
            return  # No matching inference frame
        
        # Retrieve inference data from DuckDB
        model_inputs = self.training_coordinator.get_inference_data(inference_ref)
        if not model_inputs:
            return
        
        # Create training batch with actual candle
        batch = self._create_training_batch(model_inputs, event.ohlcv, inference_ref)
        
        # Train model (backprop for Transformer, other methods for other models)
        self._train_on_batch(batch, inference_ref)
    
    def on_pivot_event(self, event: PivotEvent, inference_refs: List[InferenceFrameReference]):
        """Called when pivot detected"""
        for inference_ref in inference_refs:
            # Retrieve inference data
            model_inputs = self.training_coordinator.get_inference_data(inference_ref)
            if not model_inputs:
                continue
            
            # Create training batch with pivot result
            batch = self._create_pivot_training_batch(model_inputs, event, inference_ref)
            
            # Train model
            self._train_on_batch(batch, inference_ref)

Step 3: Update Inference Loop

In _realtime_inference_loop(), register inference frames:

# After making prediction
prediction = self._make_realtime_prediction(...)

# Create inference frame reference
inference_ref = InferenceFrameReference(
    inference_id=str(uuid.uuid4()),
    symbol=symbol,
    timeframe=timeframe,
    prediction_timestamp=datetime.now(timezone.utc),
    target_timestamp=next_candle_time,  # For candles
    data_range_start=start_time,  # 600 candles before
    data_range_end=current_time,
    norm_params=norm_params,
    predicted_action=prediction['action'],
    predicted_candle=prediction['predicted_candle'],
    confidence=prediction['confidence']
)

# Register with coordinator (no copying!)
self.training_coordinator.register_inference_frame(inference_ref)

Benefits

Memory Efficient: No copying 600 candles every second
Flexible: Supports time-based (candles) and event-based (pivots) training
Robust: Event-driven architecture with proper error handling
Simple: Clear separation of concerns
Scalable: DuckDB handles efficient queries
Extensible: Easy to add new training methods or event types

DuckDB Schema Extensions

Ensure DuckDB stores:

OHLCV data (already exists)
MA indicators (add to ohlcv_data or separate table)
Pivot points (add pivot_data table)

-- Add technical indicators to ohlcv_data
ALTER TABLE ohlcv_data ADD COLUMN sma_10 DOUBLE;
ALTER TABLE ohlcv_data ADD COLUMN sma_20 DOUBLE;
ALTER TABLE ohlcv_data ADD COLUMN ema_12 DOUBLE;
-- ... etc

-- Create pivot points table
CREATE TABLE IF NOT EXISTS pivot_points (
    id INTEGER PRIMARY KEY,
    symbol VARCHAR NOT NULL,
    timeframe VARCHAR NOT NULL,
    timestamp BIGINT NOT NULL,
    price DOUBLE NOT NULL,
    pivot_type VARCHAR NOT NULL,  -- 'L2L', 'L2H', etc.
    level INTEGER NOT NULL,
    strength DOUBLE NOT NULL,
    UNIQUE(symbol, timeframe, timestamp, pivot_type)
);

Next Steps

Implement DataProvider subscription methods
Update RealTrainingAdapter to use InferenceTrainingCoordinator
Extend DuckDB schema for indicators and pivots
Test with live inference
Add support for other model types (not just Transformer)

7.1 KiB Raw Blame History