update requirements

This commit is contained in:
Dobromir Popov
2025-10-09 15:22:49 +03:00
parent a86e07f556
commit 6cf4d902df
3 changed files with 450 additions and 57 deletions

View File

@@ -37,54 +37,326 @@ graph TD
## Components and Interfaces
### 1. Data Provider
### 1. Data Provider Backbone - Multi-Layered Architecture
The Data Provider is the foundation of the system, responsible for collecting, processing, and distributing market data to all other components.
The Data Provider backbone is the foundation of the system, implemented as a multi-layered architecture with clear separation of concerns:
#### Key Classes and Interfaces
#### Architecture Layers
- **DataProvider**: Central class that manages data collection, processing, and distribution.
- **MarketTick**: Data structure for standardized market tick data.
- **DataSubscriber**: Interface for components that subscribe to market data.
- **PivotBounds**: Data structure for pivot-based normalization bounds.
```
┌─────────────────────────────────────────────────────────────┐
│ COBY System (Standalone) │
│ Multi-Exchange Aggregation │ TimescaleDB │ Redis Cache │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Core DataProvider (core/data_provider.py) │
│ Automatic Maintenance │ Williams Pivots │ COB Integration │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ StandardizedDataProvider (core/standardized_data_provider.py) │
│ BaseDataInput │ ModelOutputManager │ Unified Interface │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Models (CNN, RL, etc.) │
└─────────────────────────────────────────────────────────────┘
```
#### Layer 1: COBY System (Multi-Exchange Aggregation)
**Purpose**: Standalone system for comprehensive multi-exchange data collection and storage
**Key Components**:
- **Exchange Connectors**: Binance, Coinbase, Kraken, Huobi, Bitfinex, KuCoin
- **TimescaleDB Storage**: Optimized time-series data persistence
- **Redis Caching**: High-performance data caching layer
- **REST API**: HTTP endpoints for data access
- **WebSocket Server**: Real-time data distribution
- **Monitoring**: Performance metrics, memory monitoring, health checks
**Data Models**:
- `OrderBookSnapshot`: Standardized order book data
- `TradeEvent`: Individual trade events
- `PriceBuckets`: Aggregated price bucket data
- `HeatmapData`: Visualization-ready heatmap data
- `ConnectionStatus`: Exchange connection monitoring
**Current Status**: ✅ Fully implemented and operational
#### Layer 2: Core DataProvider (Real-Time Trading Operations)
**Purpose**: High-performance real-time data provider for trading operations
**Key Classes**:
- **DataProvider**: Central class managing data collection, processing, and distribution
- **EnhancedCOBWebSocket**: Real-time Binance WebSocket integration
- **WilliamsMarketStructure**: Recursive pivot point calculation
- **RealTimeTickAggregator**: Tick-to-OHLCV aggregation
- **COBIntegration**: COB data collection and aggregation
**Key Features**:
1. **Automatic Data Maintenance**:
- Background worker updating data every half-candle period
- 1500 candles cached per symbol/timeframe
- Automatic fallback between Binance and MEXC
- Rate limiting and error handling
2. **Williams Market Structure Pivot Points**:
- Recursive pivot detection with 5 levels
- Monthly 1s data analysis for comprehensive context
- Pivot-based normalization bounds (PivotBounds)
- Support/resistance level tracking
3. **COB Integration**:
- EnhancedCOBWebSocket with multiple Binance streams:
- `depth@100ms`: High-frequency order book updates
- `ticker`: 24hr statistics and volume
- `aggTrade`: Large order detection
- 1s COB aggregation with price buckets ($1 ETH, $10 BTC)
- Multi-timeframe imbalance MA (1s, 5s, 15s, 60s)
- 30-minute raw tick buffer (180,000 ticks)
4. **Centralized Data Distribution**:
- Subscriber management with callbacks
- Thread-safe data access with locks
- Performance tracking per subscriber
- Tick buffers (1000 ticks per symbol)
**Data Structures**:
- `MarketTick`: Standardized tick data
- `PivotBounds`: Pivot-based normalization bounds
- `DataSubscriber`: Subscriber information
- `SimplePivotLevel`: Fallback pivot structure
**Current Status**: ✅ Fully implemented with ongoing enhancements
#### Layer 3: StandardizedDataProvider (Unified Model Interface)
**Purpose**: Provide standardized, validated data in unified format for all models
**Key Classes**:
- **StandardizedDataProvider**: Extends DataProvider with unified interface
- **ModelOutputManager**: Centralized storage for cross-model feeding
- **BaseDataInput**: Standardized input format for all models
- **COBData**: Comprehensive COB data structure
- **ModelOutput**: Extensible output format
**Key Features**:
1. **Unified Data Format (BaseDataInput)**:
```python
@dataclass
class BaseDataInput:
symbol: str
timestamp: datetime
ohlcv_1s: List[OHLCVBar] # 300 frames
ohlcv_1m: List[OHLCVBar] # 300 frames
ohlcv_1h: List[OHLCVBar] # 300 frames
ohlcv_1d: List[OHLCVBar] # 300 frames
btc_ohlcv_1s: List[OHLCVBar] # 300 frames
cob_data: Optional[COBData]
technical_indicators: Dict[str, float]
pivot_points: List[PivotPoint]
last_predictions: Dict[str, ModelOutput]
market_microstructure: Dict[str, Any]
```
2. **COB Data Structure**:
- ±20 price buckets around current price
- Bid/ask volumes and imbalances per bucket
- MA (1s, 5s, 15s, 60s) of imbalances for ±5 buckets
- Volume-weighted prices within buckets
- Order flow metrics
3. **Model Output Management**:
- Extensible ModelOutput format supporting all model types
- Cross-model feeding with hidden states
- Historical output storage (1000 entries)
- Efficient query by model_name, symbol, timestamp
4. **Data Validation**:
- Minimum 100 frames per timeframe
- Non-null COB data validation
- Data completeness scoring
- Validation before model inference
**Current Status**: ✅ Implemented with enhancements needed for heatmap integration
#### Implementation Details
The DataProvider class will:
- Collect data from multiple sources (Binance, MEXC)
- Support multiple timeframes (1s, 1m, 1h, 1d)
- Support multiple symbols (ETH, BTC)
- Calculate technical indicators
- Identify pivot points
- Normalize data
- Distribute data to subscribers
- Calculate any other algoritmic manipulations/calculations on the data
- Cache up to 3x the model inputs (300 ticks OHLCV, etc) data so we can do a proper backtesting in up to 2x time in the future
**Existing Strengths**:
- ✅ Robust automatic data maintenance with background workers
- ✅ Williams Market Structure with 5-level pivot analysis
- ✅ Real-time COB streaming with multiple Binance streams
- ✅ Thread-safe data access and subscriber management
- ✅ Comprehensive error handling and fallback mechanisms
- ✅ Pivot-based normalization for improved model training
- ✅ Centralized model output storage for cross-feeding
Based on the existing implementation in `core/data_provider.py`, we'll enhance it to:
- Improve pivot point calculation using reccursive Williams Market Structure
- Optimize data caching for better performance
- Enhance real-time data streaming
- Implement better error handling and fallback mechanisms
**Areas for Enhancement**:
- ❌ Unified integration between COBY and core DataProvider
- ❌ COB heatmap matrix generation for model inputs
- ❌ Configurable price ranges for COB imbalance calculation
- ❌ Comprehensive data quality scoring and monitoring
- ❌ Missing data interpolation strategies
- ❌ Enhanced validation with detailed error reporting
### BASE FOR ALL MODELS ###
- ***INPUTS***: COB+OHCLV data frame as described:
- OHCLV: 300 frames of (1s, 1m, 1h, 1d) ETH + 300s of 1s BTC
- COB: for each 1s OHCLV we have +- 20 buckets of COB ammounts in USD
- 1,5,15 and 60s MA of the COB imbalance counting +- 5 COB buckets
- ***OUTPUTS***:
- suggested trade action (BUY/SELL/HOLD). Paired with confidence
- immediate price movement drection vector (-1: vertical down, 1: vertical up, 0: horizontal) - linear; with it's own confidence
### Standardized Model Input/Output Format
#### Base Input Format (BaseDataInput)
All models receive data through `StandardizedDataProvider.get_base_data_input()` which returns:
```python
@dataclass
class BaseDataInput:
"""Unified base data input for all models"""
symbol: str # Primary symbol (e.g., 'ETH/USDT')
timestamp: datetime # Current timestamp
# Standardized input for all models:
{
'primary_symbol': 'ETH/USDT',
'reference_symbol': 'BTC/USDT',
'eth_data': {'ETH_1s': df, 'ETH_1m': df, 'ETH_1h': df, 'ETH_1d': df},
'btc_data': {'BTC_1s': df},
'current_prices': {'ETH': price, 'BTC': price},
'data_completeness': {...}
}
# OHLCV Data (300 frames each)
ohlcv_1s: List[OHLCVBar] # 300 x 1-second bars
ohlcv_1m: List[OHLCVBar] # 300 x 1-minute bars
ohlcv_1h: List[OHLCVBar] # 300 x 1-hour bars
ohlcv_1d: List[OHLCVBar] # 300 x 1-day bars
btc_ohlcv_1s: List[OHLCVBar] # 300 x 1-second BTC bars
# COB Data
cob_data: Optional[COBData] # COB with ±20 buckets + MA
# Technical Analysis
technical_indicators: Dict[str, float] # RSI, MACD, Bollinger, etc.
pivot_points: List[PivotPoint] # Williams Market Structure pivots
# Cross-Model Feeding
last_predictions: Dict[str, ModelOutput] # Outputs from all models
# Market Microstructure
market_microstructure: Dict[str, Any] # Order flow, liquidity, etc.
# Optional: COB Heatmap (for visualization and advanced models)
cob_heatmap_times: Optional[List[datetime]] # Heatmap time axis
cob_heatmap_prices: Optional[List[float]] # Heatmap price axis
cob_heatmap_values: Optional[np.ndarray] # Heatmap matrix (time x price)
```
**OHLCVBar Structure**:
```python
@dataclass
class OHLCVBar:
symbol: str
timestamp: datetime
open: float
high: float
low: float
close: float
volume: float
timeframe: str
indicators: Dict[str, float] # Technical indicators for this bar
```
**COBData Structure**:
```python
@dataclass
class COBData:
symbol: str
timestamp: datetime
current_price: float
bucket_size: float # $1 for ETH, $10 for BTC
# Price Buckets (±20 around current price)
price_buckets: Dict[float, Dict[str, float]] # {price: {bid_vol, ask_vol, ...}}
bid_ask_imbalance: Dict[float, float] # {price: imbalance_ratio}
volume_weighted_prices: Dict[float, float] # {price: VWAP}
# Moving Averages of Imbalance (±5 buckets)
ma_1s_imbalance: Dict[float, float] # 1-second MA
ma_5s_imbalance: Dict[float, float] # 5-second MA
ma_15s_imbalance: Dict[float, float] # 15-second MA
ma_60s_imbalance: Dict[float, float] # 60-second MA
# Order Flow Metrics
order_flow_metrics: Dict[str, float] # Aggressive buy/sell ratios, etc.
```
#### Base Output Format (ModelOutput)
All models output predictions through standardized `ModelOutput`:
```python
@dataclass
class ModelOutput:
"""Extensible model output format supporting all model types"""
model_type: str # 'cnn', 'rl', 'lstm', 'transformer'
model_name: str # Specific model identifier
symbol: str
timestamp: datetime
confidence: float # Overall confidence (0.0 to 1.0)
# Model-Specific Predictions
predictions: Dict[str, Any] # Flexible prediction format
# Cross-Model Feeding
hidden_states: Optional[Dict[str, Any]] # For feeding to other models
# Extensibility
metadata: Dict[str, Any] # Additional model-specific info
```
**Standard Prediction Fields**:
- `action`: 'BUY', 'SELL', or 'HOLD'
- `action_confidence`: Confidence in the action (0.0 to 1.0)
- `direction_vector`: Price movement direction (-1.0 to 1.0)
- `direction_confidence`: Confidence in direction (0.0 to 1.0)
- `probabilities`: Dict of action probabilities {'BUY': 0.3, 'SELL': 0.2, 'HOLD': 0.5}
**Example CNN Output**:
```python
ModelOutput(
model_type='cnn',
model_name='williams_cnn_v2',
symbol='ETH/USDT',
timestamp=datetime.now(),
confidence=0.85,
predictions={
'action': 'BUY',
'action_confidence': 0.85,
'pivot_points': [...], # Predicted pivot points
'direction_vector': 0.7, # Upward movement
'direction_confidence': 0.82
},
hidden_states={
'conv_features': tensor(...),
'lstm_hidden': tensor(...)
},
metadata={'model_version': '2.1', 'training_date': '2025-01-08'}
)
```
**Example RL Output**:
```python
ModelOutput(
model_type='rl',
model_name='dqn_agent_v1',
symbol='ETH/USDT',
timestamp=datetime.now(),
confidence=0.78,
predictions={
'action': 'HOLD',
'action_confidence': 0.78,
'q_values': {'BUY': 0.45, 'SELL': 0.32, 'HOLD': 0.78},
'expected_reward': 0.023,
'direction_vector': 0.1,
'direction_confidence': 0.65
},
hidden_states={
'state_value': 0.56,
'advantage_values': [0.12, -0.08, 0.22]
},
metadata={'epsilon': 0.1, 'replay_buffer_size': 10000}
)
```
### 2. CNN Model