update requirements
This commit is contained in:
@@ -37,54 +37,326 @@ graph TD
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### 1. Data Provider
|
||||
### 1. Data Provider Backbone - Multi-Layered Architecture
|
||||
|
||||
The Data Provider is the foundation of the system, responsible for collecting, processing, and distributing market data to all other components.
|
||||
The Data Provider backbone is the foundation of the system, implemented as a multi-layered architecture with clear separation of concerns:
|
||||
|
||||
#### Key Classes and Interfaces
|
||||
#### Architecture Layers
|
||||
|
||||
- **DataProvider**: Central class that manages data collection, processing, and distribution.
|
||||
- **MarketTick**: Data structure for standardized market tick data.
|
||||
- **DataSubscriber**: Interface for components that subscribe to market data.
|
||||
- **PivotBounds**: Data structure for pivot-based normalization bounds.
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ COBY System (Standalone) │
|
||||
│ Multi-Exchange Aggregation │ TimescaleDB │ Redis Cache │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Core DataProvider (core/data_provider.py) │
|
||||
│ Automatic Maintenance │ Williams Pivots │ COB Integration │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ StandardizedDataProvider (core/standardized_data_provider.py) │
|
||||
│ BaseDataInput │ ModelOutputManager │ Unified Interface │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Models (CNN, RL, etc.) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### Layer 1: COBY System (Multi-Exchange Aggregation)
|
||||
|
||||
**Purpose**: Standalone system for comprehensive multi-exchange data collection and storage
|
||||
|
||||
**Key Components**:
|
||||
- **Exchange Connectors**: Binance, Coinbase, Kraken, Huobi, Bitfinex, KuCoin
|
||||
- **TimescaleDB Storage**: Optimized time-series data persistence
|
||||
- **Redis Caching**: High-performance data caching layer
|
||||
- **REST API**: HTTP endpoints for data access
|
||||
- **WebSocket Server**: Real-time data distribution
|
||||
- **Monitoring**: Performance metrics, memory monitoring, health checks
|
||||
|
||||
**Data Models**:
|
||||
- `OrderBookSnapshot`: Standardized order book data
|
||||
- `TradeEvent`: Individual trade events
|
||||
- `PriceBuckets`: Aggregated price bucket data
|
||||
- `HeatmapData`: Visualization-ready heatmap data
|
||||
- `ConnectionStatus`: Exchange connection monitoring
|
||||
|
||||
**Current Status**: ✅ Fully implemented and operational
|
||||
|
||||
#### Layer 2: Core DataProvider (Real-Time Trading Operations)
|
||||
|
||||
**Purpose**: High-performance real-time data provider for trading operations
|
||||
|
||||
**Key Classes**:
|
||||
- **DataProvider**: Central class managing data collection, processing, and distribution
|
||||
- **EnhancedCOBWebSocket**: Real-time Binance WebSocket integration
|
||||
- **WilliamsMarketStructure**: Recursive pivot point calculation
|
||||
- **RealTimeTickAggregator**: Tick-to-OHLCV aggregation
|
||||
- **COBIntegration**: COB data collection and aggregation
|
||||
|
||||
**Key Features**:
|
||||
1. **Automatic Data Maintenance**:
|
||||
- Background worker updating data every half-candle period
|
||||
- 1500 candles cached per symbol/timeframe
|
||||
- Automatic fallback between Binance and MEXC
|
||||
- Rate limiting and error handling
|
||||
|
||||
2. **Williams Market Structure Pivot Points**:
|
||||
- Recursive pivot detection with 5 levels
|
||||
- Monthly 1s data analysis for comprehensive context
|
||||
- Pivot-based normalization bounds (PivotBounds)
|
||||
- Support/resistance level tracking
|
||||
|
||||
3. **COB Integration**:
|
||||
- EnhancedCOBWebSocket with multiple Binance streams:
|
||||
- `depth@100ms`: High-frequency order book updates
|
||||
- `ticker`: 24hr statistics and volume
|
||||
- `aggTrade`: Large order detection
|
||||
- 1s COB aggregation with price buckets ($1 ETH, $10 BTC)
|
||||
- Multi-timeframe imbalance MA (1s, 5s, 15s, 60s)
|
||||
- 30-minute raw tick buffer (180,000 ticks)
|
||||
|
||||
4. **Centralized Data Distribution**:
|
||||
- Subscriber management with callbacks
|
||||
- Thread-safe data access with locks
|
||||
- Performance tracking per subscriber
|
||||
- Tick buffers (1000 ticks per symbol)
|
||||
|
||||
**Data Structures**:
|
||||
- `MarketTick`: Standardized tick data
|
||||
- `PivotBounds`: Pivot-based normalization bounds
|
||||
- `DataSubscriber`: Subscriber information
|
||||
- `SimplePivotLevel`: Fallback pivot structure
|
||||
|
||||
**Current Status**: ✅ Fully implemented with ongoing enhancements
|
||||
|
||||
#### Layer 3: StandardizedDataProvider (Unified Model Interface)
|
||||
|
||||
**Purpose**: Provide standardized, validated data in unified format for all models
|
||||
|
||||
**Key Classes**:
|
||||
- **StandardizedDataProvider**: Extends DataProvider with unified interface
|
||||
- **ModelOutputManager**: Centralized storage for cross-model feeding
|
||||
- **BaseDataInput**: Standardized input format for all models
|
||||
- **COBData**: Comprehensive COB data structure
|
||||
- **ModelOutput**: Extensible output format
|
||||
|
||||
**Key Features**:
|
||||
1. **Unified Data Format (BaseDataInput)**:
|
||||
```python
|
||||
@dataclass
|
||||
class BaseDataInput:
|
||||
symbol: str
|
||||
timestamp: datetime
|
||||
ohlcv_1s: List[OHLCVBar] # 300 frames
|
||||
ohlcv_1m: List[OHLCVBar] # 300 frames
|
||||
ohlcv_1h: List[OHLCVBar] # 300 frames
|
||||
ohlcv_1d: List[OHLCVBar] # 300 frames
|
||||
btc_ohlcv_1s: List[OHLCVBar] # 300 frames
|
||||
cob_data: Optional[COBData]
|
||||
technical_indicators: Dict[str, float]
|
||||
pivot_points: List[PivotPoint]
|
||||
last_predictions: Dict[str, ModelOutput]
|
||||
market_microstructure: Dict[str, Any]
|
||||
```
|
||||
|
||||
2. **COB Data Structure**:
|
||||
- ±20 price buckets around current price
|
||||
- Bid/ask volumes and imbalances per bucket
|
||||
- MA (1s, 5s, 15s, 60s) of imbalances for ±5 buckets
|
||||
- Volume-weighted prices within buckets
|
||||
- Order flow metrics
|
||||
|
||||
3. **Model Output Management**:
|
||||
- Extensible ModelOutput format supporting all model types
|
||||
- Cross-model feeding with hidden states
|
||||
- Historical output storage (1000 entries)
|
||||
- Efficient query by model_name, symbol, timestamp
|
||||
|
||||
4. **Data Validation**:
|
||||
- Minimum 100 frames per timeframe
|
||||
- Non-null COB data validation
|
||||
- Data completeness scoring
|
||||
- Validation before model inference
|
||||
|
||||
**Current Status**: ✅ Implemented with enhancements needed for heatmap integration
|
||||
|
||||
#### Implementation Details
|
||||
|
||||
The DataProvider class will:
|
||||
- Collect data from multiple sources (Binance, MEXC)
|
||||
- Support multiple timeframes (1s, 1m, 1h, 1d)
|
||||
- Support multiple symbols (ETH, BTC)
|
||||
- Calculate technical indicators
|
||||
- Identify pivot points
|
||||
- Normalize data
|
||||
- Distribute data to subscribers
|
||||
- Calculate any other algoritmic manipulations/calculations on the data
|
||||
- Cache up to 3x the model inputs (300 ticks OHLCV, etc) data so we can do a proper backtesting in up to 2x time in the future
|
||||
**Existing Strengths**:
|
||||
- ✅ Robust automatic data maintenance with background workers
|
||||
- ✅ Williams Market Structure with 5-level pivot analysis
|
||||
- ✅ Real-time COB streaming with multiple Binance streams
|
||||
- ✅ Thread-safe data access and subscriber management
|
||||
- ✅ Comprehensive error handling and fallback mechanisms
|
||||
- ✅ Pivot-based normalization for improved model training
|
||||
- ✅ Centralized model output storage for cross-feeding
|
||||
|
||||
Based on the existing implementation in `core/data_provider.py`, we'll enhance it to:
|
||||
- Improve pivot point calculation using reccursive Williams Market Structure
|
||||
- Optimize data caching for better performance
|
||||
- Enhance real-time data streaming
|
||||
- Implement better error handling and fallback mechanisms
|
||||
**Areas for Enhancement**:
|
||||
- ❌ Unified integration between COBY and core DataProvider
|
||||
- ❌ COB heatmap matrix generation for model inputs
|
||||
- ❌ Configurable price ranges for COB imbalance calculation
|
||||
- ❌ Comprehensive data quality scoring and monitoring
|
||||
- ❌ Missing data interpolation strategies
|
||||
- ❌ Enhanced validation with detailed error reporting
|
||||
|
||||
### BASE FOR ALL MODELS ###
|
||||
- ***INPUTS***: COB+OHCLV data frame as described:
|
||||
- OHCLV: 300 frames of (1s, 1m, 1h, 1d) ETH + 300s of 1s BTC
|
||||
- COB: for each 1s OHCLV we have +- 20 buckets of COB ammounts in USD
|
||||
- 1,5,15 and 60s MA of the COB imbalance counting +- 5 COB buckets
|
||||
- ***OUTPUTS***:
|
||||
- suggested trade action (BUY/SELL/HOLD). Paired with confidence
|
||||
- immediate price movement drection vector (-1: vertical down, 1: vertical up, 0: horizontal) - linear; with it's own confidence
|
||||
### Standardized Model Input/Output Format
|
||||
|
||||
#### Base Input Format (BaseDataInput)
|
||||
|
||||
All models receive data through `StandardizedDataProvider.get_base_data_input()` which returns:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class BaseDataInput:
|
||||
"""Unified base data input for all models"""
|
||||
symbol: str # Primary symbol (e.g., 'ETH/USDT')
|
||||
timestamp: datetime # Current timestamp
|
||||
|
||||
# Standardized input for all models:
|
||||
{
|
||||
'primary_symbol': 'ETH/USDT',
|
||||
'reference_symbol': 'BTC/USDT',
|
||||
'eth_data': {'ETH_1s': df, 'ETH_1m': df, 'ETH_1h': df, 'ETH_1d': df},
|
||||
'btc_data': {'BTC_1s': df},
|
||||
'current_prices': {'ETH': price, 'BTC': price},
|
||||
'data_completeness': {...}
|
||||
}
|
||||
# OHLCV Data (300 frames each)
|
||||
ohlcv_1s: List[OHLCVBar] # 300 x 1-second bars
|
||||
ohlcv_1m: List[OHLCVBar] # 300 x 1-minute bars
|
||||
ohlcv_1h: List[OHLCVBar] # 300 x 1-hour bars
|
||||
ohlcv_1d: List[OHLCVBar] # 300 x 1-day bars
|
||||
btc_ohlcv_1s: List[OHLCVBar] # 300 x 1-second BTC bars
|
||||
|
||||
# COB Data
|
||||
cob_data: Optional[COBData] # COB with ±20 buckets + MA
|
||||
|
||||
# Technical Analysis
|
||||
technical_indicators: Dict[str, float] # RSI, MACD, Bollinger, etc.
|
||||
pivot_points: List[PivotPoint] # Williams Market Structure pivots
|
||||
|
||||
# Cross-Model Feeding
|
||||
last_predictions: Dict[str, ModelOutput] # Outputs from all models
|
||||
|
||||
# Market Microstructure
|
||||
market_microstructure: Dict[str, Any] # Order flow, liquidity, etc.
|
||||
|
||||
# Optional: COB Heatmap (for visualization and advanced models)
|
||||
cob_heatmap_times: Optional[List[datetime]] # Heatmap time axis
|
||||
cob_heatmap_prices: Optional[List[float]] # Heatmap price axis
|
||||
cob_heatmap_values: Optional[np.ndarray] # Heatmap matrix (time x price)
|
||||
```
|
||||
|
||||
**OHLCVBar Structure**:
|
||||
```python
|
||||
@dataclass
|
||||
class OHLCVBar:
|
||||
symbol: str
|
||||
timestamp: datetime
|
||||
open: float
|
||||
high: float
|
||||
low: float
|
||||
close: float
|
||||
volume: float
|
||||
timeframe: str
|
||||
indicators: Dict[str, float] # Technical indicators for this bar
|
||||
```
|
||||
|
||||
**COBData Structure**:
|
||||
```python
|
||||
@dataclass
|
||||
class COBData:
|
||||
symbol: str
|
||||
timestamp: datetime
|
||||
current_price: float
|
||||
bucket_size: float # $1 for ETH, $10 for BTC
|
||||
|
||||
# Price Buckets (±20 around current price)
|
||||
price_buckets: Dict[float, Dict[str, float]] # {price: {bid_vol, ask_vol, ...}}
|
||||
bid_ask_imbalance: Dict[float, float] # {price: imbalance_ratio}
|
||||
volume_weighted_prices: Dict[float, float] # {price: VWAP}
|
||||
|
||||
# Moving Averages of Imbalance (±5 buckets)
|
||||
ma_1s_imbalance: Dict[float, float] # 1-second MA
|
||||
ma_5s_imbalance: Dict[float, float] # 5-second MA
|
||||
ma_15s_imbalance: Dict[float, float] # 15-second MA
|
||||
ma_60s_imbalance: Dict[float, float] # 60-second MA
|
||||
|
||||
# Order Flow Metrics
|
||||
order_flow_metrics: Dict[str, float] # Aggressive buy/sell ratios, etc.
|
||||
```
|
||||
|
||||
#### Base Output Format (ModelOutput)
|
||||
|
||||
All models output predictions through standardized `ModelOutput`:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ModelOutput:
|
||||
"""Extensible model output format supporting all model types"""
|
||||
model_type: str # 'cnn', 'rl', 'lstm', 'transformer'
|
||||
model_name: str # Specific model identifier
|
||||
symbol: str
|
||||
timestamp: datetime
|
||||
confidence: float # Overall confidence (0.0 to 1.0)
|
||||
|
||||
# Model-Specific Predictions
|
||||
predictions: Dict[str, Any] # Flexible prediction format
|
||||
|
||||
# Cross-Model Feeding
|
||||
hidden_states: Optional[Dict[str, Any]] # For feeding to other models
|
||||
|
||||
# Extensibility
|
||||
metadata: Dict[str, Any] # Additional model-specific info
|
||||
```
|
||||
|
||||
**Standard Prediction Fields**:
|
||||
- `action`: 'BUY', 'SELL', or 'HOLD'
|
||||
- `action_confidence`: Confidence in the action (0.0 to 1.0)
|
||||
- `direction_vector`: Price movement direction (-1.0 to 1.0)
|
||||
- `direction_confidence`: Confidence in direction (0.0 to 1.0)
|
||||
- `probabilities`: Dict of action probabilities {'BUY': 0.3, 'SELL': 0.2, 'HOLD': 0.5}
|
||||
|
||||
**Example CNN Output**:
|
||||
```python
|
||||
ModelOutput(
|
||||
model_type='cnn',
|
||||
model_name='williams_cnn_v2',
|
||||
symbol='ETH/USDT',
|
||||
timestamp=datetime.now(),
|
||||
confidence=0.85,
|
||||
predictions={
|
||||
'action': 'BUY',
|
||||
'action_confidence': 0.85,
|
||||
'pivot_points': [...], # Predicted pivot points
|
||||
'direction_vector': 0.7, # Upward movement
|
||||
'direction_confidence': 0.82
|
||||
},
|
||||
hidden_states={
|
||||
'conv_features': tensor(...),
|
||||
'lstm_hidden': tensor(...)
|
||||
},
|
||||
metadata={'model_version': '2.1', 'training_date': '2025-01-08'}
|
||||
)
|
||||
```
|
||||
|
||||
**Example RL Output**:
|
||||
```python
|
||||
ModelOutput(
|
||||
model_type='rl',
|
||||
model_name='dqn_agent_v1',
|
||||
symbol='ETH/USDT',
|
||||
timestamp=datetime.now(),
|
||||
confidence=0.78,
|
||||
predictions={
|
||||
'action': 'HOLD',
|
||||
'action_confidence': 0.78,
|
||||
'q_values': {'BUY': 0.45, 'SELL': 0.32, 'HOLD': 0.78},
|
||||
'expected_reward': 0.023,
|
||||
'direction_vector': 0.1,
|
||||
'direction_confidence': 0.65
|
||||
},
|
||||
hidden_states={
|
||||
'state_value': 0.56,
|
||||
'advantage_values': [0.12, -0.08, 0.22]
|
||||
},
|
||||
metadata={'epsilon': 0.1, 'replay_buffer_size': 10000}
|
||||
)
|
||||
```
|
||||
|
||||
### 2. CNN Model
|
||||
|
||||
|
||||
Reference in New Issue
Block a user