update requirements
This commit is contained in:
@@ -37,54 +37,326 @@ graph TD
|
|||||||
|
|
||||||
## Components and Interfaces
|
## Components and Interfaces
|
||||||
|
|
||||||
### 1. Data Provider
|
### 1. Data Provider Backbone - Multi-Layered Architecture
|
||||||
|
|
||||||
The Data Provider is the foundation of the system, responsible for collecting, processing, and distributing market data to all other components.
|
The Data Provider backbone is the foundation of the system, implemented as a multi-layered architecture with clear separation of concerns:
|
||||||
|
|
||||||
#### Key Classes and Interfaces
|
#### Architecture Layers
|
||||||
|
|
||||||
- **DataProvider**: Central class that manages data collection, processing, and distribution.
|
```
|
||||||
- **MarketTick**: Data structure for standardized market tick data.
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
- **DataSubscriber**: Interface for components that subscribe to market data.
|
│ COBY System (Standalone) │
|
||||||
- **PivotBounds**: Data structure for pivot-based normalization bounds.
|
│ Multi-Exchange Aggregation │ TimescaleDB │ Redis Cache │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Core DataProvider (core/data_provider.py) │
|
||||||
|
│ Automatic Maintenance │ Williams Pivots │ COB Integration │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ StandardizedDataProvider (core/standardized_data_provider.py) │
|
||||||
|
│ BaseDataInput │ ModelOutputManager │ Unified Interface │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Models (CNN, RL, etc.) │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Layer 1: COBY System (Multi-Exchange Aggregation)
|
||||||
|
|
||||||
|
**Purpose**: Standalone system for comprehensive multi-exchange data collection and storage
|
||||||
|
|
||||||
|
**Key Components**:
|
||||||
|
- **Exchange Connectors**: Binance, Coinbase, Kraken, Huobi, Bitfinex, KuCoin
|
||||||
|
- **TimescaleDB Storage**: Optimized time-series data persistence
|
||||||
|
- **Redis Caching**: High-performance data caching layer
|
||||||
|
- **REST API**: HTTP endpoints for data access
|
||||||
|
- **WebSocket Server**: Real-time data distribution
|
||||||
|
- **Monitoring**: Performance metrics, memory monitoring, health checks
|
||||||
|
|
||||||
|
**Data Models**:
|
||||||
|
- `OrderBookSnapshot`: Standardized order book data
|
||||||
|
- `TradeEvent`: Individual trade events
|
||||||
|
- `PriceBuckets`: Aggregated price bucket data
|
||||||
|
- `HeatmapData`: Visualization-ready heatmap data
|
||||||
|
- `ConnectionStatus`: Exchange connection monitoring
|
||||||
|
|
||||||
|
**Current Status**: ✅ Fully implemented and operational
|
||||||
|
|
||||||
|
#### Layer 2: Core DataProvider (Real-Time Trading Operations)
|
||||||
|
|
||||||
|
**Purpose**: High-performance real-time data provider for trading operations
|
||||||
|
|
||||||
|
**Key Classes**:
|
||||||
|
- **DataProvider**: Central class managing data collection, processing, and distribution
|
||||||
|
- **EnhancedCOBWebSocket**: Real-time Binance WebSocket integration
|
||||||
|
- **WilliamsMarketStructure**: Recursive pivot point calculation
|
||||||
|
- **RealTimeTickAggregator**: Tick-to-OHLCV aggregation
|
||||||
|
- **COBIntegration**: COB data collection and aggregation
|
||||||
|
|
||||||
|
**Key Features**:
|
||||||
|
1. **Automatic Data Maintenance**:
|
||||||
|
- Background worker updating data every half-candle period
|
||||||
|
- 1500 candles cached per symbol/timeframe
|
||||||
|
- Automatic fallback between Binance and MEXC
|
||||||
|
- Rate limiting and error handling
|
||||||
|
|
||||||
|
2. **Williams Market Structure Pivot Points**:
|
||||||
|
- Recursive pivot detection with 5 levels
|
||||||
|
- Monthly 1s data analysis for comprehensive context
|
||||||
|
- Pivot-based normalization bounds (PivotBounds)
|
||||||
|
- Support/resistance level tracking
|
||||||
|
|
||||||
|
3. **COB Integration**:
|
||||||
|
- EnhancedCOBWebSocket with multiple Binance streams:
|
||||||
|
- `depth@100ms`: High-frequency order book updates
|
||||||
|
- `ticker`: 24hr statistics and volume
|
||||||
|
- `aggTrade`: Large order detection
|
||||||
|
- 1s COB aggregation with price buckets ($1 ETH, $10 BTC)
|
||||||
|
- Multi-timeframe imbalance MA (1s, 5s, 15s, 60s)
|
||||||
|
- 30-minute raw tick buffer (180,000 ticks)
|
||||||
|
|
||||||
|
4. **Centralized Data Distribution**:
|
||||||
|
- Subscriber management with callbacks
|
||||||
|
- Thread-safe data access with locks
|
||||||
|
- Performance tracking per subscriber
|
||||||
|
- Tick buffers (1000 ticks per symbol)
|
||||||
|
|
||||||
|
**Data Structures**:
|
||||||
|
- `MarketTick`: Standardized tick data
|
||||||
|
- `PivotBounds`: Pivot-based normalization bounds
|
||||||
|
- `DataSubscriber`: Subscriber information
|
||||||
|
- `SimplePivotLevel`: Fallback pivot structure
|
||||||
|
|
||||||
|
**Current Status**: ✅ Fully implemented with ongoing enhancements
|
||||||
|
|
||||||
|
#### Layer 3: StandardizedDataProvider (Unified Model Interface)
|
||||||
|
|
||||||
|
**Purpose**: Provide standardized, validated data in unified format for all models
|
||||||
|
|
||||||
|
**Key Classes**:
|
||||||
|
- **StandardizedDataProvider**: Extends DataProvider with unified interface
|
||||||
|
- **ModelOutputManager**: Centralized storage for cross-model feeding
|
||||||
|
- **BaseDataInput**: Standardized input format for all models
|
||||||
|
- **COBData**: Comprehensive COB data structure
|
||||||
|
- **ModelOutput**: Extensible output format
|
||||||
|
|
||||||
|
**Key Features**:
|
||||||
|
1. **Unified Data Format (BaseDataInput)**:
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class BaseDataInput:
|
||||||
|
symbol: str
|
||||||
|
timestamp: datetime
|
||||||
|
ohlcv_1s: List[OHLCVBar] # 300 frames
|
||||||
|
ohlcv_1m: List[OHLCVBar] # 300 frames
|
||||||
|
ohlcv_1h: List[OHLCVBar] # 300 frames
|
||||||
|
ohlcv_1d: List[OHLCVBar] # 300 frames
|
||||||
|
btc_ohlcv_1s: List[OHLCVBar] # 300 frames
|
||||||
|
cob_data: Optional[COBData]
|
||||||
|
technical_indicators: Dict[str, float]
|
||||||
|
pivot_points: List[PivotPoint]
|
||||||
|
last_predictions: Dict[str, ModelOutput]
|
||||||
|
market_microstructure: Dict[str, Any]
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **COB Data Structure**:
|
||||||
|
- ±20 price buckets around current price
|
||||||
|
- Bid/ask volumes and imbalances per bucket
|
||||||
|
- MA (1s, 5s, 15s, 60s) of imbalances for ±5 buckets
|
||||||
|
- Volume-weighted prices within buckets
|
||||||
|
- Order flow metrics
|
||||||
|
|
||||||
|
3. **Model Output Management**:
|
||||||
|
- Extensible ModelOutput format supporting all model types
|
||||||
|
- Cross-model feeding with hidden states
|
||||||
|
- Historical output storage (1000 entries)
|
||||||
|
- Efficient query by model_name, symbol, timestamp
|
||||||
|
|
||||||
|
4. **Data Validation**:
|
||||||
|
- Minimum 100 frames per timeframe
|
||||||
|
- Non-null COB data validation
|
||||||
|
- Data completeness scoring
|
||||||
|
- Validation before model inference
|
||||||
|
|
||||||
|
**Current Status**: ✅ Implemented with enhancements needed for heatmap integration
|
||||||
|
|
||||||
#### Implementation Details
|
#### Implementation Details
|
||||||
|
|
||||||
The DataProvider class will:
|
**Existing Strengths**:
|
||||||
- Collect data from multiple sources (Binance, MEXC)
|
- ✅ Robust automatic data maintenance with background workers
|
||||||
- Support multiple timeframes (1s, 1m, 1h, 1d)
|
- ✅ Williams Market Structure with 5-level pivot analysis
|
||||||
- Support multiple symbols (ETH, BTC)
|
- ✅ Real-time COB streaming with multiple Binance streams
|
||||||
- Calculate technical indicators
|
- ✅ Thread-safe data access and subscriber management
|
||||||
- Identify pivot points
|
- ✅ Comprehensive error handling and fallback mechanisms
|
||||||
- Normalize data
|
- ✅ Pivot-based normalization for improved model training
|
||||||
- Distribute data to subscribers
|
- ✅ Centralized model output storage for cross-feeding
|
||||||
- Calculate any other algoritmic manipulations/calculations on the data
|
|
||||||
- Cache up to 3x the model inputs (300 ticks OHLCV, etc) data so we can do a proper backtesting in up to 2x time in the future
|
|
||||||
|
|
||||||
Based on the existing implementation in `core/data_provider.py`, we'll enhance it to:
|
**Areas for Enhancement**:
|
||||||
- Improve pivot point calculation using reccursive Williams Market Structure
|
- ❌ Unified integration between COBY and core DataProvider
|
||||||
- Optimize data caching for better performance
|
- ❌ COB heatmap matrix generation for model inputs
|
||||||
- Enhance real-time data streaming
|
- ❌ Configurable price ranges for COB imbalance calculation
|
||||||
- Implement better error handling and fallback mechanisms
|
- ❌ Comprehensive data quality scoring and monitoring
|
||||||
|
- ❌ Missing data interpolation strategies
|
||||||
|
- ❌ Enhanced validation with detailed error reporting
|
||||||
|
|
||||||
### BASE FOR ALL MODELS ###
|
### Standardized Model Input/Output Format
|
||||||
- ***INPUTS***: COB+OHCLV data frame as described:
|
|
||||||
- OHCLV: 300 frames of (1s, 1m, 1h, 1d) ETH + 300s of 1s BTC
|
#### Base Input Format (BaseDataInput)
|
||||||
- COB: for each 1s OHCLV we have +- 20 buckets of COB ammounts in USD
|
|
||||||
- 1,5,15 and 60s MA of the COB imbalance counting +- 5 COB buckets
|
All models receive data through `StandardizedDataProvider.get_base_data_input()` which returns:
|
||||||
- ***OUTPUTS***:
|
|
||||||
- suggested trade action (BUY/SELL/HOLD). Paired with confidence
|
```python
|
||||||
- immediate price movement drection vector (-1: vertical down, 1: vertical up, 0: horizontal) - linear; with it's own confidence
|
@dataclass
|
||||||
|
class BaseDataInput:
|
||||||
|
"""Unified base data input for all models"""
|
||||||
|
symbol: str # Primary symbol (e.g., 'ETH/USDT')
|
||||||
|
timestamp: datetime # Current timestamp
|
||||||
|
|
||||||
# Standardized input for all models:
|
# OHLCV Data (300 frames each)
|
||||||
{
|
ohlcv_1s: List[OHLCVBar] # 300 x 1-second bars
|
||||||
'primary_symbol': 'ETH/USDT',
|
ohlcv_1m: List[OHLCVBar] # 300 x 1-minute bars
|
||||||
'reference_symbol': 'BTC/USDT',
|
ohlcv_1h: List[OHLCVBar] # 300 x 1-hour bars
|
||||||
'eth_data': {'ETH_1s': df, 'ETH_1m': df, 'ETH_1h': df, 'ETH_1d': df},
|
ohlcv_1d: List[OHLCVBar] # 300 x 1-day bars
|
||||||
'btc_data': {'BTC_1s': df},
|
btc_ohlcv_1s: List[OHLCVBar] # 300 x 1-second BTC bars
|
||||||
'current_prices': {'ETH': price, 'BTC': price},
|
|
||||||
'data_completeness': {...}
|
# COB Data
|
||||||
}
|
cob_data: Optional[COBData] # COB with ±20 buckets + MA
|
||||||
|
|
||||||
|
# Technical Analysis
|
||||||
|
technical_indicators: Dict[str, float] # RSI, MACD, Bollinger, etc.
|
||||||
|
pivot_points: List[PivotPoint] # Williams Market Structure pivots
|
||||||
|
|
||||||
|
# Cross-Model Feeding
|
||||||
|
last_predictions: Dict[str, ModelOutput] # Outputs from all models
|
||||||
|
|
||||||
|
# Market Microstructure
|
||||||
|
market_microstructure: Dict[str, Any] # Order flow, liquidity, etc.
|
||||||
|
|
||||||
|
# Optional: COB Heatmap (for visualization and advanced models)
|
||||||
|
cob_heatmap_times: Optional[List[datetime]] # Heatmap time axis
|
||||||
|
cob_heatmap_prices: Optional[List[float]] # Heatmap price axis
|
||||||
|
cob_heatmap_values: Optional[np.ndarray] # Heatmap matrix (time x price)
|
||||||
|
```
|
||||||
|
|
||||||
|
**OHLCVBar Structure**:
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class OHLCVBar:
|
||||||
|
symbol: str
|
||||||
|
timestamp: datetime
|
||||||
|
open: float
|
||||||
|
high: float
|
||||||
|
low: float
|
||||||
|
close: float
|
||||||
|
volume: float
|
||||||
|
timeframe: str
|
||||||
|
indicators: Dict[str, float] # Technical indicators for this bar
|
||||||
|
```
|
||||||
|
|
||||||
|
**COBData Structure**:
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class COBData:
|
||||||
|
symbol: str
|
||||||
|
timestamp: datetime
|
||||||
|
current_price: float
|
||||||
|
bucket_size: float # $1 for ETH, $10 for BTC
|
||||||
|
|
||||||
|
# Price Buckets (±20 around current price)
|
||||||
|
price_buckets: Dict[float, Dict[str, float]] # {price: {bid_vol, ask_vol, ...}}
|
||||||
|
bid_ask_imbalance: Dict[float, float] # {price: imbalance_ratio}
|
||||||
|
volume_weighted_prices: Dict[float, float] # {price: VWAP}
|
||||||
|
|
||||||
|
# Moving Averages of Imbalance (±5 buckets)
|
||||||
|
ma_1s_imbalance: Dict[float, float] # 1-second MA
|
||||||
|
ma_5s_imbalance: Dict[float, float] # 5-second MA
|
||||||
|
ma_15s_imbalance: Dict[float, float] # 15-second MA
|
||||||
|
ma_60s_imbalance: Dict[float, float] # 60-second MA
|
||||||
|
|
||||||
|
# Order Flow Metrics
|
||||||
|
order_flow_metrics: Dict[str, float] # Aggressive buy/sell ratios, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Base Output Format (ModelOutput)
|
||||||
|
|
||||||
|
All models output predictions through standardized `ModelOutput`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class ModelOutput:
|
||||||
|
"""Extensible model output format supporting all model types"""
|
||||||
|
model_type: str # 'cnn', 'rl', 'lstm', 'transformer'
|
||||||
|
model_name: str # Specific model identifier
|
||||||
|
symbol: str
|
||||||
|
timestamp: datetime
|
||||||
|
confidence: float # Overall confidence (0.0 to 1.0)
|
||||||
|
|
||||||
|
# Model-Specific Predictions
|
||||||
|
predictions: Dict[str, Any] # Flexible prediction format
|
||||||
|
|
||||||
|
# Cross-Model Feeding
|
||||||
|
hidden_states: Optional[Dict[str, Any]] # For feeding to other models
|
||||||
|
|
||||||
|
# Extensibility
|
||||||
|
metadata: Dict[str, Any] # Additional model-specific info
|
||||||
|
```
|
||||||
|
|
||||||
|
**Standard Prediction Fields**:
|
||||||
|
- `action`: 'BUY', 'SELL', or 'HOLD'
|
||||||
|
- `action_confidence`: Confidence in the action (0.0 to 1.0)
|
||||||
|
- `direction_vector`: Price movement direction (-1.0 to 1.0)
|
||||||
|
- `direction_confidence`: Confidence in direction (0.0 to 1.0)
|
||||||
|
- `probabilities`: Dict of action probabilities {'BUY': 0.3, 'SELL': 0.2, 'HOLD': 0.5}
|
||||||
|
|
||||||
|
**Example CNN Output**:
|
||||||
|
```python
|
||||||
|
ModelOutput(
|
||||||
|
model_type='cnn',
|
||||||
|
model_name='williams_cnn_v2',
|
||||||
|
symbol='ETH/USDT',
|
||||||
|
timestamp=datetime.now(),
|
||||||
|
confidence=0.85,
|
||||||
|
predictions={
|
||||||
|
'action': 'BUY',
|
||||||
|
'action_confidence': 0.85,
|
||||||
|
'pivot_points': [...], # Predicted pivot points
|
||||||
|
'direction_vector': 0.7, # Upward movement
|
||||||
|
'direction_confidence': 0.82
|
||||||
|
},
|
||||||
|
hidden_states={
|
||||||
|
'conv_features': tensor(...),
|
||||||
|
'lstm_hidden': tensor(...)
|
||||||
|
},
|
||||||
|
metadata={'model_version': '2.1', 'training_date': '2025-01-08'}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example RL Output**:
|
||||||
|
```python
|
||||||
|
ModelOutput(
|
||||||
|
model_type='rl',
|
||||||
|
model_name='dqn_agent_v1',
|
||||||
|
symbol='ETH/USDT',
|
||||||
|
timestamp=datetime.now(),
|
||||||
|
confidence=0.78,
|
||||||
|
predictions={
|
||||||
|
'action': 'HOLD',
|
||||||
|
'action_confidence': 0.78,
|
||||||
|
'q_values': {'BUY': 0.45, 'SELL': 0.32, 'HOLD': 0.78},
|
||||||
|
'expected_reward': 0.023,
|
||||||
|
'direction_vector': 0.1,
|
||||||
|
'direction_confidence': 0.65
|
||||||
|
},
|
||||||
|
hidden_states={
|
||||||
|
'state_value': 0.56,
|
||||||
|
'advantage_values': [0.12, -0.08, 0.22]
|
||||||
|
},
|
||||||
|
metadata={'epsilon': 0.1, 'replay_buffer_size': 10000}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
### 2. CNN Model
|
### 2. CNN Model
|
||||||
|
|
||||||
|
|||||||
@@ -2,30 +2,150 @@
|
|||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
The Multi-Modal Trading System is an advanced algorithmic trading platform that combines Convolutional Neural Networks (CNN) and Reinforcement Learning (RL) models orchestrated by a decision-making module. The system processes multi-timeframe and multi-symbol market data (primarily ETH and BTC) to generate trading actions. The system is designed to adapt to current market conditions through continuous learning from past experiences, with the CNN module trained on historical data to predict pivot points and the RL module optimizing trading decisions based on these predictions and market data.
|
The Multi-Modal Trading System is an advanced algorithmic trading platform that combines Convolutional Neural Networks (CNN) and Reinforcement Learning (RL) models orchestrated by a decision-making module. The system processes multi-timeframe and multi-symbol market data (primarily ETH and BTC) to generate trading actions.
|
||||||
|
|
||||||
|
**Current System Architecture:**
|
||||||
|
- **COBY System**: Standalone multi-exchange data aggregation system with TimescaleDB storage, Redis caching, and WebSocket distribution
|
||||||
|
- **Core Data Provider**: Unified data provider (`core/data_provider.py`) with automatic data maintenance, Williams Market Structure pivot points, and COB integration
|
||||||
|
- **Enhanced COB WebSocket**: Real-time order book streaming (`core/enhanced_cob_websocket.py`) with multiple Binance streams (depth, ticker, aggTrade)
|
||||||
|
- **Standardized Data Provider**: Extension layer (`core/standardized_data_provider.py`) providing unified BaseDataInput format for all models
|
||||||
|
- **Model Output Manager**: Centralized storage for cross-model feeding with extensible ModelOutput format
|
||||||
|
- **Orchestrator**: Central coordination hub managing data subscriptions, model inference, and training pipelines
|
||||||
|
|
||||||
|
The system is designed to adapt to current market conditions through continuous learning from past experiences, with the CNN module trained on historical data to predict pivot points and the RL module optimizing trading decisions based on these predictions and market data.
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
### Requirement 1: Data Collection and Processing
|
### Requirement 1: Data Collection and Processing Backbone
|
||||||
|
|
||||||
**User Story:** As a trader, I want the system to collect and process multi-timeframe and multi-symbol market data, so that the models have comprehensive market information for making accurate trading decisions.
|
**User Story:** As a trader, I want a robust, multi-layered data collection system that provides real-time and historical market data from multiple sources, so that the models have comprehensive, reliable market information for making accurate trading decisions.
|
||||||
|
|
||||||
|
#### Current Implementation Status
|
||||||
|
|
||||||
|
**IMPLEMENTED:**
|
||||||
|
- ✅ Core DataProvider with automatic data maintenance (1500 candles cached per symbol/timeframe)
|
||||||
|
- ✅ Multi-exchange COB integration via EnhancedCOBWebSocket (Binance depth@100ms, ticker, aggTrade streams)
|
||||||
|
- ✅ Williams Market Structure pivot point calculation with monthly data analysis
|
||||||
|
- ✅ Pivot-based normalization system with PivotBounds caching
|
||||||
|
- ✅ Real-time tick aggregation with RealTimeTickAggregator
|
||||||
|
- ✅ COB 1s aggregation with price buckets ($1 for ETH, $10 for BTC)
|
||||||
|
- ✅ Multi-timeframe imbalance calculations (1s, 5s, 15s, 60s MA)
|
||||||
|
- ✅ Centralized data distribution with subscriber management
|
||||||
|
- ✅ COBY standalone system with TimescaleDB storage and Redis caching
|
||||||
|
|
||||||
|
**PARTIALLY IMPLEMENTED:**
|
||||||
|
- ⚠️ COB raw tick storage (30 min buffer) - implemented but needs validation
|
||||||
|
- ⚠️ Training data collection callbacks - structure exists but needs integration
|
||||||
|
- ⚠️ Cross-exchange COB consolidation - COBY system separate from core
|
||||||
|
|
||||||
|
**NEEDS ENHANCEMENT:**
|
||||||
|
- ❌ Unified integration between COBY and core DataProvider
|
||||||
|
- ❌ Configurable price range for COB imbalance (currently hardcoded $5 ETH, $50 BTC)
|
||||||
|
- ❌ COB heatmap matrix generation for model inputs
|
||||||
|
- ❌ Validation of 600-bar caching for backtesting support
|
||||||
|
|
||||||
#### Acceptance Criteria
|
#### Acceptance Criteria
|
||||||
|
|
||||||
0. NEVER USE GENERATED/SYNTHETIC DATA or mock implementations and UI. If somethings is not implemented yet, it should be obvious.
|
0. NEVER USE GENERATED/SYNTHETIC DATA or mock implementations and UI. If something is not implemented yet, it should be obvious.
|
||||||
1. WHEN the system starts THEN it SHALL collect and process data for both ETH and BTC symbols.
|
1. WHEN the system starts THEN it SHALL initialize both core DataProvider and COBY system for comprehensive data coverage.
|
||||||
2. WHEN collecting data THEN the system SHALL store the following for the primary symbol (ETH):
|
2. WHEN collecting data THEN the system SHALL maintain in DataProvider:
|
||||||
- 300 seconds of raw tick data - price and COB snapshot for all prices +- 1% on fine reslolution buckets (1$ for ETH, 10$ for BTC)
|
- 1500 candles of OHLCV data per timeframe (1s, 1m, 1h, 1d) for ETH and BTC
|
||||||
- 300 seconds of 1-second OHLCV data + 1s aggregated COB data
|
- 300 seconds (5 min) of COB 1s aggregated data with price buckets
|
||||||
- 300 bars of OHLCV + indicators for each timeframe (1s, 1m, 1h, 1d)
|
- 180,000 raw COB ticks (30 min buffer at ~100 ticks/second)
|
||||||
3. WHEN collecting data THEN the system SHALL store similar data for the reference symbol (BTC).
|
- Williams Market Structure pivot points with 5 levels
|
||||||
4. WHEN processing data THEN the system SHALL calculate standard technical indicators for all timeframes.
|
- Technical indicators calculated on all timeframes
|
||||||
5. WHEN processing data THEN the system SHALL calculate pivot points for all timeframes according to the specified methodology.
|
3. WHEN collecting COB data THEN the system SHALL use EnhancedCOBWebSocket with:
|
||||||
6. WHEN new data arrives THEN the system SHALL update its data cache in real-time.
|
- Binance depth@100ms stream for high-frequency order book updates
|
||||||
7. IF tick data is not available THEN the system SHALL substitute with the lowest available timeframe data.
|
- Binance ticker stream for 24hr statistics and volume
|
||||||
8. WHEN normalizing data THEN the system SHALL normalize to the max and min of the highest timeframe to maintain relationships between different timeframes.
|
- Binance aggTrade stream for large order detection
|
||||||
9. data is cached for longer (let's start with double the model inputs so 600 bars) to support performing backtesting when we know the current predictions outcomes so we can generate test cases.
|
- Automatic reconnection with exponential backoff
|
||||||
10. In general all models have access to the whole data we collect in a central data provider implementation. only some are specialized. All models should also take as input the last output of evey other model (also cached in the data provider). there should be a room for adding more models in the other models data input so we can extend the system without having to loose existing models and trained W&B
|
- Proper order book synchronization with REST API snapshots
|
||||||
|
4. WHEN aggregating COB data THEN the system SHALL create 1s buckets with:
|
||||||
|
- ±20 price buckets around current price ($1 for ETH, $10 for BTC)
|
||||||
|
- Bid/ask volumes and imbalances per bucket
|
||||||
|
- Multi-timeframe MA of imbalances (1s, 5s, 15s, 60s) for ±5 buckets
|
||||||
|
- Volume-weighted prices within buckets
|
||||||
|
5. WHEN processing data THEN the system SHALL calculate Williams Market Structure pivot points using:
|
||||||
|
- Recursive pivot detection with configurable min_pivot_distance
|
||||||
|
- 5 levels of trend analysis
|
||||||
|
- Monthly 1s data for comprehensive analysis
|
||||||
|
- Pivot-based normalization bounds for model inputs
|
||||||
|
6. WHEN new data arrives THEN the system SHALL update caches in real-time with:
|
||||||
|
- Automatic data maintenance worker updating every half-candle period
|
||||||
|
- Thread-safe access to cached data
|
||||||
|
- Subscriber notification system for real-time distribution
|
||||||
|
7. WHEN normalizing data THEN the system SHALL use pivot-based normalization:
|
||||||
|
- PivotBounds derived from Williams Market Structure
|
||||||
|
- Price normalization using pivot support/resistance levels
|
||||||
|
- Distance calculations to nearest support/resistance
|
||||||
|
8. WHEN storing data THEN the system SHALL cache 1500 bars (not 600) to support:
|
||||||
|
- Model inputs (300 bars)
|
||||||
|
- Backtesting with 3x historical context
|
||||||
|
- Prediction outcome validation
|
||||||
|
9. WHEN distributing data THEN the system SHALL provide centralized access via:
|
||||||
|
- StandardizedDataProvider.get_base_data_input() for unified model inputs
|
||||||
|
- Subscriber callbacks for real-time updates
|
||||||
|
- ModelOutputManager for cross-model feeding
|
||||||
|
10. WHEN integrating COBY THEN the system SHALL maintain separation:
|
||||||
|
- COBY as standalone multi-exchange aggregation system
|
||||||
|
- Core DataProvider for real-time trading operations
|
||||||
|
- Future: unified interface for accessing both systems
|
||||||
|
|
||||||
|
### Requirement 1.1: Standardized Data Provider Architecture
|
||||||
|
|
||||||
|
**User Story:** As a model developer, I want a standardized data provider that delivers consistent, validated input data in a unified format, so that all models receive the same high-quality data structure and can be easily extended.
|
||||||
|
|
||||||
|
#### Current Implementation Status
|
||||||
|
|
||||||
|
**IMPLEMENTED:**
|
||||||
|
- ✅ StandardizedDataProvider extending core DataProvider
|
||||||
|
- ✅ BaseDataInput dataclass with comprehensive fields
|
||||||
|
- ✅ OHLCVBar, COBData, PivotPoint, ModelOutput dataclasses
|
||||||
|
- ✅ ModelOutputManager for extensible cross-model feeding
|
||||||
|
- ✅ COB moving average calculation with thread-safe access
|
||||||
|
- ✅ Input validation before model inference
|
||||||
|
- ✅ Live price fetching with multiple fallbacks
|
||||||
|
|
||||||
|
**NEEDS ENHANCEMENT:**
|
||||||
|
- ❌ COB heatmap matrix integration in BaseDataInput
|
||||||
|
- ❌ Comprehensive data completeness validation
|
||||||
|
- ❌ Automatic data quality scoring
|
||||||
|
- ❌ Missing data interpolation strategies
|
||||||
|
|
||||||
|
#### Acceptance Criteria
|
||||||
|
|
||||||
|
1. WHEN a model requests data THEN StandardizedDataProvider SHALL return BaseDataInput containing:
|
||||||
|
- 300 frames of OHLCV for each timeframe (1s, 1m, 1h, 1d) for primary symbol
|
||||||
|
- 300 frames of 1s OHLCV for BTC reference symbol
|
||||||
|
- COBData with ±20 price buckets and MA (1s, 5s, 15s, 60s) for ±5 buckets
|
||||||
|
- Technical indicators dictionary
|
||||||
|
- List of PivotPoint objects from Williams Market Structure
|
||||||
|
- Dictionary of last predictions from all models (ModelOutput format)
|
||||||
|
- Market microstructure data including order flow metrics
|
||||||
|
2. WHEN BaseDataInput is created THEN it SHALL validate:
|
||||||
|
- Minimum 100 frames of data for each required timeframe
|
||||||
|
- Non-null COB data with valid price buckets
|
||||||
|
- Valid timestamp and symbol
|
||||||
|
- Data completeness score > 0.8
|
||||||
|
3. WHEN COB data is processed THEN the system SHALL calculate:
|
||||||
|
- Bid/ask imbalance for each price bucket
|
||||||
|
- Moving averages (1s, 5s, 15s, 60s) of imbalance for ±5 buckets around current price
|
||||||
|
- Volume-weighted prices within buckets
|
||||||
|
- Order flow metrics (aggressive buy/sell ratios)
|
||||||
|
4. WHEN models output predictions THEN ModelOutputManager SHALL store:
|
||||||
|
- Standardized ModelOutput with model_type, model_name, symbol, timestamp
|
||||||
|
- Model-specific predictions dictionary
|
||||||
|
- Hidden states for cross-model feeding (optional)
|
||||||
|
- Metadata for extensibility
|
||||||
|
5. WHEN retrieving model outputs THEN the system SHALL provide:
|
||||||
|
- Current outputs for all models by symbol
|
||||||
|
- Historical outputs with configurable retention (default 1000)
|
||||||
|
- Efficient query by model_name, symbol, timestamp
|
||||||
|
6. WHEN data is unavailable THEN the system SHALL:
|
||||||
|
- Return None instead of synthetic data
|
||||||
|
- Log specific missing components
|
||||||
|
- Provide data completeness metrics
|
||||||
|
- NOT proceed with model inference on incomplete data
|
||||||
|
|
||||||
### Requirement 2: CNN Model Implementation
|
### Requirement 2: CNN Model Implementation
|
||||||
|
|
||||||
|
|||||||
@@ -31,7 +31,8 @@
|
|||||||
- Integrate with COB data for enhanced pivot detection
|
- Integrate with COB data for enhanced pivot detection
|
||||||
- _Requirements: 1.5, 2.7_
|
- _Requirements: 1.5, 2.7_
|
||||||
|
|
||||||
- [-] 1.4. Optimize real-time data streaming with COB integration
|
- [x] 1.4. Optimize real-time data streaming with COB integration
|
||||||
|
|
||||||
- Enhance existing WebSocket connections in enhanced_cob_websocket.py
|
- Enhance existing WebSocket connections in enhanced_cob_websocket.py
|
||||||
- Implement 10Hz COB data streaming alongside OHLCV data
|
- Implement 10Hz COB data streaming alongside OHLCV data
|
||||||
- Add data synchronization across different refresh rates
|
- Add data synchronization across different refresh rates
|
||||||
|
|||||||
Reference in New Issue
Block a user