448 lines
12 KiB
Markdown
448 lines
12 KiB
Markdown
# Design Document
|
|
|
|
## Overview
|
|
|
|
The Multi-Exchange Data Aggregation System is a comprehensive data collection and processing subsystem designed to serve as the foundational data layer for the trading orchestrator. The system will collect real-time order book and OHLCV data from the top 10 cryptocurrency exchanges, aggregate it into standardized formats, store it in a TimescaleDB time-series database, and provide both live data feeds and historical replay capabilities.
|
|
|
|
The system follows a microservices architecture with containerized components, ensuring scalability, maintainability, and seamless integration with the existing trading infrastructure.
|
|
|
|
We implement it in the `.\COBY` subfolder for easy integration with the existing system
|
|
|
|
## Architecture
|
|
|
|
### High-Level Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Exchange Connectors"
|
|
E1[Binance WebSocket]
|
|
E2[Coinbase WebSocket]
|
|
E3[Kraken WebSocket]
|
|
E4[Bybit WebSocket]
|
|
E5[OKX WebSocket]
|
|
E6[Huobi WebSocket]
|
|
E7[KuCoin WebSocket]
|
|
E8[Gate.io WebSocket]
|
|
E9[Bitfinex WebSocket]
|
|
E10[MEXC WebSocket]
|
|
end
|
|
|
|
subgraph "Data Processing Layer"
|
|
DP[Data Processor]
|
|
AGG[Aggregation Engine]
|
|
NORM[Data Normalizer]
|
|
end
|
|
|
|
subgraph "Storage Layer"
|
|
TSDB[(TimescaleDB)]
|
|
CACHE[Redis Cache]
|
|
end
|
|
|
|
subgraph "API Layer"
|
|
LIVE[Live Data API]
|
|
REPLAY[Replay API]
|
|
WEB[Web Dashboard]
|
|
end
|
|
|
|
subgraph "Integration Layer"
|
|
ORCH[Orchestrator Interface]
|
|
ADAPTER[Data Adapter]
|
|
end
|
|
|
|
E1 --> DP
|
|
E2 --> DP
|
|
E3 --> DP
|
|
E4 --> DP
|
|
E5 --> DP
|
|
E6 --> DP
|
|
E7 --> DP
|
|
E8 --> DP
|
|
E9 --> DP
|
|
E10 --> DP
|
|
|
|
DP --> NORM
|
|
NORM --> AGG
|
|
AGG --> TSDB
|
|
AGG --> CACHE
|
|
|
|
CACHE --> LIVE
|
|
TSDB --> REPLAY
|
|
LIVE --> WEB
|
|
REPLAY --> WEB
|
|
|
|
LIVE --> ADAPTER
|
|
REPLAY --> ADAPTER
|
|
ADAPTER --> ORCH
|
|
```
|
|
|
|
### Component Architecture
|
|
|
|
The system is organized into several key components:
|
|
|
|
1. **Exchange Connectors**: WebSocket clients for each exchange
|
|
2. **Data Processing Engine**: Normalizes and validates incoming data
|
|
3. **Aggregation Engine**: Creates price buckets and heatmaps
|
|
4. **Storage Layer**: TimescaleDB for persistence, Redis for caching
|
|
5. **API Layer**: REST and WebSocket APIs for data access
|
|
6. **Web Dashboard**: Real-time visualization interface
|
|
7. **Integration Layer**: Orchestrator-compatible interface
|
|
|
|
## Components and Interfaces
|
|
|
|
### Exchange Connector Interface
|
|
|
|
```python
|
|
class ExchangeConnector:
|
|
"""Base interface for exchange WebSocket connectors"""
|
|
|
|
async def connect(self) -> bool
|
|
async def disconnect(self) -> None
|
|
async def subscribe_orderbook(self, symbol: str) -> None
|
|
async def subscribe_trades(self, symbol: str) -> None
|
|
def get_connection_status(self) -> ConnectionStatus
|
|
def add_data_callback(self, callback: Callable) -> None
|
|
```
|
|
|
|
### Data Processing Interface
|
|
|
|
```python
|
|
class DataProcessor:
|
|
"""Processes and normalizes raw exchange data"""
|
|
|
|
def normalize_orderbook(self, raw_data: Dict, exchange: str) -> OrderBookSnapshot
|
|
def normalize_trade(self, raw_data: Dict, exchange: str) -> TradeEvent
|
|
def validate_data(self, data: Union[OrderBookSnapshot, TradeEvent]) -> bool
|
|
def calculate_metrics(self, orderbook: OrderBookSnapshot) -> OrderBookMetrics
|
|
```
|
|
|
|
### Aggregation Engine Interface
|
|
|
|
```python
|
|
class AggregationEngine:
|
|
"""Aggregates data into price buckets and heatmaps"""
|
|
|
|
def create_price_buckets(self, orderbook: OrderBookSnapshot, bucket_size: float) -> PriceBuckets
|
|
def update_heatmap(self, symbol: str, buckets: PriceBuckets) -> HeatmapData
|
|
def calculate_imbalances(self, orderbook: OrderBookSnapshot) -> ImbalanceMetrics
|
|
def aggregate_across_exchanges(self, symbol: str) -> ConsolidatedOrderBook
|
|
```
|
|
|
|
### Storage Interface
|
|
|
|
```python
|
|
class StorageManager:
|
|
"""Manages data persistence and retrieval"""
|
|
|
|
async def store_orderbook(self, data: OrderBookSnapshot) -> bool
|
|
async def store_trade(self, data: TradeEvent) -> bool
|
|
async def get_historical_data(self, symbol: str, start: datetime, end: datetime) -> List[Dict]
|
|
async def get_latest_data(self, symbol: str) -> Dict
|
|
def setup_database_schema(self) -> None
|
|
```
|
|
|
|
### Replay Interface
|
|
|
|
```python
|
|
class ReplayManager:
|
|
"""Provides historical data replay functionality"""
|
|
|
|
def create_replay_session(self, start_time: datetime, end_time: datetime, speed: float) -> str
|
|
async def start_replay(self, session_id: str) -> None
|
|
async def pause_replay(self, session_id: str) -> None
|
|
async def stop_replay(self, session_id: str) -> None
|
|
def get_replay_status(self, session_id: str) -> ReplayStatus
|
|
```
|
|
|
|
## Data Models
|
|
|
|
### Core Data Structures
|
|
|
|
```python
|
|
@dataclass
|
|
class OrderBookSnapshot:
|
|
"""Standardized order book snapshot"""
|
|
symbol: str
|
|
exchange: str
|
|
timestamp: datetime
|
|
bids: List[PriceLevel]
|
|
asks: List[PriceLevel]
|
|
sequence_id: Optional[int] = None
|
|
|
|
@dataclass
|
|
class PriceLevel:
|
|
"""Individual price level in order book"""
|
|
price: float
|
|
size: float
|
|
count: Optional[int] = None
|
|
|
|
@dataclass
|
|
class TradeEvent:
|
|
"""Standardized trade event"""
|
|
symbol: str
|
|
exchange: str
|
|
timestamp: datetime
|
|
price: float
|
|
size: float
|
|
side: str # 'buy' or 'sell'
|
|
trade_id: str
|
|
|
|
@dataclass
|
|
class PriceBuckets:
|
|
"""Aggregated price buckets for heatmap"""
|
|
symbol: str
|
|
timestamp: datetime
|
|
bucket_size: float
|
|
bid_buckets: Dict[float, float] # price -> volume
|
|
ask_buckets: Dict[float, float] # price -> volume
|
|
|
|
@dataclass
|
|
class HeatmapData:
|
|
"""Heatmap visualization data"""
|
|
symbol: str
|
|
timestamp: datetime
|
|
bucket_size: float
|
|
data: List[HeatmapPoint]
|
|
|
|
@dataclass
|
|
class HeatmapPoint:
|
|
"""Individual heatmap data point"""
|
|
price: float
|
|
volume: float
|
|
intensity: float # 0.0 to 1.0
|
|
side: str # 'bid' or 'ask'
|
|
```
|
|
|
|
### Database Schema
|
|
|
|
#### TimescaleDB Tables
|
|
|
|
```sql
|
|
-- Order book snapshots table
|
|
CREATE TABLE order_book_snapshots (
|
|
id BIGSERIAL,
|
|
symbol VARCHAR(20) NOT NULL,
|
|
exchange VARCHAR(20) NOT NULL,
|
|
timestamp TIMESTAMPTZ NOT NULL,
|
|
bids JSONB NOT NULL,
|
|
asks JSONB NOT NULL,
|
|
sequence_id BIGINT,
|
|
mid_price DECIMAL(20,8),
|
|
spread DECIMAL(20,8),
|
|
bid_volume DECIMAL(30,8),
|
|
ask_volume DECIMAL(30,8),
|
|
PRIMARY KEY (timestamp, symbol, exchange)
|
|
);
|
|
|
|
-- Convert to hypertable
|
|
SELECT create_hypertable('order_book_snapshots', 'timestamp');
|
|
|
|
-- Trade events table
|
|
CREATE TABLE trade_events (
|
|
id BIGSERIAL,
|
|
symbol VARCHAR(20) NOT NULL,
|
|
exchange VARCHAR(20) NOT NULL,
|
|
timestamp TIMESTAMPTZ NOT NULL,
|
|
price DECIMAL(20,8) NOT NULL,
|
|
size DECIMAL(30,8) NOT NULL,
|
|
side VARCHAR(4) NOT NULL,
|
|
trade_id VARCHAR(100) NOT NULL,
|
|
PRIMARY KEY (timestamp, symbol, exchange, trade_id)
|
|
);
|
|
|
|
-- Convert to hypertable
|
|
SELECT create_hypertable('trade_events', 'timestamp');
|
|
|
|
-- Aggregated heatmap data table
|
|
CREATE TABLE heatmap_data (
|
|
symbol VARCHAR(20) NOT NULL,
|
|
timestamp TIMESTAMPTZ NOT NULL,
|
|
bucket_size DECIMAL(10,2) NOT NULL,
|
|
price_bucket DECIMAL(20,8) NOT NULL,
|
|
volume DECIMAL(30,8) NOT NULL,
|
|
side VARCHAR(3) NOT NULL,
|
|
exchange_count INTEGER NOT NULL,
|
|
PRIMARY KEY (timestamp, symbol, bucket_size, price_bucket, side)
|
|
);
|
|
|
|
-- Convert to hypertable
|
|
SELECT create_hypertable('heatmap_data', 'timestamp');
|
|
|
|
-- OHLCV data table
|
|
CREATE TABLE ohlcv_data (
|
|
symbol VARCHAR(20) NOT NULL,
|
|
timestamp TIMESTAMPTZ NOT NULL,
|
|
timeframe VARCHAR(10) NOT NULL,
|
|
open_price DECIMAL(20,8) NOT NULL,
|
|
high_price DECIMAL(20,8) NOT NULL,
|
|
low_price DECIMAL(20,8) NOT NULL,
|
|
close_price DECIMAL(20,8) NOT NULL,
|
|
volume DECIMAL(30,8) NOT NULL,
|
|
trade_count INTEGER,
|
|
PRIMARY KEY (timestamp, symbol, timeframe)
|
|
);
|
|
|
|
-- Convert to hypertable
|
|
SELECT create_hypertable('ohlcv_data', 'timestamp');
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Connection Management
|
|
|
|
The system implements robust error handling for exchange connections:
|
|
|
|
1. **Exponential Backoff**: Failed connections retry with increasing delays
|
|
2. **Circuit Breaker**: Temporarily disable problematic exchanges
|
|
3. **Graceful Degradation**: Continue operation with available exchanges
|
|
4. **Health Monitoring**: Continuous monitoring of connection status
|
|
|
|
### Data Validation
|
|
|
|
All incoming data undergoes validation:
|
|
|
|
1. **Schema Validation**: Ensure data structure compliance
|
|
2. **Range Validation**: Check price and volume ranges
|
|
3. **Timestamp Validation**: Verify temporal consistency
|
|
4. **Duplicate Detection**: Prevent duplicate data storage
|
|
|
|
### Database Resilience
|
|
|
|
Database operations include comprehensive error handling:
|
|
|
|
1. **Connection Pooling**: Maintain multiple database connections
|
|
2. **Transaction Management**: Ensure data consistency
|
|
3. **Retry Logic**: Automatic retry for transient failures
|
|
4. **Backup Strategies**: Regular data backups and recovery procedures
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Testing
|
|
|
|
Each component will have comprehensive unit tests:
|
|
|
|
1. **Exchange Connectors**: Mock WebSocket responses
|
|
2. **Data Processing**: Test normalization and validation
|
|
3. **Aggregation Engine**: Verify bucket calculations
|
|
4. **Storage Layer**: Test database operations
|
|
5. **API Layer**: Test endpoint responses
|
|
|
|
### Integration Testing
|
|
|
|
End-to-end testing scenarios:
|
|
|
|
1. **Multi-Exchange Data Flow**: Test complete data pipeline
|
|
2. **Database Integration**: Verify TimescaleDB operations
|
|
3. **API Integration**: Test orchestrator interface compatibility
|
|
4. **Performance Testing**: Load testing with high-frequency data
|
|
|
|
### Performance Testing
|
|
|
|
Performance benchmarks and testing:
|
|
|
|
1. **Throughput Testing**: Measure data processing capacity
|
|
2. **Latency Testing**: Measure end-to-end data latency
|
|
3. **Memory Usage**: Monitor memory consumption patterns
|
|
4. **Database Performance**: Query performance optimization
|
|
|
|
### Monitoring and Observability
|
|
|
|
Comprehensive monitoring system:
|
|
|
|
1. **Metrics Collection**: Prometheus-compatible metrics
|
|
2. **Logging**: Structured logging with correlation IDs
|
|
3. **Alerting**: Real-time alerts for system issues
|
|
4. **Dashboards**: Grafana dashboards for system monitoring
|
|
|
|
## Deployment Architecture
|
|
|
|
### Docker Containerization
|
|
|
|
The system will be deployed using Docker containers:
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
version: '3.8'
|
|
services:
|
|
timescaledb:
|
|
image: timescale/timescaledb:latest-pg14
|
|
environment:
|
|
POSTGRES_DB: market_data
|
|
POSTGRES_USER: market_user
|
|
POSTGRES_PASSWORD: ${DB_PASSWORD}
|
|
volumes:
|
|
- timescale_data:/var/lib/postgresql/data
|
|
ports:
|
|
- "5432:5432"
|
|
|
|
redis:
|
|
image: redis:7-alpine
|
|
ports:
|
|
- "6379:6379"
|
|
volumes:
|
|
- redis_data:/data
|
|
|
|
data-aggregator:
|
|
build: ./data-aggregator
|
|
environment:
|
|
- DB_HOST=timescaledb
|
|
- REDIS_HOST=redis
|
|
- LOG_LEVEL=INFO
|
|
depends_on:
|
|
- timescaledb
|
|
- redis
|
|
|
|
web-dashboard:
|
|
build: ./web-dashboard
|
|
ports:
|
|
- "8080:8080"
|
|
environment:
|
|
- API_HOST=data-aggregator
|
|
depends_on:
|
|
- data-aggregator
|
|
|
|
volumes:
|
|
timescale_data:
|
|
redis_data:
|
|
```
|
|
|
|
### Configuration Management
|
|
|
|
Environment-based configuration:
|
|
|
|
```python
|
|
# config.py
|
|
@dataclass
|
|
class Config:
|
|
# Database settings
|
|
db_host: str = os.getenv('DB_HOST', 'localhost')
|
|
db_port: int = int(os.getenv('DB_PORT', '5432'))
|
|
db_name: str = os.getenv('DB_NAME', 'market_data')
|
|
db_user: str = os.getenv('DB_USER', 'market_user')
|
|
db_password: str = os.getenv('DB_PASSWORD', '')
|
|
|
|
# Redis settings
|
|
redis_host: str = os.getenv('REDIS_HOST', 'localhost')
|
|
redis_port: int = int(os.getenv('REDIS_PORT', '6379'))
|
|
|
|
# Exchange settings
|
|
exchanges: List[str] = field(default_factory=lambda: [
|
|
'binance', 'coinbase', 'kraken', 'bybit', 'okx',
|
|
'huobi', 'kucoin', 'gateio', 'bitfinex', 'mexc'
|
|
])
|
|
|
|
# Aggregation settings
|
|
btc_bucket_size: float = 10.0 # $10 USD buckets for BTC
|
|
eth_bucket_size: float = 1.0 # $1 USD buckets for ETH
|
|
|
|
# Performance settings
|
|
max_connections_per_exchange: int = 5
|
|
data_buffer_size: int = 10000
|
|
batch_write_size: int = 1000
|
|
|
|
# API settings
|
|
api_host: str = os.getenv('API_HOST', '0.0.0.0')
|
|
api_port: int = int(os.getenv('API_PORT', '8080'))
|
|
websocket_port: int = int(os.getenv('WS_PORT', '8081'))
|
|
```
|
|
|
|
This design provides a robust, scalable foundation for multi-exchange data aggregation that seamlessly integrates with the existing trading orchestrator while providing the flexibility for future enhancements and additional exchange integrations. |