231 lines
6.1 KiB
Markdown
231 lines
6.1 KiB
Markdown
# COBY - Multi-Exchange Data Aggregation System
|
|
|
|
COBY (Cryptocurrency Order Book Yielder) is a comprehensive data collection and aggregation subsystem designed to serve as the foundational data layer for trading systems. It collects real-time order book and OHLCV data from multiple cryptocurrency exchanges, aggregates it into standardized formats, and provides both live data feeds and historical replay capabilities.
|
|
|
|
## 🏗️ Architecture
|
|
|
|
The system follows a modular architecture with clear separation of concerns:
|
|
|
|
```
|
|
COBY/
|
|
├── config.py # Configuration management
|
|
├── models/ # Data models and structures
|
|
│ ├── __init__.py
|
|
│ └── core.py # Core data models
|
|
├── interfaces/ # Abstract interfaces
|
|
│ ├── __init__.py
|
|
│ ├── exchange_connector.py
|
|
│ ├── data_processor.py
|
|
│ ├── aggregation_engine.py
|
|
│ ├── storage_manager.py
|
|
│ └── replay_manager.py
|
|
├── utils/ # Utility functions
|
|
│ ├── __init__.py
|
|
│ ├── exceptions.py
|
|
│ ├── logging.py
|
|
│ ├── validation.py
|
|
│ └── timing.py
|
|
└── README.md
|
|
```
|
|
|
|
## 🚀 Features
|
|
|
|
- **Multi-Exchange Support**: Connect to 10+ major cryptocurrency exchanges
|
|
- **Real-Time Data**: High-frequency order book and trade data collection
|
|
- **Price Bucket Aggregation**: Configurable price buckets ($10 for BTC, $1 for ETH)
|
|
- **Heatmap Visualization**: Real-time market depth heatmaps
|
|
- **Historical Replay**: Replay past market events for model training
|
|
- **TimescaleDB Storage**: Optimized time-series data storage
|
|
- **Redis Caching**: High-performance data caching layer
|
|
- **Orchestrator Integration**: Compatible with existing trading systems
|
|
|
|
## 📊 Data Models
|
|
|
|
### Core Models
|
|
|
|
- **OrderBookSnapshot**: Standardized order book data
|
|
- **TradeEvent**: Individual trade events
|
|
- **PriceBuckets**: Aggregated price bucket data
|
|
- **HeatmapData**: Visualization-ready heatmap data
|
|
- **ConnectionStatus**: Exchange connection monitoring
|
|
- **ReplaySession**: Historical data replay management
|
|
|
|
### Key Features
|
|
|
|
- Automatic data validation and normalization
|
|
- Configurable price bucket sizes per symbol
|
|
- Real-time metrics calculation
|
|
- Cross-exchange data consolidation
|
|
- Quality scoring and anomaly detection
|
|
|
|
## ⚙️ Configuration
|
|
|
|
The system uses environment variables for configuration:
|
|
|
|
```python
|
|
# Database settings
|
|
DB_HOST=192.168.0.10
|
|
DB_PORT=5432
|
|
DB_NAME=market_data
|
|
DB_USER=market_user
|
|
DB_PASSWORD=your_password
|
|
|
|
# Redis settings
|
|
REDIS_HOST=192.168.0.10
|
|
REDIS_PORT=6379
|
|
REDIS_PASSWORD=your_password
|
|
|
|
# Aggregation settings
|
|
BTC_BUCKET_SIZE=10.0
|
|
ETH_BUCKET_SIZE=1.0
|
|
HEATMAP_DEPTH=50
|
|
UPDATE_FREQUENCY=0.5
|
|
|
|
# Performance settings
|
|
DATA_BUFFER_SIZE=10000
|
|
BATCH_WRITE_SIZE=1000
|
|
MAX_MEMORY_USAGE=2048
|
|
```
|
|
|
|
## 🔌 Interfaces
|
|
|
|
### ExchangeConnector
|
|
Abstract base class for exchange WebSocket connectors with:
|
|
- Connection management with auto-reconnect
|
|
- Order book and trade subscriptions
|
|
- Data normalization callbacks
|
|
- Health monitoring
|
|
|
|
### DataProcessor
|
|
Interface for data processing and validation:
|
|
- Raw data normalization
|
|
- Quality validation
|
|
- Metrics calculation
|
|
- Anomaly detection
|
|
|
|
### AggregationEngine
|
|
Interface for data aggregation:
|
|
- Price bucket creation
|
|
- Heatmap generation
|
|
- Cross-exchange consolidation
|
|
- Imbalance calculations
|
|
|
|
### StorageManager
|
|
Interface for data persistence:
|
|
- TimescaleDB operations
|
|
- Batch processing
|
|
- Historical data retrieval
|
|
- Storage optimization
|
|
|
|
### ReplayManager
|
|
Interface for historical data replay:
|
|
- Session management
|
|
- Configurable playback speeds
|
|
- Time-based seeking
|
|
- Real-time compatibility
|
|
|
|
## 🛠️ Utilities
|
|
|
|
### Logging
|
|
- Structured logging with correlation IDs
|
|
- Configurable log levels and outputs
|
|
- Rotating file handlers
|
|
- Context-aware logging
|
|
|
|
### Validation
|
|
- Symbol format validation
|
|
- Price and volume validation
|
|
- Configuration validation
|
|
- Data quality checks
|
|
|
|
### Timing
|
|
- UTC timestamp handling
|
|
- Performance measurement
|
|
- Time-based operations
|
|
- Interval calculations
|
|
|
|
### Exceptions
|
|
- Custom exception hierarchy
|
|
- Error code management
|
|
- Detailed error context
|
|
- Structured error responses
|
|
|
|
## 🔧 Usage
|
|
|
|
### Basic Configuration
|
|
|
|
```python
|
|
from COBY.config import config
|
|
|
|
# Access configuration
|
|
db_url = config.get_database_url()
|
|
bucket_size = config.get_bucket_size('BTCUSDT')
|
|
```
|
|
|
|
### Data Models
|
|
|
|
```python
|
|
from COBY.models import OrderBookSnapshot, PriceLevel
|
|
|
|
# Create order book snapshot
|
|
orderbook = OrderBookSnapshot(
|
|
symbol='BTCUSDT',
|
|
exchange='binance',
|
|
timestamp=datetime.now(timezone.utc),
|
|
bids=[PriceLevel(50000.0, 1.5)],
|
|
asks=[PriceLevel(50100.0, 2.0)]
|
|
)
|
|
|
|
# Access calculated properties
|
|
mid_price = orderbook.mid_price
|
|
spread = orderbook.spread
|
|
```
|
|
|
|
### Logging
|
|
|
|
```python
|
|
from COBY.utils import setup_logging, get_logger, set_correlation_id
|
|
|
|
# Setup logging
|
|
setup_logging(level='INFO', log_file='logs/coby.log')
|
|
|
|
# Get logger
|
|
logger = get_logger(__name__)
|
|
|
|
# Use correlation ID
|
|
set_correlation_id('req-123')
|
|
logger.info("Processing order book data")
|
|
```
|
|
|
|
## 🏃 Next Steps
|
|
|
|
This is the foundational structure for the COBY system. The next implementation tasks will build upon these interfaces and models to create:
|
|
|
|
1. TimescaleDB integration
|
|
2. Exchange connector implementations
|
|
3. Data processing engines
|
|
4. Aggregation algorithms
|
|
5. Web dashboard
|
|
6. API endpoints
|
|
7. Replay functionality
|
|
|
|
Each component will implement the defined interfaces, ensuring consistency and maintainability across the entire system.
|
|
|
|
## 📝 Development Guidelines
|
|
|
|
- All components must implement the defined interfaces
|
|
- Use the provided data models for consistency
|
|
- Follow the logging and error handling patterns
|
|
- Validate all input data using the utility functions
|
|
- Maintain backward compatibility with the orchestrator interface
|
|
- Write comprehensive tests for all functionality
|
|
|
|
## 🔍 Monitoring
|
|
|
|
The system provides comprehensive monitoring through:
|
|
- Structured logging with correlation IDs
|
|
- Performance metrics collection
|
|
- Health check endpoints
|
|
- Connection status monitoring
|
|
- Data quality indicators
|
|
- System resource tracking |