COBY : specs + task 1

This commit is contained in:
Dobromir Popov
2025-08-04 15:50:54 +03:00
parent e223bc90e9
commit de9fa4a421
28 changed files with 4165 additions and 1 deletions

View File

@ -0,0 +1,103 @@
# Requirements Document
## Introduction
This document outlines the requirements for a comprehensive data collection and aggregation subsystem that will serve as a foundational component for the trading orchestrator. The system will collect, aggregate, and store real-time order book and OHLCV data from multiple cryptocurrency exchanges, providing both live data feeds and historical replay capabilities for model training and backtesting.
## Requirements
### Requirement 1
**User Story:** As a trading system developer, I want to collect real-time order book data from top 10 cryptocurrency exchanges, so that I can have comprehensive market data for analysis and trading decisions.
#### Acceptance Criteria
1. WHEN the system starts THEN it SHALL establish WebSocket connections to up to 10 major cryptocurrency exchanges
2. WHEN order book updates are received THEN the system SHALL process and store raw order book events in real-time
3. WHEN processing order book data THEN the system SHALL handle connection failures gracefully and automatically reconnect
4. WHEN multiple exchanges provide data THEN the system SHALL normalize data formats to a consistent structure
5. IF an exchange connection fails THEN the system SHALL log the failure and attempt reconnection with exponential backoff
### Requirement 2
**User Story:** As a trading analyst, I want order book data aggregated into price buckets with heatmap visualization, so that I can quickly identify market depth and liquidity patterns.
#### Acceptance Criteria
1. WHEN processing BTC order book data THEN the system SHALL aggregate orders into $10 USD price range buckets
2. WHEN processing ETH order book data THEN the system SHALL aggregate orders into $1 USD price range buckets
3. WHEN aggregating order data THEN the system SHALL maintain separate bid and ask heatmaps
4. WHEN building heatmaps THEN the system SHALL update distribution data at high frequency (sub-second)
5. WHEN displaying heatmaps THEN the system SHALL show volume intensity using color gradients or progress bars
### Requirement 3
**User Story:** As a system architect, I want all market data stored in a TimescaleDB database, so that I can efficiently query time-series data and maintain historical records.
#### Acceptance Criteria
1. WHEN the system initializes THEN it SHALL connect to a TimescaleDB instance running in a Docker container
2. WHEN storing order book events THEN the system SHALL use TimescaleDB's time-series optimized storage
3. WHEN storing OHLCV data THEN the system SHALL create appropriate time-series tables with proper indexing
4. WHEN writing to database THEN the system SHALL batch writes for optimal performance
5. IF database connection fails THEN the system SHALL queue data in memory and retry with backoff strategy
### Requirement 4
**User Story:** As a trading system operator, I want a web-based dashboard to monitor real-time order book heatmaps, so that I can visualize market conditions across multiple exchanges.
#### Acceptance Criteria
1. WHEN accessing the web dashboard THEN it SHALL display real-time order book heatmaps for BTC and ETH
2. WHEN viewing heatmaps THEN the dashboard SHALL show aggregated data from all connected exchanges
3. WHEN displaying progress bars THEN they SHALL always show aggregated values across price buckets
4. WHEN updating the display THEN the dashboard SHALL refresh data at least once per second
5. WHEN an exchange goes offline THEN the dashboard SHALL indicate the status change visually
### Requirement 5
**User Story:** As a model trainer, I want a replay interface that can provide historical data in the same format as live data, so that I can train models on past market events.
#### Acceptance Criteria
1. WHEN requesting historical data THEN the replay interface SHALL provide data in the same structure as live feeds
2. WHEN replaying data THEN the system SHALL maintain original timing relationships between events
3. WHEN using replay mode THEN the interface SHALL support configurable playback speeds
4. WHEN switching between live and replay modes THEN the orchestrator SHALL receive data through the same interface
5. IF replay data is requested for unavailable time periods THEN the system SHALL return appropriate error messages
### Requirement 6
**User Story:** As a trading system integrator, I want the data aggregation system to follow the same interface as the current orchestrator data provider, so that I can seamlessly integrate it into existing workflows.
#### Acceptance Criteria
1. WHEN the orchestrator requests data THEN the aggregation system SHALL provide data in the expected format
2. WHEN integrating with existing systems THEN the interface SHALL be compatible with current data provider contracts
3. WHEN providing aggregated data THEN the system SHALL include metadata about data sources and quality
4. WHEN the orchestrator switches data sources THEN it SHALL work without code changes
5. IF data quality issues are detected THEN the system SHALL provide quality indicators in the response
### Requirement 7
**User Story:** As a system administrator, I want the data collection system to be containerized and easily deployable, so that I can manage it alongside other system components.
#### Acceptance Criteria
1. WHEN deploying the system THEN it SHALL run in Docker containers with proper resource allocation
2. WHEN starting services THEN TimescaleDB SHALL be automatically provisioned in its own container
3. WHEN configuring the system THEN all settings SHALL be externalized through environment variables or config files
4. WHEN monitoring the system THEN it SHALL provide health check endpoints for container orchestration
5. IF containers need to be restarted THEN the system SHALL recover gracefully without data loss
### Requirement 8
**User Story:** As a performance engineer, I want the system to handle high-frequency data efficiently, so that it can process order book updates from multiple exchanges without latency issues.
#### Acceptance Criteria
1. WHEN processing order book updates THEN the system SHALL handle at least 10 updates per second per exchange
2. WHEN aggregating data THEN processing latency SHALL be less than 10 milliseconds per update
3. WHEN storing data THEN the system SHALL use efficient batching to minimize database overhead
4. WHEN memory usage grows THEN the system SHALL implement appropriate cleanup and garbage collection
5. IF processing falls behind THEN the system SHALL prioritize recent data and log performance warnings