COBY : specs + task 1

2025-08-04 15:50:54 +03:00
parent e223bc90e9
commit de9fa4a421
28 changed files with 4165 additions and 1 deletions
--- a/.kiro/specs/multi-exchange-data-aggregation/requirements.md
+++ b/.kiro/specs/multi-exchange-data-aggregation/requirements.md
@@ -0,0 +1,103 @@
+# Requirements Document
+
+## Introduction
+
+This document outlines the requirements for a comprehensive data collection and aggregation subsystem that will serve as a foundational component for the trading orchestrator. The system will collect, aggregate, and store real-time order book and OHLCV data from multiple cryptocurrency exchanges, providing both live data feeds and historical replay capabilities for model training and backtesting.
+
+## Requirements
+
+### Requirement 1
+
+**User Story:** As a trading system developer, I want to collect real-time order book data from top 10 cryptocurrency exchanges, so that I can have comprehensive market data for analysis and trading decisions.
+
+#### Acceptance Criteria
+
+1. WHEN the system starts THEN it SHALL establish WebSocket connections to up to 10 major cryptocurrency exchanges
+2. WHEN order book updates are received THEN the system SHALL process and store raw order book events in real-time
+3. WHEN processing order book data THEN the system SHALL handle connection failures gracefully and automatically reconnect
+4. WHEN multiple exchanges provide data THEN the system SHALL normalize data formats to a consistent structure
+5. IF an exchange connection fails THEN the system SHALL log the failure and attempt reconnection with exponential backoff
+
+### Requirement 2
+
+**User Story:** As a trading analyst, I want order book data aggregated into price buckets with heatmap visualization, so that I can quickly identify market depth and liquidity patterns.
+
+#### Acceptance Criteria
+
+1. WHEN processing BTC order book data THEN the system SHALL aggregate orders into $10 USD price range buckets
+2. WHEN processing ETH order book data THEN the system SHALL aggregate orders into $1 USD price range buckets
+3. WHEN aggregating order data THEN the system SHALL maintain separate bid and ask heatmaps
+4. WHEN building heatmaps THEN the system SHALL update distribution data at high frequency (sub-second)
+5. WHEN displaying heatmaps THEN the system SHALL show volume intensity using color gradients or progress bars
+
+### Requirement 3
+
+**User Story:** As a system architect, I want all market data stored in a TimescaleDB database, so that I can efficiently query time-series data and maintain historical records.
+
+#### Acceptance Criteria
+
+1. WHEN the system initializes THEN it SHALL connect to a TimescaleDB instance running in a Docker container
+2. WHEN storing order book events THEN the system SHALL use TimescaleDB's time-series optimized storage
+3. WHEN storing OHLCV data THEN the system SHALL create appropriate time-series tables with proper indexing
+4. WHEN writing to database THEN the system SHALL batch writes for optimal performance
+5. IF database connection fails THEN the system SHALL queue data in memory and retry with backoff strategy
+
+### Requirement 4
+
+**User Story:** As a trading system operator, I want a web-based dashboard to monitor real-time order book heatmaps, so that I can visualize market conditions across multiple exchanges.
+
+#### Acceptance Criteria
+
+1. WHEN accessing the web dashboard THEN it SHALL display real-time order book heatmaps for BTC and ETH
+2. WHEN viewing heatmaps THEN the dashboard SHALL show aggregated data from all connected exchanges
+3. WHEN displaying progress bars THEN they SHALL always show aggregated values across price buckets
+4. WHEN updating the display THEN the dashboard SHALL refresh data at least once per second
+5. WHEN an exchange goes offline THEN the dashboard SHALL indicate the status change visually
+
+### Requirement 5
+
+**User Story:** As a model trainer, I want a replay interface that can provide historical data in the same format as live data, so that I can train models on past market events.
+
+#### Acceptance Criteria
+
+1. WHEN requesting historical data THEN the replay interface SHALL provide data in the same structure as live feeds
+2. WHEN replaying data THEN the system SHALL maintain original timing relationships between events
+3. WHEN using replay mode THEN the interface SHALL support configurable playback speeds
+4. WHEN switching between live and replay modes THEN the orchestrator SHALL receive data through the same interface
+5. IF replay data is requested for unavailable time periods THEN the system SHALL return appropriate error messages
+
+### Requirement 6
+
+**User Story:** As a trading system integrator, I want the data aggregation system to follow the same interface as the current orchestrator data provider, so that I can seamlessly integrate it into existing workflows.
+
+#### Acceptance Criteria
+
+1. WHEN the orchestrator requests data THEN the aggregation system SHALL provide data in the expected format
+2. WHEN integrating with existing systems THEN the interface SHALL be compatible with current data provider contracts
+3. WHEN providing aggregated data THEN the system SHALL include metadata about data sources and quality
+4. WHEN the orchestrator switches data sources THEN it SHALL work without code changes
+5. IF data quality issues are detected THEN the system SHALL provide quality indicators in the response
+
+### Requirement 7
+
+**User Story:** As a system administrator, I want the data collection system to be containerized and easily deployable, so that I can manage it alongside other system components.
+
+#### Acceptance Criteria
+
+1. WHEN deploying the system THEN it SHALL run in Docker containers with proper resource allocation
+2. WHEN starting services THEN TimescaleDB SHALL be automatically provisioned in its own container
+3. WHEN configuring the system THEN all settings SHALL be externalized through environment variables or config files
+4. WHEN monitoring the system THEN it SHALL provide health check endpoints for container orchestration
+5. IF containers need to be restarted THEN the system SHALL recover gracefully without data loss
+
+### Requirement 8
+
+**User Story:** As a performance engineer, I want the system to handle high-frequency data efficiently, so that it can process order book updates from multiple exchanges without latency issues.
+
+#### Acceptance Criteria
+
+1. WHEN processing order book updates THEN the system SHALL handle at least 10 updates per second per exchange
+2. WHEN aggregating data THEN processing latency SHALL be less than 10 milliseconds per update
+3. WHEN storing data THEN the system SHALL use efficient batching to minimize database overhead
+4. WHEN memory usage grows THEN the system SHALL implement appropriate cleanup and garbage collection
+5. IF processing falls behind THEN the system SHALL prioritize recent data and log performance warnings