Files

Dobromir Popov de9fa4a421 COBY : specs + task 1

2025-08-04 15:50:54 +03:00

6.4 KiB

Raw Blame History

Requirements Document

Introduction

This document outlines the requirements for a comprehensive data collection and aggregation subsystem that will serve as a foundational component for the trading orchestrator. The system will collect, aggregate, and store real-time order book and OHLCV data from multiple cryptocurrency exchanges, providing both live data feeds and historical replay capabilities for model training and backtesting.

Requirements

Requirement 1

User Story: As a trading system developer, I want to collect real-time order book data from top 10 cryptocurrency exchanges, so that I can have comprehensive market data for analysis and trading decisions.

Acceptance Criteria

WHEN the system starts THEN it SHALL establish WebSocket connections to up to 10 major cryptocurrency exchanges
WHEN order book updates are received THEN the system SHALL process and store raw order book events in real-time
WHEN processing order book data THEN the system SHALL handle connection failures gracefully and automatically reconnect
WHEN multiple exchanges provide data THEN the system SHALL normalize data formats to a consistent structure
IF an exchange connection fails THEN the system SHALL log the failure and attempt reconnection with exponential backoff

Requirement 2

User Story: As a trading analyst, I want order book data aggregated into price buckets with heatmap visualization, so that I can quickly identify market depth and liquidity patterns.

Acceptance Criteria

WHEN processing BTC order book data THEN the system SHALL aggregate orders into $10 USD price range buckets
WHEN processing ETH order book data THEN the system SHALL aggregate orders into $1 USD price range buckets
WHEN aggregating order data THEN the system SHALL maintain separate bid and ask heatmaps
WHEN building heatmaps THEN the system SHALL update distribution data at high frequency (sub-second)
WHEN displaying heatmaps THEN the system SHALL show volume intensity using color gradients or progress bars

Requirement 3

User Story: As a system architect, I want all market data stored in a TimescaleDB database, so that I can efficiently query time-series data and maintain historical records.

Acceptance Criteria

WHEN the system initializes THEN it SHALL connect to a TimescaleDB instance running in a Docker container
WHEN storing order book events THEN the system SHALL use TimescaleDB's time-series optimized storage
WHEN storing OHLCV data THEN the system SHALL create appropriate time-series tables with proper indexing
WHEN writing to database THEN the system SHALL batch writes for optimal performance
IF database connection fails THEN the system SHALL queue data in memory and retry with backoff strategy

Requirement 4

User Story: As a trading system operator, I want a web-based dashboard to monitor real-time order book heatmaps, so that I can visualize market conditions across multiple exchanges.

Acceptance Criteria

WHEN accessing the web dashboard THEN it SHALL display real-time order book heatmaps for BTC and ETH
WHEN viewing heatmaps THEN the dashboard SHALL show aggregated data from all connected exchanges
WHEN displaying progress bars THEN they SHALL always show aggregated values across price buckets
WHEN updating the display THEN the dashboard SHALL refresh data at least once per second
WHEN an exchange goes offline THEN the dashboard SHALL indicate the status change visually

Requirement 5

User Story: As a model trainer, I want a replay interface that can provide historical data in the same format as live data, so that I can train models on past market events.

Acceptance Criteria

WHEN requesting historical data THEN the replay interface SHALL provide data in the same structure as live feeds
WHEN replaying data THEN the system SHALL maintain original timing relationships between events
WHEN using replay mode THEN the interface SHALL support configurable playback speeds
WHEN switching between live and replay modes THEN the orchestrator SHALL receive data through the same interface
IF replay data is requested for unavailable time periods THEN the system SHALL return appropriate error messages

Requirement 6

User Story: As a trading system integrator, I want the data aggregation system to follow the same interface as the current orchestrator data provider, so that I can seamlessly integrate it into existing workflows.

Acceptance Criteria

WHEN the orchestrator requests data THEN the aggregation system SHALL provide data in the expected format
WHEN integrating with existing systems THEN the interface SHALL be compatible with current data provider contracts
WHEN providing aggregated data THEN the system SHALL include metadata about data sources and quality
WHEN the orchestrator switches data sources THEN it SHALL work without code changes
IF data quality issues are detected THEN the system SHALL provide quality indicators in the response

Requirement 7

User Story: As a system administrator, I want the data collection system to be containerized and easily deployable, so that I can manage it alongside other system components.

Acceptance Criteria

WHEN deploying the system THEN it SHALL run in Docker containers with proper resource allocation
WHEN starting services THEN TimescaleDB SHALL be automatically provisioned in its own container
WHEN configuring the system THEN all settings SHALL be externalized through environment variables or config files
WHEN monitoring the system THEN it SHALL provide health check endpoints for container orchestration
IF containers need to be restarted THEN the system SHALL recover gracefully without data loss

Requirement 8

User Story: As a performance engineer, I want the system to handle high-frequency data efficiently, so that it can process order book updates from multiple exchanges without latency issues.

Acceptance Criteria

WHEN processing order book updates THEN the system SHALL handle at least 10 updates per second per exchange
WHEN aggregating data THEN processing latency SHALL be less than 10 milliseconds per update
WHEN storing data THEN the system SHALL use efficient batching to minimize database overhead
WHEN memory usage grows THEN the system SHALL implement appropriate cleanup and garbage collection
IF processing falls behind THEN the system SHALL prioritize recent data and log performance warnings

6.4 KiB Raw Blame History

Requirements Document

Introduction

Requirements

Requirement 1

Acceptance Criteria

Requirement 2

Acceptance Criteria

Requirement 3

Acceptance Criteria

Requirement 4

Acceptance Criteria

Requirement 5

Acceptance Criteria

Requirement 6

Acceptance Criteria

Requirement 7

Acceptance Criteria

Requirement 8

Acceptance Criteria

6.4 KiB

Raw Blame History