# Implementation Plan - [x] 1. Set up TimescaleDB schema and infrastructure - Create database schema with hypertables for OHLCV, order book, and trade data - Implement continuous aggregates for multi-timeframe data generation - Configure compression and retention policies - Create all necessary indexes for query optimization - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.1, 3.2, 3.3, 3.4, 3.5, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6_ - [ ] 2. Implement data models and validation - [x] 2.1 Create InferenceDataFrame and OrderBookDataFrame data classes - Write dataclasses for standardized data structures - Include all required fields (OHLCV, order book, imbalances, indicators) - Add serialization/deserialization methods - _Requirements: 1.4, 10.1, 10.2, 10.3_ - [ ] 2.2 Implement DataValidator class - Write OHLCV validation logic (high >= low, positive volume) - Write order book validation logic (bids < asks) - Write timestamp validation and UTC timezone enforcement - Add comprehensive error logging for validation failures - _Requirements: 10.1, 10.2, 10.3, 10.4_ - [ ]* 2.3 Write unit tests for data models and validation - Test InferenceDataFrame creation and serialization - Test OrderBookDataFrame creation and serialization - Test DataValidator with valid and invalid data - Test edge cases and boundary conditions - _Requirements: 10.1, 10.2, 10.3, 10.4_ - [x] 3. Implement cache layer - [x] 3.1 Create DataCacheManager class - Implement in-memory cache with deque structures - Add methods for OHLCV, order book, and imbalance data - Implement cache eviction logic (5-minute rolling window) - Add cache statistics tracking (hits, misses) - _Requirements: 5.1, 5.2, 5.3, 5.4_ - [ ] 3.2 Implement cache retrieval methods - Write get_latest_ohlcv() with timeframe support - Write get_latest_orderbook() for current snapshot - Write get_latest_imbalances() for multi-timeframe metrics - Ensure <10ms latency for cache reads - _Requirements: 5.1, 5.2_ - [ ]* 3.3 Write unit tests for cache layer - Test cache insertion and retrieval - Test cache eviction logic - Test cache statistics - Test concurrent access patterns - _Requirements: 5.1, 5.2, 5.3, 5.4_ - [x] 4. Implement database connection and query layer - [x] 4.1 Create DatabaseConnectionManager class - Implement asyncpg connection pool management - Add health monitoring and automatic reconnection - Configure connection pool settings (min/max connections) - Add connection statistics and logging - _Requirements: 2.1, 2.5, 9.6_ - [x] 4.2 Implement OHLCV query methods - Write query_ohlcv_data() for single timeframe retrieval - Write query_multi_timeframe_ohlcv() for aligned multi-timeframe data - Optimize queries with time_bucket and proper indexes - Ensure <100ms query latency for typical queries - _Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.5, 9.2, 9.3_ - [ ] 4.3 Implement order book query methods - Write query_orderbook_snapshots() for raw order book data - Write query_orderbook_aggregated() for 1s/1m aggregations - Write query_orderbook_imbalances() for multi-timeframe imbalances - Optimize queries for fast retrieval - _Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2, 6.5_ - [ ]* 4.4 Write integration tests for database layer - Test connection pool management - Test OHLCV queries with various time ranges - Test order book queries - Test query performance and latency - _Requirements: 6.1, 6.2, 6.5, 9.2, 9.3_ - [-] 5. Implement data ingestion pipeline - [ ] 5.1 Create DataIngestionPipeline class - Implement batch write buffers for OHLCV, order book, and trade data - Add batch size and timeout configuration - Implement async batch flush methods - Add error handling and retry logic - _Requirements: 2.5, 5.3, 9.1, 9.4_ - [x] 5.2 Implement OHLCV ingestion - Write ingest_ohlcv_candle() method - Add immediate cache write - Implement batch buffering for database writes - Add data validation before ingestion - _Requirements: 2.1, 2.2, 2.5, 5.1, 5.3, 9.1, 9.4, 10.1, 10.2_ - [x] 5.3 Implement order book ingestion - Write ingest_orderbook_snapshot() method - Calculate and cache imbalance metrics - Implement batch buffering for database writes - Add data validation before ingestion - _Requirements: 2.1, 2.2, 4.1, 4.2, 4.3, 5.1, 5.3, 9.1, 9.4, 10.3_ - [x] 5.4 Implement retry logic and error handling - Create RetryableDBOperation wrapper class - Implement exponential backoff retry strategy - Add comprehensive error logging - Handle database connection failures gracefully - _Requirements: 2.5, 9.6_ - [ ]* 5.5 Write integration tests for ingestion pipeline - Test OHLCV ingestion flow (cache → database) - Test order book ingestion flow - Test batch write operations - Test error handling and retry logic - _Requirements: 2.5, 5.3, 9.1, 9.4_ - [x] 6. Implement unified data provider API - [x] 6.1 Create UnifiedDataProvider class - Initialize with database connection pool and cache manager - Configure symbols and timeframes - Add connection to existing DataProvider components - _Requirements: 1.1, 1.2, 1.3_ - [ ] 6.2 Implement get_inference_data() method - Handle timestamp=None for real-time data from cache - Handle specific timestamp for historical data from database - Implement context window retrieval (±N minutes) - Combine OHLCV, order book, and imbalance data - Return standardized InferenceDataFrame - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 5.2, 6.1, 6.2, 6.3, 6.4, 7.1, 7.2, 7.3_ - [ ] 6.3 Implement get_multi_timeframe_data() method - Query multiple timeframes efficiently - Align timestamps across timeframes - Handle missing data by generating from lower timeframes - Return dictionary mapping timeframe to DataFrame - _Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.3, 10.5_ - [ ] 6.4 Implement get_order_book_data() method - Handle different aggregation levels (raw, 1s, 1m) - Include multi-timeframe imbalance metrics - Return standardized OrderBookDataFrame - _Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2_ - [ ]* 6.5 Write integration tests for unified API - Test get_inference_data() with real-time and historical data - Test get_multi_timeframe_data() with various timeframes - Test get_order_book_data() with different aggregations - Test context window retrieval - Test data consistency across methods - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 6.1, 6.2, 6.3, 6.4, 10.5, 10.6_ - [ ] 7. Implement data migration system - [ ] 7.1 Create DataMigrationManager class - Initialize with database connection and cache directory path - Add methods for discovering existing Parquet files - Implement symbol format conversion utilities - _Requirements: 8.1, 8.2, 8.6_ - [ ] 7.2 Implement Parquet file migration - Write _migrate_ohlcv_data() to process all Parquet files - Parse filenames to extract symbol and timeframe - Read Parquet files and convert to database format - Implement batch insertion with conflict handling - _Requirements: 8.1, 8.2, 8.3, 8.5_ - [ ] 7.3 Implement migration verification - Write _verify_migration() to compare record counts - Check data integrity (no missing timestamps) - Validate data ranges match original files - Generate migration report - _Requirements: 8.3, 8.4_ - [ ] 7.4 Implement rollback capability - Add transaction support for migration operations - Implement rollback on verification failure - Preserve original Parquet files until verification passes - Add option to archive old files after successful migration - _Requirements: 8.4, 8.5_ - [ ]* 7.5 Write integration tests for migration - Test Parquet file discovery and parsing - Test data migration with sample files - Test verification logic - Test rollback on failure - _Requirements: 8.1, 8.2, 8.3, 8.4_ - [ ] 8. Integrate with existing DataProvider - [ ] 8.1 Update DataProvider class to use UnifiedDataProvider - Replace existing data retrieval methods with unified API calls - Update get_data() method to use get_inference_data() - Update multi-timeframe methods to use get_multi_timeframe_data() - Maintain backward compatibility with existing interfaces - _Requirements: 1.1, 1.2, 1.3, 8.6_ - [ ] 8.2 Update real-time data flow - Connect WebSocket data to DataIngestionPipeline - Update tick aggregator to write to cache and database - Update COB integration to use new ingestion methods - Ensure no data loss during transition - _Requirements: 2.1, 2.2, 5.1, 5.3, 8.6_ - [ ] 8.3 Update annotation system integration - Update ANNOTATE/core/data_loader.py to use unified API - Ensure annotation system uses get_inference_data() with timestamps - Test annotation workflow with new data provider - _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5_ - [ ] 8.4 Update backtesting system integration - Update backtesting data access to use unified API - Ensure sequential data access works efficiently - Test backtesting performance with new data provider - _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_ - [ ]* 8.5 Write end-to-end integration tests - Test complete data flow: WebSocket → ingestion → cache → database → retrieval - Test annotation system with unified data provider - Test backtesting system with unified data provider - Test real-time trading with unified data provider - _Requirements: 1.1, 1.2, 1.3, 6.1, 6.2, 7.1, 8.6_ - [ ] 9. Performance optimization and monitoring - [ ] 9.1 Implement performance monitoring - Add latency tracking for cache reads (<10ms target) - Add latency tracking for database queries (<100ms target) - Add throughput monitoring for ingestion (>1000 ops/sec target) - Create performance dashboard or logging - _Requirements: 5.2, 6.5, 9.1, 9.2, 9.3_ - [ ] 9.2 Optimize database queries - Analyze query execution plans - Add missing indexes if needed - Optimize time_bucket usage - Implement query result caching where appropriate - _Requirements: 6.5, 9.2, 9.3, 9.6_ - [ ] 9.3 Implement compression and retention - Verify compression policies are working (>80% compression target) - Monitor storage growth over time - Verify retention policies are cleaning old data - Add alerts for storage issues - _Requirements: 2.6, 9.5_ - [ ]* 9.4 Write performance tests - Test cache read latency under load - Test database query latency with various time ranges - Test ingestion throughput with high-frequency data - Test concurrent access patterns - _Requirements: 5.2, 6.5, 9.1, 9.2, 9.3, 9.6_ - [ ] 10. Documentation and deployment - [ ] 10.1 Create deployment documentation - Document TimescaleDB setup and configuration - Document migration process and steps - Document rollback procedures - Create troubleshooting guide - _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6_ - [ ] 10.2 Create API documentation - Document UnifiedDataProvider API methods - Provide usage examples for each method - Document data models and structures - Create migration guide for existing code - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_ - [ ] 10.3 Create monitoring and alerting setup - Document key metrics to monitor - Set up alerts for performance degradation - Set up alerts for data validation failures - Create operational runbook - _Requirements: 9.1, 9.2, 9.3, 9.5, 9.6, 10.4_ - [ ] 10.4 Execute phased deployment - Phase 1: Deploy with dual-write (Parquet + TimescaleDB) - Phase 2: Run migration script for historical data - Phase 3: Verify data integrity - Phase 4: Switch reads to TimescaleDB - Phase 5: Deprecate Parquet writes - Phase 6: Archive old Parquet files - _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6_