uni data storage
This commit is contained in:
286
.kiro/specs/unified-data-storage/tasks.md
Normal file
286
.kiro/specs/unified-data-storage/tasks.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Implementation Plan
|
||||
|
||||
- [x] 1. Set up TimescaleDB schema and infrastructure
|
||||
|
||||
|
||||
|
||||
- Create database schema with hypertables for OHLCV, order book, and trade data
|
||||
- Implement continuous aggregates for multi-timeframe data generation
|
||||
- Configure compression and retention policies
|
||||
- Create all necessary indexes for query optimization
|
||||
|
||||
|
||||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.1, 3.2, 3.3, 3.4, 3.5, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6_
|
||||
|
||||
- [ ] 2. Implement data models and validation
|
||||
- [ ] 2.1 Create InferenceDataFrame and OrderBookDataFrame data classes
|
||||
- Write dataclasses for standardized data structures
|
||||
- Include all required fields (OHLCV, order book, imbalances, indicators)
|
||||
- Add serialization/deserialization methods
|
||||
- _Requirements: 1.4, 10.1, 10.2, 10.3_
|
||||
|
||||
- [ ] 2.2 Implement DataValidator class
|
||||
- Write OHLCV validation logic (high >= low, positive volume)
|
||||
- Write order book validation logic (bids < asks)
|
||||
- Write timestamp validation and UTC timezone enforcement
|
||||
- Add comprehensive error logging for validation failures
|
||||
- _Requirements: 10.1, 10.2, 10.3, 10.4_
|
||||
|
||||
- [ ]* 2.3 Write unit tests for data models and validation
|
||||
- Test InferenceDataFrame creation and serialization
|
||||
- Test OrderBookDataFrame creation and serialization
|
||||
- Test DataValidator with valid and invalid data
|
||||
- Test edge cases and boundary conditions
|
||||
- _Requirements: 10.1, 10.2, 10.3, 10.4_
|
||||
|
||||
- [ ] 3. Implement cache layer
|
||||
- [ ] 3.1 Create DataCacheManager class
|
||||
- Implement in-memory cache with deque structures
|
||||
- Add methods for OHLCV, order book, and imbalance data
|
||||
- Implement cache eviction logic (5-minute rolling window)
|
||||
- Add cache statistics tracking (hits, misses)
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4_
|
||||
|
||||
- [ ] 3.2 Implement cache retrieval methods
|
||||
- Write get_latest_ohlcv() with timeframe support
|
||||
- Write get_latest_orderbook() for current snapshot
|
||||
- Write get_latest_imbalances() for multi-timeframe metrics
|
||||
- Ensure <10ms latency for cache reads
|
||||
- _Requirements: 5.1, 5.2_
|
||||
|
||||
- [ ]* 3.3 Write unit tests for cache layer
|
||||
- Test cache insertion and retrieval
|
||||
- Test cache eviction logic
|
||||
- Test cache statistics
|
||||
- Test concurrent access patterns
|
||||
- _Requirements: 5.1, 5.2, 5.3, 5.4_
|
||||
|
||||
- [ ] 4. Implement database connection and query layer
|
||||
- [ ] 4.1 Create DatabaseConnectionManager class
|
||||
- Implement asyncpg connection pool management
|
||||
- Add health monitoring and automatic reconnection
|
||||
- Configure connection pool settings (min/max connections)
|
||||
- Add connection statistics and logging
|
||||
- _Requirements: 2.1, 2.5, 9.6_
|
||||
|
||||
- [ ] 4.2 Implement OHLCV query methods
|
||||
- Write query_ohlcv_data() for single timeframe retrieval
|
||||
- Write query_multi_timeframe_ohlcv() for aligned multi-timeframe data
|
||||
- Optimize queries with time_bucket and proper indexes
|
||||
- Ensure <100ms query latency for typical queries
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.5, 9.2, 9.3_
|
||||
|
||||
- [ ] 4.3 Implement order book query methods
|
||||
- Write query_orderbook_snapshots() for raw order book data
|
||||
- Write query_orderbook_aggregated() for 1s/1m aggregations
|
||||
- Write query_orderbook_imbalances() for multi-timeframe imbalances
|
||||
- Optimize queries for fast retrieval
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2, 6.5_
|
||||
|
||||
- [ ]* 4.4 Write integration tests for database layer
|
||||
- Test connection pool management
|
||||
- Test OHLCV queries with various time ranges
|
||||
- Test order book queries
|
||||
- Test query performance and latency
|
||||
- _Requirements: 6.1, 6.2, 6.5, 9.2, 9.3_
|
||||
|
||||
- [ ] 5. Implement data ingestion pipeline
|
||||
- [ ] 5.1 Create DataIngestionPipeline class
|
||||
- Implement batch write buffers for OHLCV, order book, and trade data
|
||||
- Add batch size and timeout configuration
|
||||
- Implement async batch flush methods
|
||||
- Add error handling and retry logic
|
||||
- _Requirements: 2.5, 5.3, 9.1, 9.4_
|
||||
|
||||
- [ ] 5.2 Implement OHLCV ingestion
|
||||
- Write ingest_ohlcv_candle() method
|
||||
- Add immediate cache write
|
||||
- Implement batch buffering for database writes
|
||||
- Add data validation before ingestion
|
||||
- _Requirements: 2.1, 2.2, 2.5, 5.1, 5.3, 9.1, 9.4, 10.1, 10.2_
|
||||
|
||||
- [ ] 5.3 Implement order book ingestion
|
||||
- Write ingest_orderbook_snapshot() method
|
||||
- Calculate and cache imbalance metrics
|
||||
- Implement batch buffering for database writes
|
||||
- Add data validation before ingestion
|
||||
- _Requirements: 2.1, 2.2, 4.1, 4.2, 4.3, 5.1, 5.3, 9.1, 9.4, 10.3_
|
||||
|
||||
- [ ] 5.4 Implement retry logic and error handling
|
||||
- Create RetryableDBOperation wrapper class
|
||||
- Implement exponential backoff retry strategy
|
||||
- Add comprehensive error logging
|
||||
- Handle database connection failures gracefully
|
||||
- _Requirements: 2.5, 9.6_
|
||||
|
||||
- [ ]* 5.5 Write integration tests for ingestion pipeline
|
||||
- Test OHLCV ingestion flow (cache → database)
|
||||
- Test order book ingestion flow
|
||||
- Test batch write operations
|
||||
- Test error handling and retry logic
|
||||
- _Requirements: 2.5, 5.3, 9.1, 9.4_
|
||||
|
||||
- [ ] 6. Implement unified data provider API
|
||||
- [ ] 6.1 Create UnifiedDataProvider class
|
||||
- Initialize with database connection pool and cache manager
|
||||
- Configure symbols and timeframes
|
||||
- Add connection to existing DataProvider components
|
||||
- _Requirements: 1.1, 1.2, 1.3_
|
||||
|
||||
- [ ] 6.2 Implement get_inference_data() method
|
||||
- Handle timestamp=None for real-time data from cache
|
||||
- Handle specific timestamp for historical data from database
|
||||
- Implement context window retrieval (±N minutes)
|
||||
- Combine OHLCV, order book, and imbalance data
|
||||
- Return standardized InferenceDataFrame
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 5.2, 6.1, 6.2, 6.3, 6.4, 7.1, 7.2, 7.3_
|
||||
|
||||
- [ ] 6.3 Implement get_multi_timeframe_data() method
|
||||
- Query multiple timeframes efficiently
|
||||
- Align timestamps across timeframes
|
||||
- Handle missing data by generating from lower timeframes
|
||||
- Return dictionary mapping timeframe to DataFrame
|
||||
- _Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.3, 10.5_
|
||||
|
||||
- [ ] 6.4 Implement get_order_book_data() method
|
||||
- Handle different aggregation levels (raw, 1s, 1m)
|
||||
- Include multi-timeframe imbalance metrics
|
||||
- Return standardized OrderBookDataFrame
|
||||
- _Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2_
|
||||
|
||||
- [ ]* 6.5 Write integration tests for unified API
|
||||
- Test get_inference_data() with real-time and historical data
|
||||
- Test get_multi_timeframe_data() with various timeframes
|
||||
- Test get_order_book_data() with different aggregations
|
||||
- Test context window retrieval
|
||||
- Test data consistency across methods
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 6.1, 6.2, 6.3, 6.4, 10.5, 10.6_
|
||||
|
||||
- [ ] 7. Implement data migration system
|
||||
- [ ] 7.1 Create DataMigrationManager class
|
||||
- Initialize with database connection and cache directory path
|
||||
- Add methods for discovering existing Parquet files
|
||||
- Implement symbol format conversion utilities
|
||||
- _Requirements: 8.1, 8.2, 8.6_
|
||||
|
||||
- [ ] 7.2 Implement Parquet file migration
|
||||
- Write _migrate_ohlcv_data() to process all Parquet files
|
||||
- Parse filenames to extract symbol and timeframe
|
||||
- Read Parquet files and convert to database format
|
||||
- Implement batch insertion with conflict handling
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.5_
|
||||
|
||||
- [ ] 7.3 Implement migration verification
|
||||
- Write _verify_migration() to compare record counts
|
||||
- Check data integrity (no missing timestamps)
|
||||
- Validate data ranges match original files
|
||||
- Generate migration report
|
||||
- _Requirements: 8.3, 8.4_
|
||||
|
||||
- [ ] 7.4 Implement rollback capability
|
||||
- Add transaction support for migration operations
|
||||
- Implement rollback on verification failure
|
||||
- Preserve original Parquet files until verification passes
|
||||
- Add option to archive old files after successful migration
|
||||
- _Requirements: 8.4, 8.5_
|
||||
|
||||
- [ ]* 7.5 Write integration tests for migration
|
||||
- Test Parquet file discovery and parsing
|
||||
- Test data migration with sample files
|
||||
- Test verification logic
|
||||
- Test rollback on failure
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4_
|
||||
|
||||
- [ ] 8. Integrate with existing DataProvider
|
||||
- [ ] 8.1 Update DataProvider class to use UnifiedDataProvider
|
||||
- Replace existing data retrieval methods with unified API calls
|
||||
- Update get_data() method to use get_inference_data()
|
||||
- Update multi-timeframe methods to use get_multi_timeframe_data()
|
||||
- Maintain backward compatibility with existing interfaces
|
||||
- _Requirements: 1.1, 1.2, 1.3, 8.6_
|
||||
|
||||
- [ ] 8.2 Update real-time data flow
|
||||
- Connect WebSocket data to DataIngestionPipeline
|
||||
- Update tick aggregator to write to cache and database
|
||||
- Update COB integration to use new ingestion methods
|
||||
- Ensure no data loss during transition
|
||||
- _Requirements: 2.1, 2.2, 5.1, 5.3, 8.6_
|
||||
|
||||
- [ ] 8.3 Update annotation system integration
|
||||
- Update ANNOTATE/core/data_loader.py to use unified API
|
||||
- Ensure annotation system uses get_inference_data() with timestamps
|
||||
- Test annotation workflow with new data provider
|
||||
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5_
|
||||
|
||||
- [ ] 8.4 Update backtesting system integration
|
||||
- Update backtesting data access to use unified API
|
||||
- Ensure sequential data access works efficiently
|
||||
- Test backtesting performance with new data provider
|
||||
- _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
|
||||
|
||||
- [ ]* 8.5 Write end-to-end integration tests
|
||||
- Test complete data flow: WebSocket → ingestion → cache → database → retrieval
|
||||
- Test annotation system with unified data provider
|
||||
- Test backtesting system with unified data provider
|
||||
- Test real-time trading with unified data provider
|
||||
- _Requirements: 1.1, 1.2, 1.3, 6.1, 6.2, 7.1, 8.6_
|
||||
|
||||
- [ ] 9. Performance optimization and monitoring
|
||||
- [ ] 9.1 Implement performance monitoring
|
||||
- Add latency tracking for cache reads (<10ms target)
|
||||
- Add latency tracking for database queries (<100ms target)
|
||||
- Add throughput monitoring for ingestion (>1000 ops/sec target)
|
||||
- Create performance dashboard or logging
|
||||
- _Requirements: 5.2, 6.5, 9.1, 9.2, 9.3_
|
||||
|
||||
- [ ] 9.2 Optimize database queries
|
||||
- Analyze query execution plans
|
||||
- Add missing indexes if needed
|
||||
- Optimize time_bucket usage
|
||||
- Implement query result caching where appropriate
|
||||
- _Requirements: 6.5, 9.2, 9.3, 9.6_
|
||||
|
||||
- [ ] 9.3 Implement compression and retention
|
||||
- Verify compression policies are working (>80% compression target)
|
||||
- Monitor storage growth over time
|
||||
- Verify retention policies are cleaning old data
|
||||
- Add alerts for storage issues
|
||||
- _Requirements: 2.6, 9.5_
|
||||
|
||||
- [ ]* 9.4 Write performance tests
|
||||
- Test cache read latency under load
|
||||
- Test database query latency with various time ranges
|
||||
- Test ingestion throughput with high-frequency data
|
||||
- Test concurrent access patterns
|
||||
- _Requirements: 5.2, 6.5, 9.1, 9.2, 9.3, 9.6_
|
||||
|
||||
- [ ] 10. Documentation and deployment
|
||||
- [ ] 10.1 Create deployment documentation
|
||||
- Document TimescaleDB setup and configuration
|
||||
- Document migration process and steps
|
||||
- Document rollback procedures
|
||||
- Create troubleshooting guide
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6_
|
||||
|
||||
- [ ] 10.2 Create API documentation
|
||||
- Document UnifiedDataProvider API methods
|
||||
- Provide usage examples for each method
|
||||
- Document data models and structures
|
||||
- Create migration guide for existing code
|
||||
- _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
|
||||
|
||||
- [ ] 10.3 Create monitoring and alerting setup
|
||||
- Document key metrics to monitor
|
||||
- Set up alerts for performance degradation
|
||||
- Set up alerts for data validation failures
|
||||
- Create operational runbook
|
||||
- _Requirements: 9.1, 9.2, 9.3, 9.5, 9.6, 10.4_
|
||||
|
||||
- [ ] 10.4 Execute phased deployment
|
||||
- Phase 1: Deploy with dual-write (Parquet + TimescaleDB)
|
||||
- Phase 2: Run migration script for historical data
|
||||
- Phase 3: Verify data integrity
|
||||
- Phase 4: Switch reads to TimescaleDB
|
||||
- Phase 5: Deprecate Parquet writes
|
||||
- Phase 6: Archive old Parquet files
|
||||
- _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6_
|
||||
Reference in New Issue
Block a user