Files
gogo2/.kiro/specs/unified-data-storage/tasks.md
2025-10-20 11:16:27 +03:00

12 KiB

Implementation Plan

  • 1. Set up TimescaleDB schema and infrastructure

    • Create database schema with hypertables for OHLCV, order book, and trade data

    • Implement continuous aggregates for multi-timeframe data generation

    • Configure compression and retention policies

    • Create all necessary indexes for query optimization

    • Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.1, 3.2, 3.3, 3.4, 3.5, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6

  • 2. Implement data models and validation

    • 2.1 Create InferenceDataFrame and OrderBookDataFrame data classes

      • Write dataclasses for standardized data structures
      • Include all required fields (OHLCV, order book, imbalances, indicators)
      • Add serialization/deserialization methods
      • Requirements: 1.4, 10.1, 10.2, 10.3
    • 2.2 Implement DataValidator class

      • Write OHLCV validation logic (high >= low, positive volume)
      • Write order book validation logic (bids < asks)
      • Write timestamp validation and UTC timezone enforcement
      • Add comprehensive error logging for validation failures
      • Requirements: 10.1, 10.2, 10.3, 10.4
    • * 2.3 Write unit tests for data models and validation

      • Test InferenceDataFrame creation and serialization
      • Test OrderBookDataFrame creation and serialization
      • Test DataValidator with valid and invalid data
      • Test edge cases and boundary conditions
      • Requirements: 10.1, 10.2, 10.3, 10.4
  • 3. Implement cache layer

    • 3.1 Create DataCacheManager class

      • Implement in-memory cache with deque structures
      • Add methods for OHLCV, order book, and imbalance data
      • Implement cache eviction logic (5-minute rolling window)
      • Add cache statistics tracking (hits, misses)
      • Requirements: 5.1, 5.2, 5.3, 5.4
    • 3.2 Implement cache retrieval methods

      • Write get_latest_ohlcv() with timeframe support
      • Write get_latest_orderbook() for current snapshot
      • Write get_latest_imbalances() for multi-timeframe metrics
      • Ensure <10ms latency for cache reads
      • Requirements: 5.1, 5.2
    • * 3.3 Write unit tests for cache layer

      • Test cache insertion and retrieval
      • Test cache eviction logic
      • Test cache statistics
      • Test concurrent access patterns
      • Requirements: 5.1, 5.2, 5.3, 5.4
  • 4. Implement database connection and query layer

    • 4.1 Create DatabaseConnectionManager class

      • Implement asyncpg connection pool management
      • Add health monitoring and automatic reconnection
      • Configure connection pool settings (min/max connections)
      • Add connection statistics and logging
      • Requirements: 2.1, 2.5, 9.6
    • 4.2 Implement OHLCV query methods

      • Write query_ohlcv_data() for single timeframe retrieval
      • Write query_multi_timeframe_ohlcv() for aligned multi-timeframe data
      • Optimize queries with time_bucket and proper indexes
      • Ensure <100ms query latency for typical queries
      • Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.5, 9.2, 9.3
    • 4.3 Implement order book query methods

      • Write query_orderbook_snapshots() for raw order book data
      • Write query_orderbook_aggregated() for 1s/1m aggregations
      • Write query_orderbook_imbalances() for multi-timeframe imbalances
      • Optimize queries for fast retrieval
      • Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2, 6.5
    • * 4.4 Write integration tests for database layer

      • Test connection pool management
      • Test OHLCV queries with various time ranges
      • Test order book queries
      • Test query performance and latency
      • Requirements: 6.1, 6.2, 6.5, 9.2, 9.3
  • 5. Implement data ingestion pipeline

    • 5.1 Create DataIngestionPipeline class

      • Implement batch write buffers for OHLCV, order book, and trade data
      • Add batch size and timeout configuration
      • Implement async batch flush methods
      • Add error handling and retry logic
      • Requirements: 2.5, 5.3, 9.1, 9.4
    • 5.2 Implement OHLCV ingestion

      • Write ingest_ohlcv_candle() method
      • Add immediate cache write
      • Implement batch buffering for database writes
      • Add data validation before ingestion
      • Requirements: 2.1, 2.2, 2.5, 5.1, 5.3, 9.1, 9.4, 10.1, 10.2
    • 5.3 Implement order book ingestion

      • Write ingest_orderbook_snapshot() method
      • Calculate and cache imbalance metrics
      • Implement batch buffering for database writes
      • Add data validation before ingestion
      • Requirements: 2.1, 2.2, 4.1, 4.2, 4.3, 5.1, 5.3, 9.1, 9.4, 10.3
    • 5.4 Implement retry logic and error handling

      • Create RetryableDBOperation wrapper class
      • Implement exponential backoff retry strategy
      • Add comprehensive error logging
      • Handle database connection failures gracefully
      • Requirements: 2.5, 9.6
    • * 5.5 Write integration tests for ingestion pipeline

      • Test OHLCV ingestion flow (cache → database)
      • Test order book ingestion flow
      • Test batch write operations
      • Test error handling and retry logic
      • Requirements: 2.5, 5.3, 9.1, 9.4
  • 6. Implement unified data provider API

    • 6.1 Create UnifiedDataProvider class

      • Initialize with database connection pool and cache manager
      • Configure symbols and timeframes
      • Add connection to existing DataProvider components
      • Requirements: 1.1, 1.2, 1.3
    • 6.2 Implement get_inference_data() method

      • Handle timestamp=None for real-time data from cache
      • Handle specific timestamp for historical data from database
      • Implement context window retrieval (±N minutes)
      • Combine OHLCV, order book, and imbalance data
      • Return standardized InferenceDataFrame
      • Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 5.2, 6.1, 6.2, 6.3, 6.4, 7.1, 7.2, 7.3
    • 6.3 Implement get_multi_timeframe_data() method

      • Query multiple timeframes efficiently
      • Align timestamps across timeframes
      • Handle missing data by generating from lower timeframes
      • Return dictionary mapping timeframe to DataFrame
      • Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.3, 10.5
    • 6.4 Implement get_order_book_data() method

      • Handle different aggregation levels (raw, 1s, 1m)
      • Include multi-timeframe imbalance metrics
      • Return standardized OrderBookDataFrame
      • Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2
    • * 6.5 Write integration tests for unified API

      • Test get_inference_data() with real-time and historical data
      • Test get_multi_timeframe_data() with various timeframes
      • Test get_order_book_data() with different aggregations
      • Test context window retrieval
      • Test data consistency across methods
      • Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 6.1, 6.2, 6.3, 6.4, 10.5, 10.6
  • 7. Implement data migration system

    • 7.1 Create DataMigrationManager class

      • Initialize with database connection and cache directory path
      • Add methods for discovering existing Parquet files
      • Implement symbol format conversion utilities
      • Requirements: 8.1, 8.2, 8.6
    • 7.2 Implement Parquet file migration

      • Write _migrate_ohlcv_data() to process all Parquet files
      • Parse filenames to extract symbol and timeframe
      • Read Parquet files and convert to database format
      • Implement batch insertion with conflict handling
      • Requirements: 8.1, 8.2, 8.3, 8.5
    • 7.3 Implement migration verification

      • Write _verify_migration() to compare record counts
      • Check data integrity (no missing timestamps)
      • Validate data ranges match original files
      • Generate migration report
      • Requirements: 8.3, 8.4
    • 7.4 Implement rollback capability

      • Add transaction support for migration operations
      • Implement rollback on verification failure
      • Preserve original Parquet files until verification passes
      • Add option to archive old files after successful migration
      • Requirements: 8.4, 8.5
    • * 7.5 Write integration tests for migration

      • Test Parquet file discovery and parsing
      • Test data migration with sample files
      • Test verification logic
      • Test rollback on failure
      • Requirements: 8.1, 8.2, 8.3, 8.4
  • 8. Integrate with existing DataProvider

    • 8.1 Update DataProvider class to use UnifiedDataProvider

      • Replace existing data retrieval methods with unified API calls
      • Update get_data() method to use get_inference_data()
      • Update multi-timeframe methods to use get_multi_timeframe_data()
      • Maintain backward compatibility with existing interfaces
      • Requirements: 1.1, 1.2, 1.3, 8.6
    • 8.2 Update real-time data flow

      • Connect WebSocket data to DataIngestionPipeline
      • Update tick aggregator to write to cache and database
      • Update COB integration to use new ingestion methods
      • Ensure no data loss during transition
      • Requirements: 2.1, 2.2, 5.1, 5.3, 8.6
    • 8.3 Update annotation system integration

      • Update ANNOTATE/core/data_loader.py to use unified API
      • Ensure annotation system uses get_inference_data() with timestamps
      • Test annotation workflow with new data provider
      • Requirements: 7.1, 7.2, 7.3, 7.4, 7.5
    • 8.4 Update backtesting system integration

      • Update backtesting data access to use unified API
      • Ensure sequential data access works efficiently
      • Test backtesting performance with new data provider
      • Requirements: 6.1, 6.2, 6.3, 6.4, 6.5
    • * 8.5 Write end-to-end integration tests

      • Test complete data flow: WebSocket → ingestion → cache → database → retrieval
      • Test annotation system with unified data provider
      • Test backtesting system with unified data provider
      • Test real-time trading with unified data provider
      • Requirements: 1.1, 1.2, 1.3, 6.1, 6.2, 7.1, 8.6
  • 9. Performance optimization and monitoring

    • 9.1 Implement performance monitoring

      • Add latency tracking for cache reads (<10ms target)
      • Add latency tracking for database queries (<100ms target)
      • Add throughput monitoring for ingestion (>1000 ops/sec target)
      • Create performance dashboard or logging
      • Requirements: 5.2, 6.5, 9.1, 9.2, 9.3
    • 9.2 Optimize database queries

      • Analyze query execution plans
      • Add missing indexes if needed
      • Optimize time_bucket usage
      • Implement query result caching where appropriate
      • Requirements: 6.5, 9.2, 9.3, 9.6
    • 9.3 Implement compression and retention

      • Verify compression policies are working (>80% compression target)
      • Monitor storage growth over time
      • Verify retention policies are cleaning old data
      • Add alerts for storage issues
      • Requirements: 2.6, 9.5
    • * 9.4 Write performance tests

      • Test cache read latency under load
      • Test database query latency with various time ranges
      • Test ingestion throughput with high-frequency data
      • Test concurrent access patterns
      • Requirements: 5.2, 6.5, 9.1, 9.2, 9.3, 9.6
  • 10. Documentation and deployment

    • 10.1 Create deployment documentation

      • Document TimescaleDB setup and configuration
      • Document migration process and steps
      • Document rollback procedures
      • Create troubleshooting guide
      • Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6
    • 10.2 Create API documentation

      • Document UnifiedDataProvider API methods
      • Provide usage examples for each method
      • Document data models and structures
      • Create migration guide for existing code
      • Requirements: 1.1, 1.2, 1.3, 1.4, 1.5
    • 10.3 Create monitoring and alerting setup

      • Document key metrics to monitor
      • Set up alerts for performance degradation
      • Set up alerts for data validation failures
      • Create operational runbook
      • Requirements: 9.1, 9.2, 9.3, 9.5, 9.6, 10.4
    • 10.4 Execute phased deployment

      • Phase 1: Deploy with dual-write (Parquet + TimescaleDB)
      • Phase 2: Run migration script for historical data
      • Phase 3: Verify data integrity
      • Phase 4: Switch reads to TimescaleDB
      • Phase 5: Deprecate Parquet writes
      • Phase 6: Archive old Parquet files
      • Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6