uni data storage

2025-10-20 09:48:59 +03:00
parent 002d0f7858
commit f464a412dc
12 changed files with 2905 additions and 181 deletions
--- a/.kiro/specs/unified-data-storage/design.md
+++ b/.kiro/specs/unified-data-storage/design.md
@@ -0,0 +1,860 @@
+# Design Document: Unified Data Storage System
+
+## Overview
+
+This design document outlines the architecture for unifying all data storage and retrieval methods in the trading system. The current system uses multiple fragmented approaches (Parquet files, pickle files, in-memory caches, and TimescaleDB) which creates complexity and inconsistency. The unified system will consolidate these into a single, efficient TimescaleDB-based storage backend with a clean, unified API.
+
+### Key Design Principles
+
+1. **Single Source of Truth**: TimescaleDB as the primary storage backend for all time-series data
+2. **Unified Interface**: One method (`get_inference_data()`) for all data retrieval needs
+3. **Performance First**: In-memory caching for real-time data, optimized queries for historical data
+4. **Backward Compatibility**: Seamless migration from existing storage formats
+5. **Separation of Concerns**: Clear boundaries between storage, caching, and business logic
+
+## Architecture
+
+### High-Level Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    Application Layer                         │
+│  (Models, Backtesting, Annotation, Dashboard)               │
+└────────────────────┬────────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────────┐
+│              Unified Data Provider API                       │
+│                                                              │
+│  get_inference_data(symbol, timestamp=None, context_window) │
+│  get_multi_timeframe_data(symbol, timeframes, timestamp)    │
+│  get_order_book_data(symbol, timestamp, aggregation)        │
+└────────────────────┬────────────────────────────────────────┘
+                     │
+        ┌────────────┴────────────┐
+        ▼                         ▼
+┌──────────────────┐    ┌──────────────────┐
+│  Cache Layer     │    │  Storage Layer   │
+│  (In-Memory)     │    │  (TimescaleDB)   │
+│                  │    │                  │
+│  - Last 5 min    │    │  - OHLCV Data    │
+│  - Real-time     │    │  - Order Book    │
+│  - Low latency   │    │  - Trade Data    │
+└──────────────────┘    │  - Aggregations  │
+                        └──────────────────┘
+```
+
+### Data Flow
+
+```
+Real-Time Data Flow:
+WebSocket → Tick Aggregator → Cache Layer → TimescaleDB (async)
+                                    ↓
+                              Application (fast read)
+
+Historical Data Flow:
+Application → Unified API → TimescaleDB → Cache (optional) → Application
+```
+
+## Components and Interfaces
+
+### 1. Unified Data Provider
+
+The central component that provides a single interface for all data access.
+
+```python
+class UnifiedDataProvider:
+    """
+    Unified interface for all market data access.
+    Handles both real-time and historical data retrieval.
+    """
+    
+    def __init__(self, db_connection_pool, cache_manager):
+        self.db = db_connection_pool
+        self.cache = cache_manager
+        self.symbols = ['ETH/USDT', 'BTC/USDT']
+        self.timeframes = ['1s', '1m', '5m', '15m', '1h', '1d']
+    
+    async def get_inference_data(
+        self,
+        symbol: str,
+        timestamp: Optional[datetime] = None,
+        context_window_minutes: int = 5
+    ) -> InferenceDataFrame:
+        """
+        Get complete inference data for a symbol at a specific time.
+        
+        Args:
+            symbol: Trading symbol (e.g., 'ETH/USDT')
+            timestamp: Target timestamp (None = latest real-time data)
+            context_window_minutes: Minutes of context data before/after timestamp
+            
+        Returns:
+            InferenceDataFrame with OHLCV, indicators, COB data, imbalances
+        """
+        
+    async def get_multi_timeframe_data(
+        self,
+        symbol: str,
+        timeframes: List[str],
+        timestamp: Optional[datetime] = None,
+        limit: int = 100
+    ) -> Dict[str, pd.DataFrame]:
+        """
+        Get aligned multi-timeframe candlestick data.
+        
+        Args:
+            symbol: Trading symbol
+            timeframes: List of timeframes to retrieve
+            timestamp: Target timestamp (None = latest)
+            limit: Number of candles per timeframe
+            
+        Returns:
+            Dictionary mapping timeframe to DataFrame
+        """
+    
+    async def get_order_book_data(
+        self,
+        symbol: str,
+        timestamp: Optional[datetime] = None,
+        aggregation: str = '1s',
+        limit: int = 300
+    ) -> OrderBookDataFrame:
+        """
+        Get order book data with imbalance metrics.
+        
+        Args:
+            symbol: Trading symbol
+            timestamp: Target timestamp (None = latest)
+            aggregation: Aggregation level ('raw', '1s', '1m')
+            limit: Number of data points
+            
+        Returns:
+            OrderBookDataFrame with bids, asks, imbalances
+        """
+```
+
+### 2. Storage Layer (TimescaleDB)
+
+TimescaleDB schema and access patterns.
+
+#### Database Schema
+
+```sql
+-- OHLCV Data (Hypertable)
+CREATE TABLE ohlcv_data (
+    timestamp TIMESTAMPTZ NOT NULL,
+    symbol VARCHAR(20) NOT NULL,
+    timeframe VARCHAR(10) NOT NULL,
+    open_price DECIMAL(20,8) NOT NULL,
+    high_price DECIMAL(20,8) NOT NULL,
+    low_price DECIMAL(20,8) NOT NULL,
+    close_price DECIMAL(20,8) NOT NULL,
+    volume DECIMAL(30,8) NOT NULL,
+    trade_count INTEGER,
+    -- Technical Indicators (pre-calculated)
+    rsi_14 DECIMAL(10,4),
+    macd DECIMAL(20,8),
+    macd_signal DECIMAL(20,8),
+    bb_upper DECIMAL(20,8),
+    bb_middle DECIMAL(20,8),
+    bb_lower DECIMAL(20,8),
+    PRIMARY KEY (timestamp, symbol, timeframe)
+);
+
+SELECT create_hypertable('ohlcv_data', 'timestamp');
+CREATE INDEX idx_ohlcv_symbol_tf ON ohlcv_data (symbol, timeframe, timestamp DESC);
+
+-- Order Book Snapshots (Hypertable)
+CREATE TABLE order_book_snapshots (
+    timestamp TIMESTAMPTZ NOT NULL,
+    symbol VARCHAR(20) NOT NULL,
+    exchange VARCHAR(20) NOT NULL,
+    bids JSONB NOT NULL,  -- Top 50 levels
+    asks JSONB NOT NULL,  -- Top 50 levels
+    mid_price DECIMAL(20,8),
+    spread DECIMAL(20,8),
+    bid_volume DECIMAL(30,8),
+    ask_volume DECIMAL(30,8),
+    PRIMARY KEY (timestamp, symbol, exchange)
+);
+
+SELECT create_hypertable('order_book_snapshots', 'timestamp');
+CREATE INDEX idx_obs_symbol ON order_book_snapshots (symbol, timestamp DESC);
+
+-- Order Book Aggregated 1s (Hypertable)
+CREATE TABLE order_book_1s_agg (
+    timestamp TIMESTAMPTZ NOT NULL,
+    symbol VARCHAR(20) NOT NULL,
+    price_bucket DECIMAL(20,2) NOT NULL,  -- $1 buckets
+    bid_volume DECIMAL(30,8),
+    ask_volume DECIMAL(30,8),
+    bid_count INTEGER,
+    ask_count INTEGER,
+    imbalance DECIMAL(10,6),
+    PRIMARY KEY (timestamp, symbol, price_bucket)
+);
+
+SELECT create_hypertable('order_book_1s_agg', 'timestamp');
+CREATE INDEX idx_ob1s_symbol ON order_book_1s_agg (symbol, timestamp DESC);
+
+-- Order Book Imbalances (Hypertable)
+CREATE TABLE order_book_imbalances (
+    timestamp TIMESTAMPTZ NOT NULL,
+    symbol VARCHAR(20) NOT NULL,
+    imbalance_1s DECIMAL(10,6),
+    imbalance_5s DECIMAL(10,6),
+    imbalance_15s DECIMAL(10,6),
+    imbalance_60s DECIMAL(10,6),
+    volume_imbalance_1s DECIMAL(10,6),
+    volume_imbalance_5s DECIMAL(10,6),
+    volume_imbalance_15s DECIMAL(10,6),
+    volume_imbalance_60s DECIMAL(10,6),
+    price_range DECIMAL(10,2),
+    PRIMARY KEY (timestamp, symbol)
+);
+
+SELECT create_hypertable('order_book_imbalances', 'timestamp');
+CREATE INDEX idx_obi_symbol ON order_book_imbalances (symbol, timestamp DESC);
+
+-- Trade Events (Hypertable)
+CREATE TABLE trade_events (
+    timestamp TIMESTAMPTZ NOT NULL,
+    symbol VARCHAR(20) NOT NULL,
+    exchange VARCHAR(20) NOT NULL,
+    price DECIMAL(20,8) NOT NULL,
+    size DECIMAL(30,8) NOT NULL,
+    side VARCHAR(4) NOT NULL,
+    trade_id VARCHAR(100) NOT NULL,
+    PRIMARY KEY (timestamp, symbol, exchange, trade_id)
+);
+
+SELECT create_hypertable('trade_events', 'timestamp');
+CREATE INDEX idx_trades_symbol ON trade_events (symbol, timestamp DESC);
+```
+
+#### Continuous Aggregates
+
+```sql
+-- 1m OHLCV from 1s data
+CREATE MATERIALIZED VIEW ohlcv_1m_continuous
+WITH (timescaledb.continuous) AS
+SELECT 
+    time_bucket('1 minute', timestamp) AS timestamp,
+    symbol,
+    '1m' AS timeframe,
+    first(open_price, timestamp) AS open_price,
+    max(high_price) AS high_price,
+    min(low_price) AS low_price,
+    last(close_price, timestamp) AS close_price,
+    sum(volume) AS volume,
+    sum(trade_count) AS trade_count
+FROM ohlcv_data
+WHERE timeframe = '1s'
+GROUP BY time_bucket('1 minute', timestamp), symbol;
+
+-- 5m OHLCV from 1m data
+CREATE MATERIALIZED VIEW ohlcv_5m_continuous
+WITH (timescaledb.continuous) AS
+SELECT 
+    time_bucket('5 minutes', timestamp) AS timestamp,
+    symbol,
+    '5m' AS timeframe,
+    first(open_price, timestamp) AS open_price,
+    max(high_price) AS high_price,
+    min(low_price) AS low_price,
+    last(close_price, timestamp) AS close_price,
+    sum(volume) AS volume,
+    sum(trade_count) AS trade_count
+FROM ohlcv_data
+WHERE timeframe = '1m'
+GROUP BY time_bucket('5 minutes', timestamp), symbol;
+
+-- Similar for 15m, 1h, 1d
+```
+
+#### Compression Policies
+
+```sql
+-- Compress data older than 7 days
+SELECT add_compression_policy('ohlcv_data', INTERVAL '7 days');
+SELECT add_compression_policy('order_book_snapshots', INTERVAL '1 day');
+SELECT add_compression_policy('order_book_1s_agg', INTERVAL '2 days');
+SELECT add_compression_policy('order_book_imbalances', INTERVAL '2 days');
+SELECT add_compression_policy('trade_events', INTERVAL '7 days');
+```
+
+#### Retention Policies
+
+```sql
+-- Retain data for specified periods
+SELECT add_retention_policy('order_book_snapshots', INTERVAL '30 days');
+SELECT add_retention_policy('order_book_1s_agg', INTERVAL '60 days');
+SELECT add_retention_policy('order_book_imbalances', INTERVAL '60 days');
+SELECT add_retention_policy('trade_events', INTERVAL '90 days');
+SELECT add_retention_policy('ohlcv_data', INTERVAL '2 years');
+```
+
+### 3. Cache Layer
+
+In-memory caching for low-latency real-time data access.
+
+```python
+class DataCacheManager:
+    """
+    Manages in-memory cache for real-time data.
+    Provides <10ms latency for latest data access.
+    """
+    
+    def __init__(self, cache_duration_seconds: int = 300):
+        # Cache last 5 minutes of data
+        self.cache_duration = cache_duration_seconds
+        
+        # In-memory storage
+        self.ohlcv_cache: Dict[str, Dict[str, deque]] = {}
+        self.orderbook_cache: Dict[str, deque] = {}
+        self.imbalance_cache: Dict[str, deque] = {}
+        self.trade_cache: Dict[str, deque] = {}
+        
+        # Cache statistics
+        self.cache_hits = 0
+        self.cache_misses = 0
+        
+    def add_ohlcv_candle(self, symbol: str, timeframe: str, candle: Dict):
+        """Add OHLCV candle to cache"""
+        
+    def add_orderbook_snapshot(self, symbol: str, snapshot: Dict):
+        """Add order book snapshot to cache"""
+        
+    def add_imbalance_data(self, symbol: str, imbalance: Dict):
+        """Add imbalance metrics to cache"""
+        
+    def get_latest_ohlcv(self, symbol: str, timeframe: str, limit: int = 100) -> List[Dict]:
+        """Get latest OHLCV candles from cache"""
+        
+    def get_latest_orderbook(self, symbol: str) -> Optional[Dict]:
+        """Get latest order book snapshot from cache"""
+        
+    def get_latest_imbalances(self, symbol: str, limit: int = 60) -> List[Dict]:
+        """Get latest imbalance metrics from cache"""
+        
+    def evict_old_data(self):
+        """Remove data older than cache duration"""
+```
+
+### 4. Data Models
+
+Standardized data structures for all components.
+
+```python
+@dataclass
+class InferenceDataFrame:
+    """Complete inference data for a single timestamp"""
+    symbol: str
+    timestamp: datetime
+    
+    # Multi-timeframe OHLCV
+    ohlcv_1s: pd.DataFrame
+    ohlcv_1m: pd.DataFrame
+    ohlcv_5m: pd.DataFrame
+    ohlcv_15m: pd.DataFrame
+    ohlcv_1h: pd.DataFrame
+    ohlcv_1d: pd.DataFrame
+    
+    # Order book data
+    orderbook_snapshot: Optional[Dict]
+    orderbook_1s_agg: pd.DataFrame
+    
+    # Imbalance metrics
+    imbalances: pd.DataFrame  # Multi-timeframe imbalances
+    
+    # Technical indicators (pre-calculated)
+    indicators: Dict[str, float]
+    
+    # Context window data (±N minutes)
+    context_data: Optional[pd.DataFrame]
+    
+    # Metadata
+    data_source: str  # 'cache' or 'database'
+    query_latency_ms: float
+
+@dataclass
+class OrderBookDataFrame:
+    """Order book data with imbalances"""
+    symbol: str
+    timestamp: datetime
+    
+    # Raw order book
+    bids: List[Tuple[float, float]]  # (price, size)
+    asks: List[Tuple[float, float]]
+    
+    # Aggregated data
+    price_buckets: pd.DataFrame  # $1 buckets
+    
+    # Imbalance metrics
+    imbalance_1s: float
+    imbalance_5s: float
+    imbalance_15s: float
+    imbalance_60s: float
+    
+    # Volume-weighted imbalances
+    volume_imbalance_1s: float
+    volume_imbalance_5s: float
+    volume_imbalance_15s: float
+    volume_imbalance_60s: float
+    
+    # Statistics
+    mid_price: float
+    spread: float
+    bid_volume: float
+    ask_volume: float
+```
+
+### 5. Data Ingestion Pipeline
+
+Real-time data ingestion with async persistence.
+
+```python
+class DataIngestionPipeline:
+    """
+    Handles real-time data ingestion from WebSocket sources.
+    Writes to cache immediately, persists to DB asynchronously.
+    """
+    
+    def __init__(self, cache_manager, db_connection_pool):
+        self.cache = cache_manager
+        self.db = db_connection_pool
+        
+        # Batch write buffers
+        self.ohlcv_buffer: List[Dict] = []
+        self.orderbook_buffer: List[Dict] = []
+        self.trade_buffer: List[Dict] = []
+        
+        # Batch write settings
+        self.batch_size = 100
+        self.batch_timeout_seconds = 5
+        
+    async def ingest_ohlcv_candle(self, symbol: str, timeframe: str, candle: Dict):
+        """
+        Ingest OHLCV candle.
+        1. Add to cache immediately
+        2. Buffer for batch write to DB
+        """
+        # Immediate cache write
+        self.cache.add_ohlcv_candle(symbol, timeframe, candle)
+        
+        # Buffer for DB write
+        self.ohlcv_buffer.append({
+            'symbol': symbol,
+            'timeframe': timeframe,
+            **candle
+        })
+        
+        # Flush if buffer full
+        if len(self.ohlcv_buffer) >= self.batch_size:
+            await self._flush_ohlcv_buffer()
+    
+    async def ingest_orderbook_snapshot(self, symbol: str, snapshot: Dict):
+        """Ingest order book snapshot"""
+        # Immediate cache write
+        self.cache.add_orderbook_snapshot(symbol, snapshot)
+        
+        # Calculate and cache imbalances
+        imbalances = self._calculate_imbalances(symbol, snapshot)
+        self.cache.add_imbalance_data(symbol, imbalances)
+        
+        # Buffer for DB write
+        self.orderbook_buffer.append({
+            'symbol': symbol,
+            **snapshot
+        })
+        
+        # Flush if buffer full
+        if len(self.orderbook_buffer) >= self.batch_size:
+            await self._flush_orderbook_buffer()
+    
+    async def _flush_ohlcv_buffer(self):
+        """Batch write OHLCV data to database"""
+        if not self.ohlcv_buffer:
+            return
+        
+        try:
+            # Prepare batch insert
+            values = [
+                (
+                    item['timestamp'],
+                    item['symbol'],
+                    item['timeframe'],
+                    item['open'],
+                    item['high'],
+                    item['low'],
+                    item['close'],
+                    item['volume'],
+                    item.get('trade_count', 0)
+                )
+                for item in self.ohlcv_buffer
+            ]
+            
+            # Batch insert
+            await self.db.executemany(
+                """
+                INSERT INTO ohlcv_data 
+                (timestamp, symbol, timeframe, open_price, high_price, 
+                 low_price, close_price, volume, trade_count)
+                VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
+                ON CONFLICT (timestamp, symbol, timeframe) DO UPDATE
+                SET close_price = EXCLUDED.close_price,
+                    high_price = GREATEST(ohlcv_data.high_price, EXCLUDED.high_price),
+                    low_price = LEAST(ohlcv_data.low_price, EXCLUDED.low_price),
+                    volume = ohlcv_data.volume + EXCLUDED.volume,
+                    trade_count = ohlcv_data.trade_count + EXCLUDED.trade_count
+                """,
+                values
+            )
+            
+            # Clear buffer
+            self.ohlcv_buffer.clear()
+            
+        except Exception as e:
+            logger.error(f"Error flushing OHLCV buffer: {e}")
+```
+
+### 6. Migration System
+
+Migrate existing Parquet/pickle data to TimescaleDB.
+
+```python
+class DataMigrationManager:
+    """
+    Migrates existing data from Parquet/pickle files to TimescaleDB.
+    Ensures data integrity and provides rollback capability.
+    """
+    
+    def __init__(self, db_connection_pool, cache_dir: Path):
+        self.db = db_connection_pool
+        self.cache_dir = cache_dir
+        
+    async def migrate_all_data(self):
+        """Migrate all existing data to TimescaleDB"""
+        logger.info("Starting data migration to TimescaleDB")
+        
+        # Migrate OHLCV data from Parquet files
+        await self._migrate_ohlcv_data()
+        
+        # Migrate order book data if exists
+        await self._migrate_orderbook_data()
+        
+        # Verify migration
+        await self._verify_migration()
+        
+        logger.info("Data migration completed successfully")
+    
+    async def _migrate_ohlcv_data(self):
+        """Migrate OHLCV data from Parquet files"""
+        parquet_files = list(self.cache_dir.glob("*.parquet"))
+        
+        for parquet_file in parquet_files:
+            try:
+                # Parse filename: ETHUSDT_1m.parquet
+                filename = parquet_file.stem
+                parts = filename.split('_')
+                
+                if len(parts) != 2:
+                    continue
+                
+                symbol_raw = parts[0]
+                timeframe = parts[1]
+                
+                # Convert symbol format
+                symbol = self._convert_symbol_format(symbol_raw)
+                
+                # Read Parquet file
+                df = pd.read_parquet(parquet_file)
+                
+                # Migrate data in batches
+                await self._migrate_ohlcv_batch(symbol, timeframe, df)
+                
+                logger.info(f"Migrated {len(df)} rows from {parquet_file.name}")
+                
+            except Exception as e:
+                logger.error(f"Error migrating {parquet_file}: {e}")
+    
+    async def _migrate_ohlcv_batch(self, symbol: str, timeframe: str, df: pd.DataFrame):
+        """Migrate a batch of OHLCV data"""
+        # Prepare data for insertion
+        values = []
+        for idx, row in df.iterrows():
+            values.append((
+                row['timestamp'],
+                symbol,
+                timeframe,
+                row['open'],
+                row['high'],
+                row['low'],
+                row['close'],
+                row['volume'],
+                row.get('trade_count', 0)
+            ))
+        
+        # Batch insert
+        await self.db.executemany(
+            """
+            INSERT INTO ohlcv_data 
+            (timestamp, symbol, timeframe, open_price, high_price, 
+             low_price, close_price, volume, trade_count)
+            VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
+            ON CONFLICT (timestamp, symbol, timeframe) DO NOTHING
+            """,
+            values
+        )
+```
+
+## Error Handling
+
+### Data Validation
+
+```python
+class DataValidator:
+    """Validates all incoming data before storage"""
+    
+    @staticmethod
+    def validate_ohlcv(candle: Dict) -> bool:
+        """Validate OHLCV candle data"""
+        try:
+            # Check required fields
+            required = ['timestamp', 'open', 'high', 'low', 'close', 'volume']
+            if not all(field in candle for field in required):
+                return False
+            
+            # Validate OHLC relationships
+            if candle['high'] < candle['low']:
+                logger.warning(f"Invalid OHLCV: high < low")
+                return False
+            
+            if candle['high'] < candle['open'] or candle['high'] < candle['close']:
+                logger.warning(f"Invalid OHLCV: high < open/close")
+                return False
+            
+            if candle['low'] > candle['open'] or candle['low'] > candle['close']:
+                logger.warning(f"Invalid OHLCV: low > open/close")
+                return False
+            
+            # Validate positive volume
+            if candle['volume'] < 0:
+                logger.warning(f"Invalid OHLCV: negative volume")
+                return False
+            
+            return True
+            
+        except Exception as e:
+            logger.error(f"Error validating OHLCV: {e}")
+            return False
+    
+    @staticmethod
+    def validate_orderbook(orderbook: Dict) -> bool:
+        """Validate order book data"""
+        try:
+            # Check required fields
+            if 'bids' not in orderbook or 'asks' not in orderbook:
+                return False
+            
+            # Validate bid/ask relationship
+            if orderbook['bids'] and orderbook['asks']:
+                best_bid = max(bid[0] for bid in orderbook['bids'])
+                best_ask = min(ask[0] for ask in orderbook['asks'])
+                
+                if best_bid >= best_ask:
+                    logger.warning(f"Invalid orderbook: bid >= ask")
+                    return False
+            
+            return True
+            
+        except Exception as e:
+            logger.error(f"Error validating orderbook: {e}")
+            return False
+```
+
+### Retry Logic
+
+```python
+class RetryableDBOperation:
+    """Wrapper for database operations with retry logic"""
+    
+    @staticmethod
+    async def execute_with_retry(
+        operation: Callable,
+        max_retries: int = 3,
+        backoff_seconds: float = 1.0
+    ):
+        """Execute database operation with exponential backoff retry"""
+        for attempt in range(max_retries):
+            try:
+                return await operation()
+            except Exception as e:
+                if attempt == max_retries - 1:
+                    logger.error(f"Operation failed after {max_retries} attempts: {e}")
+                    raise
+                
+                wait_time = backoff_seconds * (2 ** attempt)
+                logger.warning(f"Operation failed (attempt {attempt + 1}), retrying in {wait_time}s: {e}")
+                await asyncio.sleep(wait_time)
+```
+
+## Testing Strategy
+
+### Unit Tests
+
+1. **Data Validation Tests**
+   - Test OHLCV validation logic
+   - Test order book validation logic
+   - Test timestamp validation and timezone handling
+
+2. **Cache Manager Tests**
+   - Test cache insertion and retrieval
+   - Test cache eviction logic
+   - Test cache hit/miss statistics
+
+3. **Data Model Tests**
+   - Test InferenceDataFrame creation
+   - Test OrderBookDataFrame creation
+   - Test data serialization/deserialization
+
+### Integration Tests
+
+1. **Database Integration Tests**
+   - Test TimescaleDB connection and queries
+   - Test batch insert operations
+   - Test continuous aggregates
+   - Test compression and retention policies
+
+2. **End-to-End Data Flow Tests**
+   - Test real-time data ingestion → cache → database
+   - Test historical data retrieval from database
+   - Test multi-timeframe data alignment
+
+3. **Migration Tests**
+   - Test Parquet file migration
+   - Test data integrity after migration
+   - Test rollback capability
+
+### Performance Tests
+
+1. **Latency Tests**
+   - Cache read latency (<10ms target)
+   - Database query latency (<100ms target)
+   - Batch write throughput (>1000 ops/sec target)
+
+2. **Load Tests**
+   - Concurrent read/write operations
+   - High-frequency data ingestion
+   - Large time-range queries
+
+3. **Storage Tests**
+   - Compression ratio validation (>80% target)
+   - Storage growth over time
+   - Query performance with compressed data
+
+## Performance Optimization
+
+### Query Optimization
+
+```sql
+-- Use time_bucket for efficient time-range queries
+SELECT 
+    time_bucket('1 minute', timestamp) AS bucket,
+    symbol,
+    first(close_price, timestamp) AS price
+FROM ohlcv_data
+WHERE symbol = 'ETH/USDT'
+  AND timeframe = '1s'
+  AND timestamp >= NOW() - INTERVAL '1 hour'
+GROUP BY bucket, symbol
+ORDER BY bucket DESC;
+
+-- Use indexes for symbol-based queries
+CREATE INDEX CONCURRENTLY idx_ohlcv_symbol_tf_ts 
+ON ohlcv_data (symbol, timeframe, timestamp DESC);
+```
+
+### Caching Strategy
+
+1. **Hot Data**: Last 5 minutes in memory (all symbols, all timeframes)
+2. **Warm Data**: Last 1 hour in TimescaleDB uncompressed
+3. **Cold Data**: Older than 1 hour in TimescaleDB compressed
+
+### Batch Operations
+
+- Batch size: 100 records or 5 seconds (whichever comes first)
+- Use `executemany()` for bulk inserts
+- Use `COPY` command for large migrations
+
+## Deployment Considerations
+
+### Database Setup
+
+1. Install TimescaleDB extension
+2. Run schema creation scripts
+3. Create hypertables and indexes
+4. Set up continuous aggregates
+5. Configure compression and retention policies
+
+### Migration Process
+
+1. **Phase 1**: Deploy new code with dual-write (Parquet + TimescaleDB)
+2. **Phase 2**: Run migration script to backfill historical data
+3. **Phase 3**: Verify data integrity
+4. **Phase 4**: Switch reads to TimescaleDB
+5. **Phase 5**: Deprecate Parquet writes
+6. **Phase 6**: Archive old Parquet files
+
+### Monitoring
+
+1. **Database Metrics**
+   - Query latency (p50, p95, p99)
+   - Write throughput
+   - Storage size and compression ratio
+   - Connection pool utilization
+
+2. **Cache Metrics**
+   - Hit/miss ratio
+   - Cache size
+   - Eviction rate
+
+3. **Application Metrics**
+   - Data retrieval latency
+   - Error rates
+   - Data validation failures
+
+## Security Considerations
+
+1. **Database Access**
+   - Use connection pooling with proper credentials
+   - Implement read-only users for query-only operations
+   - Use SSL/TLS for database connections
+
+2. **Data Validation**
+   - Validate all incoming data before storage
+   - Sanitize inputs to prevent SQL injection
+   - Implement rate limiting for API endpoints
+
+3. **Backup and Recovery**
+   - Regular database backups (daily)
+   - Point-in-time recovery capability
+   - Disaster recovery plan
+
+## Future Enhancements
+
+1. **Multi-Exchange Support**
+   - Store data from multiple exchanges
+   - Cross-exchange arbitrage analysis
+   - Exchange-specific data normalization
+
+2. **Advanced Analytics**
+   - Real-time pattern detection
+   - Anomaly detection
+   - Predictive analytics
+
+3. **Distributed Storage**
+   - Horizontal scaling with TimescaleDB clustering
+   - Read replicas for query load distribution
+   - Geographic distribution for low-latency access
--- a/.kiro/specs/unified-data-storage/requirements.md
+++ b/.kiro/specs/unified-data-storage/requirements.md
@@ -0,0 +1,134 @@
+# Requirements Document
+
+## Introduction
+
+This feature aims to unify all data storage and retrieval methods across the trading system into a single, coherent interface. Currently, the system uses multiple storage approaches (Parquet files, pickle files, in-memory caches, TimescaleDB) and has fragmented data access patterns. This creates complexity, inconsistency, and performance issues.
+
+The unified data storage system will provide a single endpoint for retrieving inference data, supporting both real-time streaming data and historical backtesting/annotation scenarios. It will consolidate storage methods into the most efficient approach and ensure all components use consistent data access patterns.
+
+## Requirements
+
+### Requirement 1: Unified Data Retrieval Interface
+
+**User Story:** As a developer, I want a single method to retrieve inference data regardless of whether I need real-time or historical data, so that I can simplify my code and ensure consistency.
+
+#### Acceptance Criteria
+
+1. WHEN a component requests inference data THEN the system SHALL provide a unified `get_inference_data()` method that accepts a timestamp parameter
+2. WHEN timestamp is None or "latest" THEN the system SHALL return the most recent cached real-time data
+3. WHEN timestamp is a specific datetime THEN the system SHALL return historical data from local storage at that timestamp
+4. WHEN requesting inference data THEN the system SHALL return data in a standardized format with all required features (OHLCV, technical indicators, COB data, order book imbalances)
+5. WHEN the requested timestamp is not available THEN the system SHALL return the nearest available data point with a warning
+
+### Requirement 2: Consolidated Storage Backend
+
+**User Story:** As a system architect, I want all market data stored using a single, optimized storage method, so that I can reduce complexity and improve performance.
+
+#### Acceptance Criteria
+
+1. WHEN storing candlestick data THEN the system SHALL use TimescaleDB as the primary storage backend
+2. WHEN storing raw order book ticks THEN the system SHALL use TimescaleDB with appropriate compression
+3. WHEN storing aggregated 1s/1m data THEN the system SHALL use TimescaleDB hypertables for efficient time-series queries
+4. WHEN the system starts THEN it SHALL migrate existing Parquet and pickle files to TimescaleDB
+5. WHEN data is written THEN the system SHALL ensure atomic writes with proper error handling
+6. WHEN querying data THEN the system SHALL leverage TimescaleDB's time-series optimizations for fast retrieval
+
+### Requirement 3: Multi-Timeframe Data Storage
+
+**User Story:** As a trading model, I need access to multiple timeframes (1s, 1m, 5m, 15m, 1h, 1d) of candlestick data, so that I can perform multi-timeframe analysis.
+
+#### Acceptance Criteria
+
+1. WHEN storing candlestick data THEN the system SHALL store all configured timeframes (1s, 1m, 5m, 15m, 1h, 1d)
+2. WHEN aggregating data THEN the system SHALL use TimescaleDB continuous aggregates to automatically generate higher timeframes from 1s data
+3. WHEN requesting multi-timeframe data THEN the system SHALL return aligned timestamps across all timeframes
+4. WHEN a timeframe is missing data THEN the system SHALL generate it from lower timeframes if available
+5. WHEN storing timeframe data THEN the system SHALL maintain at least 1500 candles per timeframe for each symbol
+
+### Requirement 4: Raw Order Book and Trade Data Storage
+
+**User Story:** As a machine learning model, I need access to raw 1s and 1m aggregated order book and trade book data, so that I can analyze market microstructure.
+
+#### Acceptance Criteria
+
+1. WHEN receiving order book updates THEN the system SHALL store raw ticks in TimescaleDB with full bid/ask depth
+2. WHEN aggregating order book data THEN the system SHALL create 1s aggregations with $1 price buckets
+3. WHEN aggregating order book data THEN the system SHALL create 1m aggregations with $10 price buckets
+4. WHEN storing trade data THEN the system SHALL store individual trades with price, size, side, and timestamp
+5. WHEN storing order book data THEN the system SHALL maintain 30 minutes of raw data and 24 hours of aggregated data
+6. WHEN querying order book data THEN the system SHALL provide efficient access to imbalance metrics across multiple timeframes (1s, 5s, 15s, 60s)
+
+### Requirement 5: Real-Time Data Caching
+
+**User Story:** As a real-time trading system, I need low-latency access to the latest market data, so that I can make timely trading decisions.
+
+#### Acceptance Criteria
+
+1. WHEN receiving real-time data THEN the system SHALL maintain an in-memory cache of the last 5 minutes of data
+2. WHEN requesting latest data THEN the system SHALL serve from cache with <10ms latency
+3. WHEN cache is updated THEN the system SHALL asynchronously persist to TimescaleDB without blocking
+4. WHEN cache reaches capacity THEN the system SHALL evict oldest data while maintaining continuity
+5. WHEN system restarts THEN the system SHALL rebuild cache from TimescaleDB automatically
+
+### Requirement 6: Historical Data Access for Backtesting
+
+**User Story:** As a backtesting system, I need efficient access to historical data at any timestamp, so that I can simulate trading strategies accurately.
+
+#### Acceptance Criteria
+
+1. WHEN requesting historical data THEN the system SHALL query TimescaleDB with timestamp-based indexing
+2. WHEN requesting a time range THEN the system SHALL return all data points within that range efficiently
+3. WHEN requesting data with context window THEN the system SHALL return ±N minutes of surrounding data
+4. WHEN backtesting THEN the system SHALL support sequential data access without loading entire dataset into memory
+5. WHEN querying historical data THEN the system SHALL return results in <100ms for typical queries (single timestamp, single symbol)
+
+### Requirement 7: Data Annotation Support
+
+**User Story:** As a data annotator, I need to retrieve historical market data at specific timestamps to manually label trading signals, so that I can create training datasets.
+
+#### Acceptance Criteria
+
+1. WHEN annotating data THEN the system SHALL provide the same `get_inference_data()` interface with timestamp parameter
+2. WHEN retrieving annotation data THEN the system SHALL include ±5 minutes of context data
+3. WHEN loading annotation sessions THEN the system SHALL support efficient random access to any timestamp
+4. WHEN displaying charts THEN the system SHALL provide multi-timeframe data aligned to the annotation timestamp
+5. WHEN saving annotations THEN the system SHALL link annotations to exact timestamps in the database
+
+### Requirement 8: Data Migration and Backward Compatibility
+
+**User Story:** As a system administrator, I want existing data migrated to the new storage system without data loss, so that I can maintain historical continuity.
+
+#### Acceptance Criteria
+
+1. WHEN migration starts THEN the system SHALL detect existing Parquet files in cache directory
+2. WHEN migrating Parquet data THEN the system SHALL import all data into TimescaleDB with proper timestamps
+3. WHEN migration completes THEN the system SHALL verify data integrity by comparing record counts
+4. WHEN migration fails THEN the system SHALL rollback changes and preserve original files
+5. WHEN migration succeeds THEN the system SHALL optionally archive old Parquet files
+6. WHEN accessing data during migration THEN the system SHALL continue serving from existing storage
+
+### Requirement 9: Performance and Scalability
+
+**User Story:** As a system operator, I need the data storage system to handle high-frequency data ingestion and queries efficiently, so that the system remains responsive under load.
+
+#### Acceptance Criteria
+
+1. WHEN ingesting real-time data THEN the system SHALL handle at least 1000 updates per second per symbol
+2. WHEN querying data THEN the system SHALL return single-timestamp queries in <100ms
+3. WHEN querying time ranges THEN the system SHALL return 1 hour of 1s data in <500ms
+4. WHEN storing data THEN the system SHALL use batch writes to optimize database performance
+5. WHEN database grows THEN the system SHALL use TimescaleDB compression to reduce storage size by 80%+
+6. WHEN running multiple queries THEN the system SHALL support concurrent access without performance degradation
+
+### Requirement 10: Data Consistency and Validation
+
+**User Story:** As a trading system, I need to ensure all data is consistent and validated, so that models receive accurate information.
+
+#### Acceptance Criteria
+
+1. WHEN storing data THEN the system SHALL validate timestamps are in UTC timezone
+2. WHEN storing OHLCV data THEN the system SHALL validate high >= low and high >= open/close
+3. WHEN storing order book data THEN the system SHALL validate bids < asks
+4. WHEN detecting invalid data THEN the system SHALL log warnings and reject the data point
+5. WHEN querying data THEN the system SHALL ensure all timeframes are properly aligned
+6. WHEN data gaps exist THEN the system SHALL identify and log missing periods
--- a/.kiro/specs/unified-data-storage/tasks.md
+++ b/.kiro/specs/unified-data-storage/tasks.md
@@ -0,0 +1,286 @@
+# Implementation Plan
+
+- [x] 1. Set up TimescaleDB schema and infrastructure
+
+
+
+  - Create database schema with hypertables for OHLCV, order book, and trade data
+  - Implement continuous aggregates for multi-timeframe data generation
+  - Configure compression and retention policies
+  - Create all necessary indexes for query optimization
+
+
+  - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 3.1, 3.2, 3.3, 3.4, 3.5, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6_
+
+- [ ] 2. Implement data models and validation
+  - [ ] 2.1 Create InferenceDataFrame and OrderBookDataFrame data classes
+    - Write dataclasses for standardized data structures
+    - Include all required fields (OHLCV, order book, imbalances, indicators)
+    - Add serialization/deserialization methods
+    - _Requirements: 1.4, 10.1, 10.2, 10.3_
+
+  - [ ] 2.2 Implement DataValidator class
+    - Write OHLCV validation logic (high >= low, positive volume)
+    - Write order book validation logic (bids < asks)
+    - Write timestamp validation and UTC timezone enforcement
+    - Add comprehensive error logging for validation failures
+    - _Requirements: 10.1, 10.2, 10.3, 10.4_
+
+  - [ ]* 2.3 Write unit tests for data models and validation
+    - Test InferenceDataFrame creation and serialization
+    - Test OrderBookDataFrame creation and serialization
+    - Test DataValidator with valid and invalid data
+    - Test edge cases and boundary conditions
+    - _Requirements: 10.1, 10.2, 10.3, 10.4_
+
+- [ ] 3. Implement cache layer
+  - [ ] 3.1 Create DataCacheManager class
+    - Implement in-memory cache with deque structures
+    - Add methods for OHLCV, order book, and imbalance data
+    - Implement cache eviction logic (5-minute rolling window)
+    - Add cache statistics tracking (hits, misses)
+    - _Requirements: 5.1, 5.2, 5.3, 5.4_
+
+  - [ ] 3.2 Implement cache retrieval methods
+    - Write get_latest_ohlcv() with timeframe support
+    - Write get_latest_orderbook() for current snapshot
+    - Write get_latest_imbalances() for multi-timeframe metrics
+    - Ensure <10ms latency for cache reads
+    - _Requirements: 5.1, 5.2_
+
+  - [ ]* 3.3 Write unit tests for cache layer
+    - Test cache insertion and retrieval
+    - Test cache eviction logic
+    - Test cache statistics
+    - Test concurrent access patterns
+    - _Requirements: 5.1, 5.2, 5.3, 5.4_
+
+- [ ] 4. Implement database connection and query layer
+  - [ ] 4.1 Create DatabaseConnectionManager class
+    - Implement asyncpg connection pool management
+    - Add health monitoring and automatic reconnection
+    - Configure connection pool settings (min/max connections)
+    - Add connection statistics and logging
+    - _Requirements: 2.1, 2.5, 9.6_
+
+  - [ ] 4.2 Implement OHLCV query methods
+    - Write query_ohlcv_data() for single timeframe retrieval
+    - Write query_multi_timeframe_ohlcv() for aligned multi-timeframe data
+    - Optimize queries with time_bucket and proper indexes
+    - Ensure <100ms query latency for typical queries
+    - _Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.5, 9.2, 9.3_
+
+  - [ ] 4.3 Implement order book query methods
+    - Write query_orderbook_snapshots() for raw order book data
+    - Write query_orderbook_aggregated() for 1s/1m aggregations
+    - Write query_orderbook_imbalances() for multi-timeframe imbalances
+    - Optimize queries for fast retrieval
+    - _Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2, 6.5_
+
+  - [ ]* 4.4 Write integration tests for database layer
+    - Test connection pool management
+    - Test OHLCV queries with various time ranges
+    - Test order book queries
+    - Test query performance and latency
+    - _Requirements: 6.1, 6.2, 6.5, 9.2, 9.3_
+
+- [ ] 5. Implement data ingestion pipeline
+  - [ ] 5.1 Create DataIngestionPipeline class
+    - Implement batch write buffers for OHLCV, order book, and trade data
+    - Add batch size and timeout configuration
+    - Implement async batch flush methods
+    - Add error handling and retry logic
+    - _Requirements: 2.5, 5.3, 9.1, 9.4_
+
+  - [ ] 5.2 Implement OHLCV ingestion
+    - Write ingest_ohlcv_candle() method
+    - Add immediate cache write
+    - Implement batch buffering for database writes
+    - Add data validation before ingestion
+    - _Requirements: 2.1, 2.2, 2.5, 5.1, 5.3, 9.1, 9.4, 10.1, 10.2_
+
+  - [ ] 5.3 Implement order book ingestion
+    - Write ingest_orderbook_snapshot() method
+    - Calculate and cache imbalance metrics
+    - Implement batch buffering for database writes
+    - Add data validation before ingestion
+    - _Requirements: 2.1, 2.2, 4.1, 4.2, 4.3, 5.1, 5.3, 9.1, 9.4, 10.3_
+
+  - [ ] 5.4 Implement retry logic and error handling
+    - Create RetryableDBOperation wrapper class
+    - Implement exponential backoff retry strategy
+    - Add comprehensive error logging
+    - Handle database connection failures gracefully
+    - _Requirements: 2.5, 9.6_
+
+  - [ ]* 5.5 Write integration tests for ingestion pipeline
+    - Test OHLCV ingestion flow (cache → database)
+    - Test order book ingestion flow
+    - Test batch write operations
+    - Test error handling and retry logic
+    - _Requirements: 2.5, 5.3, 9.1, 9.4_
+
+- [ ] 6. Implement unified data provider API
+  - [ ] 6.1 Create UnifiedDataProvider class
+    - Initialize with database connection pool and cache manager
+    - Configure symbols and timeframes
+    - Add connection to existing DataProvider components
+    - _Requirements: 1.1, 1.2, 1.3_
+
+  - [ ] 6.2 Implement get_inference_data() method
+    - Handle timestamp=None for real-time data from cache
+    - Handle specific timestamp for historical data from database
+    - Implement context window retrieval (±N minutes)
+    - Combine OHLCV, order book, and imbalance data
+    - Return standardized InferenceDataFrame
+    - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 5.2, 6.1, 6.2, 6.3, 6.4, 7.1, 7.2, 7.3_
+
+  - [ ] 6.3 Implement get_multi_timeframe_data() method
+    - Query multiple timeframes efficiently
+    - Align timestamps across timeframes
+    - Handle missing data by generating from lower timeframes
+    - Return dictionary mapping timeframe to DataFrame
+    - _Requirements: 3.1, 3.2, 3.3, 3.4, 6.1, 6.2, 6.3, 10.5_
+
+  - [ ] 6.4 Implement get_order_book_data() method
+    - Handle different aggregation levels (raw, 1s, 1m)
+    - Include multi-timeframe imbalance metrics
+    - Return standardized OrderBookDataFrame
+    - _Requirements: 4.1, 4.2, 4.3, 4.6, 6.1, 6.2_
+
+  - [ ]* 6.5 Write integration tests for unified API
+    - Test get_inference_data() with real-time and historical data
+    - Test get_multi_timeframe_data() with various timeframes
+    - Test get_order_book_data() with different aggregations
+    - Test context window retrieval
+    - Test data consistency across methods
+    - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 6.1, 6.2, 6.3, 6.4, 10.5, 10.6_
+
+- [ ] 7. Implement data migration system
+  - [ ] 7.1 Create DataMigrationManager class
+    - Initialize with database connection and cache directory path
+    - Add methods for discovering existing Parquet files
+    - Implement symbol format conversion utilities
+    - _Requirements: 8.1, 8.2, 8.6_
+
+  - [ ] 7.2 Implement Parquet file migration
+    - Write _migrate_ohlcv_data() to process all Parquet files
+    - Parse filenames to extract symbol and timeframe
+    - Read Parquet files and convert to database format
+    - Implement batch insertion with conflict handling
+    - _Requirements: 8.1, 8.2, 8.3, 8.5_
+
+  - [ ] 7.3 Implement migration verification
+    - Write _verify_migration() to compare record counts
+    - Check data integrity (no missing timestamps)
+    - Validate data ranges match original files
+    - Generate migration report
+    - _Requirements: 8.3, 8.4_
+
+  - [ ] 7.4 Implement rollback capability
+    - Add transaction support for migration operations
+    - Implement rollback on verification failure
+    - Preserve original Parquet files until verification passes
+    - Add option to archive old files after successful migration
+    - _Requirements: 8.4, 8.5_
+
+  - [ ]* 7.5 Write integration tests for migration
+    - Test Parquet file discovery and parsing
+    - Test data migration with sample files
+    - Test verification logic
+    - Test rollback on failure
+    - _Requirements: 8.1, 8.2, 8.3, 8.4_
+
+- [ ] 8. Integrate with existing DataProvider
+  - [ ] 8.1 Update DataProvider class to use UnifiedDataProvider
+    - Replace existing data retrieval methods with unified API calls
+    - Update get_data() method to use get_inference_data()
+    - Update multi-timeframe methods to use get_multi_timeframe_data()
+    - Maintain backward compatibility with existing interfaces
+    - _Requirements: 1.1, 1.2, 1.3, 8.6_
+
+  - [ ] 8.2 Update real-time data flow
+    - Connect WebSocket data to DataIngestionPipeline
+    - Update tick aggregator to write to cache and database
+    - Update COB integration to use new ingestion methods
+    - Ensure no data loss during transition
+    - _Requirements: 2.1, 2.2, 5.1, 5.3, 8.6_
+
+  - [ ] 8.3 Update annotation system integration
+    - Update ANNOTATE/core/data_loader.py to use unified API
+    - Ensure annotation system uses get_inference_data() with timestamps
+    - Test annotation workflow with new data provider
+    - _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5_
+
+  - [ ] 8.4 Update backtesting system integration
+    - Update backtesting data access to use unified API
+    - Ensure sequential data access works efficiently
+    - Test backtesting performance with new data provider
+    - _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_
+
+  - [ ]* 8.5 Write end-to-end integration tests
+    - Test complete data flow: WebSocket → ingestion → cache → database → retrieval
+    - Test annotation system with unified data provider
+    - Test backtesting system with unified data provider
+    - Test real-time trading with unified data provider
+    - _Requirements: 1.1, 1.2, 1.3, 6.1, 6.2, 7.1, 8.6_
+
+- [ ] 9. Performance optimization and monitoring
+  - [ ] 9.1 Implement performance monitoring
+    - Add latency tracking for cache reads (<10ms target)
+    - Add latency tracking for database queries (<100ms target)
+    - Add throughput monitoring for ingestion (>1000 ops/sec target)
+    - Create performance dashboard or logging
+    - _Requirements: 5.2, 6.5, 9.1, 9.2, 9.3_
+
+  - [ ] 9.2 Optimize database queries
+    - Analyze query execution plans
+    - Add missing indexes if needed
+    - Optimize time_bucket usage
+    - Implement query result caching where appropriate
+    - _Requirements: 6.5, 9.2, 9.3, 9.6_
+
+  - [ ] 9.3 Implement compression and retention
+    - Verify compression policies are working (>80% compression target)
+    - Monitor storage growth over time
+    - Verify retention policies are cleaning old data
+    - Add alerts for storage issues
+    - _Requirements: 2.6, 9.5_
+
+  - [ ]* 9.4 Write performance tests
+    - Test cache read latency under load
+    - Test database query latency with various time ranges
+    - Test ingestion throughput with high-frequency data
+    - Test concurrent access patterns
+    - _Requirements: 5.2, 6.5, 9.1, 9.2, 9.3, 9.6_
+
+- [ ] 10. Documentation and deployment
+  - [ ] 10.1 Create deployment documentation
+    - Document TimescaleDB setup and configuration
+    - Document migration process and steps
+    - Document rollback procedures
+    - Create troubleshooting guide
+    - _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6_
+
+  - [ ] 10.2 Create API documentation
+    - Document UnifiedDataProvider API methods
+    - Provide usage examples for each method
+    - Document data models and structures
+    - Create migration guide for existing code
+    - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
+
+  - [ ] 10.3 Create monitoring and alerting setup
+    - Document key metrics to monitor
+    - Set up alerts for performance degradation
+    - Set up alerts for data validation failures
+    - Create operational runbook
+    - _Requirements: 9.1, 9.2, 9.3, 9.5, 9.6, 10.4_
+
+  - [ ] 10.4 Execute phased deployment
+    - Phase 1: Deploy with dual-write (Parquet + TimescaleDB)
+    - Phase 2: Run migration script for historical data
+    - Phase 3: Verify data integrity
+    - Phase 4: Switch reads to TimescaleDB
+    - Phase 5: Deprecate Parquet writes
+    - Phase 6: Archive old Parquet files
+    - _Requirements: 8.1, 8.2, 8.3, 8.4, 8.5, 8.6_