# Data Provider Quick Reference Guide

## Overview

Quick reference for using the multi-layered data provider system in the multi-modal trading system.

## Architecture Layers

```
COBY System → Core DataProvider → StandardizedDataProvider → Models
```

## Getting Started

### Basic Usage

```python
from core.standardized_data_provider import StandardizedDataProvider

# Initialize provider
provider = StandardizedDataProvider(
    symbols=['ETH/USDT', 'BTC/USDT'],
    timeframes=['1s', '1m', '1h', '1d']
)

# Start real-time processing
provider.start_real_time_processing()

# Get standardized input for models
base_input = provider.get_base_data_input('ETH/USDT')

# Validate data quality
if base_input and base_input.validate():
    # Use data for model inference
    pass
```

## BaseDataInput Structure

```python
@dataclass
class BaseDataInput:
    symbol: str                                    # 'ETH/USDT'
    timestamp: datetime                            # Current time
    
    # OHLCV Data (300 frames each)
    ohlcv_1s: List[OHLCVBar]                      # 1-second bars
    ohlcv_1m: List[OHLCVBar]                      # 1-minute bars
    ohlcv_1h: List[OHLCVBar]                      # 1-hour bars
    ohlcv_1d: List[OHLCVBar]                      # 1-day bars
    btc_ohlcv_1s: List[OHLCVBar]                  # BTC reference
    
    # COB Data
    cob_data: Optional[COBData]                    # Order book data
    
    # Technical Analysis
    technical_indicators: Dict[str, float]         # RSI, MACD, etc.
    pivot_points: List[PivotPoint]                 # Williams pivots
    
    # Cross-Model Feeding
    last_predictions: Dict[str, ModelOutput]       # Other model outputs
    
    # Market Microstructure
    market_microstructure: Dict[str, Any]          # Order flow, etc.
```

## Common Operations

### Get Current Price

```python
# Multiple fallback methods
price = provider.get_current_price('ETH/USDT')

# Direct API call with cache
price = provider.get_live_price_from_api('ETH/USDT')
```

### Get Historical Data

```python
# Get OHLCV data
df = provider.get_historical_data(
    symbol='ETH/USDT',
    timeframe='1h',
    limit=300
)
```

### Get COB Data

```python
# Get latest COB snapshot
cob_data = provider.get_latest_cob_data('ETH/USDT')

# Get COB imbalance metrics
imbalance = provider.get_current_cob_imbalance('ETH/USDT')
```

### Get Pivot Points

```python
# Get Williams Market Structure pivots
pivots = provider.calculate_williams_pivot_points('ETH/USDT')
```

### Store Model Output

```python
from core.data_models import ModelOutput

# Create model output
output = ModelOutput(
    model_type='cnn',
    model_name='williams_cnn_v2',
    symbol='ETH/USDT',
    timestamp=datetime.now(),
    confidence=0.85,
    predictions={
        'action': 'BUY',
        'action_confidence': 0.85,
        'direction_vector': 0.7
    },
    hidden_states={'conv_features': tensor(...)},
    metadata={'version': '2.1'}
)

# Store for cross-model feeding
provider.store_model_output(output)
```

### Get Model Outputs

```python
# Get all model outputs for a symbol
outputs = provider.get_model_outputs('ETH/USDT')

# Access specific model output
cnn_output = outputs.get('williams_cnn_v2')
```

## Data Validation

### Validate BaseDataInput

```python
base_input = provider.get_base_data_input('ETH/USDT')

if base_input:
    # Check validation
    is_valid = base_input.validate()
    
    # Check data completeness
    if len(base_input.ohlcv_1s) >= 100:
        # Sufficient data for inference
        pass
```

### Check Data Quality

```python
# Get data completeness metrics
if base_input:
    ohlcv_complete = all([
        len(base_input.ohlcv_1s) >= 100,
        len(base_input.ohlcv_1m) >= 100,
        len(base_input.ohlcv_1h) >= 100,
        len(base_input.ohlcv_1d) >= 100
    ])
    
    cob_complete = base_input.cob_data is not None
    
    # Overall quality score (implement in Task 2.3)
    # quality_score = base_input.data_quality_score()
```

## COB Data Access

### COB Data Structure

```python
@dataclass
class COBData:
    symbol: str
    timestamp: datetime
    current_price: float
    bucket_size: float                             # $1 ETH, $10 BTC
    
    # Price Buckets (±20 around current price)
    price_buckets: Dict[float, Dict[str, float]]   # {price: {bid_vol, ask_vol}}
    bid_ask_imbalance: Dict[float, float]          # {price: imbalance}
    
    # Moving Averages (±5 buckets)
    ma_1s_imbalance: Dict[float, float]
    ma_5s_imbalance: Dict[float, float]
    ma_15s_imbalance: Dict[float, float]
    ma_60s_imbalance: Dict[float, float]
    
    # Order Flow
    order_flow_metrics: Dict[str, float]
```

### Access COB Buckets

```python
if base_input.cob_data:
    cob = base_input.cob_data
    
    # Get current price
    current_price = cob.current_price
    
    # Get bid/ask volumes for specific price
    price_level = current_price + cob.bucket_size  # One bucket up
    if price_level in cob.price_buckets:
        bucket = cob.price_buckets[price_level]
        bid_volume = bucket.get('bid_volume', 0)
        ask_volume = bucket.get('ask_volume', 0)
    
    # Get imbalance for price level
    imbalance = cob.bid_ask_imbalance.get(price_level, 0)
    
    # Get moving averages
    ma_1s = cob.ma_1s_imbalance.get(price_level, 0)
    ma_5s = cob.ma_5s_imbalance.get(price_level, 0)
```

## Subscriber Pattern

### Subscribe to Data Updates

```python
def my_data_callback(tick):
    """Handle real-time tick data"""
    print(f"Received tick: {tick.symbol} @ {tick.price}")

# Subscribe to data updates
subscriber_id = provider.subscribe_to_data(
    callback=my_data_callback,
    symbols=['ETH/USDT'],
    subscriber_name='my_model'
)

# Unsubscribe when done
provider.unsubscribe_from_data(subscriber_id)
```

## Configuration

### Key Configuration Options

```yaml
# config.yaml
data_provider:
  symbols:
    - ETH/USDT
    - BTC/USDT
  
  timeframes:
    - 1s
    - 1m
    - 1h
    - 1d
  
  cache:
    enabled: true
    candles_per_timeframe: 1500
  
  cob:
    enabled: true
    bucket_sizes:
      ETH/USDT: 1.0    # $1 buckets
      BTC/USDT: 10.0   # $10 buckets
    price_ranges:
      ETH/USDT: 5.0    # ±$5 for imbalance
      BTC/USDT: 50.0   # ±$50 for imbalance
    
  websocket:
    update_speed: 100ms
    max_depth: 1000
    reconnect_delay: 1.0
    max_reconnect_delay: 60.0
```

## Performance Tips

### Optimize Data Access

```python
# Cache BaseDataInput for multiple models
base_input = provider.get_base_data_input('ETH/USDT')

# Use cached data for all models
cnn_input = base_input  # CNN uses full data
rl_input = base_input   # RL uses full data + CNN outputs

# Avoid repeated calls
# BAD: base_input = provider.get_base_data_input('ETH/USDT')  # Called multiple times
# GOOD: Cache and reuse
```

### Monitor Performance

```python
# Check subscriber statistics
stats = provider.distribution_stats

print(f"Total ticks received: {stats['total_ticks_received']}")
print(f"Total ticks distributed: {stats['total_ticks_distributed']}")
print(f"Distribution errors: {stats['distribution_errors']}")
```

## Troubleshooting

### Common Issues

#### 1. No Data Available

```python
base_input = provider.get_base_data_input('ETH/USDT')

if base_input is None:
    # Check if data provider is started
    if not provider.data_maintenance_active:
        provider.start_automatic_data_maintenance()
    
    # Check if COB collection is started
    if not provider.cob_collection_active:
        provider.start_cob_collection()
```

#### 2. Incomplete Data

```python
if base_input:
    # Check frame counts
    print(f"1s frames: {len(base_input.ohlcv_1s)}")
    print(f"1m frames: {len(base_input.ohlcv_1m)}")
    print(f"1h frames: {len(base_input.ohlcv_1h)}")
    print(f"1d frames: {len(base_input.ohlcv_1d)}")
    
    # Wait for data to accumulate
    if len(base_input.ohlcv_1s) < 100:
        print("Waiting for more data...")
        time.sleep(60)  # Wait 1 minute
```

#### 3. COB Data Missing

```python
if base_input and base_input.cob_data is None:
    # Check COB collection status
    if not provider.cob_collection_active:
        provider.start_cob_collection()
    
    # Check WebSocket status
    if hasattr(provider, 'enhanced_cob_websocket'):
        ws = provider.enhanced_cob_websocket
        status = ws.status.get('ETH/USDT')
        print(f"WebSocket connected: {status.connected}")
        print(f"Last message: {status.last_message_time}")
```

#### 4. Price Data Stale

```python
# Force refresh price
price = provider.get_live_price_from_api('ETH/USDT')

# Check cache freshness
if 'ETH/USDT' in provider.live_price_cache:
    cached_price, timestamp = provider.live_price_cache['ETH/USDT']
    age = datetime.now() - timestamp
    print(f"Price cache age: {age.total_seconds()}s")
```

## Best Practices

### 1. Always Validate Data

```python
base_input = provider.get_base_data_input('ETH/USDT')

if base_input and base_input.validate():
    # Safe to use for inference
    model_output = model.predict(base_input)
else:
    # Log and skip inference
    logger.warning("Invalid or incomplete data, skipping inference")
```

### 2. Handle Missing Data Gracefully

```python
# Never use synthetic data
if base_input is None:
    logger.error("No data available")
    return None  # Don't proceed with inference

# Check specific components
if base_input.cob_data is None:
    logger.warning("COB data unavailable, using OHLCV only")
    # Proceed with reduced features or skip
```

### 3. Store Model Outputs

```python
# Always store outputs for cross-model feeding
output = model.predict(base_input)
provider.store_model_output(output)

# Other models can now access this output
```

### 4. Monitor Data Quality

```python
# Implement quality checks
def check_data_quality(base_input):
    if not base_input:
        return 0.0
    
    score = 0.0
    
    # OHLCV completeness (40%)
    ohlcv_score = min(1.0, len(base_input.ohlcv_1s) / 300) * 0.4
    score += ohlcv_score
    
    # COB availability (30%)
    cob_score = 0.3 if base_input.cob_data else 0.0
    score += cob_score
    
    # Pivot points (20%)
    pivot_score = 0.2 if base_input.pivot_points else 0.0
    score += pivot_score
    
    # Freshness (10%)
    age = (datetime.now() - base_input.timestamp).total_seconds()
    freshness_score = max(0, 1.0 - age / 60) * 0.1  # Decay over 1 minute
    score += freshness_score
    
    return score

# Use quality score
quality = check_data_quality(base_input)
if quality < 0.8:
    logger.warning(f"Low data quality: {quality:.2f}")
```

## File Locations

- **Core DataProvider**: `core/data_provider.py`
- **Standardized Provider**: `core/standardized_data_provider.py`
- **Enhanced COB WebSocket**: `core/enhanced_cob_websocket.py`
- **Williams Market Structure**: `core/williams_market_structure.py`
- **Data Models**: `core/data_models.py`
- **Model Output Manager**: `core/model_output_manager.py`
- **COBY System**: `COBY/` directory

## Additional Resources

- **Requirements**: `.kiro/specs/1.multi-modal-trading-system/requirements.md`
- **Design**: `.kiro/specs/1.multi-modal-trading-system/design.md`
- **Tasks**: `.kiro/specs/1.multi-modal-trading-system/tasks.md`
- **Audit Summary**: `.kiro/specs/1.multi-modal-trading-system/AUDIT_SUMMARY.md`

---

**Last Updated**: January 9, 2025  
**Version**: 1.0