# BaseDataInput Specification

## Overview

`BaseDataInput` is the **unified, standardized data structure** used across all models in the trading system for both inference and training. It ensures consistency, extensibility, and proper feature engineering across CNN, RL, LSTM, Transformer, and Orchestrator models.

**Location:** `core/data_models.py`

---

## Design Principles

1. **Single Source of Truth**: All models receive identical input structure
2. **Fixed Feature Size**: `get_feature_vector()` always returns exactly 7,850 features
3. **Extensibility**: New features can be added without breaking existing models
4. **No Synthetic Data**: All features must come from real market data or be zero-padded
5. **Multi-Timeframe**: Supports multiple timeframes for comprehensive market analysis
6. **Cross-Model Feeding**: Includes predictions from other models for ensemble approaches

---

## Data Structure

### Core Fields

```python
@dataclass
class BaseDataInput:
    symbol: str                    # Primary trading symbol (e.g., 'ETH/USDT')
    timestamp: datetime            # Current timestamp
```

### Multi-Timeframe OHLCV Data (Primary Symbol - ETH)

```python
    ohlcv_1s: List[OHLCVBar]      # 300 frames of 1-second bars
    ohlcv_1m: List[OHLCVBar]      # 300 frames of 1-minute bars
    ohlcv_1h: List[OHLCVBar]      # 300 frames of 1-hour bars
    ohlcv_1d: List[OHLCVBar]      # 300 frames of 1-day bars
```

**OHLCVBar Structure:**
```python
@dataclass
class OHLCVBar:
    symbol: str
    timestamp: datetime
    open: float
    high: float
    low: float
    close: float
    volume: float
    timeframe: str
    indicators: Dict[str, float] = field(default_factory=dict)
    
    # Enhanced TA properties (computed on-demand)
    @property
    def body_size(self) -> float: ...
    @property
    def upper_wick(self) -> float: ...
    @property
    def lower_wick(self) -> float: ...
    @property
    def total_range(self) -> float: ...
    @property
    def is_bullish(self) -> bool: ...
    @property
    def is_bearish(self) -> bool: ...
    @property
    def is_doji(self) -> bool: ...
    
    # Enhanced TA methods
    def get_body_to_range_ratio(self) -> float: ...
    def get_upper_wick_ratio(self) -> float: ...
    def get_lower_wick_ratio(self) -> float: ...
    def get_relative_size(self, reference_bars, method='avg') -> float: ...
    def get_candle_pattern(self) -> str: ...
    def get_ta_features(self, reference_bars=None) -> Dict[str, float]: ...
```

**See**: `docs/CANDLE_TA_FEATURES_REFERENCE.md` for complete TA feature documentation

### Reference Symbol Data (BTC)

```python
    btc_ohlcv_1s: List[OHLCVBar]  # 300 seconds of 1-second BTC bars
```

Used for correlation analysis and market-wide context.

### Consolidated Order Book (COB) Data

```python
    cob_data: Optional[COBData]   # Real-time order book snapshot
```

**COBData Structure:**
```python
@dataclass
class COBData:
    symbol: str
    timestamp: datetime
    current_price: float
    bucket_size: float                              # $1 for ETH, $10 for BTC
    price_buckets: Dict[float, Dict[str, float]]    # ±20 buckets around current price
    bid_ask_imbalance: Dict[float, float]           # Imbalance ratio per bucket
    volume_weighted_prices: Dict[float, float]      # VWAP within each bucket
    order_flow_metrics: Dict[str, float]            # Order flow indicators
    
    # Moving averages of COB imbalance for ±5 buckets
    ma_1s_imbalance: Dict[float, float]             # 1-second MA
    ma_5s_imbalance: Dict[float, float]             # 5-second MA
    ma_15s_imbalance: Dict[float, float]            # 15-second MA
    ma_60s_imbalance: Dict[float, float]            # 60-second MA
```

**Price Bucket Details:**
Each bucket contains:
- `bid_volume`: Total bid volume in USD
- `ask_volume`: Total ask volume in USD
- `total_volume`: Combined volume
- `imbalance`: (bid_volume - ask_volume) / total_volume

### COB Heatmap (Time-Series)

```python
    cob_heatmap_times: List[datetime]               # Timestamps for each snapshot
    cob_heatmap_prices: List[float]                 # Price levels tracked
    cob_heatmap_values: List[List[float]]           # 2D array: time × price buckets
```

Provides temporal evolution of order book liquidity and imbalance.

### Technical Indicators

```python
    technical_indicators: Dict[str, float]          # Calculated indicators
```

Common indicators include:
- `sma_5`, `sma_20`, `sma_50`, `sma_200`: Simple moving averages
- `ema_12`, `ema_26`: Exponential moving averages
- `rsi`: Relative Strength Index
- `macd`, `macd_signal`, `macd_hist`: MACD components
- `bb_upper`, `bb_middle`, `bb_lower`: Bollinger Bands
- `atr`: Average True Range
- `volatility`: Historical volatility
- `volume_ratio`: Current volume vs average
- `price_change_5m`, `price_change_15m`, `price_change_1h`: Price changes

### Pivot Points

```python
    pivot_points: List[PivotPoint]                  # Williams Market Structure pivots
```

**PivotPoint Structure:**
```python
@dataclass
class PivotPoint:
    symbol: str
    timestamp: datetime
    price: float
    type: str           # 'high' or 'low'
    level: int          # Pivot level (1, 2, 3, etc.)
    confidence: float   # Confidence score (0.0 to 1.0)
```

### Cross-Model Predictions

```python
    last_predictions: Dict[str, ModelOutput]        # Previous predictions from all models
```

Enables ensemble approaches and cross-model feeding. Keys are model names (e.g., 'cnn_v1', 'rl_agent', 'transformer').

### Market Microstructure

```python
    market_microstructure: Dict[str, Any]           # Additional market state data
```

May include:
- Spread metrics
- Liquidity depth
- Order arrival rates
- Trade flow toxicity
- Market impact estimates

### Position Information

```python
    position_info: Dict[str, Any]                   # Current trading position state
```

Contains:
- `has_position`: Boolean indicating if position is open
- `position_pnl`: Current profit/loss
- `position_size`: Size of position
- `entry_price`: Entry price of position
- `time_in_position_minutes`: Duration of position

---

## Feature Vector Conversion

The `get_feature_vector()` method converts the rich `BaseDataInput` structure into a **fixed-size numpy array** suitable for neural network input.

**Key Features:**
- **Automatic Normalization**: All OHLCV data normalized to 0-1 range by default
- **Independent Normalization**: Primary symbol and BTC normalized separately  
- **Daily Range**: Uses daily (longest timeframe) min/max for widest coverage
- **Cached Bounds**: Normalization boundaries cached for performance and denormalization
- **Fixed Size**: 7,850 features (standard) or 22,850 features (with candle TA)

### Feature Vector Breakdown

| Component | Features | Description |
|-----------|----------|-------------|
| **OHLCV ETH (4 timeframes)** | 6,000 | 300 frames × 4 timeframes × 5 values (OHLCV) |
| **OHLCV BTC (1s)** | 1,500 | 300 frames × 5 values (OHLCV) |
| **COB Features** | 200 | Price buckets + MAs + heatmap aggregates |
| **Technical Indicators** | 100 | Calculated indicators |
| **Last Predictions** | 45 | Cross-model predictions (9 models × 5 features) |
| **Position Info** | 5 | Position state |
| **TOTAL** | **7,850** | Fixed size |

### Normalization

#### NormalizationBounds Class

```python
@dataclass
class NormalizationBounds:
    """Normalization boundaries for price and volume data"""
    price_min: float
    price_max: float
    volume_min: float
    volume_max: float
    symbol: str
    timeframe: str = 'all'
    
    def normalize_price(self, price: float) -> float:
        """Normalize price to 0-1 range"""
        return (price - self.price_min) / (self.price_max - self.price_min)
    
    def denormalize_price(self, normalized: float) -> float:
        """Denormalize price from 0-1 range back to original"""
        return normalized * (self.price_max - self.price_min) + self.price_min
    
    def normalize_volume(self, volume: float) -> float:
        """Normalize volume to 0-1 range"""
        return (volume - self.volume_min) / (self.volume_max - self.volume_min)
    
    def denormalize_volume(self, normalized: float) -> float:
        """Denormalize volume from 0-1 range back to original"""
        return normalized * (self.volume_max - self.volume_min) + self.volume_min
```

#### How Normalization Works

1. **Primary Symbol (ETH)**: Uses daily (1d) timeframe data to compute min/max
   - Ensures all shorter timeframes (1s, 1m, 1h) fit within 0-1 range
   - Daily has widest price range, so all intraday prices normalize properly

2. **Reference Symbol (BTC)**: Uses its own 1s data to compute independent min/max
   - BTC and ETH have different price scales
   - Independent normalization ensures both are in 0-1 range

3. **Caching**: Bounds computed once and cached for performance
   - Access via `get_normalization_bounds()` and `get_btc_normalization_bounds()`
   - Useful for denormalizing model predictions back to actual prices

#### Usage Examples

```python
# Get feature vector with normalization (default)
features = base_data.get_feature_vector(normalize=True)
# All OHLCV values are now in 0-1 range

# Get raw features without normalization
features_raw = base_data.get_feature_vector(normalize=False)
# OHLCV values are in original price/volume units

# Access normalization bounds for denormalization
bounds = base_data.get_normalization_bounds()
print(f"Price range: {bounds.price_min:.2f} - {bounds.price_max:.2f}")

# Denormalize a model prediction
predicted_normalized = 0.75  # Model output
predicted_price = bounds.denormalize_price(predicted_normalized)
print(f"Predicted price: ${predicted_price:.2f}")

# BTC bounds (independent)
btc_bounds = base_data.get_btc_normalization_bounds()
print(f"BTC range: {btc_bounds.price_min:.2f} - {btc_bounds.price_max:.2f}")
```

### Feature Vector Implementation

```python
def get_feature_vector(self, include_candle_ta: bool = False, normalize: bool = True) -> np.ndarray:
    """
    Convert BaseDataInput to standardized feature vector for models
    
    Args:
        include_candle_ta: If True, include enhanced candle TA features
        normalize: If True, normalize OHLCV to 0-1 range (default: True)
    
    Returns:
        np.ndarray: FIXED SIZE standardized feature vector (7850 or 22850 features)
    """
    FIXED_FEATURE_SIZE = 22850 if include_candle_ta else 7850
    features = []
    
    # Get normalization bounds (cached)
    if normalize:
        norm_bounds = self._compute_normalization_bounds()
        btc_norm_bounds = self._compute_btc_normalization_bounds()
    
    # 1. OHLCV features for ETH (6000 features, normalized to 0-1)
    for ohlcv_list in [self.ohlcv_1s, self.ohlcv_1m, self.ohlcv_1h, self.ohlcv_1d]:
        ohlcv_frames = ohlcv_list[-300:] if len(ohlcv_list) >= 300 else ohlcv_list
        for bar in ohlcv_frames:
            if normalize:
                features.extend([
                    norm_bounds.normalize_price(bar.open),
                    norm_bounds.normalize_price(bar.high),
                    norm_bounds.normalize_price(bar.low),
                    norm_bounds.normalize_price(bar.close),
                    norm_bounds.normalize_volume(bar.volume)
                ])
            else:
                features.extend([bar.open, bar.high, bar.low, bar.close, bar.volume])
        frames_needed = 300 - len(ohlcv_frames)
        if frames_needed > 0:
            features.extend([0.0] * (frames_needed * 5))
    
    # 2. BTC OHLCV features (1500 features, normalized independently)
    btc_frames = self.btc_ohlcv_1s[-300:] if len(self.btc_ohlcv_1s) >= 300 else self.btc_ohlcv_1s
    for bar in btc_frames:
        if normalize:
            features.extend([
                btc_norm_bounds.normalize_price(bar.open),
                btc_norm_bounds.normalize_price(bar.high),
                btc_norm_bounds.normalize_price(bar.low),
                btc_norm_bounds.normalize_price(bar.close),
                btc_norm_bounds.normalize_volume(bar.volume)
            ])
        else:
            features.extend([bar.open, bar.high, bar.low, bar.close, bar.volume])
    btc_frames_needed = 300 - len(btc_frames)
    if btc_frames_needed > 0:
        features.extend([0.0] * (btc_frames_needed * 5))
    
    # 3. COB features (200 features)
    cob_features = []
    if self.cob_data:
        # Price bucket features (up to 160 features: 40 buckets × 4 metrics)
        price_keys = sorted(self.cob_data.price_buckets.keys())[:40]
        for price in price_keys:
            bucket_data = self.cob_data.price_buckets[price]
            cob_features.extend([
                bucket_data.get('bid_volume', 0.0),
                bucket_data.get('ask_volume', 0.0),
                bucket_data.get('total_volume', 0.0),
                bucket_data.get('imbalance', 0.0)
            ])
        
        # Moving averages (up to 10 features)
        ma_features = []
        for ma_dict in [self.cob_data.ma_1s_imbalance, self.cob_data.ma_5s_imbalance]:
            for price in sorted(list(ma_dict.keys())[:5]):
                ma_features.append(ma_dict[price])
                if len(ma_features) >= 10:
                    break
            if len(ma_features) >= 10:
                break
        cob_features.extend(ma_features)
        
        # Heatmap aggregates (remaining space)
        if self.cob_heatmap_values and self.cob_heatmap_prices:
            z = np.array(self.cob_heatmap_values, dtype=float)
            if z.ndim == 2 and z.size > 0:
                window_rows = z[-300:] if z.shape[0] >= 300 else z
                window_rows = np.nan_to_num(window_rows, nan=0.0)
                per_bucket_mean = window_rows.mean(axis=0).tolist()
                space_left = 200 - len(cob_features)
                if space_left > 0:
                    cob_features.extend(per_bucket_mean[:space_left])
    
    # Pad COB features to exactly 200
    cob_features.extend([0.0] * (200 - len(cob_features)))
    features.extend(cob_features[:200])
    
    # 4. Technical indicators (100 features)
    indicator_values = list(self.technical_indicators.values())
    features.extend(indicator_values[:100])
    features.extend([0.0] * max(0, 100 - len(indicator_values)))
    
    # 5. Last predictions (45 features)
    prediction_features = []
    for model_output in self.last_predictions.values():
        prediction_features.extend([
            model_output.confidence,
            model_output.predictions.get('buy_probability', 0.0),
            model_output.predictions.get('sell_probability', 0.0),
            model_output.predictions.get('hold_probability', 0.0),
            model_output.predictions.get('expected_reward', 0.0)
        ])
    features.extend(prediction_features[:45])
    features.extend([0.0] * max(0, 45 - len(prediction_features)))
    
    # 6. Position info (5 features)
    position_features = [
        1.0 if self.position_info.get('has_position', False) else 0.0,
        self.position_info.get('position_pnl', 0.0),
        self.position_info.get('position_size', 0.0),
        self.position_info.get('entry_price', 0.0),
        self.position_info.get('time_in_position_minutes', 0.0)
    ]
    features.extend(position_features)
    
    # Ensure exactly FIXED_FEATURE_SIZE
    if len(features) > FIXED_FEATURE_SIZE:
        features = features[:FIXED_FEATURE_SIZE]
    elif len(features) < FIXED_FEATURE_SIZE:
        features.extend([0.0] * (FIXED_FEATURE_SIZE - len(features)))
    
    assert len(features) == FIXED_FEATURE_SIZE
    return np.array(features, dtype=np.float32)
```

---

## Extensibility

### Adding New Features

The `BaseDataInput` structure is designed for extensibility. To add new features:

#### 1. Add New Field to BaseDataInput

```python
@dataclass
class BaseDataInput:
    # ... existing fields ...
    
    # NEW: Add your new feature
    sentiment_data: Dict[str, float] = field(default_factory=dict)
```

#### 2. Update get_feature_vector()

**Option A: Add to existing feature slots (if space available)**

```python
def get_feature_vector(self) -> np.ndarray:
    # ... existing code ...
    
    # Add sentiment features to technical indicators section
    sentiment_features = [
        self.sentiment_data.get('twitter_sentiment', 0.0),
        self.sentiment_data.get('news_sentiment', 0.0),
        self.sentiment_data.get('fear_greed_index', 0.0)
    ]
    indicator_values.extend(sentiment_features)
    # ... rest of code ...
```

**Option B: Increase FIXED_FEATURE_SIZE (requires model retraining)**

```python
def get_feature_vector(self) -> np.ndarray:
    FIXED_FEATURE_SIZE = 7900  # Increased from 7850
    
    # ... existing features (7850) ...
    
    # NEW: Sentiment features (50 features)
    sentiment_features = []
    for key in sorted(self.sentiment_data.keys())[:50]:
        sentiment_features.append(self.sentiment_data[key])
    features.extend(sentiment_features[:50])
    features.extend([0.0] * max(0, 50 - len(sentiment_features)))
    
    # ... ensure FIXED_FEATURE_SIZE ...
```

#### 3. Update Data Provider

Ensure your data provider populates the new field:

```python
def build_base_data_input(self, symbol: str) -> BaseDataInput:
    # ... existing code ...
    
    # NEW: Add sentiment data
    sentiment_data = self._get_sentiment_data(symbol)
    
    return BaseDataInput(
        # ... existing fields ...
        sentiment_data=sentiment_data
    )
```

### Best Practices for Extension

1. **Maintain Fixed Size**: If adding features, either:
   - Use existing padding space
   - Increase `FIXED_FEATURE_SIZE` and retrain all models

2. **Zero Padding**: Always pad missing data with zeros, never synthetic data

3. **Validation**: Update `validate()` method if new fields are required

4. **Documentation**: Update this document with new feature descriptions

5. **Backward Compatibility**: Consider versioning if making breaking changes

---

## Current Usage Status

### Models Using BaseDataInput

✅ **StandardizedCNN** (`NN/models/standardized_cnn.py`)
- Uses `get_feature_vector()` directly
- Expected input: 7,834 features (close to 7,850)

✅ **Orchestrator** (`core/orchestrator.py`)
- Builds BaseDataInput via `data_provider.build_base_data_input()`
- Passes to all models

✅ **UnifiedTrainingManager** (`core/unified_training_manager_v2.py`)
- Converts BaseDataInput to DQN state via `get_feature_vector()`

✅ **Dashboard** (`web/clean_dashboard.py`)
- Creates BaseDataInput for CNN predictions
- Uses `get_feature_vector()` for feature extraction

### Alternative Implementations Found

⚠️ **ModelInputData** (`core/unified_model_data_interface.py`)
- **Status**: Legacy/alternative interface
- **Usage**: Limited, primarily for model-specific preprocessing
- **Recommendation**: Migrate to BaseDataInput for consistency

⚠️ **MockBaseDataInput** (`COBY/integration/orchestrator_adapter.py`)
- **Status**: Temporary adapter for COBY integration
- **Usage**: Provides BaseDataInput interface for COBY data
- **Recommendation**: Replace with proper BaseDataInput construction

### Models NOT Using BaseDataInput

❌ **RealtimeRLCOBTrader** (`core/realtime_rl_cob_trader.py`)
- Uses custom `_extract_features()` method
- **Recommendation**: Migrate to BaseDataInput

❌ **Some legacy models** may use direct feature extraction
- **Recommendation**: Audit and migrate to BaseDataInput

---

## Validation

The `validate()` method ensures data quality:

```python
def validate(self) -> bool:
    """
    Validate that the BaseDataInput contains required data
    
    Returns:
        bool: True if valid, False otherwise
    """
    # Check minimum OHLCV data
    if len(self.ohlcv_1s) < 100:
        return False
    if len(self.btc_ohlcv_1s) < 100:
        return False
    
    # Check timestamp
    if not self.timestamp:
        return False
    
    # Check symbol format
    if not self.symbol or '/' not in self.symbol:
        return False
    
    return True
```

---

## Related Classes

### ModelOutput

Output structure for model predictions:

```python
@dataclass
class ModelOutput:
    model_type: str                                 # 'cnn', 'rl', 'lstm', 'transformer'
    model_name: str                                 # Specific model identifier
    symbol: str
    timestamp: datetime
    confidence: float
    predictions: Dict[str, Any]                     # Model-specific predictions
    hidden_states: Optional[Dict[str, Any]]         # For cross-model feeding
    metadata: Dict[str, Any]                        # Additional info
```

### COBSnapshot

Raw consolidated order book data (transformed into COBData):

```python
@dataclass
class COBSnapshot:
    symbol: str
    timestamp: datetime
    consolidated_bids: List[ConsolidatedOrderBookLevel]
    consolidated_asks: List[ConsolidatedOrderBookLevel]
    exchanges_active: List[str]
    volume_weighted_mid: float
    total_bid_liquidity: float
    total_ask_liquidity: float
    spread_bps: float
    liquidity_imbalance: float
    price_buckets: Dict[str, Dict[str, float]]
```

### PredictionSnapshot

Stores predictions with inputs for future training:

```python
@dataclass
class PredictionSnapshot:
    prediction_id: str
    symbol: str
    prediction_time: datetime
    target_horizon_minutes: int
    target_time: datetime
    current_price: float
    predicted_min_price: float
    predicted_max_price: float
    confidence: float
    model_inputs: Dict[str, Any]                    # Includes BaseDataInput features
    market_state: Dict[str, Any]
    technical_indicators: Dict[str, Any]
    pivot_analysis: Dict[str, Any]
    actual_min_price: Optional[float]
    actual_max_price: Optional[float]
    outcome_known: bool
```

---

## Migration Guide

### For Models Not Using BaseDataInput

1. **Identify current input method**
   ```python
   # OLD
   features = self._extract_features(symbol, data)
   ```

2. **Update to use BaseDataInput**
   ```python
   # NEW
   base_data = self.data_provider.build_base_data_input(symbol)
   if base_data and base_data.validate():
       features = base_data.get_feature_vector()
   ```

3. **Update model interface**
   ```python
   # OLD
   def predict(self, features: np.ndarray) -> Dict:
   
   # NEW
   def predict(self, base_input: BaseDataInput) -> ModelOutput:
       features = base_input.get_feature_vector()
       # ... prediction logic ...
   ```

4. **Test thoroughly**
   - Verify feature vector size matches expectations
   - Check for NaN or infinite values
   - Validate predictions are reasonable

---

## Performance Considerations

### Memory Usage

- **BaseDataInput object**: ~2-5 MB per instance
- **Feature vector**: 7,850 × 4 bytes = 31.4 KB
- **Recommendation**: Cache BaseDataInput for 1-2 seconds, regenerate feature vectors as needed

### Computation Time

- **Building BaseDataInput**: ~5-10 ms
- **get_feature_vector()**: ~1-2 ms
- **Total overhead**: Negligible for real-time trading

### Optimization Tips

1. **Reuse OHLCV data**: Cache OHLCV bars across multiple BaseDataInput instances
2. **Lazy evaluation**: Only compute features when `get_feature_vector()` is called
3. **Batch processing**: Process multiple symbols in parallel
4. **Avoid deep copies**: Use references where possible

---

## Testing

### Unit Tests

```python
def test_base_data_input_feature_vector():
    """Test that feature vector has correct size"""
    base_data = create_test_base_data_input()
    features = base_data.get_feature_vector()
    
    assert len(features) == 7850
    assert features.dtype == np.float32
    assert not np.isnan(features).any()
    assert not np.isinf(features).any()

def test_base_data_input_validation():
    """Test validation logic"""
    base_data = create_test_base_data_input()
    assert base_data.validate() == True
    
    # Test with insufficient data
    base_data.ohlcv_1s = []
    assert base_data.validate() == False
```

### Integration Tests

```python
def test_model_with_base_data_input():
    """Test model prediction with BaseDataInput"""
    orchestrator = create_test_orchestrator()
    base_data = orchestrator.data_provider.build_base_data_input('ETH/USDT')
    
    assert base_data is not None
    assert base_data.validate()
    
    # Test CNN prediction
    cnn_output = orchestrator.cnn_model.predict_from_base_input(base_data)
    assert isinstance(cnn_output, ModelOutput)
    assert 0.0 <= cnn_output.confidence <= 1.0
```

---

## Future Enhancements

### Planned Features

1. **Multi-Symbol Support**: Extend to support multiple correlated symbols
2. **Alternative Data**: Add social sentiment, on-chain metrics, macro indicators
3. **Feature Importance**: Track which features contribute most to predictions
4. **Compression**: Implement feature compression for faster transmission
5. **Versioning**: Add version field for backward compatibility

### Research Directions

1. **Adaptive Feature Selection**: Dynamically select relevant features per market regime
2. **Hierarchical Features**: Group related features for better model interpretability
3. **Temporal Attention**: Weight recent data more heavily than historical
4. **Cross-Asset Features**: Include correlations with other asset classes

---

## Conclusion

`BaseDataInput` is the cornerstone of the multi-modal trading system, providing:

- ✅ **Consistency**: All models use the same input format
- ✅ **Extensibility**: Easy to add new features without breaking existing code
- ✅ **Performance**: Fixed-size feature vectors enable efficient computation
- ✅ **Quality**: Validation ensures data integrity
- ✅ **Flexibility**: Supports multiple timeframes, order book data, and cross-model feeding

**All new models MUST use BaseDataInput** to ensure system-wide consistency and maintainability.

---

## References

- **Implementation**: `core/data_models.py`
- **Data Provider**: `core/standardized_data_provider.py`
- **Model Example**: `NN/models/standardized_cnn.py`
- **Training**: `core/unified_training_manager_v2.py`
- **FIFO Queue System**: `docs/fifo_queue_system.md`