Files
gogo2/DATA_STREAM_README.md
2025-09-02 16:05:44 +03:00

5.3 KiB

Data Stream Monitor

A comprehensive system for capturing and streaming all model input data in console-friendly text format, suitable for snapshots, training, and replay functionality.

Overview

The Data Stream Monitor captures real-time data flows through the trading system and outputs them in two formats:

  • Detailed: Human-readable format with clear sections
  • Compact: JSON format for programmatic processing

Data Streams Captured

Market Data

  • OHLCV Data: Multi-timeframe candlestick data (1m, 5m, 15m)
  • Tick Data: Real-time trade ticks with price, volume, and side
  • COB Data: Consolidated Order Book snapshots with imbalance and spread metrics

Model Data

  • Technical Indicators: RSI, MACD, Bollinger Bands, etc.
  • Model States: Current state vectors for each model (DQN, CNN, RL)
  • Predictions: Recent predictions from all active models
  • Training Experiences: State-action-reward tuples from RL training

Quick Start

1. Start the Dashboard

source venv/bin/activate
python run_clean_dashboard.py

2. Start Data Streaming

python data_stream_control.py start

3. Control Streaming

# Check status
python data_stream_control.py status

# Switch to compact format
python data_stream_control.py compact

# Save current snapshot
python data_stream_control.py snapshot

# Stop streaming
python data_stream_control.py stop

Output Formats

Detailed Format

================================================================================
DATA STREAM SAMPLE - 14:30:15
================================================================================
OHLCV (1m): ETH/USDT | O:4335.67 H:4338.92 L:4334.21 C:4336.67 V:125.8
TICK: ETH/USDT | Price:4336.67 Vol:0.0456 Side:buy
COB: ETH/USDT | Imbalance:0.234 Spread:2.3bps Mid:4336.67
DQN State: 15 features | Price:4336.67
DQN Prediction: BUY (conf:0.78)
Training Exp: Action:1 Reward:0.0234 Done:False
================================================================================

Compact Format

{"timestamp":"2024-01-15T14:30:15","ohlcv_count":5,"ticks_count":12,"cob_count":8,"predictions_count":3,"experiences_count":7,"price":4336.67,"volume":125.8,"imbalance":0.234,"spread_bps":2.3}

Files

Core Components

  • data_stream_monitor.py - Main streaming engine
  • data_stream_control.py - Command-line control interface
  • demo_data_stream.py - Usage examples and demo

Integration Points

  • run_clean_dashboard.py - Auto-initializes streaming
  • core/orchestrator.py - Provides prediction data
  • NN/training/enhanced_realtime_training.py - Provides training data

Configuration

The streaming system is configurable via the stream_config dictionary:

stream_config = {
    'console_output': True,      # Enable/disable console output
    'compact_format': False,     # Use compact JSON format
    'include_timestamps': True,  # Include timestamps in output
    'filter_symbols': ['ETH/USDT'],  # Symbols to monitor
    'sampling_rate': 1.0         # Sampling rate in seconds
}

Use Cases

Training Data Collection

  • Capture real market conditions during training
  • Build datasets for offline model validation
  • Replay specific market scenarios

Debugging and Monitoring

  • Monitor model input data in real-time
  • Debug prediction inconsistencies
  • Validate data pipeline integrity

Snapshot and Replay

  • Save complete system state for later analysis
  • Replay specific time periods
  • Compare model behavior across different market conditions

Technical Details

Data Collection

  • Thread-safe: Uses separate thread for data collection
  • Memory-efficient: Configurable buffer sizes with automatic cleanup
  • Error-resilient: Continues streaming even if individual data sources fail

Integration

  • Non-intrusive: Doesn't affect main trading system performance
  • Optional: Can be disabled without affecting core functionality
  • Extensible: Easy to add new data streams

Performance

  • Low overhead: Minimal CPU and memory usage
  • Configurable sampling: Adjust sampling rate based on needs
  • Efficient storage: Circular buffers prevent memory leaks

Command Reference

Command Description
start Start data streaming
stop Stop data streaming
status Show current status and buffer sizes
snapshot Save current data snapshot to file
compact Switch to compact JSON format
detailed Switch to detailed human-readable format

Troubleshooting

Streaming Not Starting

  • Ensure dashboard is running first
  • Check that venv is activated
  • Verify data_stream_monitor.py is in project root

No Data Output

  • Check streaming status with python data_stream_control.py status
  • Verify market data is available (check dashboard logs)
  • Ensure models are active and making predictions

Performance Issues

  • Reduce sampling rate in stream_config
  • Switch to compact format for less output
  • Decrease buffer sizes if memory is limited

Future Enhancements

  • File output: Save streaming data to rotating log files
  • WebSocket output: Stream data to external consumers
  • Compression: Automatic compression for long-term storage
  • Filtering: Advanced filtering based on market conditions
  • Metrics: Built-in performance metrics and statistics