data stream

This commit is contained in:
Dobromir Popov
2025-09-02 17:29:18 +03:00
parent e0fb76d9c7
commit 8068e554f3
8 changed files with 370 additions and 322 deletions

View File

@@ -1,168 +1,37 @@
# Data Stream Monitor
A comprehensive system for capturing and streaming all model input data in console-friendly text format, suitable for snapshots, training, and replay functionality.
## Overview
The Data Stream Monitor captures real-time data flows through the trading system and outputs them in two formats:
- **Detailed**: Human-readable format with clear sections
- **Compact**: JSON format for programmatic processing
## Data Streams Captured
### Market Data
- **OHLCV Data**: Multi-timeframe candlestick data (1m, 5m, 15m)
- **Tick Data**: Real-time trade ticks with price, volume, and side
- **COB Data**: Consolidated Order Book snapshots with imbalance and spread metrics
### Model Data
- **Technical Indicators**: RSI, MACD, Bollinger Bands, etc.
- **Model States**: Current state vectors for each model (DQN, CNN, RL)
- **Predictions**: Recent predictions from all active models
- **Training Experiences**: State-action-reward tuples from RL training
The Data Stream Monitor captures and streams all model input data for analysis, snapshots, and replay. It is now fully managed by the `TradingOrchestrator` and starts automatically with the dashboard.
## Quick Start
### 1. Start the Dashboard
```bash
source venv/bin/activate
# Start the dashboard (starts the data stream automatically)
python run_clean_dashboard.py
```
### 2. Start Data Streaming
```bash
python data_stream_control.py start
## Status
The orchestrator manages the data stream. You can check status in the dashboard logs; you should see a line like:
```
INFO - Data stream monitor initialized and started by orchestrator
```
### 3. Control Streaming
```bash
# Check status
python data_stream_control.py status
## What it Collects
# Switch to compact format
python data_stream_control.py compact
- OHLCV data (1m, 5m, 15m)
- Tick data
- COB (order book) features (when available)
- Technical indicators
- Model states and predictions
- Training experiences for RL
# Save current snapshot
python data_stream_control.py snapshot
## Snapshots
# Stop streaming
python data_stream_control.py stop
```
Snapshots are saved from within the running system when needed. The monitor API provides `save_snapshot(filepath)` if you call it programmatically.
## Output Formats
## Notes
### Detailed Format
```
================================================================================
DATA STREAM SAMPLE - 14:30:15
================================================================================
OHLCV (1m): ETH/USDT | O:4335.67 H:4338.92 L:4334.21 C:4336.67 V:125.8
TICK: ETH/USDT | Price:4336.67 Vol:0.0456 Side:buy
COB: ETH/USDT | Imbalance:0.234 Spread:2.3bps Mid:4336.67
DQN State: 15 features | Price:4336.67
DQN Prediction: BUY (conf:0.78)
Training Exp: Action:1 Reward:0.0234 Done:False
================================================================================
```
### Compact Format
```json
{"timestamp":"2024-01-15T14:30:15","ohlcv_count":5,"ticks_count":12,"cob_count":8,"predictions_count":3,"experiences_count":7,"price":4336.67,"volume":125.8,"imbalance":0.234,"spread_bps":2.3}
```
## Files
### Core Components
- `data_stream_monitor.py` - Main streaming engine
- `data_stream_control.py` - Command-line control interface
- `demo_data_stream.py` - Usage examples and demo
### Integration Points
- `run_clean_dashboard.py` - Auto-initializes streaming
- `core/orchestrator.py` - Provides prediction data
- `NN/training/enhanced_realtime_training.py` - Provides training data
## Configuration
The streaming system is configurable via the `stream_config` dictionary:
```python
stream_config = {
'console_output': True, # Enable/disable console output
'compact_format': False, # Use compact JSON format
'include_timestamps': True, # Include timestamps in output
'filter_symbols': ['ETH/USDT'], # Symbols to monitor
'sampling_rate': 1.0 # Sampling rate in seconds
}
```
## Use Cases
### Training Data Collection
- Capture real market conditions during training
- Build datasets for offline model validation
- Replay specific market scenarios
### Debugging and Monitoring
- Monitor model input data in real-time
- Debug prediction inconsistencies
- Validate data pipeline integrity
### Snapshot and Replay
- Save complete system state for later analysis
- Replay specific time periods
- Compare model behavior across different market conditions
## Technical Details
### Data Collection
- **Thread-safe**: Uses separate thread for data collection
- **Memory-efficient**: Configurable buffer sizes with automatic cleanup
- **Error-resilient**: Continues streaming even if individual data sources fail
### Integration
- **Non-intrusive**: Doesn't affect main trading system performance
- **Optional**: Can be disabled without affecting core functionality
- **Extensible**: Easy to add new data streams
### Performance
- **Low overhead**: Minimal CPU and memory usage
- **Configurable sampling**: Adjust sampling rate based on needs
- **Efficient storage**: Circular buffers prevent memory leaks
## Command Reference
| Command | Description |
|---------|-------------|
| `start` | Start data streaming |
| `stop` | Stop data streaming |
| `status` | Show current status and buffer sizes |
| `snapshot` | Save current data snapshot to file |
| `compact` | Switch to compact JSON format |
| `detailed` | Switch to detailed human-readable format |
## Troubleshooting
### Streaming Not Starting
- Ensure dashboard is running first
- Check that venv is activated
- Verify data_stream_monitor.py is in project root
### No Data Output
- Check streaming status with `python data_stream_control.py status`
- Verify market data is available (check dashboard logs)
- Ensure models are active and making predictions
### Performance Issues
- Reduce sampling rate in stream_config
- Switch to compact format for less output
- Decrease buffer sizes if memory is limited
## Future Enhancements
- **File output**: Save streaming data to rotating log files
- **WebSocket output**: Stream data to external consumers
- **Compression**: Automatic compression for long-term storage
- **Filtering**: Advanced filtering based on market conditions
- **Metrics**: Built-in performance metrics and statistics
- No separate process or control script is required.
- The monitor runs inside the dashboard/orchestrator process for consistency.