kiro steering, live training wip

2025-11-13 15:09:20 +02:00
parent 1af3124be7
commit 25287d0e9e
8 changed files with 1319 additions and 302 deletions
--- a/.kiro/steering/product.md
+++ b/.kiro/steering/product.md
@@ -0,0 +1,40 @@
+# Product Overview
+
+## Clean Trading System
+
+A modular cryptocurrency trading system that uses deep learning (CNN and RL models) for multi-timeframe market analysis and automated trading decisions.
+
+## Core Capabilities
+
+- **Multi-timeframe analysis**: 1s, 1m, 5m, 1h, 4h, 1d scalping with focus on ultra-fast execution
+- **Neural network models**: CNN for pattern recognition, RL/DQN for trading decisions, Transformer for long-range dependencies
+- **Real-time trading**: Live market data from multiple exchanges (Binance, Bybit, Deribit, MEXC)
+- **Web dashboard**: Real-time monitoring, visualization, and training controls
+- **Multi-horizon predictions**: 1m, 5m, 15m, 60m prediction horizons with deferred training
+
+## Key Subsystems
+
+### COBY (Cryptocurrency Order Book Yielder)
+Multi-exchange data aggregation system that collects real-time order book and OHLCV data, aggregates into standardized formats, and provides both live feeds and historical replay.
+
+### NN (Neural Network Trading)
+500M+ parameter system using Mixture of Experts (MoE) approach with CNN (100M params), Transformer, and RL models for pattern detection and trading signals.
+
+### ANNOTATE
+Manual trade annotation UI for marking profitable buy/sell signals on historical data to generate high-quality training test cases.
+
+## Critical Policy
+
+**NO SYNTHETIC DATA**: System uses EXCLUSIVELY real market data from cryptocurrency exchanges. No synthetic, generated, simulated, or mock data is allowed for training, testing, or inference. Zero tolerance policy.
+
+## Trading Modes
+
+- **Simulation**: Paper trading with simulated account
+- **Testnet**: Exchange testnet environments
+- **Live**: Real money trading (requires explicit configuration)
+
+## Primary Symbols
+
+- ETH/USDT (main trading pair for signal generation)
+- BTC/USDT (reference for correlation analysis)
+- SOL/USDT (reference for correlation analysis)
--- a/.kiro/steering/structure.md
+++ b/.kiro/steering/structure.md
@@ -0,0 +1,233 @@
+# Project Structure & Architecture
+
+## Module Organization
+
+### core/ - Core Trading System
+Central trading logic and data management.
+
+**Key modules**:
+- `orchestrator.py`: Decision coordination, combines CNN/RL predictions
+- `data_provider.py`: Real market data fetching (Binance API)
+- `data_models.py`: Shared data structures (OHLCV, features, predictions)
+- `config.py`: Configuration management
+- `trading_executor.py`: Order execution and position management
+- `exchanges/`: Exchange-specific implementations (Binance, Bybit, Deribit, MEXC)
+
+**Multi-horizon system**:
+- `multi_horizon_prediction_manager.py`: Generates 1m/5m/15m/60m predictions
+- `multi_horizon_trainer.py`: Deferred training when outcomes known
+- `prediction_snapshot_storage.py`: Efficient prediction storage
+
+**Training**:
+- `extrema_trainer.py`: Trains on market extrema (pivots)
+- `training_integration.py`: Training pipeline integration
+- `overnight_training_coordinator.py`: Scheduled training sessions
+
+### NN/ - Neural Network Models
+Deep learning models for pattern recognition and trading decisions.
+
+**models/**:
+- `enhanced_cnn.py`: CNN for pattern recognition (100M params)
+- `standardized_cnn.py`: Standardized CNN interface
+- `advanced_transformer_trading.py`: Transformer for long-range dependencies
+- `dqn_agent.py`: Deep Q-Network for RL trading
+- `model_interfaces.py`: Abstract interfaces for all models
+
+**training/**:
+- Training pipelines for each model type
+- Batch processing and optimization
+
+**utils/**:
+- `data_interface.py`: Connects to realtime data
+- Feature engineering and preprocessing
+
+### COBY/ - Data Aggregation System
+Multi-exchange order book and OHLCV data collection.
+
+**Structure**:
+- `main.py`: Entry point
+- `config.py`: COBY-specific configuration
+- `models/core.py`: Data models (OrderBookSnapshot, TradeEvent, PriceBuckets)
+- `interfaces/`: Abstract interfaces for connectors, processors, storage
+- `api/rest_api.py`: FastAPI REST endpoints
+- `web/static/`: Dashboard UI (http://localhost:8080)
+- `connectors/`: Exchange WebSocket connectors
+- `storage/`: TimescaleDB/Redis integration
+- `monitoring/`: System monitoring and metrics
+
+### ANNOTATE/ - Manual Annotation UI
+Web interface for marking profitable trades on historical data.
+
+**Structure**:
+- `web/app.py`: Flask/Dash application
+- `web/templates/`: Jinja2 HTML templates
+- `core/annotation_manager.py`: Annotation storage and retrieval
+- `core/training_simulator.py`: Simulates training with annotations
+- `core/data_loader.py`: Historical data loading
+- `data/annotations/`: Saved annotations
+- `data/test_cases/`: Generated training test cases
+
+### web/ - Main Dashboard
+Real-time monitoring and visualization.
+
+**Key files**:
+- `clean_dashboard.py`: Main dashboard application
+- `cob_realtime_dashboard.py`: COB-specific dashboard
+- `component_manager.py`: UI component management
+- `layout_manager.py`: Dashboard layout
+- `models_training_panel.py`: Training controls
+- `prediction_chart.py`: Prediction visualization
+
+### models/ - Model Checkpoints
+Trained model weights and checkpoints.
+
+**Organization**:
+- `cnn/`: CNN model checkpoints
+- `rl/`: RL model checkpoints
+- `enhanced_cnn/`: Enhanced CNN variants
+- `enhanced_rl/`: Enhanced RL variants
+- `best_models/`: Best performing models
+- `checkpoints/`: Training checkpoints
+
+### utils/ - Shared Utilities
+Common functionality across modules.
+
+**Key utilities**:
+- `checkpoint_manager.py`: Model checkpoint save/load
+- `cache_manager.py`: Data caching
+- `database_manager.py`: SQLite database operations
+- `inference_logger.py`: Prediction logging
+- `timezone_utils.py`: Timezone handling
+- `training_integration.py`: Training pipeline utilities
+
+### data/ - Data Storage
+Databases and cached data.
+
+**Contents**:
+- `predictions.db`: SQLite prediction database
+- `trading_system.db`: Trading metadata
+- `cache/`: Cached market data
+- `prediction_snapshots/`: Stored predictions for training
+- `text_exports/`: Exported data for analysis
+
+### cache/ - Data Caching
+High-performance data caching.
+
+**Contents**:
+- `trading_data.duckdb`: DuckDB time-series storage
+- `parquet_store/`: Parquet files for efficient storage
+- `monthly_1s_data/`: Monthly 1-second data cache
+- `pivot_bounds/`: Cached pivot calculations
+
+### @checkpoints/ - Checkpoint Archive
+Archived model checkpoints organized by type.
+
+**Organization**:
+- `cnn/`, `dqn/`, `hybrid/`, `rl/`, `transformer/`: By model type
+- `best_models/`: Best performers
+- `archive/`: Historical checkpoints
+
+## Architecture Patterns
+
+### Data Flow
+```
+Exchange APIs → DataProvider → Orchestrator → Models (CNN/RL/Transformer)
+                                    ↓
+                            Trading Executor → Exchange APIs
+```
+
+### Training Flow
+```
+Real Market Data → Feature Engineering → Model Training → Checkpoint Save
+                                              ↓
+                                    Validation & Metrics
+```
+
+### Multi-Horizon Flow
+```
+Orchestrator → PredictionManager → Generate predictions (1m/5m/15m/60m)
+                                          ↓
+                                  SnapshotStorage
+                                          ↓
+                            Wait for target time (deferred)
+                                          ↓
+                            MultiHorizonTrainer → Train models
+```
+
+### COBY Data Flow
+```
+Exchange WebSockets → Connectors → DataProcessor → AggregationEngine
+                                                          ↓
+                                                  StorageManager
+                                                          ↓
+                                            TimescaleDB + Redis
+```
+
+## Dependency Patterns
+
+### Core Dependencies
+- `orchestrator.py` depends on: all models, data_provider, trading_executor
+- `data_provider.py` depends on: cache_manager, timezone_utils
+- Models depend on: data_models, checkpoint_manager
+
+### Dashboard Dependencies
+- `clean_dashboard.py` depends on: orchestrator, data_provider, all models
+- Uses component_manager and layout_manager for UI
+
+### Circular Dependency Prevention
+- Use abstract interfaces (model_interfaces.py)
+- Dependency injection for orchestrator
+- Lazy imports where needed
+
+## Configuration Hierarchy
+
+1. **config.yaml**: Main system config (exchanges, symbols, trading params)
+2. **models.yml**: Model-specific settings (architecture, training)
+3. **.env**: Sensitive credentials (API keys, passwords)
+4. Module-specific configs in each subsystem (COBY/config.py, etc.)
+
+## Naming Conventions
+
+### Files
+- Snake_case for Python files: `data_provider.py`
+- Descriptive names: `multi_horizon_prediction_manager.py`
+
+### Classes
+- PascalCase: `DataProvider`, `MultiHorizonTrainer`
+- Descriptive: `PredictionSnapshotStorage`
+
+### Functions
+- Snake_case: `get_ohlcv_data()`, `train_model()`
+- Verb-noun pattern: `calculate_features()`, `save_checkpoint()`
+
+### Variables
+- Snake_case: `prediction_data`, `model_output`
+- Descriptive: `cnn_confidence_threshold`
+
+## Import Patterns
+
+### Absolute imports preferred
+```python
+from core.data_provider import DataProvider
+from NN.models.enhanced_cnn import EnhancedCNN
+```
+
+### Relative imports for same package
+```python
+from .data_models import OHLCV
+from ..utils import checkpoint_manager
+```
+
+## Testing Structure
+
+- Unit tests in `tests/` directory
+- Integration tests: `test_integration.py`
+- Component-specific tests: `test_cnn_only.py`, `test_training.py`
+- Use pytest framework
+
+## Documentation
+
+- Module-level docstrings in each file
+- README.md in major subsystems (COBY/, NN/, ANNOTATE/)
+- Architecture docs in root: `COB_MODEL_ARCHITECTURE_DOCUMENTATION.md`, `MULTI_HORIZON_TRAINING_SYSTEM.md`
+- Implementation summaries: `IMPLEMENTATION_SUMMARY.md`, `TRAINING_IMPROVEMENTS_SUMMARY.md`
--- a/.kiro/steering/tech.md
+++ b/.kiro/steering/tech.md
@@ -0,0 +1,181 @@
+# Technology Stack
+
+## Core Technologies
+
+### Python Ecosystem
+- **Python 3.x**: Primary language
+- **PyTorch**: Deep learning framework (CPU/CUDA/DirectML support)
+- **NumPy/Pandas**: Data manipulation and analysis
+- **scikit-learn**: ML utilities and preprocessing
+
+### Web & API
+- **Dash/Plotly**: Interactive web dashboard
+- **Flask**: ANNOTATE web UI
+- **FastAPI**: COBY REST API
+- **WebSockets**: Real-time data streaming
+
+### Data Storage
+- **DuckDB**: Primary data storage (time-series optimized)
+- **SQLite**: Metadata and predictions database
+- **Redis**: High-performance caching (COBY)
+- **TimescaleDB**: Optional time-series storage (COBY)
+
+### Exchange Integration
+- **ccxt**: Multi-exchange API library
+- **websocket-client**: Real-time market data
+- **pybit**: Bybit-specific integration
+
+### Monitoring & Logging
+- **TensorBoard**: Training visualization
+- **wandb**: Experiment tracking
+- **structlog**: Structured logging (COBY)
+
+## Hardware Acceleration
+
+### GPU Support
+- NVIDIA CUDA (via PyTorch CUDA builds)
+- AMD DirectML (via onnxruntime-directml)
+- CPU fallback (default PyTorch CPU build)
+
+**Note**: PyTorch is NOT in requirements.txt to avoid pulling NVIDIA CUDA deps on AMD machines. Install manually based on hardware.
+
+## Project Structure
+
+```
+gogo2/
+├── core/              # Core trading system components
+├── models/            # Trained model checkpoints
+├── NN/                # Neural network models and training
+├── COBY/              # Multi-exchange data aggregation
+├── ANNOTATE/          # Manual annotation UI
+├── web/               # Main dashboard
+├── utils/             # Shared utilities
+├── cache/             # Data caching
+├── data/              # Databases and exports
+├── logs/              # System logs
+└── @checkpoints/      # Model checkpoints archive
+```
+
+## Configuration
+
+- **config.yaml**: Main system configuration (exchanges, symbols, timeframes, trading params)
+- **models.yml**: Model-specific settings (CNN, RL, training)
+- **.env**: Sensitive credentials (API keys, database passwords)
+
+## Common Commands
+
+### Running the System
+
+```bash
+# Main dashboard with live training
+python main_dashboard.py --port 8051
+
+# Dashboard without training
+python main_dashboard.py --port 8051 --no-training
+
+# Clean dashboard (alternative)
+python run_clean_dashboard.py
+```
+
+### Training
+
+```bash
+# Unified training runner - realtime mode
+python training_runner.py --mode realtime --duration 4
+
+# Backtest training
+python training_runner.py --mode backtest --start-date 2024-01-01 --end-date 2024-12-31
+
+# CNN training with TensorBoard
+python main_clean.py --mode cnn --symbol ETH/USDT
+tensorboard --logdir=runs
+
+# RL training
+python main_clean.py --mode rl --symbol ETH/USDT
+```
+
+### Backtesting
+
+```bash
+# 30-day backtest
+python main_backtest.py --start 2024-01-01 --end 2024-01-31
+
+# Custom symbol and window
+python main_backtest.py --start 2024-01-01 --end 2024-12-31 --symbol BTC/USDT --window 48
+```
+
+### COBY System
+
+```bash
+# Start COBY data aggregation
+python COBY/main.py --debug
+
+# Access COBY dashboard: http://localhost:8080
+# COBY API: http://localhost:8080/api/...
+# COBY WebSocket: ws://localhost:8081/dashboard
+```
+
+### ANNOTATE System
+
+```bash
+# Start annotation UI
+python ANNOTATE/web/app.py
+
+# Access at: http://127.0.0.1:8051
+```
+
+### Testing
+
+```bash
+# Run tests
+python -m pytest tests/
+
+# Test specific components
+python test_cnn_only.py
+python test_training.py
+python test_duckdb_storage.py
+```
+
+### Monitoring
+
+```bash
+# TensorBoard for training metrics
+tensorboard --logdir=runs
+# Access at: http://localhost:6006
+
+# Check data stream status
+python check_stream.py status
+python check_stream.py ohlcv
+python check_stream.py cob
+```
+
+## Development Tools
+
+- **TensorBoard**: Training visualization (runs/ directory)
+- **wandb**: Experiment tracking
+- **pytest**: Testing framework
+- **Git**: Version control
+
+## Dependencies Management
+
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Install PyTorch (choose based on hardware)
+# CPU-only:
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+
+# NVIDIA GPU (CUDA 12.1):
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+
+# AMD NPU:
+pip install onnxruntime-directml onnx transformers optimum
+```
+
+## Performance Targets
+
+- **Memory Usage**: <2GB per model, <28GB total system
+- **Training Speed**: ~20 seconds for 50 epochs
+- **Inference Latency**: <200ms per prediction
+- **Real Data Processing**: 1000+ candles per timeframe