wip
This commit is contained in:
parent
0fe8286787
commit
310f3c5bf9
196
CNN_TESTING_GUIDE.md
Normal file
196
CNN_TESTING_GUIDE.md
Normal file
@ -0,0 +1,196 @@
|
|||||||
|
# CNN Testing & Backtest Guide
|
||||||
|
|
||||||
|
## 📊 **CNN Test Cases and Training Data Location**
|
||||||
|
|
||||||
|
### **1. Test Scripts**
|
||||||
|
|
||||||
|
#### **Quick CNN Test (`test_cnn_only.py`)**
|
||||||
|
- **Purpose**: Fast CNN validation with real market data
|
||||||
|
- **Location**: `/test_cnn_only.py`
|
||||||
|
- **Test Configuration**:
|
||||||
|
- Symbols: `['ETH/USDT']`
|
||||||
|
- Timeframes: `['1m', '5m', '1h']`
|
||||||
|
- Samples: `500` (for quick testing)
|
||||||
|
- Epochs: `2`
|
||||||
|
- Batch size: `16`
|
||||||
|
- **Data Source**: **Real Binance API data only**
|
||||||
|
- **Output**: `test_models/quick_cnn.pt`
|
||||||
|
|
||||||
|
#### **Comprehensive Training Test (`test_training.py`)**
|
||||||
|
- **Purpose**: Full training pipeline validation
|
||||||
|
- **Location**: `/test_training.py`
|
||||||
|
- **Functions**:
|
||||||
|
- `test_cnn_training()` - Complete CNN training test
|
||||||
|
- `test_rl_training()` - RL training validation
|
||||||
|
- **Output**: `test_models/test_cnn.pt`
|
||||||
|
|
||||||
|
### **2. Test Model Storage**
|
||||||
|
|
||||||
|
#### **Directory**: `/test_models/`
|
||||||
|
- **quick_cnn.pt** (586KB) - Latest quick test model
|
||||||
|
- **quick_cnn_best.pt** (587KB) - Best performing quick test model
|
||||||
|
- **regular_save.pt** (384MB) - Full-size training model
|
||||||
|
- **robust_save.pt** (17KB) - Optimized lightweight model
|
||||||
|
- **backup models** - Automatic backups with `.backup` extension
|
||||||
|
|
||||||
|
### **3. Training Data Sources**
|
||||||
|
|
||||||
|
#### **Real Market Data (Primary)**
|
||||||
|
- **Exchange**: Binance API
|
||||||
|
- **Symbols**: ETH/USDT, BTC/USDT, etc.
|
||||||
|
- **Timeframes**: 1s, 1m, 5m, 15m, 1h, 4h, 1d
|
||||||
|
- **Features**: 48 technical indicators calculated from real OHLCV data
|
||||||
|
- **Storage**: Cached in `/cache/` directory
|
||||||
|
- **Format**: JSON files with tick-by-tick and aggregated candle data
|
||||||
|
|
||||||
|
#### **Feature Matrix Structure**
|
||||||
|
```python
|
||||||
|
# Multi-timeframe feature matrix: (timeframes, window_size, features)
|
||||||
|
feature_matrix.shape = (4, 20, 48) # 4 timeframes, 20 steps, 48 features
|
||||||
|
|
||||||
|
# 48 Features include:
|
||||||
|
features = [
|
||||||
|
'ad_line', 'adx', 'adx_neg', 'adx_pos', 'atr',
|
||||||
|
'bb_lower', 'bb_middle', 'bb_percent', 'bb_upper', 'bb_width',
|
||||||
|
'close', 'ema_12', 'ema_26', 'ema_50', 'high',
|
||||||
|
'keltner_lower', 'keltner_middle', 'keltner_upper', 'low',
|
||||||
|
'macd', 'macd_histogram', 'macd_signal', 'mfi', 'momentum_composite',
|
||||||
|
'obv', 'open', 'price_position', 'psar', 'roc',
|
||||||
|
'rsi_14', 'rsi_21', 'rsi_7', 'sma_10', 'sma_20', 'sma_50',
|
||||||
|
'stoch_d', 'stoch_k', 'trend_strength', 'true_range', 'ultimate_osc',
|
||||||
|
'volatility_regime', 'volume', 'volume_sma_10', 'volume_sma_20',
|
||||||
|
'volume_sma_50', 'vpt', 'vwap', 'williams_r'
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
### **4. Test Case Categories**
|
||||||
|
|
||||||
|
#### **Unit Tests**
|
||||||
|
- **Quick validation**: 500 samples, 2 epochs
|
||||||
|
- **Performance benchmarks**: Speed and accuracy metrics
|
||||||
|
- **Memory usage**: Resource consumption monitoring
|
||||||
|
|
||||||
|
#### **Integration Tests**
|
||||||
|
- **Full pipeline**: Data loading → Feature engineering → Training → Evaluation
|
||||||
|
- **Multi-symbol**: Testing across different cryptocurrency pairs
|
||||||
|
- **Multi-timeframe**: Validation across various time horizons
|
||||||
|
|
||||||
|
#### **Backtesting**
|
||||||
|
- **Historical performance**: Using past market data for validation
|
||||||
|
- **Walk-forward testing**: Progressive training on expanding datasets
|
||||||
|
- **Out-of-sample validation**: Testing on unseen data periods
|
||||||
|
|
||||||
|
### **5. VSCode Launch Configurations**
|
||||||
|
|
||||||
|
#### **Quick CNN Test**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "Quick CNN Test (Real Data + TensorBoard)",
|
||||||
|
"program": "test_cnn_only.py",
|
||||||
|
"env": {"PYTHONUNBUFFERED": "1"}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **Realtime RL Training with Monitoring**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "Realtime RL Training + TensorBoard + Web UI",
|
||||||
|
"program": "train_realtime_with_tensorboard.py",
|
||||||
|
"args": ["--episodes", "50", "--symbol", "ETH/USDT", "--web-port", "8051"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### **6. Test Execution Commands**
|
||||||
|
|
||||||
|
#### **Quick CNN Test**
|
||||||
|
```bash
|
||||||
|
# Run quick CNN validation
|
||||||
|
python test_cnn_only.py
|
||||||
|
|
||||||
|
# Monitor training progress
|
||||||
|
tensorboard --logdir=runs
|
||||||
|
|
||||||
|
# Expected output:
|
||||||
|
# ✅ CNN Training completed!
|
||||||
|
# Best accuracy: 0.4600
|
||||||
|
# Total epochs: 2
|
||||||
|
# Training time: 0.61s
|
||||||
|
# TensorBoard logs: runs/cnn_training_1748043814
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **Comprehensive Training Test**
|
||||||
|
```bash
|
||||||
|
# Run full training pipeline test
|
||||||
|
python test_training.py
|
||||||
|
|
||||||
|
# Monitor multiple training modes
|
||||||
|
tensorboard --logdir=runs
|
||||||
|
```
|
||||||
|
|
||||||
|
### **7. Test Data Validation**
|
||||||
|
|
||||||
|
#### **Real Market Data Policy**
|
||||||
|
- ✅ **No Synthetic Data**: All training uses authentic exchange data
|
||||||
|
- ✅ **Live API**: Direct connection to Binance for real-time prices
|
||||||
|
- ✅ **Multi-timeframe**: Consistent data across all time horizons
|
||||||
|
- ✅ **Technical Indicators**: Calculated from real OHLCV values
|
||||||
|
|
||||||
|
#### **Data Quality Checks**
|
||||||
|
- **Completeness**: Verifying all required timeframes have data
|
||||||
|
- **Consistency**: Cross-timeframe data alignment validation
|
||||||
|
- **Freshness**: Ensuring recent market data availability
|
||||||
|
- **Feature integrity**: Validating all 48 technical indicators
|
||||||
|
|
||||||
|
### **8. TensorBoard Monitoring**
|
||||||
|
|
||||||
|
#### **CNN Training Metrics**
|
||||||
|
- `Training/Loss` - Neural network training loss
|
||||||
|
- `Training/Accuracy` - Model prediction accuracy
|
||||||
|
- `Validation/Loss` - Validation dataset loss
|
||||||
|
- `Validation/Accuracy` - Out-of-sample accuracy
|
||||||
|
- `Best/ValidationAccuracy` - Best model performance
|
||||||
|
- `Data/InputShape` - Feature matrix dimensions
|
||||||
|
- `Model/TotalParams` - Neural network parameters
|
||||||
|
|
||||||
|
#### **Access URLs**
|
||||||
|
- **TensorBoard**: http://localhost:6006
|
||||||
|
- **Web Dashboard**: http://localhost:8051
|
||||||
|
- **Training Logs**: `/runs/` directory
|
||||||
|
|
||||||
|
### **9. Best Practices**
|
||||||
|
|
||||||
|
#### **Quick Testing**
|
||||||
|
1. **Start small**: Use `test_cnn_only.py` for fast validation
|
||||||
|
2. **Monitor metrics**: Keep TensorBoard open during training
|
||||||
|
3. **Check outputs**: Verify model files are created in `test_models/`
|
||||||
|
4. **Validate accuracy**: Ensure model performance meets expectations
|
||||||
|
|
||||||
|
#### **Production Training**
|
||||||
|
1. **Use full datasets**: Scale up sample sizes for production models
|
||||||
|
2. **Multi-symbol training**: Train on multiple cryptocurrency pairs
|
||||||
|
3. **Extended timeframes**: Include longer-term patterns
|
||||||
|
4. **Comprehensive validation**: Use walk-forward and out-of-sample testing
|
||||||
|
|
||||||
|
### **10. Troubleshooting**
|
||||||
|
|
||||||
|
#### **Common Issues**
|
||||||
|
- **Memory errors**: Reduce batch size or sample count
|
||||||
|
- **Data loading failures**: Check internet connection and API access
|
||||||
|
- **Feature mismatches**: Verify all timeframes have consistent data
|
||||||
|
- **TensorBoard not updating**: Restart TensorBoard after training starts
|
||||||
|
|
||||||
|
#### **Debug Commands**
|
||||||
|
```bash
|
||||||
|
# Check training status
|
||||||
|
python monitor_training.py
|
||||||
|
|
||||||
|
# Validate data availability
|
||||||
|
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_historical_data('ETH/USDT', '1m').shape)"
|
||||||
|
|
||||||
|
# Test feature generation
|
||||||
|
python -c "from core.data_provider import DataProvider; dp = DataProvider(['ETH/USDT']); print(dp.get_feature_matrix('ETH/USDT', ['1m', '5m', '1h'], 20).shape)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**🔥 All CNN training and testing uses REAL market data from cryptocurrency exchanges. No synthetic or simulated data is used anywhere in the system.**
|
@ -140,3 +140,189 @@ python main_clean.py --mode rl
|
|||||||
- **Memory usage**: Monitored and limited per model
|
- **Memory usage**: Monitored and limited per model
|
||||||
- **Chart updates**: 2-second refresh for real-time display
|
- **Chart updates**: 2-second refresh for real-time display
|
||||||
- **Decision latency**: Optimized for scalping (< 100ms target)
|
- **Decision latency**: Optimized for scalping (< 100ms target)
|
||||||
|
|
||||||
|
## 🚀 **VSCode Launch Configurations**
|
||||||
|
|
||||||
|
### **1. Core Trading Modes**
|
||||||
|
|
||||||
|
#### **Live Trading (Demo)**
|
||||||
|
```json
|
||||||
|
"name": "Live Trading (Demo)"
|
||||||
|
"program": "main.py"
|
||||||
|
"args": ["--mode", "live", "--demo", "true", "--symbol", "ETH/USDT", "--timeframe", "1m"]
|
||||||
|
```
|
||||||
|
- **Purpose**: Safe demo trading with virtual funds
|
||||||
|
- **Environment**: Paper trading mode
|
||||||
|
- **Risk**: Zero (no real money)
|
||||||
|
|
||||||
|
#### **Live Trading (Real)**
|
||||||
|
```json
|
||||||
|
"name": "Live Trading (Real)"
|
||||||
|
"program": "main.py"
|
||||||
|
"args": ["--mode", "live", "--demo", "false", "--symbol", "ETH/USDT", "--leverage", "50"]
|
||||||
|
```
|
||||||
|
- **Purpose**: Real trading with actual funds
|
||||||
|
- **Environment**: Live exchange API
|
||||||
|
- **Risk**: High (real money)
|
||||||
|
|
||||||
|
### **2. Training & Development Modes**
|
||||||
|
|
||||||
|
#### **Train Bot**
|
||||||
|
```json
|
||||||
|
"name": "Train Bot"
|
||||||
|
"program": "main.py"
|
||||||
|
"args": ["--mode", "train", "--episodes", "100"]
|
||||||
|
```
|
||||||
|
- **Purpose**: Standard RL agent training
|
||||||
|
- **Duration**: 100 episodes
|
||||||
|
- **Output**: Trained model files
|
||||||
|
|
||||||
|
#### **Evaluate Bot**
|
||||||
|
```json
|
||||||
|
"name": "Evaluate Bot"
|
||||||
|
"program": "main.py"
|
||||||
|
"args": ["--mode", "eval", "--episodes", "10"]
|
||||||
|
```
|
||||||
|
- **Purpose**: Model performance evaluation
|
||||||
|
- **Duration**: 10 test episodes
|
||||||
|
- **Output**: Performance metrics
|
||||||
|
|
||||||
|
### **3. Neural Network Training**
|
||||||
|
|
||||||
|
#### **NN Training Pipeline**
|
||||||
|
```json
|
||||||
|
"name": "NN Training Pipeline"
|
||||||
|
"module": "NN.realtime_main"
|
||||||
|
"args": ["--mode", "train", "--model-type", "cnn", "--epochs", "10"]
|
||||||
|
```
|
||||||
|
- **Purpose**: Deep learning model training
|
||||||
|
- **Framework**: PyTorch
|
||||||
|
- **Monitoring**: Automatic TensorBoard integration
|
||||||
|
|
||||||
|
#### **Quick CNN Test (Real Data + TensorBoard)**
|
||||||
|
```json
|
||||||
|
"name": "Quick CNN Test (Real Data + TensorBoard)"
|
||||||
|
"program": "test_cnn_only.py"
|
||||||
|
```
|
||||||
|
- **Purpose**: Fast CNN validation with real market data
|
||||||
|
- **Duration**: 2 epochs, 500 samples
|
||||||
|
- **Output**: `test_models/quick_cnn.pt`
|
||||||
|
- **Monitoring**: TensorBoard metrics
|
||||||
|
|
||||||
|
### **4. 🔥 Realtime RL Training + Monitoring**
|
||||||
|
|
||||||
|
#### **Realtime RL Training + TensorBoard + Web UI**
|
||||||
|
```json
|
||||||
|
"name": "Realtime RL Training + TensorBoard + Web UI"
|
||||||
|
"program": "train_realtime_with_tensorboard.py"
|
||||||
|
"args": ["--episodes", "50", "--symbol", "ETH/USDT", "--web-port", "8051"]
|
||||||
|
```
|
||||||
|
- **Purpose**: Advanced RL training with comprehensive monitoring
|
||||||
|
- **Features**:
|
||||||
|
- Real-time TensorBoard metrics logging
|
||||||
|
- Live web dashboard at http://localhost:8051
|
||||||
|
- Episode rewards, balance tracking, win rates
|
||||||
|
- Trading performance metrics
|
||||||
|
- Agent learning progression
|
||||||
|
- **Data**: 100% real ETH/USDT market data from Binance
|
||||||
|
- **Monitoring**: Dual monitoring (TensorBoard + Web UI)
|
||||||
|
- **Duration**: 50 episodes with real-time feedback
|
||||||
|
|
||||||
|
### **5. Monitoring & Visualization**
|
||||||
|
|
||||||
|
#### **TensorBoard Monitor (All Runs)**
|
||||||
|
```json
|
||||||
|
"name": "TensorBoard Monitor (All Runs)"
|
||||||
|
"program": "run_tensorboard.py"
|
||||||
|
```
|
||||||
|
- **Purpose**: Monitor all training sessions
|
||||||
|
- **Features**: Auto-discovery of training logs
|
||||||
|
- **Access**: http://localhost:6006
|
||||||
|
|
||||||
|
#### **Realtime Charts with NN Inference**
|
||||||
|
```json
|
||||||
|
"name": "Realtime Charts with NN Inference"
|
||||||
|
"program": "realtime.py"
|
||||||
|
```
|
||||||
|
- **Purpose**: Live trading charts with ML predictions
|
||||||
|
- **Features**: Real-time price updates + model inference
|
||||||
|
- **Models**: CNN + RL integration
|
||||||
|
|
||||||
|
### **6. Advanced Training Modes**
|
||||||
|
|
||||||
|
#### **TRAIN Realtime Charts with NN Inference**
|
||||||
|
```json
|
||||||
|
"name": "TRAIN Realtime Charts with NN Inference"
|
||||||
|
"program": "train_rl_with_realtime.py"
|
||||||
|
"args": ["--episodes", "100", "--max-position", "0.1"]
|
||||||
|
```
|
||||||
|
- **Purpose**: RL training with live chart integration
|
||||||
|
- **Features**: Visual training feedback
|
||||||
|
- **Position limit**: 10% portfolio allocation
|
||||||
|
|
||||||
|
## 📊 **Monitoring URLs**
|
||||||
|
|
||||||
|
### **Development**
|
||||||
|
- **TensorBoard**: http://localhost:6006
|
||||||
|
- **Web Dashboard**: http://localhost:8051
|
||||||
|
- **Training Status**: `python monitor_training.py`
|
||||||
|
|
||||||
|
### **Production**
|
||||||
|
- **Live Trading Dashboard**: Integrated in trading interface
|
||||||
|
- **Performance Metrics**: Real-time P&L tracking
|
||||||
|
- **Risk Management**: Position size and drawdown monitoring
|
||||||
|
|
||||||
|
## 🎯 **Quick Start Recommendations**
|
||||||
|
|
||||||
|
### **For CNN Development**
|
||||||
|
1. **Start**: "Quick CNN Test (Real Data + TensorBoard)"
|
||||||
|
2. **Monitor**: Open TensorBoard at http://localhost:6006
|
||||||
|
3. **Validate**: Check `test_models/` for output files
|
||||||
|
|
||||||
|
### **For RL Development**
|
||||||
|
1. **Start**: "Realtime RL Training + TensorBoard + Web UI"
|
||||||
|
2. **Monitor**: TensorBoard (http://localhost:6006) + Web UI (http://localhost:8051)
|
||||||
|
3. **Track**: Episode rewards, balance progression, win rates
|
||||||
|
|
||||||
|
### **For Production Trading**
|
||||||
|
1. **Test**: "Live Trading (Demo)" first
|
||||||
|
2. **Validate**: Confirm strategy performance
|
||||||
|
3. **Deploy**: "Live Trading (Real)" with appropriate risk management
|
||||||
|
|
||||||
|
## ⚡ **Performance Features**
|
||||||
|
|
||||||
|
### **GPU Acceleration**
|
||||||
|
- Automatic CUDA detection and utilization
|
||||||
|
- Mixed precision training support
|
||||||
|
- Memory optimization for large datasets
|
||||||
|
|
||||||
|
### **Real-time Data**
|
||||||
|
- Direct Binance API integration
|
||||||
|
- Multi-timeframe data synchronization
|
||||||
|
- Live price feed with minimal latency
|
||||||
|
|
||||||
|
### **Professional Monitoring**
|
||||||
|
- Industry-standard TensorBoard integration
|
||||||
|
- Custom web dashboards for trading metrics
|
||||||
|
- Real-time performance tracking
|
||||||
|
|
||||||
|
## 🛡️ **Safety Features**
|
||||||
|
|
||||||
|
### **Pre-launch Tasks**
|
||||||
|
- **Kill Stale Processes**: Automatic cleanup before launch
|
||||||
|
- **Port Management**: Intelligent port allocation
|
||||||
|
- **Resource Monitoring**: Memory and GPU usage tracking
|
||||||
|
|
||||||
|
### **Real Market Data Policy**
|
||||||
|
- ✅ **No Synthetic Data**: All training uses authentic exchange data
|
||||||
|
- ✅ **Live API Integration**: Direct connection to cryptocurrency exchanges
|
||||||
|
- ✅ **Data Validation**: Quality checks for completeness and consistency
|
||||||
|
- ✅ **Multi-timeframe Sync**: Aligned data across all time horizons
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
✅ **Launch configuration** - Clean, modular mode selection
|
||||||
|
✅ **Professional monitoring** - TensorBoard + custom dashboards
|
||||||
|
✅ **Real market data** - Authentic cryptocurrency price data
|
||||||
|
✅ **Safety features** - Risk management and validation
|
||||||
|
✅ **GPU acceleration** - Optimized for high-performance training
|
160
start_monitoring.py
Normal file
160
start_monitoring.py
Normal file
@ -0,0 +1,160 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Helper script to start monitoring services for RL training
|
||||||
|
"""
|
||||||
|
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import requests
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Available ports to try for TensorBoard
|
||||||
|
TENSORBOARD_PORTS = [6006, 6007, 6008, 6009, 6010, 6011, 6012]
|
||||||
|
|
||||||
|
def check_port(port, service_name):
|
||||||
|
"""Check if a service is running on the specified port"""
|
||||||
|
try:
|
||||||
|
response = requests.get(f"http://localhost:{port}", timeout=3)
|
||||||
|
print(f"✅ {service_name} is running on port {port}")
|
||||||
|
return True
|
||||||
|
except requests.exceptions.RequestException:
|
||||||
|
return False
|
||||||
|
|
||||||
|
def is_port_in_use(port):
|
||||||
|
"""Check if a port is already in use"""
|
||||||
|
import socket
|
||||||
|
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
|
||||||
|
try:
|
||||||
|
s.bind(('localhost', port))
|
||||||
|
return False
|
||||||
|
except OSError:
|
||||||
|
return True
|
||||||
|
|
||||||
|
def find_available_port(ports_list, service_name):
|
||||||
|
"""Find an available port from the list"""
|
||||||
|
for port in ports_list:
|
||||||
|
if not is_port_in_use(port):
|
||||||
|
print(f"🔍 Found available port {port} for {service_name}")
|
||||||
|
return port
|
||||||
|
else:
|
||||||
|
print(f"⚠️ Port {port} is already in use")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def save_port_config(tensorboard_port):
|
||||||
|
"""Save the port configuration to a file"""
|
||||||
|
config = {
|
||||||
|
"tensorboard_port": tensorboard_port,
|
||||||
|
"web_dashboard_port": 8051
|
||||||
|
}
|
||||||
|
with open("monitoring_ports.json", "w") as f:
|
||||||
|
json.dump(config, f, indent=2)
|
||||||
|
print(f"💾 Port configuration saved to monitoring_ports.json")
|
||||||
|
|
||||||
|
def start_tensorboard():
|
||||||
|
"""Start TensorBoard in background on an available port"""
|
||||||
|
try:
|
||||||
|
# First check if TensorBoard is already running on any of our ports
|
||||||
|
for port in TENSORBOARD_PORTS:
|
||||||
|
if check_port(port, "TensorBoard"):
|
||||||
|
print(f"✅ TensorBoard already running on port {port}")
|
||||||
|
save_port_config(port)
|
||||||
|
return port
|
||||||
|
|
||||||
|
# Find an available port
|
||||||
|
port = find_available_port(TENSORBOARD_PORTS, "TensorBoard")
|
||||||
|
if port is None:
|
||||||
|
print(f"❌ No available ports found in range {TENSORBOARD_PORTS}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
print(f"🚀 Starting TensorBoard on port {port}...")
|
||||||
|
|
||||||
|
# Create runs directory if it doesn't exist
|
||||||
|
Path("runs").mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Start TensorBoard
|
||||||
|
if os.name == 'nt': # Windows
|
||||||
|
subprocess.Popen([
|
||||||
|
sys.executable, "-m", "tensorboard",
|
||||||
|
"--logdir=runs", f"--port={port}", "--reload_interval=1"
|
||||||
|
], creationflags=subprocess.CREATE_NEW_CONSOLE)
|
||||||
|
else: # Linux/Mac
|
||||||
|
subprocess.Popen([
|
||||||
|
sys.executable, "-m", "tensorboard",
|
||||||
|
"--logdir=runs", f"--port={port}", "--reload_interval=1"
|
||||||
|
], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
|
||||||
|
|
||||||
|
# Wait for TensorBoard to start
|
||||||
|
print(f"⏳ Waiting for TensorBoard to start on port {port}...")
|
||||||
|
for i in range(15):
|
||||||
|
time.sleep(2)
|
||||||
|
if check_port(port, "TensorBoard"):
|
||||||
|
save_port_config(port)
|
||||||
|
return port
|
||||||
|
|
||||||
|
print(f"⚠️ TensorBoard failed to start on port {port} within 30 seconds")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ Error starting TensorBoard: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def check_web_dashboard_port():
|
||||||
|
"""Check if web dashboard port is available"""
|
||||||
|
port = 8051
|
||||||
|
if is_port_in_use(port):
|
||||||
|
print(f"⚠️ Web dashboard port {port} is in use")
|
||||||
|
# Try alternative ports
|
||||||
|
for alt_port in [8052, 8053, 8054, 8055]:
|
||||||
|
if not is_port_in_use(alt_port):
|
||||||
|
print(f"🔍 Alternative port {alt_port} available for web dashboard")
|
||||||
|
return alt_port
|
||||||
|
print("❌ No alternative ports found for web dashboard")
|
||||||
|
return port
|
||||||
|
else:
|
||||||
|
print(f"✅ Web dashboard port {port} is available")
|
||||||
|
return port
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main function"""
|
||||||
|
print("=" * 60)
|
||||||
|
print("🎯 RL TRAINING MONITORING SETUP")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
# Check web dashboard port
|
||||||
|
web_port = check_web_dashboard_port()
|
||||||
|
|
||||||
|
# Start TensorBoard
|
||||||
|
tensorboard_port = start_tensorboard()
|
||||||
|
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print("📊 MONITORING STATUS")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
if tensorboard_port:
|
||||||
|
print(f"✅ TensorBoard: http://localhost:{tensorboard_port}")
|
||||||
|
# Update port config
|
||||||
|
save_port_config(tensorboard_port)
|
||||||
|
else:
|
||||||
|
print("❌ TensorBoard: Failed to start")
|
||||||
|
print(" Manual start: python -m tensorboard --logdir=runs --port=6007")
|
||||||
|
|
||||||
|
if web_port:
|
||||||
|
print(f"✅ Web Dashboard: Ready on port {web_port}")
|
||||||
|
|
||||||
|
print(f"\n🎯 Ready to start RL training!")
|
||||||
|
if tensorboard_port and web_port != 8051:
|
||||||
|
print(f"Run: python train_realtime_with_tensorboard.py --episodes 10 --web-port {web_port}")
|
||||||
|
else:
|
||||||
|
print("Run: python train_realtime_with_tensorboard.py --episodes 10")
|
||||||
|
|
||||||
|
print(f"\n📋 Available URLs:")
|
||||||
|
if tensorboard_port:
|
||||||
|
print(f" 📊 TensorBoard: http://localhost:{tensorboard_port}")
|
||||||
|
if web_port:
|
||||||
|
print(f" 🌐 Web Dashboard: http://localhost:{web_port} (starts with training)")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
@ -293,6 +293,10 @@ class RealtimeRLTrainer:
|
|||||||
# Setup environment and agent
|
# Setup environment and agent
|
||||||
environment, agent = self.rl_trainer.setup_environment_and_agent()
|
environment, agent = self.rl_trainer.setup_environment_and_agent()
|
||||||
|
|
||||||
|
# Assign to trainer instance
|
||||||
|
self.rl_trainer.environment = environment
|
||||||
|
self.rl_trainer.agent = agent
|
||||||
|
|
||||||
# Training loop
|
# Training loop
|
||||||
for episode in range(episodes):
|
for episode in range(episodes):
|
||||||
self.current_episode = episode
|
self.current_episode = episode
|
||||||
@ -362,6 +366,7 @@ async def main():
|
|||||||
parser.add_argument('--episodes', type=int, default=50, help='Number of episodes')
|
parser.add_argument('--episodes', type=int, default=50, help='Number of episodes')
|
||||||
parser.add_argument('--balance', type=float, default=1000.0, help='Initial balance')
|
parser.add_argument('--balance', type=float, default=1000.0, help='Initial balance')
|
||||||
parser.add_argument('--web-port', type=int, default=8051, help='Web dashboard port')
|
parser.add_argument('--web-port', type=int, default=8051, help='Web dashboard port')
|
||||||
|
parser.add_argument('--keep-alive', type=int, default=300, help='Keep monitoring alive for N seconds after training')
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
@ -375,6 +380,41 @@ async def main():
|
|||||||
logger.info(f"Initial Balance: ${args.balance:.2f}")
|
logger.info(f"Initial Balance: ${args.balance:.2f}")
|
||||||
logger.info("=" * 60)
|
logger.info("=" * 60)
|
||||||
|
|
||||||
|
# Check if TensorBoard is accessible
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
import time
|
||||||
|
import json
|
||||||
|
|
||||||
|
# Try to read port configuration
|
||||||
|
tensorboard_port = 6006 # default
|
||||||
|
try:
|
||||||
|
with open("monitoring_ports.json", "r") as f:
|
||||||
|
config = json.load(f)
|
||||||
|
tensorboard_port = config.get("tensorboard_port", 6006)
|
||||||
|
logger.info(f"📋 Using TensorBoard port {tensorboard_port} from config")
|
||||||
|
except FileNotFoundError:
|
||||||
|
logger.info("📋 No port config file found, using default ports")
|
||||||
|
|
||||||
|
logger.info("Checking TensorBoard accessibility...")
|
||||||
|
|
||||||
|
# Wait for TensorBoard to start
|
||||||
|
for i in range(10):
|
||||||
|
try:
|
||||||
|
response = requests.get(f"http://localhost:{tensorboard_port}", timeout=2)
|
||||||
|
logger.info(f"✅ TensorBoard is accessible at http://localhost:{tensorboard_port}")
|
||||||
|
break
|
||||||
|
except requests.exceptions.RequestException:
|
||||||
|
if i == 0:
|
||||||
|
logger.info("⏳ Waiting for TensorBoard to start...")
|
||||||
|
await asyncio.sleep(2)
|
||||||
|
else:
|
||||||
|
logger.warning(f"⚠️ TensorBoard may not be running on port {tensorboard_port}")
|
||||||
|
logger.warning(" Run: python start_monitoring.py")
|
||||||
|
except ImportError:
|
||||||
|
tensorboard_port = 6006
|
||||||
|
logger.warning("requests module not available for TensorBoard check")
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Create trainer
|
# Create trainer
|
||||||
trainer = RealtimeRLTrainer(
|
trainer = RealtimeRLTrainer(
|
||||||
@ -383,14 +423,23 @@ async def main():
|
|||||||
)
|
)
|
||||||
|
|
||||||
# Start web dashboard
|
# Start web dashboard
|
||||||
|
logger.info("🚀 Starting web dashboard...")
|
||||||
trainer.start_web_dashboard(port=args.web_port)
|
trainer.start_web_dashboard(port=args.web_port)
|
||||||
|
|
||||||
# Wait for dashboard to start
|
# Wait for dashboard to start
|
||||||
await asyncio.sleep(2)
|
await asyncio.sleep(3)
|
||||||
|
|
||||||
|
# Check if web dashboard is accessible
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
response = requests.get(f"http://localhost:{args.web_port}", timeout=5)
|
||||||
|
logger.info(f"✅ Web Dashboard is accessible at http://localhost:{args.web_port}")
|
||||||
|
except:
|
||||||
|
logger.warning(f"⚠️ Web Dashboard may not be fully ready at http://localhost:{args.web_port}")
|
||||||
|
|
||||||
logger.info("MONITORING READY!")
|
logger.info("MONITORING READY!")
|
||||||
logger.info(f"TensorBoard: http://localhost:6006")
|
logger.info(f"📊 TensorBoard: http://localhost:{tensorboard_port}")
|
||||||
logger.info(f"Web Dashboard: http://localhost:{args.web_port}")
|
logger.info(f"🌐 Web Dashboard: http://localhost:{args.web_port}")
|
||||||
logger.info("=" * 60)
|
logger.info("=" * 60)
|
||||||
|
|
||||||
# Run training
|
# Run training
|
||||||
@ -404,10 +453,17 @@ async def main():
|
|||||||
logger.info(f" Final PnL: ${results['final_pnl']:.2f}")
|
logger.info(f" Final PnL: ${results['final_pnl']:.2f}")
|
||||||
logger.info(f" Model Saved: {results['model_path']}")
|
logger.info(f" Model Saved: {results['model_path']}")
|
||||||
|
|
||||||
# Keep running for monitoring
|
# Keep monitoring alive for specified time
|
||||||
logger.info("Training complete. Press Ctrl+C to exit monitoring.")
|
logger.info(f"🔄 Keeping monitoring alive for {args.keep_alive} seconds...")
|
||||||
while True:
|
logger.info(f"📊 TensorBoard: http://localhost:6006")
|
||||||
await asyncio.sleep(1)
|
logger.info(f"🌐 Web Dashboard: http://localhost:{args.web_port}")
|
||||||
|
logger.info("Press Ctrl+C to exit monitoring.")
|
||||||
|
|
||||||
|
for remaining in range(args.keep_alive, 0, -10):
|
||||||
|
logger.info(f"⏰ Monitoring active - {remaining} seconds remaining")
|
||||||
|
await asyncio.sleep(10)
|
||||||
|
|
||||||
|
logger.info("✅ Monitoring session completed.")
|
||||||
|
|
||||||
except KeyboardInterrupt:
|
except KeyboardInterrupt:
|
||||||
logger.info("Training stopped by user")
|
logger.info("Training stopped by user")
|
||||||
|
Loading…
x
Reference in New Issue
Block a user