# TensorBoard Monitoring Guide ## Overview The trading system now uses **TensorBoard** for real-time training monitoring instead of static charts. This provides dynamic, interactive visualizations that update during training. ## 🚨 CRITICAL: Real Market Data Only All TensorBoard metrics are derived from **REAL market data training**. No synthetic or generated data is used. ## Quick Start ### 1. Start Training with TensorBoard ```bash # CNN Training with TensorBoard python main_clean.py --mode cnn --symbol ETH/USDT # RL Training with TensorBoard python train_rl_with_realtime.py --episodes 10 # Quick CNN Test python test_cnn_only.py ``` ### 2. Launch TensorBoard ```bash # Option 1: Direct command tensorboard --logdir=runs # Option 2: Convenience script python run_tensorboard.py ``` ### 3. Access TensorBoard Open your browser to: **http://localhost:6006** ## Available Metrics ### CNN Training Metrics #### **Training Progress** - `Training/EpochLoss` - Training loss per epoch - `Training/EpochAccuracy` - Training accuracy per epoch - `Training/BatchLoss` - Batch-level loss - `Training/BatchAccuracy` - Batch-level accuracy - `Training/BatchConfidence` - Model confidence scores - `Training/LearningRate` - Learning rate schedule - `Training/EpochTime` - Time per epoch #### **Validation Metrics** - `Validation/Loss` - Validation loss - `Validation/Accuracy` - Validation accuracy - `Validation/AvgConfidence` - Average confidence on validation set - `Validation/Class_0_Accuracy` - BUY class accuracy - `Validation/Class_1_Accuracy` - SELL class accuracy - `Validation/Class_2_Accuracy` - HOLD class accuracy #### **Best Model Tracking** - `Best/ValidationLoss` - Best validation loss achieved - `Best/ValidationAccuracy` - Best validation accuracy achieved #### **Data Statistics** - `Data/TotalSamples` - Number of training samples from real data - `Data/Features` - Number of features (detected from real data) - `Data/Timeframes` - Number of timeframes used - `Data/WindowSize` - Window size for temporal patterns - `Data/Class_X_Count` - Sample count per class - `Data/Feature_X_Mean/Std` - Feature statistics #### **Model Architecture** - `Model/TotalParameters` - Total model parameters - `Model/TrainableParameters` - Trainable parameters #### **Training Configuration** - `Config/LearningRate` - Learning rate used - `Config/BatchSize` - Batch size - `Config/MaxEpochs` - Maximum epochs ### RL Training Metrics #### **Episode Performance** - `Episode/TotalReward` - Total reward per episode - `Episode/FinalBalance` - Final balance after episode - `Episode/TotalReturn` - Return percentage - `Episode/Steps` - Steps taken in episode #### **Trading Performance** - `Trading/TotalTrades` - Number of trades executed - `Trading/WinRate` - Percentage of profitable trades - `Trading/ProfitFactor` - Gross profit / gross loss ratio - `Trading/MaxDrawdown` - Maximum drawdown percentage #### **Agent Learning** - `Agent/Epsilon` - Exploration rate (epsilon) - `Agent/LearningRate` - Agent learning rate - `Agent/MemorySize` - Experience replay buffer size - `Agent/Loss` - Training loss from experience replay #### **Moving Averages** - `Moving_Average/Reward_50ep` - 50-episode average reward - `Moving_Average/Return_50ep` - 50-episode average return #### **Best Performance** - `Best/Return` - Best return percentage achieved ## Directory Structure ``` runs/ ├── cnn_training_1748043814/ # CNN training session │ ├── events.out.tfevents.* # TensorBoard event files │ └── ... ├── rl_training_1748043920/ # RL training session │ ├── events.out.tfevents.* │ └── ... └── ... # Other training sessions ``` ## TensorBoard Features ### **Scalars Tab** - Real-time line charts of all metrics - Smoothing controls for noisy metrics - Multiple run comparisons - Download data as CSV ### **Images Tab** - Model architecture visualizations - Training progression images ### **Graphs Tab** - Computational graph of models - Network architecture visualization ### **Histograms Tab** - Weight and gradient distributions - Activation patterns over time ### **Projector Tab** - High-dimensional data visualization - Feature embeddings ## Usage Examples ### 1. Monitor CNN Training ```bash # Start CNN training (generates TensorBoard logs) python main_clean.py --mode cnn --symbol ETH/USDT # In another terminal, start TensorBoard tensorboard --logdir=runs # Open browser to http://localhost:6006 # Navigate to Scalars tab to see: # - Training/EpochLoss declining over time # - Validation/Accuracy improving # - Training/LearningRate schedule ``` ### 2. Compare Multiple Training Runs ```bash # Run multiple training sessions python test_cnn_only.py # Creates cnn_training_X python test_cnn_only.py # Creates cnn_training_Y # TensorBoard automatically shows both runs # Compare performance across runs in the same charts ``` ### 3. Monitor RL Agent Training ```bash # Start RL training with TensorBoard logging python main_clean.py --mode rl --symbol ETH/USDT # View in TensorBoard: # - Episode/TotalReward trending up # - Trading/WinRate improving # - Agent/Epsilon decreasing (less exploration) ``` ## Real-Time Monitoring ### Key Indicators to Watch #### **CNN Training Health** - ✅ `Training/EpochLoss` should decrease over time - ✅ `Validation/Accuracy` should increase - ⚠️ Watch for overfitting (val loss increases while train loss decreases) - ✅ `Training/LearningRate` should follow schedule #### **RL Training Health** - ✅ `Episode/TotalReward` trending upward - ✅ `Trading/WinRate` above 50% - ✅ `Moving_Average/Return_50ep` positive and stable - ⚠️ `Agent/Epsilon` should decay over time ### Warning Signs - **Loss not decreasing**: Check learning rate, data quality - **Accuracy plateauing**: May need more data or different architecture - **RL rewards oscillating**: Unstable learning, adjust hyperparameters - **Win rate dropping**: Strategy not working, need different approach ## Configuration ### Custom TensorBoard Setup ```python from torch.utils.tensorboard import SummaryWriter # Custom log directory writer = SummaryWriter(log_dir='runs/my_experiment') # Log custom metrics writer.add_scalar('Custom/Metric', value, step) writer.add_histogram('Custom/Weights', weights, step) ``` ### Advanced Features ```bash # Start TensorBoard with custom port tensorboard --logdir=runs --port=6007 # Enable debugging tensorboard --logdir=runs --debugger_port=6064 # Profile performance tensorboard --logdir=runs --load_fast=false ``` ## Integration with Training ### CNN Trainer Integration - Automatically logs all training metrics - Model architecture visualization - Real data statistics tracking - Best model checkpointing based on TensorBoard metrics ### RL Trainer Integration - Episode-by-episode performance tracking - Trading strategy effectiveness monitoring - Agent learning progress visualization - Hyperparameter optimization guidance ## Benefits Over Static Charts ### ✅ **Real-Time Updates** - See training progress as it happens - No need to wait for training completion - Immediate feedback on hyperparameter changes ### ✅ **Interactive Exploration** - Zoom, pan, and explore metrics - Smooth noisy data with built-in controls - Compare multiple training runs side-by-side ### ✅ **Rich Visualizations** - Scalars, histograms, images, and graphs - Model architecture visualization - High-dimensional data projections ### ✅ **Data Export** - Download metrics as CSV - Programmatic access to training data - Integration with external analysis tools ## Troubleshooting ### TensorBoard Not Starting ```bash # Check if TensorBoard is installed pip install tensorboard # Verify runs directory exists dir runs # Windows ls runs # Linux/Mac # Kill existing TensorBoard processes taskkill /F /IM tensorboard.exe # Windows pkill -f tensorboard # Linux/Mac ``` ### No Data Showing - Ensure training is generating logs in `runs/` directory - Check browser console for errors - Try refreshing the page - Verify correct port (default 6006) ### Performance Issues - Use `--load_fast=true` for faster loading - Clear old log directories - Reduce logging frequency in training code ## Best Practices ### 🎯 **Regular Monitoring** - Check TensorBoard every 10-20 epochs during CNN training - Monitor RL agents every 50-100 episodes - Look for concerning trends early ### 📊 **Metric Organization** - Use clear naming conventions (Training/, Validation/, etc.) - Group related metrics together - Log at appropriate frequencies (not every step) ### 💾 **Data Management** - Archive old training runs periodically - Keep successful run logs for reference - Document experiment parameters in run names ### 🔍 **Hyperparameter Tuning** - Compare multiple runs with different hyperparameters - Use TensorBoard data to guide optimization - Track which settings produce best results --- ## Summary TensorBoard integration provides **real-time, interactive monitoring** of training progress using **only real market data**. This replaces static plots with dynamic visualizations that help optimize model performance and catch issues early. **Key Commands:** ```bash # Train with TensorBoard logging python main_clean.py --mode cnn --symbol ETH/USDT # Start TensorBoard python run_tensorboard.py # Access dashboard http://localhost:6006 ``` All metrics are derived from **real cryptocurrency market data** to ensure authentic trading model development.