9.4 KiB
9.4 KiB
TensorBoard Monitoring Guide
Overview
The trading system now uses TensorBoard for real-time training monitoring instead of static charts. This provides dynamic, interactive visualizations that update during training.
🚨 CRITICAL: Real Market Data Only
All TensorBoard metrics are derived from REAL market data training. No synthetic or generated data is used.
Quick Start
1. Start Training with TensorBoard
# CNN Training with TensorBoard
python main_clean.py --mode cnn --symbol ETH/USDT
# RL Training with TensorBoard
python train_rl_with_realtime.py --episodes 10
# Quick CNN Test
python test_cnn_only.py
2. Launch TensorBoard
# Option 1: Direct command
tensorboard --logdir=runs
# Option 2: Convenience script
python run_tensorboard.py
3. Access TensorBoard
Open your browser to: http://localhost:6006
Available Metrics
CNN Training Metrics
Training Progress
Training/EpochLoss- Training loss per epochTraining/EpochAccuracy- Training accuracy per epochTraining/BatchLoss- Batch-level lossTraining/BatchAccuracy- Batch-level accuracyTraining/BatchConfidence- Model confidence scoresTraining/LearningRate- Learning rate scheduleTraining/EpochTime- Time per epoch
Validation Metrics
Validation/Loss- Validation lossValidation/Accuracy- Validation accuracyValidation/AvgConfidence- Average confidence on validation setValidation/Class_0_Accuracy- BUY class accuracyValidation/Class_1_Accuracy- SELL class accuracyValidation/Class_2_Accuracy- HOLD class accuracy
Best Model Tracking
Best/ValidationLoss- Best validation loss achievedBest/ValidationAccuracy- Best validation accuracy achieved
Data Statistics
Data/TotalSamples- Number of training samples from real dataData/Features- Number of features (detected from real data)Data/Timeframes- Number of timeframes usedData/WindowSize- Window size for temporal patternsData/Class_X_Count- Sample count per classData/Feature_X_Mean/Std- Feature statistics
Model Architecture
Model/TotalParameters- Total model parametersModel/TrainableParameters- Trainable parameters
Training Configuration
Config/LearningRate- Learning rate usedConfig/BatchSize- Batch sizeConfig/MaxEpochs- Maximum epochs
RL Training Metrics
Episode Performance
Episode/TotalReward- Total reward per episodeEpisode/FinalBalance- Final balance after episodeEpisode/TotalReturn- Return percentageEpisode/Steps- Steps taken in episode
Trading Performance
Trading/TotalTrades- Number of trades executedTrading/WinRate- Percentage of profitable tradesTrading/ProfitFactor- Gross profit / gross loss ratioTrading/MaxDrawdown- Maximum drawdown percentage
Agent Learning
Agent/Epsilon- Exploration rate (epsilon)Agent/LearningRate- Agent learning rateAgent/MemorySize- Experience replay buffer sizeAgent/Loss- Training loss from experience replay
Moving Averages
Moving_Average/Reward_50ep- 50-episode average rewardMoving_Average/Return_50ep- 50-episode average return
Best Performance
Best/Return- Best return percentage achieved
Directory Structure
runs/
├── cnn_training_1748043814/ # CNN training session
│ ├── events.out.tfevents.* # TensorBoard event files
│ └── ...
├── rl_training_1748043920/ # RL training session
│ ├── events.out.tfevents.*
│ └── ...
└── ... # Other training sessions
TensorBoard Features
Scalars Tab
- Real-time line charts of all metrics
- Smoothing controls for noisy metrics
- Multiple run comparisons
- Download data as CSV
Images Tab
- Model architecture visualizations
- Training progression images
Graphs Tab
- Computational graph of models
- Network architecture visualization
Histograms Tab
- Weight and gradient distributions
- Activation patterns over time
Projector Tab
- High-dimensional data visualization
- Feature embeddings
Usage Examples
1. Monitor CNN Training
# Start CNN training (generates TensorBoard logs)
python main_clean.py --mode cnn --symbol ETH/USDT
# In another terminal, start TensorBoard
tensorboard --logdir=runs
# Open browser to http://localhost:6006
# Navigate to Scalars tab to see:
# - Training/EpochLoss declining over time
# - Validation/Accuracy improving
# - Training/LearningRate schedule
2. Compare Multiple Training Runs
# Run multiple training sessions
python test_cnn_only.py # Creates cnn_training_X
python test_cnn_only.py # Creates cnn_training_Y
# TensorBoard automatically shows both runs
# Compare performance across runs in the same charts
3. Monitor RL Agent Training
# Start RL training with TensorBoard logging
python main_clean.py --mode rl --symbol ETH/USDT
# View in TensorBoard:
# - Episode/TotalReward trending up
# - Trading/WinRate improving
# - Agent/Epsilon decreasing (less exploration)
Real-Time Monitoring
Key Indicators to Watch
CNN Training Health
- ✅
Training/EpochLossshould decrease over time - ✅
Validation/Accuracyshould increase - ⚠️ Watch for overfitting (val loss increases while train loss decreases)
- ✅
Training/LearningRateshould follow schedule
RL Training Health
- ✅
Episode/TotalRewardtrending upward - ✅
Trading/WinRateabove 50% - ✅
Moving_Average/Return_50eppositive and stable - ⚠️
Agent/Epsilonshould decay over time
Warning Signs
- Loss not decreasing: Check learning rate, data quality
- Accuracy plateauing: May need more data or different architecture
- RL rewards oscillating: Unstable learning, adjust hyperparameters
- Win rate dropping: Strategy not working, need different approach
Configuration
Custom TensorBoard Setup
from torch.utils.tensorboard import SummaryWriter
# Custom log directory
writer = SummaryWriter(log_dir='runs/my_experiment')
# Log custom metrics
writer.add_scalar('Custom/Metric', value, step)
writer.add_histogram('Custom/Weights', weights, step)
Advanced Features
# Start TensorBoard with custom port
tensorboard --logdir=runs --port=6007
# Enable debugging
tensorboard --logdir=runs --debugger_port=6064
# Profile performance
tensorboard --logdir=runs --load_fast=false
Integration with Training
CNN Trainer Integration
- Automatically logs all training metrics
- Model architecture visualization
- Real data statistics tracking
- Best model checkpointing based on TensorBoard metrics
RL Trainer Integration
- Episode-by-episode performance tracking
- Trading strategy effectiveness monitoring
- Agent learning progress visualization
- Hyperparameter optimization guidance
Benefits Over Static Charts
✅ Real-Time Updates
- See training progress as it happens
- No need to wait for training completion
- Immediate feedback on hyperparameter changes
✅ Interactive Exploration
- Zoom, pan, and explore metrics
- Smooth noisy data with built-in controls
- Compare multiple training runs side-by-side
✅ Rich Visualizations
- Scalars, histograms, images, and graphs
- Model architecture visualization
- High-dimensional data projections
✅ Data Export
- Download metrics as CSV
- Programmatic access to training data
- Integration with external analysis tools
Troubleshooting
TensorBoard Not Starting
# Check if TensorBoard is installed
pip install tensorboard
# Verify runs directory exists
dir runs # Windows
ls runs # Linux/Mac
# Kill existing TensorBoard processes
taskkill /F /IM tensorboard.exe # Windows
pkill -f tensorboard # Linux/Mac
No Data Showing
- Ensure training is generating logs in
runs/directory - Check browser console for errors
- Try refreshing the page
- Verify correct port (default 6006)
Performance Issues
- Use
--load_fast=truefor faster loading - Clear old log directories
- Reduce logging frequency in training code
Best Practices
🎯 Regular Monitoring
- Check TensorBoard every 10-20 epochs during CNN training
- Monitor RL agents every 50-100 episodes
- Look for concerning trends early
📊 Metric Organization
- Use clear naming conventions (Training/, Validation/, etc.)
- Group related metrics together
- Log at appropriate frequencies (not every step)
💾 Data Management
- Archive old training runs periodically
- Keep successful run logs for reference
- Document experiment parameters in run names
🔍 Hyperparameter Tuning
- Compare multiple runs with different hyperparameters
- Use TensorBoard data to guide optimization
- Track which settings produce best results
Summary
TensorBoard integration provides real-time, interactive monitoring of training progress using only real market data. This replaces static plots with dynamic visualizations that help optimize model performance and catch issues early.
Key Commands:
# Train with TensorBoard logging
python main_clean.py --mode cnn --symbol ETH/USDT
# Start TensorBoard
python run_tensorboard.py
# Access dashboard
http://localhost:6006
All metrics are derived from real cryptocurrency market data to ensure authentic trading model development.