9.4 KiB
9.4 KiB
TensorBoard Monitoring Guide
Overview
The trading system now uses TensorBoard for real-time training monitoring instead of static charts. This provides dynamic, interactive visualizations that update during training.
🚨 CRITICAL: Real Market Data Only
All TensorBoard metrics are derived from REAL market data training. No synthetic or generated data is used.
Quick Start
1. Start Training with TensorBoard
# CNN Training with TensorBoard
python main_clean.py --mode cnn --symbol ETH/USDT
# RL Training with TensorBoard
python train_rl_with_realtime.py --episodes 10
# Quick CNN Test
python test_cnn_only.py
2. Launch TensorBoard
# Option 1: Direct command
tensorboard --logdir=runs
# Option 2: Convenience script
python run_tensorboard.py
3. Access TensorBoard
Open your browser to: http://localhost:6006
Available Metrics
CNN Training Metrics
Training Progress
Training/EpochLoss
- Training loss per epochTraining/EpochAccuracy
- Training accuracy per epochTraining/BatchLoss
- Batch-level lossTraining/BatchAccuracy
- Batch-level accuracyTraining/BatchConfidence
- Model confidence scoresTraining/LearningRate
- Learning rate scheduleTraining/EpochTime
- Time per epoch
Validation Metrics
Validation/Loss
- Validation lossValidation/Accuracy
- Validation accuracyValidation/AvgConfidence
- Average confidence on validation setValidation/Class_0_Accuracy
- BUY class accuracyValidation/Class_1_Accuracy
- SELL class accuracyValidation/Class_2_Accuracy
- HOLD class accuracy
Best Model Tracking
Best/ValidationLoss
- Best validation loss achievedBest/ValidationAccuracy
- Best validation accuracy achieved
Data Statistics
Data/TotalSamples
- Number of training samples from real dataData/Features
- Number of features (detected from real data)Data/Timeframes
- Number of timeframes usedData/WindowSize
- Window size for temporal patternsData/Class_X_Count
- Sample count per classData/Feature_X_Mean/Std
- Feature statistics
Model Architecture
Model/TotalParameters
- Total model parametersModel/TrainableParameters
- Trainable parameters
Training Configuration
Config/LearningRate
- Learning rate usedConfig/BatchSize
- Batch sizeConfig/MaxEpochs
- Maximum epochs
RL Training Metrics
Episode Performance
Episode/TotalReward
- Total reward per episodeEpisode/FinalBalance
- Final balance after episodeEpisode/TotalReturn
- Return percentageEpisode/Steps
- Steps taken in episode
Trading Performance
Trading/TotalTrades
- Number of trades executedTrading/WinRate
- Percentage of profitable tradesTrading/ProfitFactor
- Gross profit / gross loss ratioTrading/MaxDrawdown
- Maximum drawdown percentage
Agent Learning
Agent/Epsilon
- Exploration rate (epsilon)Agent/LearningRate
- Agent learning rateAgent/MemorySize
- Experience replay buffer sizeAgent/Loss
- Training loss from experience replay
Moving Averages
Moving_Average/Reward_50ep
- 50-episode average rewardMoving_Average/Return_50ep
- 50-episode average return
Best Performance
Best/Return
- Best return percentage achieved
Directory Structure
runs/
├── cnn_training_1748043814/ # CNN training session
│ ├── events.out.tfevents.* # TensorBoard event files
│ └── ...
├── rl_training_1748043920/ # RL training session
│ ├── events.out.tfevents.*
│ └── ...
└── ... # Other training sessions
TensorBoard Features
Scalars Tab
- Real-time line charts of all metrics
- Smoothing controls for noisy metrics
- Multiple run comparisons
- Download data as CSV
Images Tab
- Model architecture visualizations
- Training progression images
Graphs Tab
- Computational graph of models
- Network architecture visualization
Histograms Tab
- Weight and gradient distributions
- Activation patterns over time
Projector Tab
- High-dimensional data visualization
- Feature embeddings
Usage Examples
1. Monitor CNN Training
# Start CNN training (generates TensorBoard logs)
python main_clean.py --mode cnn --symbol ETH/USDT
# In another terminal, start TensorBoard
tensorboard --logdir=runs
# Open browser to http://localhost:6006
# Navigate to Scalars tab to see:
# - Training/EpochLoss declining over time
# - Validation/Accuracy improving
# - Training/LearningRate schedule
2. Compare Multiple Training Runs
# Run multiple training sessions
python test_cnn_only.py # Creates cnn_training_X
python test_cnn_only.py # Creates cnn_training_Y
# TensorBoard automatically shows both runs
# Compare performance across runs in the same charts
3. Monitor RL Agent Training
# Start RL training with TensorBoard logging
python main_clean.py --mode rl --symbol ETH/USDT
# View in TensorBoard:
# - Episode/TotalReward trending up
# - Trading/WinRate improving
# - Agent/Epsilon decreasing (less exploration)
Real-Time Monitoring
Key Indicators to Watch
CNN Training Health
- ✅
Training/EpochLoss
should decrease over time - ✅
Validation/Accuracy
should increase - ⚠️ Watch for overfitting (val loss increases while train loss decreases)
- ✅
Training/LearningRate
should follow schedule
RL Training Health
- ✅
Episode/TotalReward
trending upward - ✅
Trading/WinRate
above 50% - ✅
Moving_Average/Return_50ep
positive and stable - ⚠️
Agent/Epsilon
should decay over time
Warning Signs
- Loss not decreasing: Check learning rate, data quality
- Accuracy plateauing: May need more data or different architecture
- RL rewards oscillating: Unstable learning, adjust hyperparameters
- Win rate dropping: Strategy not working, need different approach
Configuration
Custom TensorBoard Setup
from torch.utils.tensorboard import SummaryWriter
# Custom log directory
writer = SummaryWriter(log_dir='runs/my_experiment')
# Log custom metrics
writer.add_scalar('Custom/Metric', value, step)
writer.add_histogram('Custom/Weights', weights, step)
Advanced Features
# Start TensorBoard with custom port
tensorboard --logdir=runs --port=6007
# Enable debugging
tensorboard --logdir=runs --debugger_port=6064
# Profile performance
tensorboard --logdir=runs --load_fast=false
Integration with Training
CNN Trainer Integration
- Automatically logs all training metrics
- Model architecture visualization
- Real data statistics tracking
- Best model checkpointing based on TensorBoard metrics
RL Trainer Integration
- Episode-by-episode performance tracking
- Trading strategy effectiveness monitoring
- Agent learning progress visualization
- Hyperparameter optimization guidance
Benefits Over Static Charts
✅ Real-Time Updates
- See training progress as it happens
- No need to wait for training completion
- Immediate feedback on hyperparameter changes
✅ Interactive Exploration
- Zoom, pan, and explore metrics
- Smooth noisy data with built-in controls
- Compare multiple training runs side-by-side
✅ Rich Visualizations
- Scalars, histograms, images, and graphs
- Model architecture visualization
- High-dimensional data projections
✅ Data Export
- Download metrics as CSV
- Programmatic access to training data
- Integration with external analysis tools
Troubleshooting
TensorBoard Not Starting
# Check if TensorBoard is installed
pip install tensorboard
# Verify runs directory exists
dir runs # Windows
ls runs # Linux/Mac
# Kill existing TensorBoard processes
taskkill /F /IM tensorboard.exe # Windows
pkill -f tensorboard # Linux/Mac
No Data Showing
- Ensure training is generating logs in
runs/
directory - Check browser console for errors
- Try refreshing the page
- Verify correct port (default 6006)
Performance Issues
- Use
--load_fast=true
for faster loading - Clear old log directories
- Reduce logging frequency in training code
Best Practices
🎯 Regular Monitoring
- Check TensorBoard every 10-20 epochs during CNN training
- Monitor RL agents every 50-100 episodes
- Look for concerning trends early
📊 Metric Organization
- Use clear naming conventions (Training/, Validation/, etc.)
- Group related metrics together
- Log at appropriate frequencies (not every step)
💾 Data Management
- Archive old training runs periodically
- Keep successful run logs for reference
- Document experiment parameters in run names
🔍 Hyperparameter Tuning
- Compare multiple runs with different hyperparameters
- Use TensorBoard data to guide optimization
- Track which settings produce best results
Summary
TensorBoard integration provides real-time, interactive monitoring of training progress using only real market data. This replaces static plots with dynamic visualizations that help optimize model performance and catch issues early.
Key Commands:
# Train with TensorBoard logging
python main_clean.py --mode cnn --symbol ETH/USDT
# Start TensorBoard
python run_tensorboard.py
# Access dashboard
http://localhost:6006
All metrics are derived from real cryptocurrency market data to ensure authentic trading model development.