7.8 KiB
Trading System Logging Upgrade
Overview
This upgrade implements a comprehensive logging and metadata management system that addresses the key issues:
- Eliminates scattered "No checkpoints found" logs during runtime
- Fast checkpoint metadata access without loading full models
- Centralized inference logging with database and text file storage
- Structured tracking of model performance and checkpoints
Key Components
1. Database Manager (utils/database_manager.py
)
Purpose: SQLite-based storage for structured data
Features:
- Inference records logging with deduplication
- Checkpoint metadata storage (separate from model weights)
- Model performance tracking
- Fast queries without loading model files
Tables:
inference_records
: All model predictions with metadatacheckpoint_metadata
: Checkpoint info without model weightsmodel_performance
: Daily aggregated statistics
2. Inference Logger (utils/inference_logger.py
)
Purpose: Centralized logging for all model inferences
Features:
- Single function call replaces scattered
logger.info()
calls - Automatic feature hashing for deduplication
- Memory usage tracking
- Processing time measurement
- Dual storage (database + text files)
Usage:
from utils.inference_logger import log_model_inference
log_model_inference(
model_name="dqn_agent",
symbol="ETH/USDT",
action="BUY",
confidence=0.85,
probabilities={"BUY": 0.85, "SELL": 0.10, "HOLD": 0.05},
input_features=features_array,
processing_time_ms=12.5,
checkpoint_id="dqn_agent_20250725_143500"
)
3. Text Logger (utils/text_logger.py
)
Purpose: Human-readable log files for tracking
Features:
- Separate files for different event types
- Clean, tabular format
- Automatic cleanup of old entries
- Easy to read and grep
Files:
logs/inference_records.txt
: All model predictionslogs/checkpoint_events.txt
: Save/load eventslogs/system_events.txt
: General system events
4. Enhanced Checkpoint Manager (utils/checkpoint_manager.py
)
Purpose: Improved checkpoint handling with metadata separation
Features:
- Database-backed metadata storage
- Fast metadata queries without loading models
- Eliminates "No checkpoints found" spam
- Backward compatibility with existing code
Benefits
1. Performance Improvements
Before: Loading full checkpoint just to get metadata
# Old way - loads entire model!
checkpoint_path, metadata = load_best_checkpoint("dqn_agent")
loss = metadata.loss # Expensive operation
After: Fast metadata access from database
# New way - database query only
metadata = db_manager.get_best_checkpoint_metadata("dqn_agent")
loss = metadata.performance_metrics['loss'] # Fast!
2. Cleaner Runtime Logs
Before: Scattered logs everywhere
2025-07-25 14:34:39,749 - utils.checkpoint_manager - INFO - No checkpoints found for dqn_agent
2025-07-25 14:34:39,754 - utils.checkpoint_manager - INFO - No checkpoints found for enhanced_cnn
2025-07-25 14:34:39,756 - utils.checkpoint_manager - INFO - No checkpoints found for extrema_trainer
After: Clean, structured logging
2025-07-25 14:34:39 | dqn_agent | ETH/USDT | BUY | conf=0.850 | time= 12.5ms [checkpoint: dqn_agent_20250725_143500]
2025-07-25 14:34:40 | enhanced_cnn | ETH/USDT | HOLD | conf=0.720 | time= 8.2ms [checkpoint: enhanced_cnn_20250725_143501]
3. Structured Data Storage
Database Schema:
-- Fast metadata queries
SELECT * FROM checkpoint_metadata WHERE model_name = 'dqn_agent' AND is_active = TRUE;
-- Performance analysis
SELECT model_name, AVG(confidence), COUNT(*)
FROM inference_records
WHERE timestamp > datetime('now', '-24 hours')
GROUP BY model_name;
4. Easy Integration
In Model Code:
# Replace scattered logging
# OLD: logger.info(f"DQN prediction: {action} confidence={conf}")
# NEW: Centralized logging
self.orchestrator.log_model_inference(
model_name="dqn_agent",
symbol=symbol,
action=action,
confidence=confidence,
probabilities=probs,
input_features=features,
processing_time_ms=processing_time
)
Implementation Guide
1. Update Model Classes
Add inference logging to prediction methods:
class DQNAgent:
def predict(self, state):
start_time = time.time()
# Your prediction logic here
action = self._predict_action(state)
confidence = self._calculate_confidence()
processing_time = (time.time() - start_time) * 1000
# Log the inference
self.orchestrator.log_model_inference(
model_name="dqn_agent",
symbol=self.symbol,
action=action,
confidence=confidence,
probabilities=self.action_probabilities,
input_features=state,
processing_time_ms=processing_time,
checkpoint_id=self.current_checkpoint_id
)
return action
2. Update Checkpoint Saving
Use the enhanced checkpoint manager:
from utils.checkpoint_manager import save_checkpoint
# Save with metadata
checkpoint_metadata = save_checkpoint(
model=self.model,
model_name="dqn_agent",
model_type="rl",
performance_metrics={"loss": 0.0234, "accuracy": 0.87},
training_metadata={"epochs": 100, "lr": 0.001}
)
3. Fast Metadata Access
Get checkpoint info without loading models:
# Fast metadata access
metadata = orchestrator.get_checkpoint_metadata_fast("dqn_agent")
if metadata:
current_loss = metadata.performance_metrics['loss']
checkpoint_id = metadata.checkpoint_id
Migration Steps
- Install new dependencies (if any)
- Update model classes to use centralized logging
- Replace checkpoint loading with database queries where possible
- Remove scattered logger.info() calls for inferences
- Test with demo script:
python demo_logging_system.py
File Structure
utils/
├── database_manager.py # SQLite database management
├── inference_logger.py # Centralized inference logging
├── text_logger.py # Human-readable text logs
└── checkpoint_manager.py # Enhanced checkpoint handling
logs/ # Text log files
├── inference_records.txt
├── checkpoint_events.txt
└── system_events.txt
data/
└── trading_system.db # SQLite database
demo_logging_system.py # Demonstration script
Monitoring and Maintenance
Daily Tasks
- Check
logs/inference_records.txt
for recent activity - Monitor database size:
ls -lh data/trading_system.db
Weekly Tasks
- Run cleanup:
inference_logger.cleanup_old_logs(days_to_keep=30)
- Check model performance trends in database
Monthly Tasks
- Archive old log files
- Analyze model performance statistics
- Review checkpoint storage usage
Troubleshooting
Common Issues
-
Database locked: Multiple processes accessing SQLite
- Solution: Use connection timeout and proper context managers
-
Log files growing too large:
- Solution: Run
text_logger.cleanup_old_logs(max_lines=10000)
- Solution: Run
-
Missing checkpoint metadata:
- Solution: System falls back to file-based approach automatically
Debug Commands
# Check database status
db_manager = get_database_manager()
checkpoints = db_manager.list_checkpoints("dqn_agent")
# Check recent inferences
inference_logger = get_inference_logger()
stats = inference_logger.get_model_stats("dqn_agent", hours=24)
# View text logs
text_logger = get_text_logger()
recent = text_logger.get_recent_inferences(lines=50)
This upgrade provides a solid foundation for tracking model performance, eliminating log spam, and enabling fast metadata access without the overhead of loading full model checkpoints.