280 lines
7.8 KiB
Markdown
280 lines
7.8 KiB
Markdown
# Trading System Logging Upgrade
|
|
|
|
## Overview
|
|
|
|
This upgrade implements a comprehensive logging and metadata management system that addresses the key issues:
|
|
|
|
1. **Eliminates scattered "No checkpoints found" logs** during runtime
|
|
2. **Fast checkpoint metadata access** without loading full models
|
|
3. **Centralized inference logging** with database and text file storage
|
|
4. **Structured tracking** of model performance and checkpoints
|
|
|
|
## Key Components
|
|
|
|
### 1. Database Manager (`utils/database_manager.py`)
|
|
|
|
**Purpose**: SQLite-based storage for structured data
|
|
|
|
**Features**:
|
|
- Inference records logging with deduplication
|
|
- Checkpoint metadata storage (separate from model weights)
|
|
- Model performance tracking
|
|
- Fast queries without loading model files
|
|
|
|
**Tables**:
|
|
- `inference_records`: All model predictions with metadata
|
|
- `checkpoint_metadata`: Checkpoint info without model weights
|
|
- `model_performance`: Daily aggregated statistics
|
|
|
|
### 2. Inference Logger (`utils/inference_logger.py`)
|
|
|
|
**Purpose**: Centralized logging for all model inferences
|
|
|
|
**Features**:
|
|
- Single function call replaces scattered `logger.info()` calls
|
|
- Automatic feature hashing for deduplication
|
|
- Memory usage tracking
|
|
- Processing time measurement
|
|
- Dual storage (database + text files)
|
|
|
|
**Usage**:
|
|
```python
|
|
from utils.inference_logger import log_model_inference
|
|
|
|
log_model_inference(
|
|
model_name="dqn_agent",
|
|
symbol="ETH/USDT",
|
|
action="BUY",
|
|
confidence=0.85,
|
|
probabilities={"BUY": 0.85, "SELL": 0.10, "HOLD": 0.05},
|
|
input_features=features_array,
|
|
processing_time_ms=12.5,
|
|
checkpoint_id="dqn_agent_20250725_143500"
|
|
)
|
|
```
|
|
|
|
### 3. Text Logger (`utils/text_logger.py`)
|
|
|
|
**Purpose**: Human-readable log files for tracking
|
|
|
|
**Features**:
|
|
- Separate files for different event types
|
|
- Clean, tabular format
|
|
- Automatic cleanup of old entries
|
|
- Easy to read and grep
|
|
|
|
**Files**:
|
|
- `logs/inference_records.txt`: All model predictions
|
|
- `logs/checkpoint_events.txt`: Save/load events
|
|
- `logs/system_events.txt`: General system events
|
|
|
|
### 4. Enhanced Checkpoint Manager (`utils/checkpoint_manager.py`)
|
|
|
|
**Purpose**: Improved checkpoint handling with metadata separation
|
|
|
|
**Features**:
|
|
- Database-backed metadata storage
|
|
- Fast metadata queries without loading models
|
|
- Eliminates "No checkpoints found" spam
|
|
- Backward compatibility with existing code
|
|
|
|
## Benefits
|
|
|
|
### 1. Performance Improvements
|
|
|
|
**Before**: Loading full checkpoint just to get metadata
|
|
```python
|
|
# Old way - loads entire model!
|
|
checkpoint_path, metadata = load_best_checkpoint("dqn_agent")
|
|
loss = metadata.loss # Expensive operation
|
|
```
|
|
|
|
**After**: Fast metadata access from database
|
|
```python
|
|
# New way - database query only
|
|
metadata = db_manager.get_best_checkpoint_metadata("dqn_agent")
|
|
loss = metadata.performance_metrics['loss'] # Fast!
|
|
```
|
|
|
|
### 2. Cleaner Runtime Logs
|
|
|
|
**Before**: Scattered logs everywhere
|
|
```
|
|
2025-07-25 14:34:39,749 - utils.checkpoint_manager - INFO - No checkpoints found for dqn_agent
|
|
2025-07-25 14:34:39,754 - utils.checkpoint_manager - INFO - No checkpoints found for enhanced_cnn
|
|
2025-07-25 14:34:39,756 - utils.checkpoint_manager - INFO - No checkpoints found for extrema_trainer
|
|
```
|
|
|
|
**After**: Clean, structured logging
|
|
```
|
|
2025-07-25 14:34:39 | dqn_agent | ETH/USDT | BUY | conf=0.850 | time= 12.5ms [checkpoint: dqn_agent_20250725_143500]
|
|
2025-07-25 14:34:40 | enhanced_cnn | ETH/USDT | HOLD | conf=0.720 | time= 8.2ms [checkpoint: enhanced_cnn_20250725_143501]
|
|
```
|
|
|
|
### 3. Structured Data Storage
|
|
|
|
**Database Schema**:
|
|
```sql
|
|
-- Fast metadata queries
|
|
SELECT * FROM checkpoint_metadata WHERE model_name = 'dqn_agent' AND is_active = TRUE;
|
|
|
|
-- Performance analysis
|
|
SELECT model_name, AVG(confidence), COUNT(*)
|
|
FROM inference_records
|
|
WHERE timestamp > datetime('now', '-24 hours')
|
|
GROUP BY model_name;
|
|
```
|
|
|
|
### 4. Easy Integration
|
|
|
|
**In Model Code**:
|
|
```python
|
|
# Replace scattered logging
|
|
# OLD: logger.info(f"DQN prediction: {action} confidence={conf}")
|
|
|
|
# NEW: Centralized logging
|
|
self.orchestrator.log_model_inference(
|
|
model_name="dqn_agent",
|
|
symbol=symbol,
|
|
action=action,
|
|
confidence=confidence,
|
|
probabilities=probs,
|
|
input_features=features,
|
|
processing_time_ms=processing_time
|
|
)
|
|
```
|
|
|
|
## Implementation Guide
|
|
|
|
### 1. Update Model Classes
|
|
|
|
Add inference logging to prediction methods:
|
|
|
|
```python
|
|
class DQNAgent:
|
|
def predict(self, state):
|
|
start_time = time.time()
|
|
|
|
# Your prediction logic here
|
|
action = self._predict_action(state)
|
|
confidence = self._calculate_confidence()
|
|
|
|
processing_time = (time.time() - start_time) * 1000
|
|
|
|
# Log the inference
|
|
self.orchestrator.log_model_inference(
|
|
model_name="dqn_agent",
|
|
symbol=self.symbol,
|
|
action=action,
|
|
confidence=confidence,
|
|
probabilities=self.action_probabilities,
|
|
input_features=state,
|
|
processing_time_ms=processing_time,
|
|
checkpoint_id=self.current_checkpoint_id
|
|
)
|
|
|
|
return action
|
|
```
|
|
|
|
### 2. Update Checkpoint Saving
|
|
|
|
Use the enhanced checkpoint manager:
|
|
|
|
```python
|
|
from utils.checkpoint_manager import save_checkpoint
|
|
|
|
# Save with metadata
|
|
checkpoint_metadata = save_checkpoint(
|
|
model=self.model,
|
|
model_name="dqn_agent",
|
|
model_type="rl",
|
|
performance_metrics={"loss": 0.0234, "accuracy": 0.87},
|
|
training_metadata={"epochs": 100, "lr": 0.001}
|
|
)
|
|
```
|
|
|
|
### 3. Fast Metadata Access
|
|
|
|
Get checkpoint info without loading models:
|
|
|
|
```python
|
|
# Fast metadata access
|
|
metadata = orchestrator.get_checkpoint_metadata_fast("dqn_agent")
|
|
if metadata:
|
|
current_loss = metadata.performance_metrics['loss']
|
|
checkpoint_id = metadata.checkpoint_id
|
|
```
|
|
|
|
## Migration Steps
|
|
|
|
1. **Install new dependencies** (if any)
|
|
2. **Update model classes** to use centralized logging
|
|
3. **Replace checkpoint loading** with database queries where possible
|
|
4. **Remove scattered logger.info()** calls for inferences
|
|
5. **Test with demo script**: `python demo_logging_system.py`
|
|
|
|
## File Structure
|
|
|
|
```
|
|
utils/
|
|
├── database_manager.py # SQLite database management
|
|
├── inference_logger.py # Centralized inference logging
|
|
├── text_logger.py # Human-readable text logs
|
|
└── checkpoint_manager.py # Enhanced checkpoint handling
|
|
|
|
logs/ # Text log files
|
|
├── inference_records.txt
|
|
├── checkpoint_events.txt
|
|
└── system_events.txt
|
|
|
|
data/
|
|
└── trading_system.db # SQLite database
|
|
|
|
demo_logging_system.py # Demonstration script
|
|
```
|
|
|
|
## Monitoring and Maintenance
|
|
|
|
### Daily Tasks
|
|
- Check `logs/inference_records.txt` for recent activity
|
|
- Monitor database size: `ls -lh data/trading_system.db`
|
|
|
|
### Weekly Tasks
|
|
- Run cleanup: `inference_logger.cleanup_old_logs(days_to_keep=30)`
|
|
- Check model performance trends in database
|
|
|
|
### Monthly Tasks
|
|
- Archive old log files
|
|
- Analyze model performance statistics
|
|
- Review checkpoint storage usage
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Database locked**: Multiple processes accessing SQLite
|
|
- Solution: Use connection timeout and proper context managers
|
|
|
|
2. **Log files growing too large**:
|
|
- Solution: Run `text_logger.cleanup_old_logs(max_lines=10000)`
|
|
|
|
3. **Missing checkpoint metadata**:
|
|
- Solution: System falls back to file-based approach automatically
|
|
|
|
### Debug Commands
|
|
|
|
```python
|
|
# Check database status
|
|
db_manager = get_database_manager()
|
|
checkpoints = db_manager.list_checkpoints("dqn_agent")
|
|
|
|
# Check recent inferences
|
|
inference_logger = get_inference_logger()
|
|
stats = inference_logger.get_model_stats("dqn_agent", hours=24)
|
|
|
|
# View text logs
|
|
text_logger = get_text_logger()
|
|
recent = text_logger.get_recent_inferences(lines=50)
|
|
```
|
|
|
|
This upgrade provides a solid foundation for tracking model performance, eliminating log spam, and enabling fast metadata access without the overhead of loading full model checkpoints. |