popov/gogo2

Fork 0

Files

Dobromir Popov dd9f4b63ba sqlite for checkpoints, cleanup

2025-07-25 22:34:13 +03:00

7.8 KiB

Raw Blame History

Trading System Logging Upgrade

Overview

This upgrade implements a comprehensive logging and metadata management system that addresses the key issues:

Eliminates scattered "No checkpoints found" logs during runtime
Fast checkpoint metadata access without loading full models
Centralized inference logging with database and text file storage
Structured tracking of model performance and checkpoints

Key Components

1. Database Manager (`utils/database_manager.py`)

Purpose: SQLite-based storage for structured data

Features:

Inference records logging with deduplication
Checkpoint metadata storage (separate from model weights)
Model performance tracking
Fast queries without loading model files

Tables:

inference_records: All model predictions with metadata
checkpoint_metadata: Checkpoint info without model weights
model_performance: Daily aggregated statistics

2. Inference Logger (`utils/inference_logger.py`)

Purpose: Centralized logging for all model inferences

Features:

Single function call replaces scattered logger.info() calls
Automatic feature hashing for deduplication
Memory usage tracking
Processing time measurement
Dual storage (database + text files)

Usage:

from utils.inference_logger import log_model_inference

log_model_inference(
    model_name="dqn_agent",
    symbol="ETH/USDT", 
    action="BUY",
    confidence=0.85,
    probabilities={"BUY": 0.85, "SELL": 0.10, "HOLD": 0.05},
    input_features=features_array,
    processing_time_ms=12.5,
    checkpoint_id="dqn_agent_20250725_143500"
)

3. Text Logger (`utils/text_logger.py`)

Purpose: Human-readable log files for tracking

Features:

Separate files for different event types
Clean, tabular format
Automatic cleanup of old entries
Easy to read and grep

Files:

logs/inference_records.txt: All model predictions
logs/checkpoint_events.txt: Save/load events
logs/system_events.txt: General system events

4. Enhanced Checkpoint Manager (`utils/checkpoint_manager.py`)

Purpose: Improved checkpoint handling with metadata separation

Features:

Database-backed metadata storage
Fast metadata queries without loading models
Eliminates "No checkpoints found" spam
Backward compatibility with existing code

Benefits

1. Performance Improvements

Before: Loading full checkpoint just to get metadata

# Old way - loads entire model!
checkpoint_path, metadata = load_best_checkpoint("dqn_agent")
loss = metadata.loss  # Expensive operation

After: Fast metadata access from database

# New way - database query only
metadata = db_manager.get_best_checkpoint_metadata("dqn_agent")
loss = metadata.performance_metrics['loss']  # Fast!

2. Cleaner Runtime Logs

Before: Scattered logs everywhere

2025-07-25 14:34:39,749 - utils.checkpoint_manager - INFO - No checkpoints found for dqn_agent
2025-07-25 14:34:39,754 - utils.checkpoint_manager - INFO - No checkpoints found for enhanced_cnn
2025-07-25 14:34:39,756 - utils.checkpoint_manager - INFO - No checkpoints found for extrema_trainer

After: Clean, structured logging

2025-07-25 14:34:39 | dqn_agent       | ETH/USDT   | BUY  | conf=0.850 | time=  12.5ms [checkpoint: dqn_agent_20250725_143500]
2025-07-25 14:34:40 | enhanced_cnn    | ETH/USDT   | HOLD | conf=0.720 | time=   8.2ms [checkpoint: enhanced_cnn_20250725_143501]

3. Structured Data Storage

Database Schema:

-- Fast metadata queries
SELECT * FROM checkpoint_metadata WHERE model_name = 'dqn_agent' AND is_active = TRUE;

-- Performance analysis
SELECT model_name, AVG(confidence), COUNT(*) 
FROM inference_records 
WHERE timestamp > datetime('now', '-24 hours')
GROUP BY model_name;

4. Easy Integration

In Model Code:

# Replace scattered logging
# OLD: logger.info(f"DQN prediction: {action} confidence={conf}")

# NEW: Centralized logging
self.orchestrator.log_model_inference(
    model_name="dqn_agent",
    symbol=symbol,
    action=action,
    confidence=confidence,
    probabilities=probs,
    input_features=features,
    processing_time_ms=processing_time
)

Implementation Guide

1. Update Model Classes

Add inference logging to prediction methods:

class DQNAgent:
    def predict(self, state):
        start_time = time.time()
        
        # Your prediction logic here
        action = self._predict_action(state)
        confidence = self._calculate_confidence()
        
        processing_time = (time.time() - start_time) * 1000
        
        # Log the inference
        self.orchestrator.log_model_inference(
            model_name="dqn_agent",
            symbol=self.symbol,
            action=action,
            confidence=confidence,
            probabilities=self.action_probabilities,
            input_features=state,
            processing_time_ms=processing_time,
            checkpoint_id=self.current_checkpoint_id
        )
        
        return action

2. Update Checkpoint Saving

Use the enhanced checkpoint manager:

from utils.checkpoint_manager import save_checkpoint

# Save with metadata
checkpoint_metadata = save_checkpoint(
    model=self.model,
    model_name="dqn_agent",
    model_type="rl",
    performance_metrics={"loss": 0.0234, "accuracy": 0.87},
    training_metadata={"epochs": 100, "lr": 0.001}
)

3. Fast Metadata Access

Get checkpoint info without loading models:

# Fast metadata access
metadata = orchestrator.get_checkpoint_metadata_fast("dqn_agent")
if metadata:
    current_loss = metadata.performance_metrics['loss']
    checkpoint_id = metadata.checkpoint_id

Migration Steps

Install new dependencies (if any)
Update model classes to use centralized logging
Replace checkpoint loading with database queries where possible
Remove scattered logger.info() calls for inferences
Test with demo script: python demo_logging_system.py

File Structure

utils/
├── database_manager.py      # SQLite database management
├── inference_logger.py      # Centralized inference logging
├── text_logger.py          # Human-readable text logs
└── checkpoint_manager.py    # Enhanced checkpoint handling

logs/                        # Text log files
├── inference_records.txt
├── checkpoint_events.txt
└── system_events.txt

data/
└── trading_system.db       # SQLite database

demo_logging_system.py      # Demonstration script

Monitoring and Maintenance

Daily Tasks

Check logs/inference_records.txt for recent activity
Monitor database size: ls -lh data/trading_system.db

Weekly Tasks

Run cleanup: inference_logger.cleanup_old_logs(days_to_keep=30)
Check model performance trends in database

Monthly Tasks

Archive old log files
Analyze model performance statistics
Review checkpoint storage usage

Troubleshooting

Common Issues

Database locked: Multiple processes accessing SQLite
- Solution: Use connection timeout and proper context managers
Log files growing too large:
- Solution: Run text_logger.cleanup_old_logs(max_lines=10000)
Missing checkpoint metadata:
- Solution: System falls back to file-based approach automatically

Debug Commands

# Check database status
db_manager = get_database_manager()
checkpoints = db_manager.list_checkpoints("dqn_agent")

# Check recent inferences
inference_logger = get_inference_logger()
stats = inference_logger.get_model_stats("dqn_agent", hours=24)

# View text logs
text_logger = get_text_logger()
recent = text_logger.get_recent_inferences(lines=50)

This upgrade provides a solid foundation for tracking model performance, eliminating log spam, and enabling fast metadata access without the overhead of loading full model checkpoints.

7.8 KiB Raw Blame History

Trading System Logging Upgrade

Overview

Key Components

1. Database Manager (utils/database_manager.py)

2. Inference Logger (utils/inference_logger.py)

3. Text Logger (utils/text_logger.py)

4. Enhanced Checkpoint Manager (utils/checkpoint_manager.py)

Benefits

1. Performance Improvements

2. Cleaner Runtime Logs

3. Structured Data Storage

4. Easy Integration

Implementation Guide

1. Update Model Classes

2. Update Checkpoint Saving

3. Fast Metadata Access

Migration Steps

File Structure

Monitoring and Maintenance

Daily Tasks

Weekly Tasks

Monthly Tasks

Troubleshooting

Common Issues

Debug Commands

7.8 KiB

Raw Blame History

1. Database Manager (`utils/database_manager.py`)

2. Inference Logger (`utils/inference_logger.py`)

3. Text Logger (`utils/text_logger.py`)

4. Enhanced Checkpoint Manager (`utils/checkpoint_manager.py`)