8.7 KiB
8.7 KiB
Docker Model Runner Integration
This guide shows how to integrate Docker Model Runner with your existing Docker stack for AI-powered trading applications.
📁 Files Overview
File | Purpose |
---|---|
docker-compose.yml |
Main compose file with model runner services |
docker-compose.model-runner.yml |
Standalone model runner configuration |
model-runner.env |
Environment variables for configuration |
integrate_model_runner.sh |
Integration script for existing stacks |
docker-compose.integration-example.yml |
Example integration with trading services |
🚀 Quick Start
Option 1: Use with Existing Stack
# Run integration script
./integrate_model_runner.sh
# Start services
docker-compose up -d
# Test API
curl http://localhost:11434/api/tags
Option 2: Standalone Model Runner
# Use dedicated compose file
docker-compose -f docker-compose.model-runner.yml up -d
# Test with specific profile
docker-compose -f docker-compose.model-runner.yml --profile llama-cpp up -d
🔧 Configuration
Environment Variables (model-runner.env
)
# AMD GPU Configuration
HSA_OVERRIDE_GFX_VERSION=11.0.0 # AMD GPU version override
GPU_LAYERS=35 # Layers to offload to GPU
THREADS=8 # CPU threads
BATCH_SIZE=512 # Batch processing size
CONTEXT_SIZE=4096 # Context window size
# API Configuration
MODEL_RUNNER_PORT=11434 # Main API port
LLAMA_CPP_PORT=8000 # Llama.cpp server port
METRICS_PORT=9090 # Metrics endpoint
Ports Exposed
Port | Service | Purpose |
---|---|---|
11434 | Docker Model Runner | Ollama-compatible API |
8083 | Docker Model Runner | Alternative API port |
8000 | Llama.cpp Server | Advanced llama.cpp features |
9090 | Metrics | Prometheus metrics |
8050 | Trading Dashboard | Example dashboard |
9091 | Model Monitor | Performance monitoring |
🛠️ Usage Examples
Basic Model Operations
# List available models
curl http://localhost:11434/api/tags
# Pull a model
docker-compose exec docker-model-runner /app/model-runner pull ai/smollm2:135M-Q4_K_M
# Run a model
docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "Hello!"
# Pull Hugging Face model
docker-compose exec docker-model-runner /app/model-runner pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
API Usage
# Generate text (OpenAI-compatible)
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2:135M-Q4_K_M",
"prompt": "Analyze market trends",
"temperature": 0.7,
"max_tokens": 100
}'
# Chat completion
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2:135M-Q4_K_M",
"messages": [{"role": "user", "content": "What is your analysis?"}]
}'
Integration with Your Services
# Example: Python integration
import requests
class AIModelClient:
def __init__(self, base_url="http://localhost:11434"):
self.base_url = base_url
def generate(self, prompt, model="ai/smollm2:135M-Q4_K_M"):
response = requests.post(
f"{self.base_url}/api/generate",
json={"model": model, "prompt": prompt}
)
return response.json()
def chat(self, messages, model="ai/smollm2:135M-Q4_K_M"):
response = requests.post(
f"{self.base_url}/api/chat",
json={"model": model, "messages": messages}
)
return response.json()
# Usage
client = AIModelClient()
analysis = client.generate("Analyze BTC/USDT market")
🔗 Service Integration
With Existing Trading Dashboard
# Add to your existing docker-compose.yml
services:
your-trading-service:
# ... your existing config
environment:
- MODEL_RUNNER_URL=http://docker-model-runner:11434
depends_on:
- docker-model-runner
networks:
- model-runner-network
Internal Networking
Services communicate using Docker networks:
http://docker-model-runner:11434
- Internal API callshttp://llama-cpp-server:8000
- Advanced featureshttp://model-manager:8001
- Management API
📊 Monitoring and Health Checks
Health Endpoints
# Main service health
curl http://localhost:11434/api/tags
# Metrics endpoint
curl http://localhost:9090/metrics
# Model monitor (if enabled)
curl http://localhost:9091/health
curl http://localhost:9091/models
curl http://localhost:9091/performance
Logs
# View all logs
docker-compose logs -f
# Specific service logs
docker-compose logs -f docker-model-runner
docker-compose logs -f llama-cpp-server
⚡ Performance Tuning
GPU Optimization
# Adjust GPU layers based on VRAM
GPU_LAYERS=35 # For 8GB VRAM
GPU_LAYERS=50 # For 12GB VRAM
GPU_LAYERS=65 # For 16GB+ VRAM
# CPU threading
THREADS=8 # Match CPU cores
BATCH_SIZE=512 # Increase for better throughput
Memory Management
# Context size affects memory usage
CONTEXT_SIZE=4096 # Standard context
CONTEXT_SIZE=8192 # Larger context (more memory)
CONTEXT_SIZE=2048 # Smaller context (less memory)
🧪 Testing and Validation
Run Integration Tests
# Test basic connectivity
docker-compose exec docker-model-runner curl -f http://localhost:11434/api/tags
# Test model loading
docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "test"
# Test parallel requests
for i in {1..5}; do
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "test '$i'"}' &
done
Benchmarking
# Simple benchmark
time curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "Write a detailed analysis of market trends"}'
🛡️ Security Considerations
Network Security
# Restrict network access
services:
docker-model-runner:
networks:
- internal-network
# No external ports for internal-only services
networks:
internal-network:
internal: true
API Security
# Use API keys (if supported)
MODEL_RUNNER_API_KEY=your-secret-key
# Enable authentication
MODEL_RUNNER_AUTH_ENABLED=true
📈 Scaling and Production
Multiple GPU Support
# Use multiple GPUs
environment:
- CUDA_VISIBLE_DEVICES=0,1 # Use GPU 0 and 1
- GPU_LAYERS=35 # Layers per GPU
Load Balancing
# Multiple model runner instances
services:
model-runner-1:
# ... config
deploy:
placement:
constraints:
- node.labels.gpu==true
model-runner-2:
# ... config
deploy:
placement:
constraints:
- node.labels.gpu==true
🔧 Troubleshooting
Common Issues
-
GPU not detected
# Check NVIDIA drivers nvidia-smi # Check Docker GPU support docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
-
Port conflicts
# Check port usage netstat -tulpn | grep :11434 # Change ports in model-runner.env MODEL_RUNNER_PORT=11435
-
Model loading failures
# Check available disk space df -h # Check model file permissions ls -la models/
Debug Commands
# Full service logs
docker-compose logs
# Container resource usage
docker stats
# Model runner debug info
docker-compose exec docker-model-runner /app/model-runner --help
# Test internal connectivity
docker-compose exec trading-dashboard curl http://docker-model-runner:11434/api/tags
📚 Advanced Features
Custom Model Loading
# Load custom GGUF model
docker-compose exec docker-model-runner /app/model-runner pull /models/custom-model.gguf
# Use specific model file
docker-compose exec docker-model-runner /app/model-runner run /models/my-model.gguf "prompt"
Batch Processing
# Process multiple prompts
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2:135M-Q4_K_M",
"prompt": ["prompt1", "prompt2", "prompt3"],
"batch_size": 3
}'
Streaming Responses
# Enable streaming
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2:135M-Q4_K_M",
"prompt": "long analysis request",
"stream": true
}'
This integration provides a complete AI model running environment that seamlessly integrates with your existing trading infrastructure while providing advanced parallelism and GPU acceleration capabilities.