384 lines
8.7 KiB
Markdown
384 lines
8.7 KiB
Markdown
# Docker Model Runner Integration
|
|
|
|
This guide shows how to integrate Docker Model Runner with your existing Docker stack for AI-powered trading applications.
|
|
|
|
## 📁 Files Overview
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `docker-compose.yml` | Main compose file with model runner services |
|
|
| `docker-compose.model-runner.yml` | Standalone model runner configuration |
|
|
| `model-runner.env` | Environment variables for configuration |
|
|
| `integrate_model_runner.sh` | Integration script for existing stacks |
|
|
| `docker-compose.integration-example.yml` | Example integration with trading services |
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Option 1: Use with Existing Stack
|
|
```bash
|
|
# Run integration script
|
|
./integrate_model_runner.sh
|
|
|
|
# Start services
|
|
docker-compose up -d
|
|
|
|
# Test API
|
|
curl http://localhost:11434/api/tags
|
|
```
|
|
|
|
### Option 2: Standalone Model Runner
|
|
```bash
|
|
# Use dedicated compose file
|
|
docker-compose -f docker-compose.model-runner.yml up -d
|
|
|
|
# Test with specific profile
|
|
docker-compose -f docker-compose.model-runner.yml --profile llama-cpp up -d
|
|
```
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Environment Variables (`model-runner.env`)
|
|
|
|
```bash
|
|
# AMD GPU Configuration
|
|
HSA_OVERRIDE_GFX_VERSION=11.0.0 # AMD GPU version override
|
|
GPU_LAYERS=35 # Layers to offload to GPU
|
|
THREADS=8 # CPU threads
|
|
BATCH_SIZE=512 # Batch processing size
|
|
CONTEXT_SIZE=4096 # Context window size
|
|
|
|
# API Configuration
|
|
MODEL_RUNNER_PORT=11434 # Main API port
|
|
LLAMA_CPP_PORT=8000 # Llama.cpp server port
|
|
METRICS_PORT=9090 # Metrics endpoint
|
|
```
|
|
|
|
### Ports Exposed
|
|
|
|
| Port | Service | Purpose |
|
|
|------|---------|---------|
|
|
| 11434 | Docker Model Runner | Ollama-compatible API |
|
|
| 8083 | Docker Model Runner | Alternative API port |
|
|
| 8000 | Llama.cpp Server | Advanced llama.cpp features |
|
|
| 9090 | Metrics | Prometheus metrics |
|
|
| 8050 | Trading Dashboard | Example dashboard |
|
|
| 9091 | Model Monitor | Performance monitoring |
|
|
|
|
## 🛠️ Usage Examples
|
|
|
|
### Basic Model Operations
|
|
|
|
```bash
|
|
# List available models
|
|
curl http://localhost:11434/api/tags
|
|
|
|
# Pull a model
|
|
docker-compose exec docker-model-runner /app/model-runner pull ai/smollm2:135M-Q4_K_M
|
|
|
|
# Run a model
|
|
docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "Hello!"
|
|
|
|
# Pull Hugging Face model
|
|
docker-compose exec docker-model-runner /app/model-runner pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
|
|
```
|
|
|
|
### API Usage
|
|
|
|
```bash
|
|
# Generate text (OpenAI-compatible)
|
|
curl -X POST http://localhost:11434/api/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ai/smollm2:135M-Q4_K_M",
|
|
"prompt": "Analyze market trends",
|
|
"temperature": 0.7,
|
|
"max_tokens": 100
|
|
}'
|
|
|
|
# Chat completion
|
|
curl -X POST http://localhost:11434/api/chat \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ai/smollm2:135M-Q4_K_M",
|
|
"messages": [{"role": "user", "content": "What is your analysis?"}]
|
|
}'
|
|
```
|
|
|
|
### Integration with Your Services
|
|
|
|
```python
|
|
# Example: Python integration
|
|
import requests
|
|
|
|
class AIModelClient:
|
|
def __init__(self, base_url="http://localhost:11434"):
|
|
self.base_url = base_url
|
|
|
|
def generate(self, prompt, model="ai/smollm2:135M-Q4_K_M"):
|
|
response = requests.post(
|
|
f"{self.base_url}/api/generate",
|
|
json={"model": model, "prompt": prompt}
|
|
)
|
|
return response.json()
|
|
|
|
def chat(self, messages, model="ai/smollm2:135M-Q4_K_M"):
|
|
response = requests.post(
|
|
f"{self.base_url}/api/chat",
|
|
json={"model": model, "messages": messages}
|
|
)
|
|
return response.json()
|
|
|
|
# Usage
|
|
client = AIModelClient()
|
|
analysis = client.generate("Analyze BTC/USDT market")
|
|
```
|
|
|
|
## 🔗 Service Integration
|
|
|
|
### With Existing Trading Dashboard
|
|
|
|
```yaml
|
|
# Add to your existing docker-compose.yml
|
|
services:
|
|
your-trading-service:
|
|
# ... your existing config
|
|
environment:
|
|
- MODEL_RUNNER_URL=http://docker-model-runner:11434
|
|
depends_on:
|
|
- docker-model-runner
|
|
networks:
|
|
- model-runner-network
|
|
```
|
|
|
|
### Internal Networking
|
|
|
|
Services communicate using Docker networks:
|
|
- `http://docker-model-runner:11434` - Internal API calls
|
|
- `http://llama-cpp-server:8000` - Advanced features
|
|
- `http://model-manager:8001` - Management API
|
|
|
|
## 📊 Monitoring and Health Checks
|
|
|
|
### Health Endpoints
|
|
|
|
```bash
|
|
# Main service health
|
|
curl http://localhost:11434/api/tags
|
|
|
|
# Metrics endpoint
|
|
curl http://localhost:9090/metrics
|
|
|
|
# Model monitor (if enabled)
|
|
curl http://localhost:9091/health
|
|
curl http://localhost:9091/models
|
|
curl http://localhost:9091/performance
|
|
```
|
|
|
|
### Logs
|
|
|
|
```bash
|
|
# View all logs
|
|
docker-compose logs -f
|
|
|
|
# Specific service logs
|
|
docker-compose logs -f docker-model-runner
|
|
docker-compose logs -f llama-cpp-server
|
|
```
|
|
|
|
## ⚡ Performance Tuning
|
|
|
|
### GPU Optimization
|
|
|
|
```bash
|
|
# Adjust GPU layers based on VRAM
|
|
GPU_LAYERS=35 # For 8GB VRAM
|
|
GPU_LAYERS=50 # For 12GB VRAM
|
|
GPU_LAYERS=65 # For 16GB+ VRAM
|
|
|
|
# CPU threading
|
|
THREADS=8 # Match CPU cores
|
|
BATCH_SIZE=512 # Increase for better throughput
|
|
```
|
|
|
|
### Memory Management
|
|
|
|
```bash
|
|
# Context size affects memory usage
|
|
CONTEXT_SIZE=4096 # Standard context
|
|
CONTEXT_SIZE=8192 # Larger context (more memory)
|
|
CONTEXT_SIZE=2048 # Smaller context (less memory)
|
|
```
|
|
|
|
## 🧪 Testing and Validation
|
|
|
|
### Run Integration Tests
|
|
|
|
```bash
|
|
# Test basic connectivity
|
|
docker-compose exec docker-model-runner curl -f http://localhost:11434/api/tags
|
|
|
|
# Test model loading
|
|
docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "test"
|
|
|
|
# Test parallel requests
|
|
for i in {1..5}; do
|
|
curl -X POST http://localhost:11434/api/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "test '$i'"}' &
|
|
done
|
|
```
|
|
|
|
### Benchmarking
|
|
|
|
```bash
|
|
# Simple benchmark
|
|
time curl -X POST http://localhost:11434/api/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "Write a detailed analysis of market trends"}'
|
|
```
|
|
|
|
## 🛡️ Security Considerations
|
|
|
|
### Network Security
|
|
|
|
```yaml
|
|
# Restrict network access
|
|
services:
|
|
docker-model-runner:
|
|
networks:
|
|
- internal-network
|
|
# No external ports for internal-only services
|
|
|
|
networks:
|
|
internal-network:
|
|
internal: true
|
|
```
|
|
|
|
### API Security
|
|
|
|
```bash
|
|
# Use API keys (if supported)
|
|
MODEL_RUNNER_API_KEY=your-secret-key
|
|
|
|
# Enable authentication
|
|
MODEL_RUNNER_AUTH_ENABLED=true
|
|
```
|
|
|
|
## 📈 Scaling and Production
|
|
|
|
### Multiple GPU Support
|
|
|
|
```yaml
|
|
# Use multiple GPUs
|
|
environment:
|
|
- CUDA_VISIBLE_DEVICES=0,1 # Use GPU 0 and 1
|
|
- GPU_LAYERS=35 # Layers per GPU
|
|
```
|
|
|
|
### Load Balancing
|
|
|
|
```yaml
|
|
# Multiple model runner instances
|
|
services:
|
|
model-runner-1:
|
|
# ... config
|
|
deploy:
|
|
placement:
|
|
constraints:
|
|
- node.labels.gpu==true
|
|
|
|
model-runner-2:
|
|
# ... config
|
|
deploy:
|
|
placement:
|
|
constraints:
|
|
- node.labels.gpu==true
|
|
```
|
|
|
|
## 🔧 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **GPU not detected**
|
|
```bash
|
|
# Check NVIDIA drivers
|
|
nvidia-smi
|
|
|
|
# Check Docker GPU support
|
|
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
|
|
```
|
|
|
|
2. **Port conflicts**
|
|
```bash
|
|
# Check port usage
|
|
netstat -tulpn | grep :11434
|
|
|
|
# Change ports in model-runner.env
|
|
MODEL_RUNNER_PORT=11435
|
|
```
|
|
|
|
3. **Model loading failures**
|
|
```bash
|
|
# Check available disk space
|
|
df -h
|
|
|
|
# Check model file permissions
|
|
ls -la models/
|
|
```
|
|
|
|
### Debug Commands
|
|
|
|
```bash
|
|
# Full service logs
|
|
docker-compose logs
|
|
|
|
# Container resource usage
|
|
docker stats
|
|
|
|
# Model runner debug info
|
|
docker-compose exec docker-model-runner /app/model-runner --help
|
|
|
|
# Test internal connectivity
|
|
docker-compose exec trading-dashboard curl http://docker-model-runner:11434/api/tags
|
|
```
|
|
|
|
## 📚 Advanced Features
|
|
|
|
### Custom Model Loading
|
|
|
|
```bash
|
|
# Load custom GGUF model
|
|
docker-compose exec docker-model-runner /app/model-runner pull /models/custom-model.gguf
|
|
|
|
# Use specific model file
|
|
docker-compose exec docker-model-runner /app/model-runner run /models/my-model.gguf "prompt"
|
|
```
|
|
|
|
### Batch Processing
|
|
|
|
```bash
|
|
# Process multiple prompts
|
|
curl -X POST http://localhost:11434/api/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ai/smollm2:135M-Q4_K_M",
|
|
"prompt": ["prompt1", "prompt2", "prompt3"],
|
|
"batch_size": 3
|
|
}'
|
|
```
|
|
|
|
### Streaming Responses
|
|
|
|
```bash
|
|
# Enable streaming
|
|
curl -X POST http://localhost:11434/api/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ai/smollm2:135M-Q4_K_M",
|
|
"prompt": "long analysis request",
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
This integration provides a complete AI model running environment that seamlessly integrates with your existing trading infrastructure while providing advanced parallelism and GPU acceleration capabilities.
|