# Docker Model Runner Integration This guide shows how to integrate Docker Model Runner with your existing Docker stack for AI-powered trading applications. ## ๐Ÿ“ Files Overview | File | Purpose | |------|---------| | `docker-compose.yml` | Main compose file with model runner services | | `docker-compose.model-runner.yml` | Standalone model runner configuration | | `model-runner.env` | Environment variables for configuration | | `integrate_model_runner.sh` | Integration script for existing stacks | | `docker-compose.integration-example.yml` | Example integration with trading services | ## ๐Ÿš€ Quick Start ### Option 1: Use with Existing Stack ```bash # Run integration script ./integrate_model_runner.sh # Start services docker-compose up -d # Test API curl http://localhost:11434/api/tags ``` ### Option 2: Standalone Model Runner ```bash # Use dedicated compose file docker-compose -f docker-compose.model-runner.yml up -d # Test with specific profile docker-compose -f docker-compose.model-runner.yml --profile llama-cpp up -d ``` ## ๐Ÿ”ง Configuration ### Environment Variables (`model-runner.env`) ```bash # AMD GPU Configuration HSA_OVERRIDE_GFX_VERSION=11.0.0 # AMD GPU version override GPU_LAYERS=35 # Layers to offload to GPU THREADS=8 # CPU threads BATCH_SIZE=512 # Batch processing size CONTEXT_SIZE=4096 # Context window size # API Configuration MODEL_RUNNER_PORT=11434 # Main API port LLAMA_CPP_PORT=8000 # Llama.cpp server port METRICS_PORT=9090 # Metrics endpoint ``` ### Ports Exposed | Port | Service | Purpose | |------|---------|---------| | 11434 | Docker Model Runner | Ollama-compatible API | | 8083 | Docker Model Runner | Alternative API port | | 8000 | Llama.cpp Server | Advanced llama.cpp features | | 9090 | Metrics | Prometheus metrics | | 8050 | Trading Dashboard | Example dashboard | | 9091 | Model Monitor | Performance monitoring | ## ๐Ÿ› ๏ธ Usage Examples ### Basic Model Operations ```bash # List available models curl http://localhost:11434/api/tags # Pull a model docker-compose exec docker-model-runner /app/model-runner pull ai/smollm2:135M-Q4_K_M # Run a model docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "Hello!" # Pull Hugging Face model docker-compose exec docker-model-runner /app/model-runner pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF ``` ### API Usage ```bash # Generate text (OpenAI-compatible) curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2:135M-Q4_K_M", "prompt": "Analyze market trends", "temperature": 0.7, "max_tokens": 100 }' # Chat completion curl -X POST http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2:135M-Q4_K_M", "messages": [{"role": "user", "content": "What is your analysis?"}] }' ``` ### Integration with Your Services ```python # Example: Python integration import requests class AIModelClient: def __init__(self, base_url="http://localhost:11434"): self.base_url = base_url def generate(self, prompt, model="ai/smollm2:135M-Q4_K_M"): response = requests.post( f"{self.base_url}/api/generate", json={"model": model, "prompt": prompt} ) return response.json() def chat(self, messages, model="ai/smollm2:135M-Q4_K_M"): response = requests.post( f"{self.base_url}/api/chat", json={"model": model, "messages": messages} ) return response.json() # Usage client = AIModelClient() analysis = client.generate("Analyze BTC/USDT market") ``` ## ๐Ÿ”— Service Integration ### With Existing Trading Dashboard ```yaml # Add to your existing docker-compose.yml services: your-trading-service: # ... your existing config environment: - MODEL_RUNNER_URL=http://docker-model-runner:11434 depends_on: - docker-model-runner networks: - model-runner-network ``` ### Internal Networking Services communicate using Docker networks: - `http://docker-model-runner:11434` - Internal API calls - `http://llama-cpp-server:8000` - Advanced features - `http://model-manager:8001` - Management API ## ๐Ÿ“Š Monitoring and Health Checks ### Health Endpoints ```bash # Main service health curl http://localhost:11434/api/tags # Metrics endpoint curl http://localhost:9090/metrics # Model monitor (if enabled) curl http://localhost:9091/health curl http://localhost:9091/models curl http://localhost:9091/performance ``` ### Logs ```bash # View all logs docker-compose logs -f # Specific service logs docker-compose logs -f docker-model-runner docker-compose logs -f llama-cpp-server ``` ## โšก Performance Tuning ### GPU Optimization ```bash # Adjust GPU layers based on VRAM GPU_LAYERS=35 # For 8GB VRAM GPU_LAYERS=50 # For 12GB VRAM GPU_LAYERS=65 # For 16GB+ VRAM # CPU threading THREADS=8 # Match CPU cores BATCH_SIZE=512 # Increase for better throughput ``` ### Memory Management ```bash # Context size affects memory usage CONTEXT_SIZE=4096 # Standard context CONTEXT_SIZE=8192 # Larger context (more memory) CONTEXT_SIZE=2048 # Smaller context (less memory) ``` ## ๐Ÿงช Testing and Validation ### Run Integration Tests ```bash # Test basic connectivity docker-compose exec docker-model-runner curl -f http://localhost:11434/api/tags # Test model loading docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "test" # Test parallel requests for i in {1..5}; do curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "test '$i'"}' & done ``` ### Benchmarking ```bash # Simple benchmark time curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "Write a detailed analysis of market trends"}' ``` ## ๐Ÿ›ก๏ธ Security Considerations ### Network Security ```yaml # Restrict network access services: docker-model-runner: networks: - internal-network # No external ports for internal-only services networks: internal-network: internal: true ``` ### API Security ```bash # Use API keys (if supported) MODEL_RUNNER_API_KEY=your-secret-key # Enable authentication MODEL_RUNNER_AUTH_ENABLED=true ``` ## ๐Ÿ“ˆ Scaling and Production ### Multiple GPU Support ```yaml # Use multiple GPUs environment: - CUDA_VISIBLE_DEVICES=0,1 # Use GPU 0 and 1 - GPU_LAYERS=35 # Layers per GPU ``` ### Load Balancing ```yaml # Multiple model runner instances services: model-runner-1: # ... config deploy: placement: constraints: - node.labels.gpu==true model-runner-2: # ... config deploy: placement: constraints: - node.labels.gpu==true ``` ## ๐Ÿ”ง Troubleshooting ### Common Issues 1. **GPU not detected** ```bash # Check NVIDIA drivers nvidia-smi # Check Docker GPU support docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi ``` 2. **Port conflicts** ```bash # Check port usage netstat -tulpn | grep :11434 # Change ports in model-runner.env MODEL_RUNNER_PORT=11435 ``` 3. **Model loading failures** ```bash # Check available disk space df -h # Check model file permissions ls -la models/ ``` ### Debug Commands ```bash # Full service logs docker-compose logs # Container resource usage docker stats # Model runner debug info docker-compose exec docker-model-runner /app/model-runner --help # Test internal connectivity docker-compose exec trading-dashboard curl http://docker-model-runner:11434/api/tags ``` ## ๐Ÿ“š Advanced Features ### Custom Model Loading ```bash # Load custom GGUF model docker-compose exec docker-model-runner /app/model-runner pull /models/custom-model.gguf # Use specific model file docker-compose exec docker-model-runner /app/model-runner run /models/my-model.gguf "prompt" ``` ### Batch Processing ```bash # Process multiple prompts curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2:135M-Q4_K_M", "prompt": ["prompt1", "prompt2", "prompt3"], "batch_size": 3 }' ``` ### Streaming Responses ```bash # Enable streaming curl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "ai/smollm2:135M-Q4_K_M", "prompt": "long analysis request", "stream": true }' ``` This integration provides a complete AI model running environment that seamlessly integrates with your existing trading infrastructure while providing advanced parallelism and GPU acceleration capabilities.