using LLM for sentiment analysis
This commit is contained in:
383
MODEL_RUNNER_README.md
Normal file
383
MODEL_RUNNER_README.md
Normal file
@@ -0,0 +1,383 @@
|
||||
# Docker Model Runner Integration
|
||||
|
||||
This guide shows how to integrate Docker Model Runner with your existing Docker stack for AI-powered trading applications.
|
||||
|
||||
## 📁 Files Overview
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `docker-compose.yml` | Main compose file with model runner services |
|
||||
| `docker-compose.model-runner.yml` | Standalone model runner configuration |
|
||||
| `model-runner.env` | Environment variables for configuration |
|
||||
| `integrate_model_runner.sh` | Integration script for existing stacks |
|
||||
| `docker-compose.integration-example.yml` | Example integration with trading services |
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Option 1: Use with Existing Stack
|
||||
```bash
|
||||
# Run integration script
|
||||
./integrate_model_runner.sh
|
||||
|
||||
# Start services
|
||||
docker-compose up -d
|
||||
|
||||
# Test API
|
||||
curl http://localhost:11434/api/tags
|
||||
```
|
||||
|
||||
### Option 2: Standalone Model Runner
|
||||
```bash
|
||||
# Use dedicated compose file
|
||||
docker-compose -f docker-compose.model-runner.yml up -d
|
||||
|
||||
# Test with specific profile
|
||||
docker-compose -f docker-compose.model-runner.yml --profile llama-cpp up -d
|
||||
```
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Environment Variables (`model-runner.env`)
|
||||
|
||||
```bash
|
||||
# AMD GPU Configuration
|
||||
HSA_OVERRIDE_GFX_VERSION=11.0.0 # AMD GPU version override
|
||||
GPU_LAYERS=35 # Layers to offload to GPU
|
||||
THREADS=8 # CPU threads
|
||||
BATCH_SIZE=512 # Batch processing size
|
||||
CONTEXT_SIZE=4096 # Context window size
|
||||
|
||||
# API Configuration
|
||||
MODEL_RUNNER_PORT=11434 # Main API port
|
||||
LLAMA_CPP_PORT=8000 # Llama.cpp server port
|
||||
METRICS_PORT=9090 # Metrics endpoint
|
||||
```
|
||||
|
||||
### Ports Exposed
|
||||
|
||||
| Port | Service | Purpose |
|
||||
|------|---------|---------|
|
||||
| 11434 | Docker Model Runner | Ollama-compatible API |
|
||||
| 8083 | Docker Model Runner | Alternative API port |
|
||||
| 8000 | Llama.cpp Server | Advanced llama.cpp features |
|
||||
| 9090 | Metrics | Prometheus metrics |
|
||||
| 8050 | Trading Dashboard | Example dashboard |
|
||||
| 9091 | Model Monitor | Performance monitoring |
|
||||
|
||||
## 🛠️ Usage Examples
|
||||
|
||||
### Basic Model Operations
|
||||
|
||||
```bash
|
||||
# List available models
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# Pull a model
|
||||
docker-compose exec docker-model-runner /app/model-runner pull ai/smollm2:135M-Q4_K_M
|
||||
|
||||
# Run a model
|
||||
docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "Hello!"
|
||||
|
||||
# Pull Hugging Face model
|
||||
docker-compose exec docker-model-runner /app/model-runner pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
|
||||
```
|
||||
|
||||
### API Usage
|
||||
|
||||
```bash
|
||||
# Generate text (OpenAI-compatible)
|
||||
curl -X POST http://localhost:11434/api/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "ai/smollm2:135M-Q4_K_M",
|
||||
"prompt": "Analyze market trends",
|
||||
"temperature": 0.7,
|
||||
"max_tokens": 100
|
||||
}'
|
||||
|
||||
# Chat completion
|
||||
curl -X POST http://localhost:11434/api/chat \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "ai/smollm2:135M-Q4_K_M",
|
||||
"messages": [{"role": "user", "content": "What is your analysis?"}]
|
||||
}'
|
||||
```
|
||||
|
||||
### Integration with Your Services
|
||||
|
||||
```python
|
||||
# Example: Python integration
|
||||
import requests
|
||||
|
||||
class AIModelClient:
|
||||
def __init__(self, base_url="http://localhost:11434"):
|
||||
self.base_url = base_url
|
||||
|
||||
def generate(self, prompt, model="ai/smollm2:135M-Q4_K_M"):
|
||||
response = requests.post(
|
||||
f"{self.base_url}/api/generate",
|
||||
json={"model": model, "prompt": prompt}
|
||||
)
|
||||
return response.json()
|
||||
|
||||
def chat(self, messages, model="ai/smollm2:135M-Q4_K_M"):
|
||||
response = requests.post(
|
||||
f"{self.base_url}/api/chat",
|
||||
json={"model": model, "messages": messages}
|
||||
)
|
||||
return response.json()
|
||||
|
||||
# Usage
|
||||
client = AIModelClient()
|
||||
analysis = client.generate("Analyze BTC/USDT market")
|
||||
```
|
||||
|
||||
## 🔗 Service Integration
|
||||
|
||||
### With Existing Trading Dashboard
|
||||
|
||||
```yaml
|
||||
# Add to your existing docker-compose.yml
|
||||
services:
|
||||
your-trading-service:
|
||||
# ... your existing config
|
||||
environment:
|
||||
- MODEL_RUNNER_URL=http://docker-model-runner:11434
|
||||
depends_on:
|
||||
- docker-model-runner
|
||||
networks:
|
||||
- model-runner-network
|
||||
```
|
||||
|
||||
### Internal Networking
|
||||
|
||||
Services communicate using Docker networks:
|
||||
- `http://docker-model-runner:11434` - Internal API calls
|
||||
- `http://llama-cpp-server:8000` - Advanced features
|
||||
- `http://model-manager:8001` - Management API
|
||||
|
||||
## 📊 Monitoring and Health Checks
|
||||
|
||||
### Health Endpoints
|
||||
|
||||
```bash
|
||||
# Main service health
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# Metrics endpoint
|
||||
curl http://localhost:9090/metrics
|
||||
|
||||
# Model monitor (if enabled)
|
||||
curl http://localhost:9091/health
|
||||
curl http://localhost:9091/models
|
||||
curl http://localhost:9091/performance
|
||||
```
|
||||
|
||||
### Logs
|
||||
|
||||
```bash
|
||||
# View all logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Specific service logs
|
||||
docker-compose logs -f docker-model-runner
|
||||
docker-compose logs -f llama-cpp-server
|
||||
```
|
||||
|
||||
## ⚡ Performance Tuning
|
||||
|
||||
### GPU Optimization
|
||||
|
||||
```bash
|
||||
# Adjust GPU layers based on VRAM
|
||||
GPU_LAYERS=35 # For 8GB VRAM
|
||||
GPU_LAYERS=50 # For 12GB VRAM
|
||||
GPU_LAYERS=65 # For 16GB+ VRAM
|
||||
|
||||
# CPU threading
|
||||
THREADS=8 # Match CPU cores
|
||||
BATCH_SIZE=512 # Increase for better throughput
|
||||
```
|
||||
|
||||
### Memory Management
|
||||
|
||||
```bash
|
||||
# Context size affects memory usage
|
||||
CONTEXT_SIZE=4096 # Standard context
|
||||
CONTEXT_SIZE=8192 # Larger context (more memory)
|
||||
CONTEXT_SIZE=2048 # Smaller context (less memory)
|
||||
```
|
||||
|
||||
## 🧪 Testing and Validation
|
||||
|
||||
### Run Integration Tests
|
||||
|
||||
```bash
|
||||
# Test basic connectivity
|
||||
docker-compose exec docker-model-runner curl -f http://localhost:11434/api/tags
|
||||
|
||||
# Test model loading
|
||||
docker-compose exec docker-model-runner /app/model-runner run ai/smollm2:135M-Q4_K_M "test"
|
||||
|
||||
# Test parallel requests
|
||||
for i in {1..5}; do
|
||||
curl -X POST http://localhost:11434/api/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "test '$i'"}' &
|
||||
done
|
||||
```
|
||||
|
||||
### Benchmarking
|
||||
|
||||
```bash
|
||||
# Simple benchmark
|
||||
time curl -X POST http://localhost:11434/api/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "ai/smollm2:135M-Q4_K_M", "prompt": "Write a detailed analysis of market trends"}'
|
||||
```
|
||||
|
||||
## 🛡️ Security Considerations
|
||||
|
||||
### Network Security
|
||||
|
||||
```yaml
|
||||
# Restrict network access
|
||||
services:
|
||||
docker-model-runner:
|
||||
networks:
|
||||
- internal-network
|
||||
# No external ports for internal-only services
|
||||
|
||||
networks:
|
||||
internal-network:
|
||||
internal: true
|
||||
```
|
||||
|
||||
### API Security
|
||||
|
||||
```bash
|
||||
# Use API keys (if supported)
|
||||
MODEL_RUNNER_API_KEY=your-secret-key
|
||||
|
||||
# Enable authentication
|
||||
MODEL_RUNNER_AUTH_ENABLED=true
|
||||
```
|
||||
|
||||
## 📈 Scaling and Production
|
||||
|
||||
### Multiple GPU Support
|
||||
|
||||
```yaml
|
||||
# Use multiple GPUs
|
||||
environment:
|
||||
- CUDA_VISIBLE_DEVICES=0,1 # Use GPU 0 and 1
|
||||
- GPU_LAYERS=35 # Layers per GPU
|
||||
```
|
||||
|
||||
### Load Balancing
|
||||
|
||||
```yaml
|
||||
# Multiple model runner instances
|
||||
services:
|
||||
model-runner-1:
|
||||
# ... config
|
||||
deploy:
|
||||
placement:
|
||||
constraints:
|
||||
- node.labels.gpu==true
|
||||
|
||||
model-runner-2:
|
||||
# ... config
|
||||
deploy:
|
||||
placement:
|
||||
constraints:
|
||||
- node.labels.gpu==true
|
||||
```
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **GPU not detected**
|
||||
```bash
|
||||
# Check NVIDIA drivers
|
||||
nvidia-smi
|
||||
|
||||
# Check Docker GPU support
|
||||
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
|
||||
```
|
||||
|
||||
2. **Port conflicts**
|
||||
```bash
|
||||
# Check port usage
|
||||
netstat -tulpn | grep :11434
|
||||
|
||||
# Change ports in model-runner.env
|
||||
MODEL_RUNNER_PORT=11435
|
||||
```
|
||||
|
||||
3. **Model loading failures**
|
||||
```bash
|
||||
# Check available disk space
|
||||
df -h
|
||||
|
||||
# Check model file permissions
|
||||
ls -la models/
|
||||
```
|
||||
|
||||
### Debug Commands
|
||||
|
||||
```bash
|
||||
# Full service logs
|
||||
docker-compose logs
|
||||
|
||||
# Container resource usage
|
||||
docker stats
|
||||
|
||||
# Model runner debug info
|
||||
docker-compose exec docker-model-runner /app/model-runner --help
|
||||
|
||||
# Test internal connectivity
|
||||
docker-compose exec trading-dashboard curl http://docker-model-runner:11434/api/tags
|
||||
```
|
||||
|
||||
## 📚 Advanced Features
|
||||
|
||||
### Custom Model Loading
|
||||
|
||||
```bash
|
||||
# Load custom GGUF model
|
||||
docker-compose exec docker-model-runner /app/model-runner pull /models/custom-model.gguf
|
||||
|
||||
# Use specific model file
|
||||
docker-compose exec docker-model-runner /app/model-runner run /models/my-model.gguf "prompt"
|
||||
```
|
||||
|
||||
### Batch Processing
|
||||
|
||||
```bash
|
||||
# Process multiple prompts
|
||||
curl -X POST http://localhost:11434/api/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "ai/smollm2:135M-Q4_K_M",
|
||||
"prompt": ["prompt1", "prompt2", "prompt3"],
|
||||
"batch_size": 3
|
||||
}'
|
||||
```
|
||||
|
||||
### Streaming Responses
|
||||
|
||||
```bash
|
||||
# Enable streaming
|
||||
curl -X POST http://localhost:11434/api/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "ai/smollm2:135M-Q4_K_M",
|
||||
"prompt": "long analysis request",
|
||||
"stream": true
|
||||
}'
|
||||
```
|
||||
|
||||
This integration provides a complete AI model running environment that seamlessly integrates with your existing trading infrastructure while providing advanced parallelism and GPU acceleration capabilities.
|
Reference in New Issue
Block a user