2.5 KiB
2.5 KiB
AMD Strix Halo Toolboxes Docker Compose
This Docker Compose setup provides pre-built containers for running LLMs on AMD Ryzen AI Max "Strix Halo" integrated GPUs.
Prerequisites
- AMD Ryzen AI Max "Strix Halo" system (e.g., Ryzen AI MAX+ 395)
- Docker and Docker Compose installed
- At least 128GB RAM recommended for larger models
- Proper kernel configuration for unified memory
Kernel Configuration
Add these boot parameters to /etc/default/grub
:
amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432
Then apply:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot
Usage
Start all services
docker-compose up -d
Start specific backend
# ROCm backend (best for prompt processing)
docker-compose up -d amd-strix-halo-llama-rocm
# Vulkan RADV backend (fastest token generation)
docker-compose up -d amd-strix-halo-llama-vulkan-radv
# Vulkan AMDVLK backend
docker-compose up -d amd-strix-halo-llama-vulkan-amdvlk
Access containers
# Enter ROCm container
docker exec -it amd-strix-halo-llama-rocm bash
# Enter Vulkan RADV container
docker exec -it amd-strix-halo-llama-vulkan-radv bash
# Enter Vulkan AMDVLK container
docker exec -it amd-strix-halo-llama-vulkan-amdvlk bash
Directory Structure
amd-strix-halo-toolboxes/
├── models/ # Mount point for GGUF models
├── data/ # Mount point for data
└── amd-strix-halo-toolboxes.yml
Download Models
Inside the container, download GGUF models:
# Example: Download Llama-2-7B
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
# Run the model
./llama.cpp/main -m llama-2-7b-chat.Q4_K_M.gguf -n 128 --repeat_penalty 1.1
Backend Performance
Based on benchmarks:
- ROCm 6.4.3 + ROCWMMA (hipBLASLt): Best for prompt processing
- Vulkan RADV: Fastest for token generation
- Vulkan AMDVLK: Good balance
Memory Planning
Use the VRAM estimator inside containers:
python3 gguf-vram-estimator.py your-model.gguf --contexts 4096 32768 1048576
Ports
- ROCm backend:
8080
- Vulkan RADV backend:
8081
- Vulkan AMDVLK backend:
8082
Troubleshooting
- Permission issues: Ensure your user is in the
video
group - GPU not detected: Check kernel parameters and reboot
- Out of memory: Use the VRAM estimator to plan model sizes