# AMD Strix Halo Toolboxes Docker Compose This Docker Compose setup provides pre-built containers for running LLMs on AMD Ryzen AI Max "Strix Halo" integrated GPUs. ## Prerequisites - AMD Ryzen AI Max "Strix Halo" system (e.g., Ryzen AI MAX+ 395) - Docker and Docker Compose installed - At least 128GB RAM recommended for larger models - Proper kernel configuration for unified memory ## Kernel Configuration Add these boot parameters to `/etc/default/grub`: ```bash amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432 ``` Then apply: ```bash sudo grub2-mkconfig -o /boot/grub2/grub.cfg sudo reboot ``` ## Usage ### Start all services ```bash docker-compose up -d ``` ### Start specific backend ```bash # ROCm backend (best for prompt processing) docker-compose up -d amd-strix-halo-llama-rocm # Vulkan RADV backend (fastest token generation) docker-compose up -d amd-strix-halo-llama-vulkan-radv # Vulkan AMDVLK backend docker-compose up -d amd-strix-halo-llama-vulkan-amdvlk ``` ### Access containers ```bash # Enter ROCm container docker exec -it amd-strix-halo-llama-rocm bash # Enter Vulkan RADV container docker exec -it amd-strix-halo-llama-vulkan-radv bash # Enter Vulkan AMDVLK container docker exec -it amd-strix-halo-llama-vulkan-amdvlk bash ``` ## Directory Structure ``` amd-strix-halo-toolboxes/ ├── models/ # Mount point for GGUF models ├── data/ # Mount point for data └── amd-strix-halo-toolboxes.yml ``` ## Download Models Inside the container, download GGUF models: ```bash # Example: Download Llama-2-7B wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf # Run the model ./llama.cpp/main -m llama-2-7b-chat.Q4_K_M.gguf -n 128 --repeat_penalty 1.1 ``` ## Backend Performance Based on benchmarks: - **ROCm 6.4.3 + ROCWMMA (hipBLASLt)**: Best for prompt processing - **Vulkan RADV**: Fastest for token generation - **Vulkan AMDVLK**: Good balance ## Memory Planning Use the VRAM estimator inside containers: ```bash python3 gguf-vram-estimator.py your-model.gguf --contexts 4096 32768 1048576 ``` ## Ports - ROCm backend: `8080` - Vulkan RADV backend: `8081` - Vulkan AMDVLK backend: `8082` ## Troubleshooting 1. **Permission issues**: Ensure your user is in the `video` group 2. **GPU not detected**: Check kernel parameters and reboot 3. **Out of memory**: Use the VRAM estimator to plan model sizes ## References - [Original Repository](https://github.com/kyuz0/amd-strix-halo-toolboxes) - [Strix Halo Hardware Database](https://strixhalo-homelab.d7.wtf/)