Files
scripts/portainer-compose-stacks/amd-strix-halo-toolboxes/README.md
Dobromir Popov d6ce6e0870 toolboxes
2025-09-01 14:33:58 +03:00

2.5 KiB

AMD Strix Halo Toolboxes Docker Compose

This Docker Compose setup provides pre-built containers for running LLMs on AMD Ryzen AI Max "Strix Halo" integrated GPUs.

Prerequisites

  • AMD Ryzen AI Max "Strix Halo" system (e.g., Ryzen AI MAX+ 395)
  • Docker and Docker Compose installed
  • At least 128GB RAM recommended for larger models
  • Proper kernel configuration for unified memory

Kernel Configuration

Add these boot parameters to /etc/default/grub:

amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432

Then apply:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

Usage

Start all services

docker-compose up -d

Start specific backend

# ROCm backend (best for prompt processing)
docker-compose up -d amd-strix-halo-llama-rocm

# Vulkan RADV backend (fastest token generation)
docker-compose up -d amd-strix-halo-llama-vulkan-radv

# Vulkan AMDVLK backend
docker-compose up -d amd-strix-halo-llama-vulkan-amdvlk

Access containers

# Enter ROCm container
docker exec -it amd-strix-halo-llama-rocm bash

# Enter Vulkan RADV container
docker exec -it amd-strix-halo-llama-vulkan-radv bash

# Enter Vulkan AMDVLK container
docker exec -it amd-strix-halo-llama-vulkan-amdvlk bash

Directory Structure

amd-strix-halo-toolboxes/
├── models/          # Mount point for GGUF models
├── data/            # Mount point for data
└── amd-strix-halo-toolboxes.yml

Download Models

Inside the container, download GGUF models:

# Example: Download Llama-2-7B
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf

# Run the model
./llama.cpp/main -m llama-2-7b-chat.Q4_K_M.gguf -n 128 --repeat_penalty 1.1

Backend Performance

Based on benchmarks:

  • ROCm 6.4.3 + ROCWMMA (hipBLASLt): Best for prompt processing
  • Vulkan RADV: Fastest for token generation
  • Vulkan AMDVLK: Good balance

Memory Planning

Use the VRAM estimator inside containers:

python3 gguf-vram-estimator.py your-model.gguf --contexts 4096 32768 1048576

Ports

  • ROCm backend: 8080
  • Vulkan RADV backend: 8081
  • Vulkan AMDVLK backend: 8082

Troubleshooting

  1. Permission issues: Ensure your user is in the video group
  2. GPU not detected: Check kernel parameters and reboot
  3. Out of memory: Use the VRAM estimator to plan model sizes

References