Files

Dobromir Popov d6ce6e0870 toolboxes

2025-09-01 14:33:58 +03:00

2.5 KiB

Raw Blame History

AMD Strix Halo Toolboxes Docker Compose

This Docker Compose setup provides pre-built containers for running LLMs on AMD Ryzen AI Max "Strix Halo" integrated GPUs.

Prerequisites

AMD Ryzen AI Max "Strix Halo" system (e.g., Ryzen AI MAX+ 395)
Docker and Docker Compose installed
At least 128GB RAM recommended for larger models
Proper kernel configuration for unified memory

Kernel Configuration

Add these boot parameters to /etc/default/grub:

amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432

Then apply:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

Usage

Start all services

docker-compose up -d

Start specific backend

# ROCm backend (best for prompt processing)
docker-compose up -d amd-strix-halo-llama-rocm

# Vulkan RADV backend (fastest token generation)
docker-compose up -d amd-strix-halo-llama-vulkan-radv

# Vulkan AMDVLK backend
docker-compose up -d amd-strix-halo-llama-vulkan-amdvlk

Access containers

# Enter ROCm container
docker exec -it amd-strix-halo-llama-rocm bash

# Enter Vulkan RADV container
docker exec -it amd-strix-halo-llama-vulkan-radv bash

# Enter Vulkan AMDVLK container
docker exec -it amd-strix-halo-llama-vulkan-amdvlk bash

Directory Structure

amd-strix-halo-toolboxes/
├── models/          # Mount point for GGUF models
├── data/            # Mount point for data
└── amd-strix-halo-toolboxes.yml

Download Models

Inside the container, download GGUF models:

# Example: Download Llama-2-7B
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf

# Run the model
./llama.cpp/main -m llama-2-7b-chat.Q4_K_M.gguf -n 128 --repeat_penalty 1.1

Backend Performance

Based on benchmarks:

ROCm 6.4.3 + ROCWMMA (hipBLASLt): Best for prompt processing
Vulkan RADV: Fastest for token generation
Vulkan AMDVLK: Good balance

Memory Planning

Use the VRAM estimator inside containers:

python3 gguf-vram-estimator.py your-model.gguf --contexts 4096 32768 1048576

2.5 KiB

Raw Blame History

AMD Strix Halo Toolboxes Docker Compose

Prerequisites

Kernel Configuration

Usage

Start all services

Start specific backend

Access containers

Directory Structure

Download Models

Backend Performance

Memory Planning

Ports

Troubleshooting

References

2.5 KiB Raw Blame History

AMD Strix Halo Toolboxes Docker Compose

Prerequisites

Kernel Configuration

Usage

Start all services

Start specific backend

Access containers

Directory Structure

Download Models

Backend Performance

Memory Planning

Ports

Troubleshooting

References

2.5 KiB

Raw Blame History