scripts/portainer-compose-stacks/amd-strix-halo-toolboxes/README.md

# AMD Strix Halo Toolboxes Docker Compose

This Docker Compose setup provides pre-built containers for running LLMs on AMD Ryzen AI Max "Strix Halo" integrated GPUs.

## Prerequisites

- AMD Ryzen AI Max "Strix Halo" system (e.g., Ryzen AI MAX+ 395)
- Docker and Docker Compose installed
- At least 128GB RAM recommended for larger models
- Proper kernel configuration for unified memory

## Kernel Configuration

Add these boot parameters to `/etc/default/grub`:

```bash
amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432
```

Then apply:
```bash
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot
```

## Usage

### Start all services
```bash
docker-compose up -d
```

### Start specific backend
```bash
# ROCm backend (best for prompt processing)
docker-compose up -d amd-strix-halo-llama-rocm

# Vulkan RADV backend (fastest token generation)
docker-compose up -d amd-strix-halo-llama-vulkan-radv

# Vulkan AMDVLK backend
docker-compose up -d amd-strix-halo-llama-vulkan-amdvlk
```

### Access containers
```bash
# Enter ROCm container
docker exec -it amd-strix-halo-llama-rocm bash

# Enter Vulkan RADV container
docker exec -it amd-strix-halo-llama-vulkan-radv bash

# Enter Vulkan AMDVLK container
docker exec -it amd-strix-halo-llama-vulkan-amdvlk bash
```

## Directory Structure

```
amd-strix-halo-toolboxes/
├── models/          # Mount point for GGUF models
├── data/            # Mount point for data
└── amd-strix-halo-toolboxes.yml
```

## Download Models

Inside the container, download GGUF models:

```bash
# Example: Download Llama-2-7B
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf

# Run the model
./llama.cpp/main -m llama-2-7b-chat.Q4_K_M.gguf -n 128 --repeat_penalty 1.1
```

## Backend Performance

Based on benchmarks:
- **ROCm 6.4.3 + ROCWMMA (hipBLASLt)**: Best for prompt processing
- **Vulkan RADV**: Fastest for token generation
- **Vulkan AMDVLK**: Good balance

## Memory Planning

Use the VRAM estimator inside containers:
```bash
python3 gguf-vram-estimator.py your-model.gguf --contexts 4096 32768 1048576
```

## Ports

- ROCm backend: `8080`
- Vulkan RADV backend: `8081`
- Vulkan AMDVLK backend: `8082`

## Troubleshooting

1. **Permission issues**: Ensure your user is in the `video` group
2. **GPU not detected**: Check kernel parameters and reboot
3. **Out of memory**: Use the VRAM estimator to plan model sizes

## References

- [Original Repository](https://github.com/kyuz0/amd-strix-halo-toolboxes)
- [Strix Halo Hardware Database](https://strixhalo-homelab.d7.wtf/)