toolboxes
This commit is contained in:
109
portainer-compose-stacks/amd-strix-halo-toolboxes/README.md
Normal file
109
portainer-compose-stacks/amd-strix-halo-toolboxes/README.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# AMD Strix Halo Toolboxes Docker Compose
|
||||
|
||||
This Docker Compose setup provides pre-built containers for running LLMs on AMD Ryzen AI Max "Strix Halo" integrated GPUs.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- AMD Ryzen AI Max "Strix Halo" system (e.g., Ryzen AI MAX+ 395)
|
||||
- Docker and Docker Compose installed
|
||||
- At least 128GB RAM recommended for larger models
|
||||
- Proper kernel configuration for unified memory
|
||||
|
||||
## Kernel Configuration
|
||||
|
||||
Add these boot parameters to `/etc/default/grub`:
|
||||
|
||||
```bash
|
||||
amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432
|
||||
```
|
||||
|
||||
Then apply:
|
||||
```bash
|
||||
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Start all services
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Start specific backend
|
||||
```bash
|
||||
# ROCm backend (best for prompt processing)
|
||||
docker-compose up -d amd-strix-halo-llama-rocm
|
||||
|
||||
# Vulkan RADV backend (fastest token generation)
|
||||
docker-compose up -d amd-strix-halo-llama-vulkan-radv
|
||||
|
||||
# Vulkan AMDVLK backend
|
||||
docker-compose up -d amd-strix-halo-llama-vulkan-amdvlk
|
||||
```
|
||||
|
||||
### Access containers
|
||||
```bash
|
||||
# Enter ROCm container
|
||||
docker exec -it amd-strix-halo-llama-rocm bash
|
||||
|
||||
# Enter Vulkan RADV container
|
||||
docker exec -it amd-strix-halo-llama-vulkan-radv bash
|
||||
|
||||
# Enter Vulkan AMDVLK container
|
||||
docker exec -it amd-strix-halo-llama-vulkan-amdvlk bash
|
||||
```
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
amd-strix-halo-toolboxes/
|
||||
├── models/ # Mount point for GGUF models
|
||||
├── data/ # Mount point for data
|
||||
└── amd-strix-halo-toolboxes.yml
|
||||
```
|
||||
|
||||
## Download Models
|
||||
|
||||
Inside the container, download GGUF models:
|
||||
|
||||
```bash
|
||||
# Example: Download Llama-2-7B
|
||||
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
|
||||
|
||||
# Run the model
|
||||
./llama.cpp/main -m llama-2-7b-chat.Q4_K_M.gguf -n 128 --repeat_penalty 1.1
|
||||
```
|
||||
|
||||
## Backend Performance
|
||||
|
||||
Based on benchmarks:
|
||||
- **ROCm 6.4.3 + ROCWMMA (hipBLASLt)**: Best for prompt processing
|
||||
- **Vulkan RADV**: Fastest for token generation
|
||||
- **Vulkan AMDVLK**: Good balance
|
||||
|
||||
## Memory Planning
|
||||
|
||||
Use the VRAM estimator inside containers:
|
||||
```bash
|
||||
python3 gguf-vram-estimator.py your-model.gguf --contexts 4096 32768 1048576
|
||||
```
|
||||
|
||||
## Ports
|
||||
|
||||
- ROCm backend: `8080`
|
||||
- Vulkan RADV backend: `8081`
|
||||
- Vulkan AMDVLK backend: `8082`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
1. **Permission issues**: Ensure your user is in the `video` group
|
||||
2. **GPU not detected**: Check kernel parameters and reboot
|
||||
3. **Out of memory**: Use the VRAM estimator to plan model sizes
|
||||
|
||||
## References
|
||||
|
||||
- [Original Repository](https://github.com/kyuz0/amd-strix-halo-toolboxes)
|
||||
- [Strix Halo Hardware Database](https://strixhalo-homelab.d7.wtf/)
|
||||
|
||||
|
Reference in New Issue
Block a user