toolboxes

2025-09-01 14:33:58 +03:00
parent 2bb1c17453
commit d6ce6e0870
5 changed files with 353 additions and 0 deletions
--- a/portainer-compose-stacks/amd-strix-halo-toolboxes/README.md
+++ b/portainer-compose-stacks/amd-strix-halo-toolboxes/README.md
@@ -0,0 +1,109 @@
+# AMD Strix Halo Toolboxes Docker Compose
+
+This Docker Compose setup provides pre-built containers for running LLMs on AMD Ryzen AI Max "Strix Halo" integrated GPUs.
+
+## Prerequisites
+
+- AMD Ryzen AI Max "Strix Halo" system (e.g., Ryzen AI MAX+ 395)
+- Docker and Docker Compose installed
+- At least 128GB RAM recommended for larger models
+- Proper kernel configuration for unified memory
+
+## Kernel Configuration
+
+Add these boot parameters to `/etc/default/grub`:
+
+```bash
+amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432
+```
+
+Then apply:
+```bash
+sudo grub2-mkconfig -o /boot/grub2/grub.cfg
+sudo reboot
+```
+
+## Usage
+
+### Start all services
+```bash
+docker-compose up -d
+```
+
+### Start specific backend
+```bash
+# ROCm backend (best for prompt processing)
+docker-compose up -d amd-strix-halo-llama-rocm
+
+# Vulkan RADV backend (fastest token generation)
+docker-compose up -d amd-strix-halo-llama-vulkan-radv
+
+# Vulkan AMDVLK backend
+docker-compose up -d amd-strix-halo-llama-vulkan-amdvlk
+```
+
+### Access containers
+```bash
+# Enter ROCm container
+docker exec -it amd-strix-halo-llama-rocm bash
+
+# Enter Vulkan RADV container
+docker exec -it amd-strix-halo-llama-vulkan-radv bash
+
+# Enter Vulkan AMDVLK container
+docker exec -it amd-strix-halo-llama-vulkan-amdvlk bash
+```
+
+## Directory Structure
+
+```
+amd-strix-halo-toolboxes/
+├── models/          # Mount point for GGUF models
+├── data/            # Mount point for data
+└── amd-strix-halo-toolboxes.yml
+```
+
+## Download Models
+
+Inside the container, download GGUF models:
+
+```bash
+# Example: Download Llama-2-7B
+wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
+
+# Run the model
+./llama.cpp/main -m llama-2-7b-chat.Q4_K_M.gguf -n 128 --repeat_penalty 1.1
+```
+
+## Backend Performance
+
+Based on benchmarks:
+- **ROCm 6.4.3 + ROCWMMA (hipBLASLt)**: Best for prompt processing
+- **Vulkan RADV**: Fastest for token generation
+- **Vulkan AMDVLK**: Good balance
+
+## Memory Planning
+
+Use the VRAM estimator inside containers:
+```bash
+python3 gguf-vram-estimator.py your-model.gguf --contexts 4096 32768 1048576
+```
+
+## Ports
+
+- ROCm backend: `8080`
+- Vulkan RADV backend: `8081`
+- Vulkan AMDVLK backend: `8082`
+
+## Troubleshooting
+
+1. **Permission issues**: Ensure your user is in the `video` group
+2. **GPU not detected**: Check kernel parameters and reboot
+3. **Out of memory**: Use the VRAM estimator to plan model sizes
+
+## References
+
+- [Original Repository](https://github.com/kyuz0/amd-strix-halo-toolboxes)
+- [Strix Halo Hardware Database](https://strixhalo-homelab.d7.wtf/)
+
+