diff --git a/portainer-compose-stacks/windows/GPU-PASSTHROUGH.md b/portainer-compose-stacks/windows/GPU-PASSTHROUGH.md index c16d83d..a71adc6 100644 --- a/portainer-compose-stacks/windows/GPU-PASSTHROUGH.md +++ b/portainer-compose-stacks/windows/GPU-PASSTHROUGH.md @@ -2,12 +2,17 @@ This guide configures PCI passthrough for the AMD Strix Halo integrated GPU to the Windows Docker container, enabling GPU-accelerated applications. +## Important Note: Manual Binding Approach + +This setup uses **manual GPU binding** to avoid host display issues. The GPU remains available to the host by default, and you manually bind it to VFIO only when starting the Windows container. This prevents system freezing at boot on newer kernels. + ## Problem The Windows container was showing "Red Hat VirtIO GPU DOD Controller" instead of the AMD GPU because: - IOMMU was disabled (`amd_iommu=off`) - GPU was not passed through at PCI level - Only `/dev/dri` devices were exposed (insufficient for Windows) +- Early VFIO binding caused host display to freeze ## Solution Overview @@ -23,50 +28,56 @@ The Windows container was showing "Red Hat VirtIO GPU DOD Controller" instead of ## Setup Instructions -### Step 1: Run Setup Script +### Step 1: Fix GRUB Configuration (If You Had Freezing Issues) + +If your system was freezing at login on newer kernels: ```bash cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows -sudo ./setup-gpu-passthrough.sh +sudo ./fix-grub-remove-vfio-ids.sh ``` -This script will: -- Enable IOMMU in GRUB (`amd_iommu=on iommu=pt`) -- Configure VFIO to claim the GPU devices -- Update initramfs with VFIO modules -- Create necessary configuration files +This removes early VFIO binding that causes the host to lose GPU access. -### Step 2: Reboot +### Step 2: Update GRUB and Reboot ```bash +sudo update-grub sudo reboot ``` -**IMPORTANT**: The system MUST be rebooted for IOMMU and VFIO changes to take effect. +**IMPORTANT**: After reboot, IOMMU will be enabled but GPU remains available to host. -### Step 3: Verify Setup (After Reboot) +### Step 3: Before Starting Windows Container - Bind GPU + +Every time you want to use GPU passthrough, run: ```bash cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows -./verify-gpu-passthrough.sh +sudo ./bind-gpu-to-vfio.sh ``` -Expected output: -- ✓ IOMMU is enabled -- ✓ vfio_pci module loaded -- ✓ GPU bound to vfio-pci -- ✓ Audio bound to vfio-pci -- ✓ /dev/vfio/vfio exists +This temporarily binds the GPU to VFIO (host display will stop working). ### Step 4: Start Windows Container ```bash cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows -docker-compose down # Stop if running -docker-compose up -d +docker compose up -d ``` -### Step 5: Install AMD Drivers in Windows +### Step 5: When Done - Restore GPU to Host + +After stopping the Windows container, restore GPU to host: + +```bash +cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows +sudo ./unbind-gpu-from-vfio.sh +``` + +This rebinds the GPU to amdgpu driver for host use. + +### Step 6: Install AMD Drivers in Windows 1. Connect to Windows via RDP: `localhost:3389` 2. Open Device Manager diff --git a/portainer-compose-stacks/windows/GPU-SHARING-EXPLAINED.md b/portainer-compose-stacks/windows/GPU-SHARING-EXPLAINED.md new file mode 100644 index 0000000..d4c4939 --- /dev/null +++ b/portainer-compose-stacks/windows/GPU-SHARING-EXPLAINED.md @@ -0,0 +1,185 @@ +# GPU Sharing: Windows vs Linux Explained + +## Can You Share GPU Between Host and Container? + +**Short Answer**: No, not with VFIO PCI passthrough on Linux. + +## Why Windows Can Do It (Hyper-V GPU-PV) + +### Windows Hyper-V GPU Paravirtualization +- **Technology**: Microsoft's proprietary GPU virtualization +- **How it works**: GPU stays with Windows host, VMs get "virtual GPU slices" +- **Requirements**: + - Windows host (Server or Pro with Hyper-V) + - Windows guests + - Specific GPU support (mostly newer Intel/AMD/NVIDIA) +- **Benefits**: + - ✓ Multiple VMs share one GPU + - ✓ Host keeps display working + - ✓ Decent performance for most workloads +- **Limitations**: + - Windows only (host + guest) + - Not full GPU performance + - Limited GPU features + +## Why Linux QEMU/VFIO Can't Share + +### VFIO PCI Passthrough +- **Technology**: Hardware-level device passthrough (Linux kernel feature) +- **How it works**: Entire GPU is "unplugged" from host and given to guest +- **Benefits**: + - ✓ Near-native performance + - ✓ Full GPU features + - ✓ Works cross-platform (any guest OS) +- **Limitations**: + - ✗ Exclusive access only (either host OR guest) + - ✗ Host loses display when GPU passed through + - ✗ Cannot share between multiple VMs + +## Your Options for AMD Strix Halo iGPU + +### Option 1: Shared Software Rendering (Recommended) +**Configuration**: No GPU passthrough + +**How it works**: +- Host uses GPU normally (amdgpu driver) +- Windows container gets VirtIO virtual GPU +- Both host and container work simultaneously +- Software rendering in container (accelerated by host GPU) + +**Pros**: +- ✓ Host display works +- ✓ Container auto-starts +- ✓ Both usable at same time +- ✓ Simple, no binding scripts + +**Cons**: +- ✗ No native GPU in Windows +- ✗ Limited GPU performance in Windows +- ✗ No GPU-Z, no AMD drivers in Windows + +**Best for**: +- General Windows usage +- When you need host display +- Development/testing +- Light workloads in Windows + +**Current docker-compose setup**: This is now configured (I just updated it) + +--- + +### Option 2: Exclusive GPU Passthrough +**Configuration**: VFIO PCI passthrough (manual binding) + +**How it works**: +1. Bind GPU to VFIO (host display freezes) +2. Start Windows container +3. Windows gets real AMD GPU +4. Stop container and unbind to restore host + +**Pros**: +- ✓ Full AMD GPU in Windows +- ✓ Native performance +- ✓ GPU-accelerated apps work +- ✓ AMD drivers install + +**Cons**: +- ✗ Host display frozen (no GUI) +- ✗ Exclusive - can't use both +- ✗ Manual binding required +- ✗ Access host via SSH only + +**Best for**: +- GPU-intensive Windows apps +- Machine learning in Windows +- Gaming (if that's possible) +- When maximum GPU performance needed + +**Workflow**: +```bash +# Start Windows with GPU +sudo ./bind-gpu-to-vfio.sh # Host display goes black! +docker compose -f docker-compose.gpu-passthrough.yml up -d + +# Stop and restore +docker compose -f docker-compose.gpu-passthrough.yml down +sudo ./unbind-gpu-from-vfio.sh +``` + +--- + +## Technologies That DON'T Work Here + +### SR-IOV (Single Root I/O Virtualization) +- Requires GPU hardware support +- Consumer GPUs (like Strix Halo) don't have it +- Mostly enterprise data center GPUs + +### AMD MxGPU / NVIDIA vGPU +- Enterprise GPU virtualization +- Requires special drivers + licensed enterprise GPUs +- Not available for consumer iGPUs + +### GVT-g (Intel GPU Virtualization) +- Intel only +- Not available for AMD GPUs + +### Looking Glass +- Allows viewing GPU output from guest +- Still exclusive passthrough (guest owns GPU) +- Just a viewer, not sharing + +## What About DRI/DRM Passthrough? + +You might think: "Can we pass `/dev/dri` to share?" + +**Tried this already** - it doesn't work for Windows because: +- Windows needs PCI-level GPU access +- `/dev/dri` is Linux-specific (won't work in Windows) +- Windows drivers expect real PCI GPU device + +## Comparison Table + +| Feature | Shared (VirtIO) | Exclusive (VFIO) | Windows Hyper-V GPU-PV | +|---------|----------------|------------------|------------------------| +| Host display works | ✓ | ✗ | ✓ | +| Container auto-start | ✓ | ✗ | ✓ | +| Both usable together | ✓ | ✗ | ✓ | +| Native GPU in Windows | ✗ | ✓ | ~ (virtual) | +| GPU performance | Low | High | Medium | +| Setup complexity | Easy | Complex | Medium | +| Requires manual binding | ✗ | ✓ | ✗ | + +## Recommendation + +**For your use case**, I recommend: + +### Start with Option 1 (Shared - No Passthrough) +- Container works +- Host works +- Both at same time +- Simple setup + +**If Windows GPU performance is too slow**, then consider: +- Adding a second GPU to host (dedicate one to passthrough) +- Running Windows on bare metal for GPU workloads +- Using cloud GPU instances for heavy GPU tasks + +## Current Configuration + +I've just updated your `docker-compose.yml` to **Option 1 (Shared)**: +- Removed GPU passthrough +- Removed VFIO devices +- Container can auto-start +- Host display continues working + +**Want to test it?** +```bash +cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows +docker compose up -d +``` + +Windows will start with VirtIO display. Both host and container will work simultaneously. + +**Need GPU passthrough later?** I can create a separate docker-compose file for that use case. + diff --git a/portainer-compose-stacks/windows/bind-gpu-to-vfio.sh b/portainer-compose-stacks/windows/bind-gpu-to-vfio.sh new file mode 100644 index 0000000..866cb96 --- /dev/null +++ b/portainer-compose-stacks/windows/bind-gpu-to-vfio.sh @@ -0,0 +1,79 @@ +#!/bin/bash + +# Manually bind the AMD GPU to VFIO for Windows container passthrough +# Run this BEFORE starting the Windows container + +set -e + +if [ "$EUID" -ne 0 ]; then + echo "Please run as root: sudo $0" + exit 1 +fi + +echo "=== Binding AMD GPU to VFIO ===" +echo "" + +GPU_PCI="0000:c5:00.0" +AUDIO_PCI="0000:c5:00.1" + +# Check if vfio-pci module is loaded +if ! lsmod | grep -q vfio_pci; then + echo "Loading vfio-pci module..." + modprobe vfio-pci +fi + +# Unbind GPU from amdgpu +echo "Unbinding GPU from amdgpu..." +if [ -e /sys/bus/pci/devices/$GPU_PCI/driver ]; then + echo "$GPU_PCI" > /sys/bus/pci/devices/$GPU_PCI/driver/unbind + echo "✓ GPU unbound from amdgpu" +else + echo "GPU not bound to any driver" +fi + +# Unbind audio from snd_hda_intel +echo "Unbinding audio from snd_hda_intel..." +if [ -e /sys/bus/pci/devices/$AUDIO_PCI/driver ]; then + echo "$AUDIO_PCI" > /sys/bus/pci/devices/$AUDIO_PCI/driver/unbind + echo "✓ Audio unbound" +else + echo "Audio not bound to any driver" +fi + +# Bind to vfio-pci +echo "" +echo "Binding to vfio-pci..." +echo "1002 1586" > /sys/bus/pci/drivers/vfio-pci/new_id 2>/dev/null || echo "GPU ID already registered" +echo "1002 1640" > /sys/bus/pci/drivers/vfio-pci/new_id 2>/dev/null || echo "Audio ID already registered" + +sleep 1 + +# Verify +GPU_DRIVER=$(lspci -nnk -s c5:00.0 | grep "Kernel driver in use" | awk '{print $5}') +AUDIO_DRIVER=$(lspci -nnk -s c5:00.1 | grep "Kernel driver in use" | awk '{print $5}') + +echo "" +echo "=== Status ===" +if [ "$GPU_DRIVER" = "vfio-pci" ]; then + echo "✓ GPU bound to vfio-pci" +else + echo "✗ GPU bound to: ${GPU_DRIVER:-none}" +fi + +if [ "$AUDIO_DRIVER" = "vfio-pci" ]; then + echo "✓ Audio bound to vfio-pci" +else + echo "✗ Audio bound to: ${AUDIO_DRIVER:-none}" +fi + +echo "" +if [ "$GPU_DRIVER" = "vfio-pci" ] && [ "$AUDIO_DRIVER" = "vfio-pci" ]; then + echo "✓ Ready for GPU passthrough!" + echo "" + echo "Now start the Windows container:" + echo " cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows" + echo " docker compose up -d" +else + echo "✗ Binding failed. Check errors above." +fi + diff --git a/portainer-compose-stacks/windows/docker-compose.gpu-passthrough.yml b/portainer-compose-stacks/windows/docker-compose.gpu-passthrough.yml new file mode 100644 index 0000000..b042452 --- /dev/null +++ b/portainer-compose-stacks/windows/docker-compose.gpu-passthrough.yml @@ -0,0 +1,33 @@ +services: + windows: + image: dockurr/windows # https://github.com/dockur/windows + container_name: windows-gpu + environment: + VERSION: "11" + RAM_SIZE: "8G" + CPU_CORES: "4" + GPU: "Y" + ARGUMENTS: "-device vfio-pci,host=c5:00.0,addr=0x05,multifunction=on,rombar=0 -device vfio-pci,host=c5:00.1,addr=0x05.1" + devices: + - /dev/kvm + - /dev/net/tun + - /dev/nvme0n1p8:/disk2 + - /dev/vfio/vfio + - /dev/vfio/20 + - /dev/vfio/21 + cap_add: + - NET_ADMIN + privileged: true + ports: + - 445:445 + - 1433:1433 + - 8006:8006 + - 3389:3389/tcp + - 3389:3389/udp + volumes: + # - /dev/nvme0n1p7:/disk1 # blind mount - not working for now + - /mnt/data/docker_vol/windows:/storage # storage (img file)location + - /mnt/shared:/data + restart: "no" # Manual start only - requires GPU binding first + stop_grace_period: 2m + diff --git a/portainer-compose-stacks/windows/docker-compose.yml b/portainer-compose-stacks/windows/docker-compose.yml index 906e117..e405ebe 100644 --- a/portainer-compose-stacks/windows/docker-compose.yml +++ b/portainer-compose-stacks/windows/docker-compose.yml @@ -1,3 +1,7 @@ +# Windows Container - Shared Mode (No GPU Passthrough) +# Host and container can both run simultaneously +# Windows gets VirtIO display, host keeps AMD GPU +# For GPU passthrough, use: docker-compose.gpu-passthrough.yml services: windows: image: dockurr/windows # https://github.com/dockur/windows @@ -6,15 +10,10 @@ services: VERSION: "11" RAM_SIZE: "8G" CPU_CORES: "4" - GPU: "Y" - ARGUMENTS: "-device vfio-pci,host=c5:00.0,addr=0x05,multifunction=on -device vfio-pci,host=c5:00.1,addr=0x05.1" devices: - /dev/kvm - /dev/net/tun - /dev/nvme0n1p8:/disk2 - - /dev/vfio/vfio - - /dev/vfio/20 - - /dev/vfio/21 cap_add: - NET_ADMIN privileged: true diff --git a/portainer-compose-stacks/windows/unbind-gpu-from-vfio.sh b/portainer-compose-stacks/windows/unbind-gpu-from-vfio.sh new file mode 100644 index 0000000..53b1b6c --- /dev/null +++ b/portainer-compose-stacks/windows/unbind-gpu-from-vfio.sh @@ -0,0 +1,82 @@ +#!/bin/bash + +# Unbind the AMD GPU from VFIO and restore it to the host (amdgpu driver) +# Run this AFTER stopping the Windows container to restore host GPU access + +set -e + +if [ "$EUID" -ne 0 ]; then + echo "Please run as root: sudo $0" + exit 1 +fi + +echo "=== Unbinding AMD GPU from VFIO ===" +echo "" + +GPU_PCI="0000:c5:00.0" +AUDIO_PCI="0000:c5:00.1" + +# Stop Windows container first +echo "Checking if Windows container is running..." +if docker ps | grep -q windows2; then + echo "Stopping Windows container..." + docker stop windows2 + echo "✓ Container stopped" +fi + +# Unbind from vfio-pci +echo "" +echo "Unbinding from vfio-pci..." +if [ -e /sys/bus/pci/devices/$GPU_PCI/driver ]; then + echo "$GPU_PCI" > /sys/bus/pci/devices/$GPU_PCI/driver/unbind + echo "✓ GPU unbound from vfio-pci" +fi + +if [ -e /sys/bus/pci/devices/$AUDIO_PCI/driver ]; then + echo "$AUDIO_PCI" > /sys/bus/pci/devices/$AUDIO_PCI/driver/unbind + echo "✓ Audio unbound from vfio-pci" +fi + +# Remove device IDs from vfio-pci +echo "1002 1586" > /sys/bus/pci/drivers/vfio-pci/remove_id 2>/dev/null || true +echo "1002 1640" > /sys/bus/pci/drivers/vfio-pci/remove_id 2>/dev/null || true + +sleep 1 + +# Rebind to host drivers +echo "" +echo "Binding back to host drivers..." +echo "$GPU_PCI" > /sys/bus/pci/drivers_probe +echo "$AUDIO_PCI" > /sys/bus/pci/drivers_probe + +sleep 2 + +# Verify +GPU_DRIVER=$(lspci -nnk -s c5:00.0 | grep "Kernel driver in use" | awk '{print $5}') +AUDIO_DRIVER=$(lspci -nnk -s c5:00.1 | grep "Kernel driver in use" | awk '{print $5}') + +echo "" +echo "=== Status ===" +if [ "$GPU_DRIVER" = "amdgpu" ]; then + echo "✓ GPU restored to amdgpu" +else + echo "⚠ GPU bound to: ${GPU_DRIVER:-none}" +fi + +if [ "$AUDIO_DRIVER" = "snd_hda_intel" ]; then + echo "✓ Audio restored to snd_hda_intel" +else + echo "⚠ Audio bound to: ${AUDIO_DRIVER:-none}" +fi + +echo "" +if [ "$GPU_DRIVER" = "amdgpu" ]; then + echo "✓ GPU restored to host!" + echo "" + echo "You may need to restart your display manager:" + echo " sudo systemctl restart gdm3 # for GNOME" + echo " sudo systemctl restart lightdm # for XFCE/other" +else + echo "⚠ GPU not fully restored. You may need to reboot." +fi +