gpu passtrough moved to separate file

This commit is contained in:
Dobromir Popov
2025-11-22 14:48:58 +02:00
parent 23b4e1a8ee
commit 185d4f520b
6 changed files with 414 additions and 25 deletions

View File

@@ -2,12 +2,17 @@
This guide configures PCI passthrough for the AMD Strix Halo integrated GPU to the Windows Docker container, enabling GPU-accelerated applications. This guide configures PCI passthrough for the AMD Strix Halo integrated GPU to the Windows Docker container, enabling GPU-accelerated applications.
## Important Note: Manual Binding Approach
This setup uses **manual GPU binding** to avoid host display issues. The GPU remains available to the host by default, and you manually bind it to VFIO only when starting the Windows container. This prevents system freezing at boot on newer kernels.
## Problem ## Problem
The Windows container was showing "Red Hat VirtIO GPU DOD Controller" instead of the AMD GPU because: The Windows container was showing "Red Hat VirtIO GPU DOD Controller" instead of the AMD GPU because:
- IOMMU was disabled (`amd_iommu=off`) - IOMMU was disabled (`amd_iommu=off`)
- GPU was not passed through at PCI level - GPU was not passed through at PCI level
- Only `/dev/dri` devices were exposed (insufficient for Windows) - Only `/dev/dri` devices were exposed (insufficient for Windows)
- Early VFIO binding caused host display to freeze
## Solution Overview ## Solution Overview
@@ -23,50 +28,56 @@ The Windows container was showing "Red Hat VirtIO GPU DOD Controller" instead of
## Setup Instructions ## Setup Instructions
### Step 1: Run Setup Script ### Step 1: Fix GRUB Configuration (If You Had Freezing Issues)
If your system was freezing at login on newer kernels:
```bash ```bash
cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows
sudo ./setup-gpu-passthrough.sh sudo ./fix-grub-remove-vfio-ids.sh
``` ```
This script will: This removes early VFIO binding that causes the host to lose GPU access.
- Enable IOMMU in GRUB (`amd_iommu=on iommu=pt`)
- Configure VFIO to claim the GPU devices
- Update initramfs with VFIO modules
- Create necessary configuration files
### Step 2: Reboot ### Step 2: Update GRUB and Reboot
```bash ```bash
sudo update-grub
sudo reboot sudo reboot
``` ```
**IMPORTANT**: The system MUST be rebooted for IOMMU and VFIO changes to take effect. **IMPORTANT**: After reboot, IOMMU will be enabled but GPU remains available to host.
### Step 3: Verify Setup (After Reboot) ### Step 3: Before Starting Windows Container - Bind GPU
Every time you want to use GPU passthrough, run:
```bash ```bash
cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows
./verify-gpu-passthrough.sh sudo ./bind-gpu-to-vfio.sh
``` ```
Expected output: This temporarily binds the GPU to VFIO (host display will stop working).
- ✓ IOMMU is enabled
- ✓ vfio_pci module loaded
- ✓ GPU bound to vfio-pci
- ✓ Audio bound to vfio-pci
- ✓ /dev/vfio/vfio exists
### Step 4: Start Windows Container ### Step 4: Start Windows Container
```bash ```bash
cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows
docker-compose down # Stop if running docker compose up -d
docker-compose up -d
``` ```
### Step 5: Install AMD Drivers in Windows ### Step 5: When Done - Restore GPU to Host
After stopping the Windows container, restore GPU to host:
```bash
cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows
sudo ./unbind-gpu-from-vfio.sh
```
This rebinds the GPU to amdgpu driver for host use.
### Step 6: Install AMD Drivers in Windows
1. Connect to Windows via RDP: `localhost:3389` 1. Connect to Windows via RDP: `localhost:3389`
2. Open Device Manager 2. Open Device Manager

View File

@@ -0,0 +1,185 @@
# GPU Sharing: Windows vs Linux Explained
## Can You Share GPU Between Host and Container?
**Short Answer**: No, not with VFIO PCI passthrough on Linux.
## Why Windows Can Do It (Hyper-V GPU-PV)
### Windows Hyper-V GPU Paravirtualization
- **Technology**: Microsoft's proprietary GPU virtualization
- **How it works**: GPU stays with Windows host, VMs get "virtual GPU slices"
- **Requirements**:
- Windows host (Server or Pro with Hyper-V)
- Windows guests
- Specific GPU support (mostly newer Intel/AMD/NVIDIA)
- **Benefits**:
- ✓ Multiple VMs share one GPU
- ✓ Host keeps display working
- ✓ Decent performance for most workloads
- **Limitations**:
- Windows only (host + guest)
- Not full GPU performance
- Limited GPU features
## Why Linux QEMU/VFIO Can't Share
### VFIO PCI Passthrough
- **Technology**: Hardware-level device passthrough (Linux kernel feature)
- **How it works**: Entire GPU is "unplugged" from host and given to guest
- **Benefits**:
- ✓ Near-native performance
- ✓ Full GPU features
- ✓ Works cross-platform (any guest OS)
- **Limitations**:
- ✗ Exclusive access only (either host OR guest)
- ✗ Host loses display when GPU passed through
- ✗ Cannot share between multiple VMs
## Your Options for AMD Strix Halo iGPU
### Option 1: Shared Software Rendering (Recommended)
**Configuration**: No GPU passthrough
**How it works**:
- Host uses GPU normally (amdgpu driver)
- Windows container gets VirtIO virtual GPU
- Both host and container work simultaneously
- Software rendering in container (accelerated by host GPU)
**Pros**:
- ✓ Host display works
- ✓ Container auto-starts
- ✓ Both usable at same time
- ✓ Simple, no binding scripts
**Cons**:
- ✗ No native GPU in Windows
- ✗ Limited GPU performance in Windows
- ✗ No GPU-Z, no AMD drivers in Windows
**Best for**:
- General Windows usage
- When you need host display
- Development/testing
- Light workloads in Windows
**Current docker-compose setup**: This is now configured (I just updated it)
---
### Option 2: Exclusive GPU Passthrough
**Configuration**: VFIO PCI passthrough (manual binding)
**How it works**:
1. Bind GPU to VFIO (host display freezes)
2. Start Windows container
3. Windows gets real AMD GPU
4. Stop container and unbind to restore host
**Pros**:
- ✓ Full AMD GPU in Windows
- ✓ Native performance
- ✓ GPU-accelerated apps work
- ✓ AMD drivers install
**Cons**:
- ✗ Host display frozen (no GUI)
- ✗ Exclusive - can't use both
- ✗ Manual binding required
- ✗ Access host via SSH only
**Best for**:
- GPU-intensive Windows apps
- Machine learning in Windows
- Gaming (if that's possible)
- When maximum GPU performance needed
**Workflow**:
```bash
# Start Windows with GPU
sudo ./bind-gpu-to-vfio.sh # Host display goes black!
docker compose -f docker-compose.gpu-passthrough.yml up -d
# Stop and restore
docker compose -f docker-compose.gpu-passthrough.yml down
sudo ./unbind-gpu-from-vfio.sh
```
---
## Technologies That DON'T Work Here
### SR-IOV (Single Root I/O Virtualization)
- Requires GPU hardware support
- Consumer GPUs (like Strix Halo) don't have it
- Mostly enterprise data center GPUs
### AMD MxGPU / NVIDIA vGPU
- Enterprise GPU virtualization
- Requires special drivers + licensed enterprise GPUs
- Not available for consumer iGPUs
### GVT-g (Intel GPU Virtualization)
- Intel only
- Not available for AMD GPUs
### Looking Glass
- Allows viewing GPU output from guest
- Still exclusive passthrough (guest owns GPU)
- Just a viewer, not sharing
## What About DRI/DRM Passthrough?
You might think: "Can we pass `/dev/dri` to share?"
**Tried this already** - it doesn't work for Windows because:
- Windows needs PCI-level GPU access
- `/dev/dri` is Linux-specific (won't work in Windows)
- Windows drivers expect real PCI GPU device
## Comparison Table
| Feature | Shared (VirtIO) | Exclusive (VFIO) | Windows Hyper-V GPU-PV |
|---------|----------------|------------------|------------------------|
| Host display works | ✓ | ✗ | ✓ |
| Container auto-start | ✓ | ✗ | ✓ |
| Both usable together | ✓ | ✗ | ✓ |
| Native GPU in Windows | ✗ | ✓ | ~ (virtual) |
| GPU performance | Low | High | Medium |
| Setup complexity | Easy | Complex | Medium |
| Requires manual binding | ✗ | ✓ | ✗ |
## Recommendation
**For your use case**, I recommend:
### Start with Option 1 (Shared - No Passthrough)
- Container works
- Host works
- Both at same time
- Simple setup
**If Windows GPU performance is too slow**, then consider:
- Adding a second GPU to host (dedicate one to passthrough)
- Running Windows on bare metal for GPU workloads
- Using cloud GPU instances for heavy GPU tasks
## Current Configuration
I've just updated your `docker-compose.yml` to **Option 1 (Shared)**:
- Removed GPU passthrough
- Removed VFIO devices
- Container can auto-start
- Host display continues working
**Want to test it?**
```bash
cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows
docker compose up -d
```
Windows will start with VirtIO display. Both host and container will work simultaneously.
**Need GPU passthrough later?** I can create a separate docker-compose file for that use case.

View File

@@ -0,0 +1,79 @@
#!/bin/bash
# Manually bind the AMD GPU to VFIO for Windows container passthrough
# Run this BEFORE starting the Windows container
set -e
if [ "$EUID" -ne 0 ]; then
echo "Please run as root: sudo $0"
exit 1
fi
echo "=== Binding AMD GPU to VFIO ==="
echo ""
GPU_PCI="0000:c5:00.0"
AUDIO_PCI="0000:c5:00.1"
# Check if vfio-pci module is loaded
if ! lsmod | grep -q vfio_pci; then
echo "Loading vfio-pci module..."
modprobe vfio-pci
fi
# Unbind GPU from amdgpu
echo "Unbinding GPU from amdgpu..."
if [ -e /sys/bus/pci/devices/$GPU_PCI/driver ]; then
echo "$GPU_PCI" > /sys/bus/pci/devices/$GPU_PCI/driver/unbind
echo "✓ GPU unbound from amdgpu"
else
echo "GPU not bound to any driver"
fi
# Unbind audio from snd_hda_intel
echo "Unbinding audio from snd_hda_intel..."
if [ -e /sys/bus/pci/devices/$AUDIO_PCI/driver ]; then
echo "$AUDIO_PCI" > /sys/bus/pci/devices/$AUDIO_PCI/driver/unbind
echo "✓ Audio unbound"
else
echo "Audio not bound to any driver"
fi
# Bind to vfio-pci
echo ""
echo "Binding to vfio-pci..."
echo "1002 1586" > /sys/bus/pci/drivers/vfio-pci/new_id 2>/dev/null || echo "GPU ID already registered"
echo "1002 1640" > /sys/bus/pci/drivers/vfio-pci/new_id 2>/dev/null || echo "Audio ID already registered"
sleep 1
# Verify
GPU_DRIVER=$(lspci -nnk -s c5:00.0 | grep "Kernel driver in use" | awk '{print $5}')
AUDIO_DRIVER=$(lspci -nnk -s c5:00.1 | grep "Kernel driver in use" | awk '{print $5}')
echo ""
echo "=== Status ==="
if [ "$GPU_DRIVER" = "vfio-pci" ]; then
echo "✓ GPU bound to vfio-pci"
else
echo "✗ GPU bound to: ${GPU_DRIVER:-none}"
fi
if [ "$AUDIO_DRIVER" = "vfio-pci" ]; then
echo "✓ Audio bound to vfio-pci"
else
echo "✗ Audio bound to: ${AUDIO_DRIVER:-none}"
fi
echo ""
if [ "$GPU_DRIVER" = "vfio-pci" ] && [ "$AUDIO_DRIVER" = "vfio-pci" ]; then
echo "✓ Ready for GPU passthrough!"
echo ""
echo "Now start the Windows container:"
echo " cd /mnt/shared/DEV/repos/d-popov.com/scripts/portainer-compose-stacks/windows"
echo " docker compose up -d"
else
echo "✗ Binding failed. Check errors above."
fi

View File

@@ -0,0 +1,33 @@
services:
windows:
image: dockurr/windows # https://github.com/dockur/windows
container_name: windows-gpu
environment:
VERSION: "11"
RAM_SIZE: "8G"
CPU_CORES: "4"
GPU: "Y"
ARGUMENTS: "-device vfio-pci,host=c5:00.0,addr=0x05,multifunction=on,rombar=0 -device vfio-pci,host=c5:00.1,addr=0x05.1"
devices:
- /dev/kvm
- /dev/net/tun
- /dev/nvme0n1p8:/disk2
- /dev/vfio/vfio
- /dev/vfio/20
- /dev/vfio/21
cap_add:
- NET_ADMIN
privileged: true
ports:
- 445:445
- 1433:1433
- 8006:8006
- 3389:3389/tcp
- 3389:3389/udp
volumes:
# - /dev/nvme0n1p7:/disk1 # blind mount - not working for now
- /mnt/data/docker_vol/windows:/storage # storage (img file)location
- /mnt/shared:/data
restart: "no" # Manual start only - requires GPU binding first
stop_grace_period: 2m

View File

@@ -1,3 +1,7 @@
# Windows Container - Shared Mode (No GPU Passthrough)
# Host and container can both run simultaneously
# Windows gets VirtIO display, host keeps AMD GPU
# For GPU passthrough, use: docker-compose.gpu-passthrough.yml
services: services:
windows: windows:
image: dockurr/windows # https://github.com/dockur/windows image: dockurr/windows # https://github.com/dockur/windows
@@ -6,15 +10,10 @@ services:
VERSION: "11" VERSION: "11"
RAM_SIZE: "8G" RAM_SIZE: "8G"
CPU_CORES: "4" CPU_CORES: "4"
GPU: "Y"
ARGUMENTS: "-device vfio-pci,host=c5:00.0,addr=0x05,multifunction=on -device vfio-pci,host=c5:00.1,addr=0x05.1"
devices: devices:
- /dev/kvm - /dev/kvm
- /dev/net/tun - /dev/net/tun
- /dev/nvme0n1p8:/disk2 - /dev/nvme0n1p8:/disk2
- /dev/vfio/vfio
- /dev/vfio/20
- /dev/vfio/21
cap_add: cap_add:
- NET_ADMIN - NET_ADMIN
privileged: true privileged: true

View File

@@ -0,0 +1,82 @@
#!/bin/bash
# Unbind the AMD GPU from VFIO and restore it to the host (amdgpu driver)
# Run this AFTER stopping the Windows container to restore host GPU access
set -e
if [ "$EUID" -ne 0 ]; then
echo "Please run as root: sudo $0"
exit 1
fi
echo "=== Unbinding AMD GPU from VFIO ==="
echo ""
GPU_PCI="0000:c5:00.0"
AUDIO_PCI="0000:c5:00.1"
# Stop Windows container first
echo "Checking if Windows container is running..."
if docker ps | grep -q windows2; then
echo "Stopping Windows container..."
docker stop windows2
echo "✓ Container stopped"
fi
# Unbind from vfio-pci
echo ""
echo "Unbinding from vfio-pci..."
if [ -e /sys/bus/pci/devices/$GPU_PCI/driver ]; then
echo "$GPU_PCI" > /sys/bus/pci/devices/$GPU_PCI/driver/unbind
echo "✓ GPU unbound from vfio-pci"
fi
if [ -e /sys/bus/pci/devices/$AUDIO_PCI/driver ]; then
echo "$AUDIO_PCI" > /sys/bus/pci/devices/$AUDIO_PCI/driver/unbind
echo "✓ Audio unbound from vfio-pci"
fi
# Remove device IDs from vfio-pci
echo "1002 1586" > /sys/bus/pci/drivers/vfio-pci/remove_id 2>/dev/null || true
echo "1002 1640" > /sys/bus/pci/drivers/vfio-pci/remove_id 2>/dev/null || true
sleep 1
# Rebind to host drivers
echo ""
echo "Binding back to host drivers..."
echo "$GPU_PCI" > /sys/bus/pci/drivers_probe
echo "$AUDIO_PCI" > /sys/bus/pci/drivers_probe
sleep 2
# Verify
GPU_DRIVER=$(lspci -nnk -s c5:00.0 | grep "Kernel driver in use" | awk '{print $5}')
AUDIO_DRIVER=$(lspci -nnk -s c5:00.1 | grep "Kernel driver in use" | awk '{print $5}')
echo ""
echo "=== Status ==="
if [ "$GPU_DRIVER" = "amdgpu" ]; then
echo "✓ GPU restored to amdgpu"
else
echo "⚠ GPU bound to: ${GPU_DRIVER:-none}"
fi
if [ "$AUDIO_DRIVER" = "snd_hda_intel" ]; then
echo "✓ Audio restored to snd_hda_intel"
else
echo "⚠ Audio bound to: ${AUDIO_DRIVER:-none}"
fi
echo ""
if [ "$GPU_DRIVER" = "amdgpu" ]; then
echo "✓ GPU restored to host!"
echo ""
echo "You may need to restart your display manager:"
echo " sudo systemctl restart gdm3 # for GNOME"
echo " sudo systemctl restart lightdm # for XFCE/other"
else
echo "⚠ GPU not fully restored. You may need to reboot."
fi