try fixing GPU (torch)

2025-11-17 13:06:37 +02:00
parent 4fcadcdbff
commit 43a7d75daf
9 changed files with 1393 additions and 11 deletions
--- a/docs/USING_EXISTING_ROCM_CONTAINER.md
+++ b/docs/USING_EXISTING_ROCM_CONTAINER.md
@@ -0,0 +1,186 @@
+# Using Existing ROCm Container for Development
+
+## Current Status
+
+✅ **You already have ROCm PyTorch working on the host!**
+
+```bash
+PyTorch: 2.5.1+rocm6.2
+CUDA available: True
+Device: AMD Radeon Graphics (Strix Halo)
+Memory: 47.0 GB
+```
+
+## Recommendation: Use Host Environment
+
+**Since your host venv already has ROCm support working, this is the simplest option:**
+
+```bash
+cd /mnt/shared/DEV/repos/d-popov.com/gogo2
+source venv/bin/activate
+python ANNOTATE/web/app.py
+```
+
+**Benefits:**
+- ✅ Already configured
+- ✅ No container overhead
+- ✅ Direct file access
+- ✅ GPU works perfectly
+
+## Alternative: Use Existing Container
+
+You have these containers running:
+- `amd-strix-halo-llama-rocm` - ROCm 7rc (port 8080)
+- `amd-strix-halo-llama-vulkan-radv` - Vulkan RADV (port 8081)  
+- `amd-strix-halo-llama-vulkan-amdvlk` - Vulkan AMDVLK (port 8082)
+
+### Option 1: Quick Attach Script
+
+```bash
+./scripts/attach-to-rocm-container.sh
+```
+
+This script will:
+1. Check if project is accessible in container
+2. Offer to copy project if needed
+3. Check/install Python if needed
+4. Check/install PyTorch if needed
+5. Attach you to a bash shell
+
+### Option 2: Manual Setup
+
+#### A. Copy Project to Container
+
+```bash
+# Create workspace in container
+docker exec amd-strix-halo-llama-rocm mkdir -p /workspace
+
+# Copy project
+docker cp /mnt/shared/DEV/repos/d-popov.com/gogo2 amd-strix-halo-llama-rocm:/workspace/
+
+# Enter container
+docker exec -it amd-strix-halo-llama-rocm bash
+```
+
+#### B. Install Python (if needed)
+
+Inside container:
+```bash
+# Fedora-based container
+dnf install -y python3.12 python3-pip python3-devel git
+
+# Create symlinks
+ln -sf /usr/bin/python3.12 /usr/bin/python3
+ln -sf /usr/bin/python3.12 /usr/bin/python
+```
+
+#### C. Install Dependencies
+
+Inside container:
+```bash
+cd /workspace/gogo2
+
+# Install PyTorch with ROCm
+pip3 install torch --index-url https://download.pytorch.org/whl/rocm6.2
+
+# Install project dependencies
+pip3 install -r requirements.txt
+```
+
+#### D. Run Application
+
+```bash
+# Run ANNOTATE dashboard
+python3 ANNOTATE/web/app.py
+
+# Or run training
+python3 training_runner.py --mode realtime --duration 4
+```
+
+### Option 3: Mount Project on Container Restart
+
+Add volume mount to your docker-compose:
+
+```yaml
+services:
+  amd-strix-halo-llama-rocm:
+    volumes:
+      - /mnt/shared/DEV/repos/d-popov.com/gogo2:/workspace/gogo2:rw
+```
+
+Then restart:
+```bash
+docker-compose down
+docker-compose up -d
+```
+
+## Port Conflicts
+
+Your ROCm container uses port 8080, which conflicts with COBY API.
+
+**Solutions:**
+
+1. **Use host environment** (no conflict)
+2. **Change ANNOTATE port** in container:
+   ```bash
+   python3 ANNOTATE/web/app.py --port 8051
+   ```
+3. **Expose different port** when starting container
+
+## Comparison
+
+| Aspect | Host (venv) | Container |
+|--------|-------------|-----------|
+| Setup | ✅ Already done | ⚠️ Needs Python install |
+| GPU | ✅ Working | ✅ Should work |
+| Files | ✅ Direct access | ⚠️ Need to copy/mount |
+| Performance | ✅ Native | ⚠️ Small overhead |
+| Isolation | ⚠️ Shares host | ✅ Isolated |
+| Simplicity | ✅ Just works | ⚠️ Extra steps |
+
+## Quick Commands
+
+### Host Development (Recommended)
+
+```bash
+cd /mnt/shared/DEV/repos/d-popov.com/gogo2
+source venv/bin/activate
+python ANNOTATE/web/app.py
+```
+
+### Container Development
+
+```bash
+# Method 1: Use helper script
+./scripts/attach-to-rocm-container.sh
+
+# Method 2: Manual attach
+docker exec -it amd-strix-halo-llama-rocm bash
+cd /workspace/gogo2
+python3 ANNOTATE/web/app.py
+```
+
+### Check GPU in Container
+
+```bash
+docker exec amd-strix-halo-llama-rocm rocm-smi
+docker exec amd-strix-halo-llama-rocm python3 -c "import torch; print(torch.cuda.is_available())"
+```
+
+## Summary
+
+**For your use case (avoid heavy downloads):**
+
+→ **Use the host environment** - Your venv already has everything working perfectly!
+
+**Only use container if you need:**
+- Complete isolation from host
+- Specific ROCm version testing
+- Multiple parallel environments
+
+---
+
+**Last Updated:** 2025-11-12  
+**Status:** Host venv with ROCm 6.2 is ready to use
+
+