try fixing GPU (torch)
This commit is contained in:
186
docs/USING_EXISTING_ROCM_CONTAINER.md
Normal file
186
docs/USING_EXISTING_ROCM_CONTAINER.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Using Existing ROCm Container for Development
|
||||
|
||||
## Current Status
|
||||
|
||||
✅ **You already have ROCm PyTorch working on the host!**
|
||||
|
||||
```bash
|
||||
PyTorch: 2.5.1+rocm6.2
|
||||
CUDA available: True
|
||||
Device: AMD Radeon Graphics (Strix Halo)
|
||||
Memory: 47.0 GB
|
||||
```
|
||||
|
||||
## Recommendation: Use Host Environment
|
||||
|
||||
**Since your host venv already has ROCm support working, this is the simplest option:**
|
||||
|
||||
```bash
|
||||
cd /mnt/shared/DEV/repos/d-popov.com/gogo2
|
||||
source venv/bin/activate
|
||||
python ANNOTATE/web/app.py
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Already configured
|
||||
- ✅ No container overhead
|
||||
- ✅ Direct file access
|
||||
- ✅ GPU works perfectly
|
||||
|
||||
## Alternative: Use Existing Container
|
||||
|
||||
You have these containers running:
|
||||
- `amd-strix-halo-llama-rocm` - ROCm 7rc (port 8080)
|
||||
- `amd-strix-halo-llama-vulkan-radv` - Vulkan RADV (port 8081)
|
||||
- `amd-strix-halo-llama-vulkan-amdvlk` - Vulkan AMDVLK (port 8082)
|
||||
|
||||
### Option 1: Quick Attach Script
|
||||
|
||||
```bash
|
||||
./scripts/attach-to-rocm-container.sh
|
||||
```
|
||||
|
||||
This script will:
|
||||
1. Check if project is accessible in container
|
||||
2. Offer to copy project if needed
|
||||
3. Check/install Python if needed
|
||||
4. Check/install PyTorch if needed
|
||||
5. Attach you to a bash shell
|
||||
|
||||
### Option 2: Manual Setup
|
||||
|
||||
#### A. Copy Project to Container
|
||||
|
||||
```bash
|
||||
# Create workspace in container
|
||||
docker exec amd-strix-halo-llama-rocm mkdir -p /workspace
|
||||
|
||||
# Copy project
|
||||
docker cp /mnt/shared/DEV/repos/d-popov.com/gogo2 amd-strix-halo-llama-rocm:/workspace/
|
||||
|
||||
# Enter container
|
||||
docker exec -it amd-strix-halo-llama-rocm bash
|
||||
```
|
||||
|
||||
#### B. Install Python (if needed)
|
||||
|
||||
Inside container:
|
||||
```bash
|
||||
# Fedora-based container
|
||||
dnf install -y python3.12 python3-pip python3-devel git
|
||||
|
||||
# Create symlinks
|
||||
ln -sf /usr/bin/python3.12 /usr/bin/python3
|
||||
ln -sf /usr/bin/python3.12 /usr/bin/python
|
||||
```
|
||||
|
||||
#### C. Install Dependencies
|
||||
|
||||
Inside container:
|
||||
```bash
|
||||
cd /workspace/gogo2
|
||||
|
||||
# Install PyTorch with ROCm
|
||||
pip3 install torch --index-url https://download.pytorch.org/whl/rocm6.2
|
||||
|
||||
# Install project dependencies
|
||||
pip3 install -r requirements.txt
|
||||
```
|
||||
|
||||
#### D. Run Application
|
||||
|
||||
```bash
|
||||
# Run ANNOTATE dashboard
|
||||
python3 ANNOTATE/web/app.py
|
||||
|
||||
# Or run training
|
||||
python3 training_runner.py --mode realtime --duration 4
|
||||
```
|
||||
|
||||
### Option 3: Mount Project on Container Restart
|
||||
|
||||
Add volume mount to your docker-compose:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
amd-strix-halo-llama-rocm:
|
||||
volumes:
|
||||
- /mnt/shared/DEV/repos/d-popov.com/gogo2:/workspace/gogo2:rw
|
||||
```
|
||||
|
||||
Then restart:
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
## Port Conflicts
|
||||
|
||||
Your ROCm container uses port 8080, which conflicts with COBY API.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Use host environment** (no conflict)
|
||||
2. **Change ANNOTATE port** in container:
|
||||
```bash
|
||||
python3 ANNOTATE/web/app.py --port 8051
|
||||
```
|
||||
3. **Expose different port** when starting container
|
||||
|
||||
## Comparison
|
||||
|
||||
| Aspect | Host (venv) | Container |
|
||||
|--------|-------------|-----------|
|
||||
| Setup | ✅ Already done | ⚠️ Needs Python install |
|
||||
| GPU | ✅ Working | ✅ Should work |
|
||||
| Files | ✅ Direct access | ⚠️ Need to copy/mount |
|
||||
| Performance | ✅ Native | ⚠️ Small overhead |
|
||||
| Isolation | ⚠️ Shares host | ✅ Isolated |
|
||||
| Simplicity | ✅ Just works | ⚠️ Extra steps |
|
||||
|
||||
## Quick Commands
|
||||
|
||||
### Host Development (Recommended)
|
||||
|
||||
```bash
|
||||
cd /mnt/shared/DEV/repos/d-popov.com/gogo2
|
||||
source venv/bin/activate
|
||||
python ANNOTATE/web/app.py
|
||||
```
|
||||
|
||||
### Container Development
|
||||
|
||||
```bash
|
||||
# Method 1: Use helper script
|
||||
./scripts/attach-to-rocm-container.sh
|
||||
|
||||
# Method 2: Manual attach
|
||||
docker exec -it amd-strix-halo-llama-rocm bash
|
||||
cd /workspace/gogo2
|
||||
python3 ANNOTATE/web/app.py
|
||||
```
|
||||
|
||||
### Check GPU in Container
|
||||
|
||||
```bash
|
||||
docker exec amd-strix-halo-llama-rocm rocm-smi
|
||||
docker exec amd-strix-halo-llama-rocm python3 -c "import torch; print(torch.cuda.is_available())"
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
**For your use case (avoid heavy downloads):**
|
||||
|
||||
→ **Use the host environment** - Your venv already has everything working perfectly!
|
||||
|
||||
**Only use container if you need:**
|
||||
- Complete isolation from host
|
||||
- Specific ROCm version testing
|
||||
- Multiple parallel environments
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-11-12
|
||||
**Status:** Host venv with ROCm 6.2 is ready to use
|
||||
|
||||
|
||||
Reference in New Issue
Block a user