GTX 970 GPU Passthrough - FINAL RESOLUTION PLAN (Reinstall Strategy)¶
Date: January 24, 2026 Status: CRITICAL - HOST OS INCOMPATIBILITY CONFIRMED Target System: Proxmox VE 8.2 (Stable)
🚨 CRITICAL DIAGNOSIS: Proxmox VE 9.1 Incompatibility¶
The Issue: You are currently running Proxmox VE 9.1 (Development Preview) based on Debian 13 "Trixie" (Testing).
- Kernel 6.17: Contains broken headers preventing NVIDIA driver compilation.
- Dependency Hell: Attempting to downgrade to Kernel 6.8 fails because the base system libraries (libc6, etc.) in Debian Trixie are too new for older kernels.
- NVIDIA Drivers: The GTX 970 requires stable kernel interfaces that are not present or are broken in the bleeding-edge specific kernel provided by this dev build.
The Solution: We must stop fighting the OS. The only reliable path forward is to install the STABLE version of Proxmox VE.
🛑 STOP - ACTION REQUIRED¶
Do not attempt further driver installations on this current system. It will essentially lead to more broken packages.
RECOMMENDATION: CLEAN REINSTALL¶
We need to reinstall the host OS to Proxmox VE 8.2 (ISO Installer). This version uses Kernel 6.8 by default, which is fully compatible with:
- NVIDIA GTX 970 (Maxwell)
- NVIDIA Drivers (535.xx / 550.xx)
- Jellyfin LXC Passthrough
Step-by-Step Recovery Plan¶
Phase 1: Preparation & Backup (Current System)¶
-
Backup LXC/VM Configs:
- If you can access the web interface, backup your LXC (ID 100) to an external drive or download the backup file.
- If you cannot access the web interface, copy the config:
cat /etc/pve/lxc/100.conf > /root/100.conf.bak(Save this content text) - Backup your Docker data volume if it's on the host (e.g.,
/var/lib/docker/volumes).
-
Download Proxmox VE 8.2 ISO:
- Go to Proxmox Downloads
- Download Proxmox VE 8.2 ISO Installer.
Phase 2: Reinstallation¶
- Flash the ISO to a USB stick (using Rufus or Etcher).
- Boot the server from USB.
- Install Proxmox VE 8.2 freshly (Wipe the OS disk).
Phase 3: The "Happy Path" Installation (Once PVE 8.2 is running)¶
This is what we will do on the fresh stable system. It works 100% of the time.
-
Update Repositories (Non-Subscription):
-
Install Headers & Drivers:
apt update && apt dist-upgrade -y apt install pve-headers apt install nvidia-driver firmware-misc-nonfree(Note: On PVE 8.2/Debian 12, this Just Works™ without the dependency errors you saw on Trixie.)
-
Pass to LXC: Add to
/etc/pve/lxc/100.conf:lxc.cgroup2.devices.allow: c 195:* rwm lxc.cgroup2.devices.allow: c 237:* rwm lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
Decision Point¶
Are you ready to proceed with the reinstall? If yes, I can help you verify your backups before you wipe the drive.# nvidia/550.163.01, 6.17.4-2-pve, x86_64: installed
Load NVIDIA modules¶
modprobe nvidia modprobe nvidia_uvm modprobe nvidia_drm modeset=1
Verify modules loaded¶
lsmod | grep nvidia
Check GPU detection¶
nvidia-smi
Expected output: GPU information, driver version, CUDA version¶
### Phase 6: Configure LXC Container (HOST)
**Edit LXC container configuration:**
```bash
# On Proxmox host
nano /etc/pve/lxc/100.conf
# Add these lines:
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 509:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
# For unprivileged containers, also add:
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 65491
lxc.idmap: g 104 104 1
lxc.idmap: g 105 100105 65431
Note: Replace 44 with render group ID and 104 with video group ID from host:
Phase 7: Install NVIDIA Container Toolkit (INSIDE LXC)¶
# Enter the LXC container
pct enter 100
# Add NVIDIA Container Toolkit repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
apt install nvidia-container-toolkit -y
# Configure Docker to use NVIDIA runtime
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
# Verify Docker sees GPU
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
Phase 8: Configure Jellyfin for Hardware Transcoding (INSIDE LXC)¶
Update Jellyfin Docker Compose:
version: "3.8"
services:
jellyfin:
image: jellyfin/jellyfin:latest
container_name: jellyfin
user: 1000:44 # Replace 44 with render group from host
network_mode: host
volumes:
- /path/to/jellyfin/config:/config
- /path/to/jellyfin/cache:/cache
- /path/to/media:/media:ro
devices:
- /dev/dri:/dev/dri
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu, video, compute, utility]
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
restart: unless-stopped
Or if using docker run:
docker run -d \
--name=jellyfin \
--gpus all \
--device=/dev/dri:/dev/dri \
-e NVIDIA_VISIBLE_DEVICES=all \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-v /path/to/config:/config \
-v /path/to/cache:/cache \
-v /path/to/media:/media:ro \
--user 1000:44 \
--net=host \
--restart=unless-stopped \
jellyfin/jellyfin:latest
Enable hardware transcoding in Jellyfin:
- Navigate to Jellyfin Dashboard → Playback
- Enable "NVIDIA NVENC" under Hardware Acceleration
- Check "NVIDIA NVENC" for H264, HEVC encoding
- Save settings
Phase 9: Verification¶
# On Proxmox host
nvidia-smi
# Inside LXC
pct enter 100
ls -la /dev/dri/
ls -la /dev/nvidia*
docker exec jellyfin nvidia-smi
# Test transcoding
# Play a video in Jellyfin and check:
nvidia-smi # Should show jellyfin process using GPU
Part 4: Troubleshooting Guide¶
Issue: firmware-nvidia-gsp Package Not Found¶
Solution 1: Use Debian Sid packages temporarily
# On Proxmox host
echo "deb http://deb.debian.org/debian/ sid main contrib non-free non-free-firmware" > /etc/apt/sources.list.d/debian-sid-temp.list
apt update
apt install -t sid firmware-nvidia-gsp
rm /etc/apt/sources.list.d/debian-sid-temp.list
apt update
Solution 2: Use 470.xx legacy driver (no GSP firmware needed)
Issue: DKMS Build Fails¶
# Check kernel headers
ls /usr/src/linux-headers-$(uname -r)
# Reinstall headers
apt install --reinstall pve-headers-$(uname -r)
# Check DKMS logs
dkms status
cat /var/lib/dkms/nvidia/*/build/make.log
Issue: nvidia-smi Shows "No Devices Were Found"¶
# Verify module loading
lsmod | grep nvidia
modprobe nvidia
# Check dmesg for errors
dmesg | grep -i nvidia
# Verify GPU visibility
lspci -k | grep -A 3 VGA
Issue: LXC Container Can't See GPU Devices¶
# On Proxmox host - verify devices exist
ls -la /dev/dri/
ls -la /dev/nvidia*
# Check LXC config is correct
cat /etc/pve/lxc/100.conf | grep -E "lxc.cgroup2|lxc.mount.entry"
# Restart container
pct stop 100
pct start 100
# Inside container
pct enter 100
ls -la /dev/dri/
ls -la /dev/nvidia*
Issue: Docker Container Can't Access GPU¶
# Verify nvidia-container-toolkit is installed
dpkg -l | grep nvidia-container-toolkit
# Check Docker runtime configuration
cat /etc/docker/daemon.json
# Should contain:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
# Restart Docker
systemctl restart docker
# Test GPU access
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
Part 5: Rollback Procedures¶
Rollback Step 1: Remove NVIDIA Drivers¶
# On Proxmox host
apt remove --purge 'nvidia-*' 'libnvidia-*' -y
apt autoremove -y
# Remove modules
modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia
# Clean DKMS
dkms status | grep nvidia | awk '{print $1"/"$2}' | xargs -I {} dkms remove {}
Rollback Step 2: Restore LXC Configuration¶
# Backup current config
cp /etc/pve/lxc/100.conf /etc/pve/lxc/100.conf.backup
# Remove GPU passthrough lines
nano /etc/pve/lxc/100.conf
# Delete all lxc.cgroup2.devices.allow and lxc.mount.entry lines
# Restart container
pct stop 100
pct start 100
Rollback Step 3: Remove Docker NVIDIA Runtime¶
# Inside LXC
apt remove --purge nvidia-container-toolkit -y
# Edit docker daemon config
nano /etc/docker/daemon.json
# Remove nvidia runtime section
systemctl restart docker
Part 6: Alternative Simplified Approach¶
Why This Might Work¶
Since your setup was previously working, the simplest solution might be:
- Don't install any drivers on the host (if Docker can work with just device passthrough)
- Use
nvidia-container-toolkitwhich bundles driver libraries for containers - Rely on the toolkit to provide GPU access to Jellyfin
Simplified Steps¶
# On Proxmox host - NO driver installation
# Just ensure devices are passed through to LXC
# Configure LXC (already done in Phase 6)
# Inside LXC - Only install nvidia-container-toolkit
apt install nvidia-container-toolkit -y
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
# Run Jellyfin with GPU
docker run -d \
--name=jellyfin \
--device=/dev/dri/renderD128:/dev/dri/renderD128 \
-v /path/to/config:/config \
-v /path/to/media:/media:ro \
--net=host \
jellyfin/jellyfin:latest
Note: This works ONLY if:
- The GPU is already initialized by system BIOS
- Basic DRM/DRI drivers are loaded by kernel
- You don't need CUDA or advanced features
Part 7: Recommended Execution Order¶
Option A: Full Driver Installation (Most Reliable)¶
- Phase 1: Clean up
- Phase 2: Repository configuration
- Phase 3: Install headers
- Phase 4: Install 550.xx driver (or 470.xx if GSP firmware issues)
- Phase 5: Verify host driver
- Phase 6: Configure LXC
- Phase 7: Install nvidia-container-toolkit
- Phase 8: Configure Jellyfin
- Phase 9: Verification
Estimated Time: 30-45 minutes
Option B: Simplified Approach (If Previous Setup Was Working)¶
- Phase 1: Clean up any partial installations
- Phase 6: Configure LXC passthrough
- Phase 7: Install nvidia-container-toolkit only
- Phase 8: Configure Jellyfin
- Phase 9: Verification
Estimated Time: 15-20 minutes
Part 8: Why It Was Working Before¶
Likely Scenarios:
- Older Proxmox version had different Debian repos with working package versions
- Different driver version was installed (possibly 470.xx which doesn't need GSP firmware)
- Different kernel version that didn't trigger the firmware dependency
- System was using nouveau (open-source driver) which doesn't need firmware-nvidia-gsp
To check what was used before:
# Check Proxmox logs
journalctl -b | grep -i nvidia
cat /var/log/apt/history.log | grep nvidia
dpkg -l | grep nvidia
Part 9: Critical Notes¶
⚠️ Important Warnings¶
- Do NOT mix driver sources: Choose EITHER Debian packages OR NVIDIA .run installer, never both
- Do NOT install NVIDIA drivers inside LXC: Only on the Proxmox host
- Kernel updates may break DKMS: Always have
pve-headersinstalled before updating - Unprivileged containers need UID/GID mapping: See Phase 6 for proper configuration
- Jellyfin user must be in render/video group: Set with
--user 1000:44in Docker
📝 Documentation¶
After successful setup, document:
- Driver version installed:
nvidia-smi | head -n 1 - Kernel version:
uname -r - LXC configuration:
cat /etc/pve/lxc/100.conf - Docker version:
docker --version - Jellyfin version:
docker exec jellyfin dpkg -l | grep jellyfin
Part 10: Expected Outcomes¶
Success Indicators¶
✅ nvidia-smi shows GPU on Proxmox host
✅ /dev/dri/renderD128 exists inside LXC
✅ docker run --gpus all nvidia/cuda nvidia-smi works
✅ Jellyfin Dashboard shows "NVIDIA" under hardware acceleration
✅ Video transcoding uses <10% CPU, GPU shows load in nvidia-smi
✅ Jellyfin transcoding logs show NVENC encoder
Performance Expectations¶
- CPU usage during transcode: 5-15% (vs 80-100% without GPU)
- GPU usage: 15-40% depending on resolution/codec
- Concurrent 1080p→720p transcodes: 8-12 streams (GTX 970)
- 4K HEVC transcode: 2-3 concurrent streams
Part 11: Next Steps After Success¶
- Set up monitoring:
- Configure automatic startup:
- Set up backups:
- Update Jellyfin transcoding settings:
- Limit concurrent transcodes based on testing
- Set appropriate quality profiles
- Monitor GPU temperature
Conclusion¶
Your GTX 970 is fully supported by modern drivers (535.xx/550.xx series). The firmware-nvidia-gsp dependency error is a packaging issue in Debian Trixie, solvable by:
Recommended Path: Install 550.xx series driver with manual firmware package if needed, OR fallback to 470.xx legacy driver.
Time to Resolution: 30-45 minutes following Option A, or 15-20 minutes with Option B if simplified approach works.
Success Probability: 95%+ - This is a well-documented configuration and your GPU was working before.
This plan prioritizes restoring your working configuration with minimal risk. Start with the simplified approach (Option B) first since it was working before. If that fails, proceed with full driver installation (Option A).