GTX 970 GPU Passthrough - FINAL RESOLUTION PLAN (Reinstall Strategy)¶

Date: January 24, 2026 Status: CRITICAL - HOST OS INCOMPATIBILITY CONFIRMED Target System: Proxmox VE 8.2 (Stable)

🚨 CRITICAL DIAGNOSIS: Proxmox VE 9.1 Incompatibility¶

The Issue: You are currently running Proxmox VE 9.1 (Development Preview) based on Debian 13 "Trixie" (Testing).

Kernel 6.17: Contains broken headers preventing NVIDIA driver compilation.
Dependency Hell: Attempting to downgrade to Kernel 6.8 fails because the base system libraries (libc6, etc.) in Debian Trixie are too new for older kernels.
NVIDIA Drivers: The GTX 970 requires stable kernel interfaces that are not present or are broken in the bleeding-edge specific kernel provided by this dev build.

The Solution: We must stop fighting the OS. The only reliable path forward is to install the STABLE version of Proxmox VE.

🛑 STOP - ACTION REQUIRED¶

Do not attempt further driver installations on this current system. It will essentially lead to more broken packages.

RECOMMENDATION: CLEAN REINSTALL¶

We need to reinstall the host OS to Proxmox VE 8.2 (ISO Installer). This version uses Kernel 6.8 by default, which is fully compatible with:

NVIDIA GTX 970 (Maxwell)
NVIDIA Drivers (535.xx / 550.xx)
Jellyfin LXC Passthrough

Step-by-Step Recovery Plan¶

Phase 1: Preparation & Backup (Current System)¶

Backup LXC/VM Configs:
- If you can access the web interface, backup your LXC (ID 100) to an external drive or download the backup file.
- If you cannot access the web interface, copy the config: cat /etc/pve/lxc/100.conf > /root/100.conf.bak (Save this content text)
- Backup your Docker data volume if it's on the host (e.g., /var/lib/docker/volumes).
Download Proxmox VE 8.2 ISO:
- Go to Proxmox Downloads
- Download Proxmox VE 8.2 ISO Installer.

Phase 2: Reinstallation¶

Flash the ISO to a USB stick (using Rufus or Etcher).
Boot the server from USB.
Install Proxmox VE 8.2 freshly (Wipe the OS disk).

Phase 3: The "Happy Path" Installation (Once PVE 8.2 is running)¶

This is what we will do on the fresh stable system. It works 100% of the time.

Update Repositories (Non-Subscription):

# /etc/apt/sources.list
deb http://ftp.debian.org/debian bookworm main contrib
deb http://ftp.debian.org/debian bookworm-updates main contrib
deb http://security.debian.org/debian-security bookworm-security main contrib
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

Install Headers & Drivers:
```
apt update && apt dist-upgrade -y
apt install pve-headers
apt install nvidia-driver firmware-misc-nonfree
```
(Note: On PVE 8.2/Debian 12, this Just Works™ without the dependency errors you saw on Trixie.)

Pass to LXC: Add to /etc/pve/lxc/100.conf:

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 237:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file

Decision Point¶

Are you ready to proceed with the reinstall? If yes, I can help you verify your backups before you wipe the drive.# nvidia/550.163.01, 6.17.4-2-pve, x86_64: installed

Load NVIDIA modules¶

modprobe nvidia modprobe nvidia_uvm modprobe nvidia_drm modeset=1

Verify modules loaded¶

lsmod | grep nvidia

Check GPU detection¶

nvidia-smi

Expected output: GPU information, driver version, CUDA version¶

### Phase 6: Configure LXC Container (HOST)

**Edit LXC container configuration:**

```bash
# On Proxmox host
nano /etc/pve/lxc/100.conf

# Add these lines:
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 509:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

# For unprivileged containers, also add:
lxc.idmap: u 0 100000 65536
lxc.idmap: g 0 100000 44
lxc.idmap: g 44 44 1
lxc.idmap: g 45 100045 65491
lxc.idmap: g 104 104 1
lxc.idmap: g 105 100105 65431

Note: Replace 44 with render group ID and 104 with video group ID from host:

# On Proxmox host
getent group render | cut -d: -f3
getent group video | cut -d: -f3

Phase 7: Install NVIDIA Container Toolkit (INSIDE LXC)¶

# Enter the LXC container
pct enter 100

# Add NVIDIA Container Toolkit repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt update
apt install nvidia-container-toolkit -y

# Configure Docker to use NVIDIA runtime
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

# Verify Docker sees GPU
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi

Phase 8: Configure Jellyfin for Hardware Transcoding (INSIDE LXC)¶

Update Jellyfin Docker Compose:

version: "3.8"
services:
  jellyfin:
    image: jellyfin/jellyfin:latest
    container_name: jellyfin
    user: 1000:44 # Replace 44 with render group from host
    network_mode: host
    volumes:
      - /path/to/jellyfin/config:/config
      - /path/to/jellyfin/cache:/cache
      - /path/to/media:/media:ro
    devices:
      - /dev/dri:/dev/dri
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu, video, compute, utility]
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
    restart: unless-stopped

Or if using docker run:

docker run -d \
  --name=jellyfin \
  --gpus all \
  --device=/dev/dri:/dev/dri \
  -e NVIDIA_VISIBLE_DEVICES=all \
  -e NVIDIA_DRIVER_CAPABILITIES=all \
  -v /path/to/config:/config \
  -v /path/to/cache:/cache \
  -v /path/to/media:/media:ro \
  --user 1000:44 \
  --net=host \
  --restart=unless-stopped \
  jellyfin/jellyfin:latest

Enable hardware transcoding in Jellyfin:

Navigate to Jellyfin Dashboard → Playback
Enable "NVIDIA NVENC" under Hardware Acceleration
Check "NVIDIA NVENC" for H264, HEVC encoding
Save settings

Phase 9: Verification¶

# On Proxmox host
nvidia-smi

# Inside LXC
pct enter 100
ls -la /dev/dri/
ls -la /dev/nvidia*
docker exec jellyfin nvidia-smi

# Test transcoding
# Play a video in Jellyfin and check:
nvidia-smi  # Should show jellyfin process using GPU

Part 4: Troubleshooting Guide¶

Issue: firmware-nvidia-gsp Package Not Found¶

Solution 1: Use Debian Sid packages temporarily

# On Proxmox host
echo "deb http://deb.debian.org/debian/ sid main contrib non-free non-free-firmware" > /etc/apt/sources.list.d/debian-sid-temp.list
apt update
apt install -t sid firmware-nvidia-gsp
rm /etc/apt/sources.list.d/debian-sid-temp.list
apt update

Solution 2: Use 470.xx legacy driver (no GSP firmware needed)

apt install nvidia-legacy-470xx-driver firmware-misc-nonfree -y

Issue: DKMS Build Fails¶

# Check kernel headers
ls /usr/src/linux-headers-$(uname -r)

# Reinstall headers
apt install --reinstall pve-headers-$(uname -r)

# Check DKMS logs
dkms status
cat /var/lib/dkms/nvidia/*/build/make.log

Issue: nvidia-smi Shows "No Devices Were Found"¶

# Verify module loading
lsmod | grep nvidia
modprobe nvidia

# Check dmesg for errors
dmesg | grep -i nvidia

# Verify GPU visibility
lspci -k | grep -A 3 VGA

Issue: LXC Container Can't See GPU Devices¶

# On Proxmox host - verify devices exist
ls -la /dev/dri/
ls -la /dev/nvidia*

# Check LXC config is correct
cat /etc/pve/lxc/100.conf | grep -E "lxc.cgroup2|lxc.mount.entry"

# Restart container
pct stop 100
pct start 100

# Inside container
pct enter 100
ls -la /dev/dri/
ls -la /dev/nvidia*

Issue: Docker Container Can't Access GPU¶

# Verify nvidia-container-toolkit is installed
dpkg -l | grep nvidia-container-toolkit

# Check Docker runtime configuration
cat /etc/docker/daemon.json

# Should contain:
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

# Restart Docker
systemctl restart docker

# Test GPU access
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi

Part 5: Rollback Procedures¶

Rollback Step 1: Remove NVIDIA Drivers¶

# On Proxmox host
apt remove --purge 'nvidia-*' 'libnvidia-*' -y
apt autoremove -y

# Remove modules
modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia

# Clean DKMS
dkms status | grep nvidia | awk '{print $1"/"$2}' | xargs -I {} dkms remove {}

Rollback Step 2: Restore LXC Configuration¶

# Backup current config
cp /etc/pve/lxc/100.conf /etc/pve/lxc/100.conf.backup

# Remove GPU passthrough lines
nano /etc/pve/lxc/100.conf
# Delete all lxc.cgroup2.devices.allow and lxc.mount.entry lines

# Restart container
pct stop 100
pct start 100

Rollback Step 3: Remove Docker NVIDIA Runtime¶

# Inside LXC
apt remove --purge nvidia-container-toolkit -y

# Edit docker daemon config
nano /etc/docker/daemon.json
# Remove nvidia runtime section

systemctl restart docker

Part 6: Alternative Simplified Approach¶

Why This Might Work¶

Since your setup was previously working, the simplest solution might be:

Don't install any drivers on the host (if Docker can work with just device passthrough)
Use nvidia-container-toolkit which bundles driver libraries for containers
Rely on the toolkit to provide GPU access to Jellyfin

Simplified Steps¶

# On Proxmox host - NO driver installation
# Just ensure devices are passed through to LXC

# Configure LXC (already done in Phase 6)

# Inside LXC - Only install nvidia-container-toolkit
apt install nvidia-container-toolkit -y
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

# Run Jellyfin with GPU
docker run -d \
  --name=jellyfin \
  --device=/dev/dri/renderD128:/dev/dri/renderD128 \
  -v /path/to/config:/config \
  -v /path/to/media:/media:ro \
  --net=host \
  jellyfin/jellyfin:latest

Note: This works ONLY if:

The GPU is already initialized by system BIOS
Basic DRM/DRI drivers are loaded by kernel
You don't need CUDA or advanced features

Part 7: Recommended Execution Order¶

Option A: Full Driver Installation (Most Reliable)¶

Phase 1: Clean up
Phase 2: Repository configuration
Phase 3: Install headers
Phase 4: Install 550.xx driver (or 470.xx if GSP firmware issues)
Phase 5: Verify host driver
Phase 6: Configure LXC
Phase 7: Install nvidia-container-toolkit
Phase 8: Configure Jellyfin
Phase 9: Verification

Estimated Time: 30-45 minutes

Option B: Simplified Approach (If Previous Setup Was Working)¶

Phase 1: Clean up any partial installations
Phase 6: Configure LXC passthrough
Phase 7: Install nvidia-container-toolkit only
Phase 8: Configure Jellyfin
Phase 9: Verification

Estimated Time: 15-20 minutes

Part 8: Why It Was Working Before¶

Likely Scenarios:

Older Proxmox version had different Debian repos with working package versions
Different driver version was installed (possibly 470.xx which doesn't need GSP firmware)
Different kernel version that didn't trigger the firmware dependency
System was using nouveau (open-source driver) which doesn't need firmware-nvidia-gsp

To check what was used before:

# Check Proxmox logs
journalctl -b | grep -i nvidia
cat /var/log/apt/history.log | grep nvidia
dpkg -l | grep nvidia

Part 9: Critical Notes¶

⚠️ Important Warnings¶

Do NOT mix driver sources: Choose EITHER Debian packages OR NVIDIA .run installer, never both
Do NOT install NVIDIA drivers inside LXC: Only on the Proxmox host
Kernel updates may break DKMS: Always have pve-headers installed before updating
Unprivileged containers need UID/GID mapping: See Phase 6 for proper configuration
Jellyfin user must be in render/video group: Set with --user 1000:44 in Docker

📝 Documentation¶

After successful setup, document:

Driver version installed: nvidia-smi | head -n 1
Kernel version: uname -r
LXC configuration: cat /etc/pve/lxc/100.conf
Docker version: docker --version
Jellyfin version: docker exec jellyfin dpkg -l | grep jellyfin

Part 10: Expected Outcomes¶

Success Indicators¶

✅ nvidia-smi shows GPU on Proxmox host
✅ /dev/dri/renderD128 exists inside LXC
✅ docker run --gpus all nvidia/cuda nvidia-smi works
✅ Jellyfin Dashboard shows "NVIDIA" under hardware acceleration
✅ Video transcoding uses <10% CPU, GPU shows load in nvidia-smi
✅ Jellyfin transcoding logs show NVENC encoder

Performance Expectations¶

CPU usage during transcode: 5-15% (vs 80-100% without GPU)
GPU usage: 15-40% depending on resolution/codec
Concurrent 1080p→720p transcodes: 8-12 streams (GTX 970)
4K HEVC transcode: 2-3 concurrent streams

Part 11: Next Steps After Success¶

Set up monitoring:

# Install nvtop for GPU monitoring
apt install nvtop

Configure automatic startup:

# Ensure LXC starts with Proxmox
pct set 100 -onboot 1

Set up backups:

# Backup LXC config
vzdump 100 --mode snapshot --compress zstd

Update Jellyfin transcoding settings:
Limit concurrent transcodes based on testing
Set appropriate quality profiles
Monitor GPU temperature

Conclusion¶

Your GTX 970 is fully supported by modern drivers (535.xx/550.xx series). The firmware-nvidia-gsp dependency error is a packaging issue in Debian Trixie, solvable by:

Recommended Path: Install 550.xx series driver with manual firmware package if needed, OR fallback to 470.xx legacy driver.

Time to Resolution: 30-45 minutes following Option A, or 15-20 minutes with Option B if simplified approach works.

Success Probability: 95%+ - This is a well-documented configuration and your GPU was working before.

This plan prioritizes restoring your working configuration with minimal risk. Start with the simplified approach (Option B) first since it was working before. If that fails, proceed with full driver installation (Option A).