Skip to content

[Vulkan] dGPU + iGPU is slower in LLMs than 8GB dGPU with CPU overflow? #2252

@NintendoManiac64

Description

@NintendoManiac64

Describe the Issue
Following the advice in #2243 I went and tried the dGPU + iGPU combination in hopes it'd perform better than the default mode of dGPU with CPU overflow since iGPU-only is definitely faster than CPU-only.

However, despite iGPU by itself being substantually faster than CPU by itself, the default dGPU by itself but with CPU overflow was still decently faster than dGPU + iGPU? Changing the value for "Main GPU" made no performance difference (I tied a value of 0, 1, and 2) nor did changing the "SplitMode" setting (though setting it to "tensor" straight up crashed). I also used nvtop to confirm that both the iGPU and dGPU were being used (results were considerably faster than the iGPU by itself anyway).

All stock settings were used other than setting CPU threads to 8 (because 8 was around 10% faster than the default 7 threads in CPU-only benchmarking) and other than setting the Vulkan devices to "all" and then manually specifying Vulkan0,Vulkan1

Additional Information:

Hardware is a Ryzen 5800H with Radeon RX 6600M 8GB (note that it's a mini PC so, much like a desktop PC, the discrete RX 6600M is used as the primary GPU).

OS was live ISO of openSUSE Tumbleweed Xfce build 2026-05-31 with GRUB boot parameter ttm.pages_limit=3840000 (I also ran sudo zypper install libvulkan_radeon vulkan-tools once booted in order to avoid issue #2102).

LLM model used (10GB); all 41 layers can fit into system (iGPU) RAM but only 26 layers fit into the 8GB dGPU VRAM: https://huggingface.co/XeyonAI/Mistral-Helcyon-Saturn-RP-12b-v1.0-GGUF/blob/main/helcyon-saturn-RP-v1.0-Q6_K.gguf

Performance Results via "Run Benchmark"

_______ RX 6600M 8GB + 5800H iGPU (3,1 tensor split; manually-specified 41 GPU layers) _______
ProcessingTime: 52.055s
ProcessingSpeed: 155.45T/s
GenerationTime: 10.717s
GenerationSpeed: 9.33T/s
TotalTime: 62.772s

_______ RX 6600M 8GB with Ryzen CPU overflow _______
ProcessingTime: 70.310s
ProcessingSpeed: 115.09T/s
GenerationTime: 15.211s
GenerationSpeed: 6.57T/s
TotalTime: 85.521s

_______ RX 6600M 8GB + 5800H iGPU _______
ProcessingTime: 126.467s
ProcessingSpeed: 63.99T/s
GenerationTime: 18.323s
GenerationSpeed: 5.46T/s
TotalTime: 144.790s

_______ 5800H iGPU only _______
ProcessingTime: 178.198s
ProcessingSpeed: 45.41T/s
GenerationTime: 36.066s
GenerationSpeed: 2.77T/s
TotalTime: 214.264s

_______ Ryzen CPU only (8 threads) _______
ProcessingTime: 555.117s
ProcessingSpeed: 14.58T/s
GenerationTime: 31.990s
GenerationSpeed: 3.13T/s
TotalTime: 587.107s

_______ Ryzen CPU only (7 threads) _______
ProcessingTime: 620.016s
ProcessingSpeed: 13.05T/s
GenerationTime: 31.193s
GenerationSpeed: 3.21T/s
TotalTime: 651.209s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions