Describe the Issue
Following the advice in #2243 I went and tried the dGPU + iGPU combination in hopes it'd perform better than the default mode of dGPU with CPU overflow since iGPU-only is definitely faster than CPU-only.
However, despite iGPU by itself being substantually faster than CPU by itself, the default dGPU by itself but with CPU overflow was still decently faster than dGPU + iGPU? Changing the value for "Main GPU" made no performance difference (I tied a value of 0, 1, and 2) nor did changing the "SplitMode" setting (though setting it to "tensor" straight up crashed). I also used nvtop to confirm that both the iGPU and dGPU were being used (results were considerably faster than the iGPU by itself anyway).
All stock settings were used other than setting CPU threads to 8 (because 8 was around 10% faster than the default 7 threads in CPU-only benchmarking) and other than setting the Vulkan devices to "all" and then manually specifying Vulkan0,Vulkan1
Additional Information:
Hardware is a Ryzen 5800H with Radeon RX 6600M 8GB (note that it's a mini PC so, much like a desktop PC, the discrete RX 6600M is used as the primary GPU).
OS was live ISO of openSUSE Tumbleweed Xfce build 2026-05-31 with GRUB boot parameter ttm.pages_limit=3840000 (I also ran sudo zypper install libvulkan_radeon vulkan-tools once booted in order to avoid issue #2102).
LLM model used (10GB); all 41 layers can fit into system (iGPU) RAM but only 26 layers fit into the 8GB dGPU VRAM: https://huggingface.co/XeyonAI/Mistral-Helcyon-Saturn-RP-12b-v1.0-GGUF/blob/main/helcyon-saturn-RP-v1.0-Q6_K.gguf
Performance Results via "Run Benchmark"
_______ RX 6600M 8GB + 5800H iGPU (3,1 tensor split; manually-specified 41 GPU layers) _______
ProcessingTime: 52.055s
ProcessingSpeed: 155.45T/s
GenerationTime: 10.717s
GenerationSpeed: 9.33T/s
TotalTime: 62.772s
_______ RX 6600M 8GB with Ryzen CPU overflow _______
ProcessingTime: 70.310s
ProcessingSpeed: 115.09T/s
GenerationTime: 15.211s
GenerationSpeed: 6.57T/s
TotalTime: 85.521s
_______ RX 6600M 8GB + 5800H iGPU _______
ProcessingTime: 126.467s
ProcessingSpeed: 63.99T/s
GenerationTime: 18.323s
GenerationSpeed: 5.46T/s
TotalTime: 144.790s
_______ 5800H iGPU only _______
ProcessingTime: 178.198s
ProcessingSpeed: 45.41T/s
GenerationTime: 36.066s
GenerationSpeed: 2.77T/s
TotalTime: 214.264s
_______ Ryzen CPU only (8 threads) _______
ProcessingTime: 555.117s
ProcessingSpeed: 14.58T/s
GenerationTime: 31.990s
GenerationSpeed: 3.13T/s
TotalTime: 587.107s
_______ Ryzen CPU only (7 threads) _______
ProcessingTime: 620.016s
ProcessingSpeed: 13.05T/s
GenerationTime: 31.193s
GenerationSpeed: 3.21T/s
TotalTime: 651.209s
Describe the Issue
Following the advice in #2243 I went and tried the dGPU + iGPU combination in hopes it'd perform better than the default mode of dGPU with CPU overflow since iGPU-only is definitely faster than CPU-only.
However, despite iGPU by itself being substantually faster than CPU by itself, the default dGPU by itself but with CPU overflow was still decently faster than dGPU + iGPU? Changing the value for "Main GPU" made no performance difference (I tied a value of
0,1, and2) nor did changing the "SplitMode" setting (though setting it to "tensor" straight up crashed). I also used nvtop to confirm that both the iGPU and dGPU were being used (results were considerably faster than the iGPU by itself anyway).All stock settings were used other than setting CPU threads to 8 (because 8 was around 10% faster than the default 7 threads in CPU-only benchmarking) and other than setting the Vulkan devices to "all" and then manually specifying
Vulkan0,Vulkan1Additional Information:
Hardware is a Ryzen 5800H with Radeon RX 6600M 8GB (note that it's a mini PC so, much like a desktop PC, the discrete RX 6600M is used as the primary GPU).
OS was live ISO of openSUSE Tumbleweed Xfce build 2026-05-31 with GRUB boot parameter
ttm.pages_limit=3840000(I also ransudo zypper install libvulkan_radeon vulkan-toolsonce booted in order to avoid issue #2102).LLM model used (10GB); all 41 layers can fit into system (iGPU) RAM but only 26 layers fit into the 8GB dGPU VRAM: https://huggingface.co/XeyonAI/Mistral-Helcyon-Saturn-RP-12b-v1.0-GGUF/blob/main/helcyon-saturn-RP-v1.0-Q6_K.gguf
Performance Results via "Run Benchmark"
_______ RX 6600M 8GB + 5800H iGPU (
3,1tensor split; manually-specified41GPU layers) _______ProcessingTime: 52.055s
ProcessingSpeed: 155.45T/s
GenerationTime: 10.717s
GenerationSpeed: 9.33T/s
TotalTime: 62.772s
_______ RX 6600M 8GB with Ryzen CPU overflow _______
ProcessingTime: 70.310s
ProcessingSpeed: 115.09T/s
GenerationTime: 15.211s
GenerationSpeed: 6.57T/s
TotalTime: 85.521s
_______ RX 6600M 8GB + 5800H iGPU _______
ProcessingTime: 126.467s
ProcessingSpeed: 63.99T/s
GenerationTime: 18.323s
GenerationSpeed: 5.46T/s
TotalTime: 144.790s
_______ 5800H iGPU only _______
ProcessingTime: 178.198s
ProcessingSpeed: 45.41T/s
GenerationTime: 36.066s
GenerationSpeed: 2.77T/s
TotalTime: 214.264s
_______ Ryzen CPU only (8 threads) _______
ProcessingTime: 555.117s
ProcessingSpeed: 14.58T/s
GenerationTime: 31.990s
GenerationSpeed: 3.13T/s
TotalTime: 587.107s
_______ Ryzen CPU only (7 threads) _______
ProcessingTime: 620.016s
ProcessingSpeed: 13.05T/s
GenerationTime: 31.193s
GenerationSpeed: 3.21T/s
TotalTime: 651.209s