View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001452 | channel: kernel/el9 | kernel-ml | public | 2024-05-13 14:16 | 2024-05-25 22:26 |
Reporter | jcsiblesei | Assigned To | toracat | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | x86_64 | OS | Red Hat Enterprise Linux | OS Version | 9.4 |
Summary | 0001452: amdgpu firmware fails to load on Linux 6.9 | ||||
Description | I just upgraded my workstation from kernel-ml 6.8.9-1.el9.elrepo.x86_64 to 6.9.0-1.el9.elrepo.x86_64, and now GDM won't work. I notice this in my dmesg output now: [ 3.000274] amdgpu 0000:01:00.0: Direct firmware load for amdgpu/polaris12_sdma.bin failed with error -2 [ 3.000275] amdgpu: sdma_v3_0: Failed to load firmware "amdgpu/polaris12_sdma.bin" [ 3.000290] [drm:amdgpu_device_ip_early_init [amdgpu]] *ERROR* early_init of IP block <sdma_v3_0> failed -19 [ 3.000428] [drm] UVD is enabled in VM mode [ 3.000429] [drm] UVD ENC is enabled in VM mode [ 3.000430] [drm] VCE enabled in VM mode [ 3.000430] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init Things look basically the same between "lsinitrd -k 6.8.9-1.el9.elrepo.x86_64" and "lsinitrd -k 6.9.0-1.el9.elrepo.x86_64" to me. In particular, both contain this: -rw-r--r-- 1 root root 4456 Jan 3 20:56 usr/lib/firmware/amdgpu/polaris12_sdma.bin.xz | ||||
Tags | No tags attached. | ||||
|
It turns out it's not just amdgpu that's failing to load, but rather all firmware. On another RHEL9 host, Wi-Fi, Bluetooth, and Intel graphics had the same problem: [ 2.059651] i915 0000:00:02.0: Direct firmware load for i915/adlp_dmc.bin failed with error -2 [ 2.059657] i915 0000:00:02.0: Direct firmware load for i915/adlp_dmc_ver2_16.bin failed with error -2 [ 2.059658] i915 0000:00:02.0: [drm] Failed to load DMC firmware i915/adlp_dmc.bin. Disabling runtime power management. [ 2.059659] i915 0000:00:02.0: [drm] DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915 [ 2.123822] i915 0000:00:02.0: [drm] *ERROR* GT0: GuC firmware i915/adlp_guc_70.bin: fetch failed -ENOENT [ 2.123828] i915 0000:00:02.0: [drm] GT0: GuC firmware(s) can be downloaded from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915 [ 2.124622] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 0.0.0 [ 2.124695] i915 0000:00:02.0: [drm] *ERROR* GT0: GuC initialization failed -ENOENT [ 2.124697] i915 0000:00:02.0: [drm] *ERROR* GT0: Enabling uc failed (-5) [ 2.124698] i915 0000:00:02.0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged! [ 15.189823] iwlwifi 0000:00:14.3: enabling device (0000 -> 0002) [ 15.199021] iwlwifi 0000:00:14.3: Detected crf-id 0x400410, cnv-id 0x80400 wfpm id 0x80000020 [ 15.199125] iwlwifi 0000:00:14.3: PCI dev 51f0/4090, rev=0x370, rfid=0x2010d000 [ 15.199451] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-89.ucode failed with error -2 [ 15.199464] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-88.ucode failed with error -2 [ 15.199470] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-87.ucode failed with error -2 [ 15.199477] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-86.ucode failed with error -2 [ 15.199483] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-85.ucode failed with error -2 [ 15.199490] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-84.ucode failed with error -2 [ 15.199496] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-83.ucode failed with error -2 [ 15.199502] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-82.ucode failed with error -2 [ 15.199514] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-81.ucode failed with error -2 [ 15.199523] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-80.ucode failed with error -2 [ 15.199530] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-79.ucode failed with error -2 [ 15.199536] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-78.ucode failed with error -2 [ 15.199543] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-77.ucode failed with error -2 [ 15.199549] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-76.ucode failed with error -2 [ 15.199559] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-75.ucode failed with error -2 [ 15.199570] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-74.ucode failed with error -2 [ 15.199579] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-73.ucode failed with error -2 [ 15.199588] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-72.ucode failed with error -2 [ 15.199596] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-71.ucode failed with error -2 [ 15.199603] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-70.ucode failed with error -2 [ 15.199609] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-69.ucode failed with error -2 [ 15.199616] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-68.ucode failed with error -2 [ 15.199622] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-67.ucode failed with error -2 [ 15.199629] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-66.ucode failed with error -2 [ 15.199635] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-65.ucode failed with error -2 [ 15.199641] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-64.ucode failed with error -2 [ 15.199650] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-63.ucode failed with error -2 [ 15.199659] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-62.ucode failed with error -2 [ 15.199665] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-61.ucode failed with error -2 [ 15.199674] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-60.ucode failed with error -2 [ 15.199683] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-59.ucode failed with error -2 [ 15.199685] iwlwifi 0000:00:14.3: no suitable firmware found! [ 15.199688] iwlwifi 0000:00:14.3: minimum version required: iwlwifi-so-a0-gf-a0-59 [ 15.199689] iwlwifi 0000:00:14.3: maximum version supported: iwlwifi-so-a0-gf-a0-89 [ 15.199691] iwlwifi 0000:00:14.3: check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git [ 15.481470] Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-0040-0041.sfi (-2) [ 15.481730] Bluetooth: hci0: Reading supported features failed (-56) |
|
I just tried 6.9.1 too, and it still has the same problem. |
|
We will update the linux-firmware package and let you know when it's ready. |
|
I did some more digging, and I think I found the problem: all of RHEL's firmware is compressed, and CONFIG_FW_LOADER_COMPRESS and CONFIG_FW_LOADER_COMPRESS_XZ were enabled in kernel-ml 6.8.9 and older on RHEL9, but they're disabled in 6.9.0 and 6.9.1. Since they're enabled in the official RHEL9 kernel, should they just be re-enabled in kernel-ml too? That seems a lot easier than having to ship uncompressed replacements not just for linux-firmware, but also for alsa-sof-firmware, iwl*-firmware, etc. |
|
Other notes: 1. CONFIG_FW_LOADER_COMPRESS_ZSTD is enabled too in RHEL9's official kernel, so we might want to bring it back, even though I don't immediately see anything that needs it. 2. The reason this wasn't an issue on RHEL8 too is that these configurations are all disabled in the official RHEL8 kernel, so all of the firmware for it is already uncompressed. |
|
Thanks for the notes. We will enable these options in the next build of our kernels for EL9: CONFIG_FW_LOADER_COMPRESS=y CONFIG_FW_LOADER_COMPRESS_XZ=y CONFIG_FW_LOADER_COMPRESS_ZSTD=y |
|
Will there be a 6.9.1-2 for this, or will it have to wait until upstream releases kernel 6.9.2? |
|
I've done a test build and it went alright. So I can build a 6.9.1-2 for you to test. |
|
I think I found the root cause of this: $ diff config-6.8.9-1.el8.elrepo.x86_64 config-6.8.9-1.el9.elrepo.x86_64 | wc -l 2638 $ diff config-6.9.1-1.el8.elrepo.x86_64 config-6.8.9-1.el9.elrepo.x86_64 | wc -l 2798 $ diff config-6.9.1-1.el9.elrepo.x86_64 config-6.8.9-1.el9.elrepo.x86_64 | wc -l 2727 $ diff config-6.8.9-1.el8.elrepo.x86_64 config-6.9.1-1.el8.elrepo.x86_64 | wc -l 199 $ diff config-6.8.9-1.el8.elrepo.x86_64 config-6.9.1-1.el9.elrepo.x86_64 | wc -l 306 $ diff config-6.9.1-1.el8.elrepo.x86_64 config-6.9.1-1.el9.elrepo.x86_64 | wc -l 146 $ It looks like as of kernel 6.9, the RHEL9 builds somehow started using the RHEL8 kernel configuration. I expect that a bunch of other things are going to be broken as a result of that too. |
|
Thank you for pointing that out. I have no idea how this mess-up happened. There should be no piece of el8-related files in the directories where el9 files reside. Anyway, I have built kernel-ml-6.9.1-2.el9.elrepo and pushed it to the elrepo-kernel repository. |
|
Okay, I just installed that and everything is working again now. Thanks! |
|
Thanks for letting us know that it now works. And thank you again for all the help. We are still investigating why and how this happened. |
|
I'm now closing this report as 'resolved'. If/when we ever figure out what happened, we will add a note here. |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-05-13 14:16 | jcsiblesei | New Issue | |
2024-05-13 14:16 | jcsiblesei | Status | new => assigned |
2024-05-13 14:16 | jcsiblesei | Assigned To | => toracat |
2024-05-14 13:27 | jcsiblesei | Note Added: 0009738 | |
2024-05-17 15:25 | jcsiblesei | Note Added: 0009754 | |
2024-05-17 21:32 | toracat | Note Added: 0009761 | |
2024-05-18 15:48 | jcsiblesei | Note Added: 0009763 | |
2024-05-18 15:53 | jcsiblesei | Note Added: 0009764 | |
2024-05-18 19:22 | toracat | Note Added: 0009765 | |
2024-05-18 23:54 | jcsiblesei | Note Added: 0009767 | |
2024-05-19 00:41 | toracat | Note Added: 0009768 | |
2024-05-19 01:25 | jcsiblesei | Note Added: 0009769 | |
2024-05-19 04:24 | toracat | Note Added: 0009770 | |
2024-05-19 12:09 | jcsiblesei | Note Added: 0009771 | |
2024-05-19 12:17 | toracat | Note Added: 0009772 | |
2024-05-21 15:26 | toracat | Status | assigned => resolved |
2024-05-21 15:26 | toracat | Resolution | open => fixed |
2024-05-21 15:26 | toracat | Note Added: 0009789 |