View Issue Details

IDProjectCategoryView StatusLast Update
0001452channel: kernel/el9kernel-mlpublic2024-05-25 22:26
Reporterjcsiblesei Assigned Totoracat  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSRed Hat Enterprise LinuxOS Version9.4
Summary0001452: amdgpu firmware fails to load on Linux 6.9
DescriptionI just upgraded my workstation from kernel-ml 6.8.9-1.el9.elrepo.x86_64 to 6.9.0-1.el9.elrepo.x86_64, and now GDM won't work. I notice this in my dmesg output now:

[ 3.000274] amdgpu 0000:01:00.0: Direct firmware load for amdgpu/polaris12_sdma.bin failed with error -2
[ 3.000275] amdgpu: sdma_v3_0: Failed to load firmware "amdgpu/polaris12_sdma.bin"
[ 3.000290] [drm:amdgpu_device_ip_early_init [amdgpu]] *ERROR* early_init of IP block <sdma_v3_0> failed -19
[ 3.000428] [drm] UVD is enabled in VM mode
[ 3.000429] [drm] UVD ENC is enabled in VM mode
[ 3.000430] [drm] VCE enabled in VM mode
[ 3.000430] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init

Things look basically the same between "lsinitrd -k 6.8.9-1.el9.elrepo.x86_64" and "lsinitrd -k 6.9.0-1.el9.elrepo.x86_64" to me. In particular, both contain this:

-rw-r--r-- 1 root root 4456 Jan 3 20:56 usr/lib/firmware/amdgpu/polaris12_sdma.bin.xz
TagsNo tags attached.

Activities

jcsiblesei

2024-05-14 13:27

reporter   ~0009738

It turns out it's not just amdgpu that's failing to load, but rather all firmware. On another RHEL9 host, Wi-Fi, Bluetooth, and Intel graphics had the same problem:

[ 2.059651] i915 0000:00:02.0: Direct firmware load for i915/adlp_dmc.bin failed with error -2
[ 2.059657] i915 0000:00:02.0: Direct firmware load for i915/adlp_dmc_ver2_16.bin failed with error -2
[ 2.059658] i915 0000:00:02.0: [drm] Failed to load DMC firmware i915/adlp_dmc.bin. Disabling runtime power management.
[ 2.059659] i915 0000:00:02.0: [drm] DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
[ 2.123822] i915 0000:00:02.0: [drm] *ERROR* GT0: GuC firmware i915/adlp_guc_70.bin: fetch failed -ENOENT
[ 2.123828] i915 0000:00:02.0: [drm] GT0: GuC firmware(s) can be downloaded from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
[ 2.124622] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 0.0.0
[ 2.124695] i915 0000:00:02.0: [drm] *ERROR* GT0: GuC initialization failed -ENOENT
[ 2.124697] i915 0000:00:02.0: [drm] *ERROR* GT0: Enabling uc failed (-5)
[ 2.124698] i915 0000:00:02.0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!

[ 15.189823] iwlwifi 0000:00:14.3: enabling device (0000 -> 0002)
[ 15.199021] iwlwifi 0000:00:14.3: Detected crf-id 0x400410, cnv-id 0x80400 wfpm id 0x80000020
[ 15.199125] iwlwifi 0000:00:14.3: PCI dev 51f0/4090, rev=0x370, rfid=0x2010d000
[ 15.199451] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-89.ucode failed with error -2
[ 15.199464] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-88.ucode failed with error -2
[ 15.199470] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-87.ucode failed with error -2
[ 15.199477] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-86.ucode failed with error -2
[ 15.199483] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-85.ucode failed with error -2
[ 15.199490] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-84.ucode failed with error -2
[ 15.199496] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-83.ucode failed with error -2
[ 15.199502] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-82.ucode failed with error -2
[ 15.199514] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-81.ucode failed with error -2
[ 15.199523] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-80.ucode failed with error -2
[ 15.199530] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-79.ucode failed with error -2
[ 15.199536] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-78.ucode failed with error -2
[ 15.199543] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-77.ucode failed with error -2
[ 15.199549] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-76.ucode failed with error -2
[ 15.199559] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-75.ucode failed with error -2
[ 15.199570] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-74.ucode failed with error -2
[ 15.199579] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-73.ucode failed with error -2
[ 15.199588] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-72.ucode failed with error -2
[ 15.199596] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-71.ucode failed with error -2
[ 15.199603] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-70.ucode failed with error -2
[ 15.199609] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-69.ucode failed with error -2
[ 15.199616] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-68.ucode failed with error -2
[ 15.199622] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-67.ucode failed with error -2
[ 15.199629] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-66.ucode failed with error -2
[ 15.199635] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-65.ucode failed with error -2
[ 15.199641] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-64.ucode failed with error -2
[ 15.199650] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-63.ucode failed with error -2
[ 15.199659] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-62.ucode failed with error -2
[ 15.199665] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-61.ucode failed with error -2
[ 15.199674] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-60.ucode failed with error -2
[ 15.199683] iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-so-a0-gf-a0-59.ucode failed with error -2
[ 15.199685] iwlwifi 0000:00:14.3: no suitable firmware found!
[ 15.199688] iwlwifi 0000:00:14.3: minimum version required: iwlwifi-so-a0-gf-a0-59
[ 15.199689] iwlwifi 0000:00:14.3: maximum version supported: iwlwifi-so-a0-gf-a0-89
[ 15.199691] iwlwifi 0000:00:14.3: check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

[ 15.481470] Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-0040-0041.sfi (-2)
[ 15.481730] Bluetooth: hci0: Reading supported features failed (-56)

jcsiblesei

2024-05-17 15:25

reporter   ~0009754

I just tried 6.9.1 too, and it still has the same problem.

toracat

2024-05-17 21:32

administrator   ~0009761

We will update the linux-firmware package and let you know when it's ready.

jcsiblesei

2024-05-18 15:48

reporter   ~0009763

I did some more digging, and I think I found the problem: all of RHEL's firmware is compressed, and CONFIG_FW_LOADER_COMPRESS and CONFIG_FW_LOADER_COMPRESS_XZ were enabled in kernel-ml 6.8.9 and older on RHEL9, but they're disabled in 6.9.0 and 6.9.1. Since they're enabled in the official RHEL9 kernel, should they just be re-enabled in kernel-ml too? That seems a lot easier than having to ship uncompressed replacements not just for linux-firmware, but also for alsa-sof-firmware, iwl*-firmware, etc.

jcsiblesei

2024-05-18 15:53

reporter   ~0009764

Other notes:

1. CONFIG_FW_LOADER_COMPRESS_ZSTD is enabled too in RHEL9's official kernel, so we might want to bring it back, even though I don't immediately see anything that needs it.
2. The reason this wasn't an issue on RHEL8 too is that these configurations are all disabled in the official RHEL8 kernel, so all of the firmware for it is already uncompressed.

toracat

2024-05-18 19:22

administrator   ~0009765

Thanks for the notes. We will enable these options in the next build of our kernels for EL9:

CONFIG_FW_LOADER_COMPRESS=y
CONFIG_FW_LOADER_COMPRESS_XZ=y
CONFIG_FW_LOADER_COMPRESS_ZSTD=y

jcsiblesei

2024-05-18 23:54

reporter   ~0009767

Will there be a 6.9.1-2 for this, or will it have to wait until upstream releases kernel 6.9.2?

toracat

2024-05-19 00:41

administrator   ~0009768

I've done a test build and it went alright. So I can build a 6.9.1-2 for you to test.

jcsiblesei

2024-05-19 01:25

reporter   ~0009769

I think I found the root cause of this:

$ diff config-6.8.9-1.el8.elrepo.x86_64 config-6.8.9-1.el9.elrepo.x86_64 | wc -l
2638
$ diff config-6.9.1-1.el8.elrepo.x86_64 config-6.8.9-1.el9.elrepo.x86_64 | wc -l
2798
$ diff config-6.9.1-1.el9.elrepo.x86_64 config-6.8.9-1.el9.elrepo.x86_64 | wc -l
2727
$ diff config-6.8.9-1.el8.elrepo.x86_64 config-6.9.1-1.el8.elrepo.x86_64 | wc -l
199
$ diff config-6.8.9-1.el8.elrepo.x86_64 config-6.9.1-1.el9.elrepo.x86_64 | wc -l
306
$ diff config-6.9.1-1.el8.elrepo.x86_64 config-6.9.1-1.el9.elrepo.x86_64 | wc -l
146
$

It looks like as of kernel 6.9, the RHEL9 builds somehow started using the RHEL8 kernel configuration. I expect that a bunch of other things are going to be broken as a result of that too.

toracat

2024-05-19 04:24

administrator   ~0009770

Thank you for pointing that out. I have no idea how this mess-up happened. There should be no piece of el8-related files in the directories where el9 files reside. Anyway, I have built kernel-ml-6.9.1-2.el9.elrepo and pushed it to the elrepo-kernel repository.

jcsiblesei

2024-05-19 12:09

reporter   ~0009771

Okay, I just installed that and everything is working again now. Thanks!

toracat

2024-05-19 12:17

administrator   ~0009772

Thanks for letting us know that it now works. And thank you again for all the help. We are still investigating why and how this happened.

toracat

2024-05-21 15:26

administrator   ~0009789

I'm now closing this report as 'resolved'. If/when we ever figure out what happened, we will add a note here.

Issue History

Date Modified Username Field Change
2024-05-13 14:16 jcsiblesei New Issue
2024-05-13 14:16 jcsiblesei Status new => assigned
2024-05-13 14:16 jcsiblesei Assigned To => toracat
2024-05-14 13:27 jcsiblesei Note Added: 0009738
2024-05-17 15:25 jcsiblesei Note Added: 0009754
2024-05-17 21:32 toracat Note Added: 0009761
2024-05-18 15:48 jcsiblesei Note Added: 0009763
2024-05-18 15:53 jcsiblesei Note Added: 0009764
2024-05-18 19:22 toracat Note Added: 0009765
2024-05-18 23:54 jcsiblesei Note Added: 0009767
2024-05-19 00:41 toracat Note Added: 0009768
2024-05-19 01:25 jcsiblesei Note Added: 0009769
2024-05-19 04:24 toracat Note Added: 0009770
2024-05-19 12:09 jcsiblesei Note Added: 0009771
2024-05-19 12:17 toracat Note Added: 0009772
2024-05-21 15:26 toracat Status assigned => resolved
2024-05-21 15:26 toracat Resolution open => fixed
2024-05-21 15:26 toracat Note Added: 0009789