View Issue Details

IDProjectCategoryView StatusLast Update
0001212channel: elrepo/el7kmod-nvidiapublic2022-03-23 07:19
Reporterjlehtone Assigned Topperry  
PrioritynormalSeveritymajorReproducibilityalways
Status assignedResolutionopen 
Summary0001212: kmod-nvidia fails module key when secure boot is off
DescriptionBoot halts soon after these messages:
> Request for unknown module key 'The ELRepo Project (http://elrepo .org): ELRepo.org Secure Boot Key: f365ad3481a7b20e3427b61b2a26635b83fe427b' err -11
> ...
> nvidia: module verification failed: signature and/or required ...

Hard halt, must force poweroff to proceed

Workstation is in legacy mode. Secure Boot is Off.

(Dual boot with Windows 10. Windows is default. Linux less used.)
Linux boot has been ok last Fall.

Found new BIOS and applied, but no change.
Additional InformationCouple latest kernel and nvidia versions affected. Current:
kernel-3.10.0-1160.59.1.el7.x86_64
kmod-nvidia-510.54-1.el7_9.elrepo.x86_64

Dell 3060 Tower, late 2015 model
NVidia Quadro M4000
TagsNo tags attached.
Reported upstream

Activities

toracat

2022-03-21 13:24

administrator   ~0008266

> Request for unknown module key 'The ELRepo Project (http://elrepo .org): ELRepo.org Secure Boot Key: f365ad3481a7b20e3427b61b2a26635b83fe427b' err -11

This message shows up on a system where secure boot is disabled and is information only.

> nvidia: module verification failed: signature and/or required ...

Not sure about this. Does the system boot in a text-only mode?

pperry

2022-03-21 16:28

administrator   ~0008267

I'm on the older 470 driver on legacy hardware, but this is what I see in dmesg as the nvidia driver loads:

[ 1.345520] Request for unknown module key 'The ELRepo Project (http://elrepo.org): ELRepo.org Secure Boot Key: f365ad3481a7b20e3427b61b2a26635b83fe427b' err -11
[ 1.345530] nvidia: loading out-of-tree module taints kernel.
[ 1.345536] nvidia: module license 'NVIDIA' taints kernel.
[ 1.345537] Disabling lock debugging due to kernel taint
[ 1.429717] nvidia: module verification failed: signature and/or required key missing - tainting kernel

so those messages are perfectly normal on a legacy bios with SB disabled, understandably caused by the elrepo SB signing key being missing on the system (why would it be present on a system that is not using SB).

I would suggest booting from a rescue CD and uninstalling / reinstalling the nvidia driver packages if you think they are the issue (maybe uninstall them and see if the system is then able to boot to confirm they are causing the issue)

toracat

2022-03-21 18:18

administrator   ~0008270

FYI, my system has the same kernel and kmod-nvidia as the submitter:

kernel-3.10.0-1160.59.1.el7.x86_64
kmod-nvidia-510.54-1.el7_9.elrepo.x86_64

Boot-related lines in /var/log/messages:

Feb 25 15:23:58 mikan kernel: Request for unknown module key 'The ELRepo Project (http://elrepo.org): ELRepo.org Secure Boot Key: f365ad3481a7b20e3427b61b2a26635b83fe427b' err -11
Feb 25 15:23:58 mikan kernel: nvidia: module license 'NVIDIA' taints kernel.
Feb 25 15:23:58 mikan kernel: Disabling lock debugging due to kernel taint
Feb 25 15:23:58 mikan kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 240

I don't see in my log (including dmesg):

nvidia: module verification failed: signature and/or required key missing - tainting kernel

jlehtone

2022-03-22 05:04

reporter   ~0008271

(Frak, wrote much, submit ate it.)
Drill:
* reboot rescue
* set-default multi-user.target
* remove kmod-nvidia, nvidia-x11-drv-libs, and latest kernel
* reboot successfully to 1160.45.1
* reinstall latest kernel
* reboot successfully to 1160.59.1
* reinstall kmod-nvidia
* reboot. Halts at: https://drive.google.com/file/d/1VzHXuKZhAnExIirUvNtNi3TuVHc1uMcV/view?invite=CPq4wb4P&ts=623989f5
* poweroff and reboot with "single". Halts similarly.
* poweroff and reboot with 1160.45.1. Halts similarly.
* poweroff and reboot rescue
* remove kmod-nvidia, nvidia-x11-drv-libs
* reboot successfully
* reboot successfully to graphical.target

So, no, the system does not boot even in text-mode while kmod-nvidia is installed.

$ nvidia-detect
kmod-nvidia
An Intel display controller was also detected

Dell BIOS has Auto|Intel|NVidia for primary video and a note that the non-Auto options enable the Intel. Set at Auto. Behaves same with NVidia.


Different SB off system does not show the
> nvidia: module verification failed: signature and/or required key missing - tainting kernel

This system has shown it during latest succesfull boot (2022-01-25)


The "unknown module" is a red herring, but what data should we look at?

pperry

2022-03-22 07:25

administrator   ~0008272

Is this something that broke with the upgrade to nvidia 510.xx drivers or is this the first attempt to get nvidia drivers working on this system? If working previously, what was the last combination that worked?

I'm out of my depth on machines with dual Intel/nvidia hardware so am unable to offer anything there.

toracat

2022-03-22 12:59

administrator   ~0008273

@jlehtone

Since my el7 box works with the same kernel and kmod-nvidia combination, there must be something unique to your system. One notable difference is that you have integrated graphics device. I assume things would work if you set the BIOS to non-Auto.

I now wonder if the nvidia driver offered by rpmfusion works. I read somewhere it provides "full Optimus support". But then I don't understand the fact it worked fine last fall and fails now.

jlehtone

2022-03-23 07:19

reporter   ~0008274

First, I'd wish the Intel IGP to not be there. Mostly we have (Xeon) that have on IGP. Of the desktops that have IGP (earlier on motherboard, lately in CPU) most neatly disable or allow disablement of IGP when discrete GPU is installed; OS sees only the GPU. Then there are always some freak outliers, where that does not happen. Most of the time that is harmless; only discrete GPU is used. Laptops are a different story. There wiring is different and it is preferable to use both IGP and GPU. I rather avoid laptops.

This machine in question has had CentOS 7 + ELRepo kmod-nvidia since 2016. It was only the latest updates, where system did not wake up timely. (I apply updates with Ansible. The "reboot" step hang.)
Kernel 3.10.0-1160.45.1 had definitely functioned, 3.10.0-1160.53.1 too. Problem was observed when booting to 3.10.0-1160.59.1
kmod-nvidia-470.74-1.el7_9 had definitely functioned, 470.94-1.el7_9 too. Problem seemed to coincide wit the 510 series.

Now, I did change Primary Video to "NVidia" in BIOS and install oldest available: kmod-nvidia-470.103.01-1.el7_9.elrepo
Result: system gets into black screen, with both 3.10.0-1160.45.1 and 3.10.0-1160.59.1 kernels. However, now Ctrl-Alt-Del initiates reboot.

I did revert Primary Video to "Auto" in BIOS. No change. Black screen.

I did update nvidia drivers to 510.54-1.el7_9.elrepo. Again, black screen.

Change to graphical.target. The boot displays the initial texts to where it earlier did hang, then black screen quite a while, and finally, X11 and lightdm are up.

There are no virtual consoles.

Doesn't the boot process at some point switch text-mode (bit differently for nouveau and nvidia) to "frame buffer"?
(I don't have "rhgb quiet" as I like to see text and change of font occurs at some point.)

On this machine that latter part of output is now black. I can't recall whether it always used to do that. I definitely have had some.

Basically, I have changed nothing (except BIOS firmware update that had no observable effect), yet now machine boots to graphical.target.
Bizarre. Documented. Knock on wood and act calm?

Issue History

Date Modified Username Field Change
2022-03-21 12:23 jlehtone New Issue
2022-03-21 12:23 jlehtone Status new => assigned
2022-03-21 12:23 jlehtone Assigned To => pperry
2022-03-21 13:24 toracat Note Added: 0008266
2022-03-21 16:28 pperry Note Added: 0008267
2022-03-21 18:18 toracat Note Added: 0008270
2022-03-22 05:04 jlehtone Note Added: 0008271
2022-03-22 07:25 pperry Note Added: 0008272
2022-03-22 12:59 toracat Note Added: 0008273
2022-03-23 07:19 jlehtone Note Added: 0008274