View Issue Details

IDProjectCategoryView StatusLast Update
0000779channel: elrepo/el7kmod-nvidiapublic2017-09-11 13:55
Reporterpbergene Assigned Topperry  
PrioritynormalSeverityblockReproducibilityalways
Status assignedResolutionopen 
Summary0000779: Kernel crash with kmod-nvidia 7.4 on M1000M
DescriptionWith kmod-nvidia drivers on latest CR (7.4 branch) driver fails to load and crashes with kernel message: [ 202.205309] NVRM: failed to copy vbios to system memory.

Suspect this is kernel related, see also what could be the same issue upstream with M2000M [1]. Modules built with drivers directly from Nvidia also experience this behaviour.

[1] https://access.redhat.com/discussions/3176211#comment-1217571
Additional InformationI'm on a Lenovo P50 and getting a kernel message (config generated by nvidia-xconfig):

Lots of these:
[ 198.178279] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)

And finally

[ 202.205309] NVRM: failed to copy vbios to system memory.
[ 202.205691] NVRM: RmInitAdapter failed! (0x30:0xffff:659)
[ 202.205762] NVRM: rm_init_adapter failed for device bearing minor number 0

I have attempted many different BIOS settings and doing this both docked/undocked, the follow settings used to work on 7.3

Boot Display Device: [Display on dock]

Shared Display Priority: [Display on dock]

Total Graphics Memory: [512MB]

Graphics Device [Discrete Graphics] (disable intel, nvidia only)

[paul@leda ~]$ lspci | grep NVIDIA
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)

[paul@leda ~]$ nvidia-detect
kmod-nvidia

Have also tried kmod-nvidia-drv-340xx and kmod-nvidia-drv-304xx for good measure
No luck with nouveau either, nor anything with intel (in hybrid mode).

[paul@leda ~]$ uname -a
Linux leda.snowcrashed.net 3.10.0-693.2.1.el7.x86_64 #1 SMP Wed Sep 6 20:06:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

[paul@leda ~]$ sudo grep Command /var/log/dmesg
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.2.1.el7.x86_64 root=/dev/mapper/cl-root ro crashkernel=auto rd.lvm.lv=cl/root rd.luks.uuid=luks-65355cc9-ccdd-41e0-8e60-cc9175f015fe rd.lvm.lv=cl/swap rhgb quiet audit=1 LANG=en_GB.UTF-8 nouveau.modeset=0 rd.driver.blacklist=nouveau plymouth.ignore-udev

[paul@leda ~]$ yum repolist
Loaded plugins: fastestmirror, langpacks, nvidia
Loading mirror speeds from cached hostfile
 * base: centos.uib.no
 * elrepo: ftp.nluug.nl
 * extras: centos.uib.no
 * updates: centos.uib.no
repo id repo name status
base/7/x86_64 CentOS-7 - Base 9,363
centos-openshift-origin CentOS OpenShift Origin 261
cr/7/x86_64 CentOS-7 - cr 3,844
elrepo ELRepo.org Community Enterprise Linux Repository 216
extras/7/x86_64 CentOS-7 - Extras 451
updates/7/x86_64 CentOS-7 - Updates 2,146
repolist: 16,281
TagsNo tags attached.
Reported upstream https://access.redhat.com/discussions/3176211#comment-1217571 (?)

Activities

toracat

2017-09-11 12:23

administrator   ~0005475

As suggested by Jamie in the RH discussion thread, and as you suspect the kernel could be a culprit, it may be a good idea to test a recent kernel. You can test-install either kernel-lt (4.4.87) or kernel-ml (4.13.1). If either one fixes the issue, we can try identifying the bug (although this is not really an elrepo issue).

pbergene

2017-09-11 12:28

reporter   ~0005476

Yeah, I agree it is not really an elrepo issue, nvidia/kernel/bios all need to cooperate. Had a short email thread with Akemi where we concluded that I should open a bug for you to track it.

I rolled back to non-CR over the weekend because I needed a working system for the weekdays, but will keep issue up to date when I need to start pulling in 7.4 packages.

pperry

2017-09-11 13:37

administrator   ~0005479

Last edited: 2017-09-11 13:40

I am using our nvidia packages on RHEL7.4 with latest kernel and have no issues here, so I am unable to replicate your issue:

$ uname -a
Linux quad 3.10.0-693.2.1.el7.x86_64 #1 SMP Fri Aug 11 04:58:43 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

$ rpm -qa | grep nvidia
nvidia-detect-384.59-1.el7.elrepo.x86_64
nvidia-x11-drv-384.69-2.el7.elrepo.x86_64
yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch
kmod-nvidia-384.69-1.el7_4.elrepo.x86_64

So it looks like something specific to your setup rather than a more generic issue.

pperry

2017-09-11 13:42

administrator   ~0005480

One similarity I note is that you have a Quadro M1000M and the poster in the Red Hat thread who is similarly affected has a Quadro M2000M card. Related perhaps?

pbergene

2017-09-11 13:55

reporter   ~0005481

That is the same kernel I was attempting, so that's a data point. Of course seeing the issue on both M2000M/M1000M could be a cognitive bias and that something went wrong with my installation upgrading to CR.

This is however a fairly plain installation, barely a month old (also a very recent P50 with 2017 bios) I'll give it another spin when centos 7.4 hits shortly.

Issue History

Date Modified Username Field Change
2017-09-11 05:52 pbergene New Issue
2017-09-11 05:52 pbergene Status new => assigned
2017-09-11 05:52 pbergene Assigned To => pperry
2017-09-11 05:52 pbergene Reported upstream => https://access.redhat.com/discussions/3176211#comment-1217571 (?)
2017-09-11 12:23 toracat Note Added: 0005475
2017-09-11 12:28 pbergene Note Added: 0005476
2017-09-11 13:37 pperry Note Added: 0005479
2017-09-11 13:40 pperry Note Edited: 0005479
2017-09-11 13:42 pperry Note Added: 0005480
2017-09-11 13:55 pbergene Note Added: 0005481