View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000779 | channel: elrepo/el7 | kmod-nvidia | public | 2017-09-11 05:52 | 2017-09-11 13:55 |
Reporter | pbergene | Assigned To | pperry | ||
Priority | normal | Severity | block | Reproducibility | always |
Status | assigned | Resolution | open | ||
Summary | 0000779: Kernel crash with kmod-nvidia 7.4 on M1000M | ||||
Description | With kmod-nvidia drivers on latest CR (7.4 branch) driver fails to load and crashes with kernel message: [ 202.205309] NVRM: failed to copy vbios to system memory. Suspect this is kernel related, see also what could be the same issue upstream with M2000M [1]. Modules built with drivers directly from Nvidia also experience this behaviour. [1] https://access.redhat.com/discussions/3176211#comment-1217571 | ||||
Additional Information | I'm on a Lenovo P50 and getting a kernel message (config generated by nvidia-xconfig): Lots of these: [ 198.178279] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95) And finally [ 202.205309] NVRM: failed to copy vbios to system memory. [ 202.205691] NVRM: RmInitAdapter failed! (0x30:0xffff:659) [ 202.205762] NVRM: rm_init_adapter failed for device bearing minor number 0 I have attempted many different BIOS settings and doing this both docked/undocked, the follow settings used to work on 7.3 Boot Display Device: [Display on dock] Shared Display Priority: [Display on dock] Total Graphics Memory: [512MB] Graphics Device [Discrete Graphics] (disable intel, nvidia only) [paul@leda ~]$ lspci | grep NVIDIA 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2) 01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1) [paul@leda ~]$ nvidia-detect kmod-nvidia Have also tried kmod-nvidia-drv-340xx and kmod-nvidia-drv-304xx for good measure No luck with nouveau either, nor anything with intel (in hybrid mode). [paul@leda ~]$ uname -a Linux leda.snowcrashed.net 3.10.0-693.2.1.el7.x86_64 #1 SMP Wed Sep 6 20:06:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [paul@leda ~]$ sudo grep Command /var/log/dmesg [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.2.1.el7.x86_64 root=/dev/mapper/cl-root ro crashkernel=auto rd.lvm.lv=cl/root rd.luks.uuid=luks-65355cc9-ccdd-41e0-8e60-cc9175f015fe rd.lvm.lv=cl/swap rhgb quiet audit=1 LANG=en_GB.UTF-8 nouveau.modeset=0 rd.driver.blacklist=nouveau plymouth.ignore-udev [paul@leda ~]$ yum repolist Loaded plugins: fastestmirror, langpacks, nvidia Loading mirror speeds from cached hostfile * base: centos.uib.no * elrepo: ftp.nluug.nl * extras: centos.uib.no * updates: centos.uib.no repo id repo name status base/7/x86_64 CentOS-7 - Base 9,363 centos-openshift-origin CentOS OpenShift Origin 261 cr/7/x86_64 CentOS-7 - cr 3,844 elrepo ELRepo.org Community Enterprise Linux Repository 216 extras/7/x86_64 CentOS-7 - Extras 451 updates/7/x86_64 CentOS-7 - Updates 2,146 repolist: 16,281 | ||||
Tags | No tags attached. | ||||
Reported upstream | https://access.redhat.com/discussions/3176211#comment-1217571 (?) | ||||
|
As suggested by Jamie in the RH discussion thread, and as you suspect the kernel could be a culprit, it may be a good idea to test a recent kernel. You can test-install either kernel-lt (4.4.87) or kernel-ml (4.13.1). If either one fixes the issue, we can try identifying the bug (although this is not really an elrepo issue). |
|
Yeah, I agree it is not really an elrepo issue, nvidia/kernel/bios all need to cooperate. Had a short email thread with Akemi where we concluded that I should open a bug for you to track it. I rolled back to non-CR over the weekend because I needed a working system for the weekdays, but will keep issue up to date when I need to start pulling in 7.4 packages. |
|
I am using our nvidia packages on RHEL7.4 with latest kernel and have no issues here, so I am unable to replicate your issue: $ uname -a Linux quad 3.10.0-693.2.1.el7.x86_64 #1 SMP Fri Aug 11 04:58:43 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux $ rpm -qa | grep nvidia nvidia-detect-384.59-1.el7.elrepo.x86_64 nvidia-x11-drv-384.69-2.el7.elrepo.x86_64 yum-plugin-nvidia-1.0.2-1.el7.elrepo.noarch kmod-nvidia-384.69-1.el7_4.elrepo.x86_64 So it looks like something specific to your setup rather than a more generic issue. |
|
One similarity I note is that you have a Quadro M1000M and the poster in the Red Hat thread who is similarly affected has a Quadro M2000M card. Related perhaps? |
|
That is the same kernel I was attempting, so that's a data point. Of course seeing the issue on both M2000M/M1000M could be a cognitive bias and that something went wrong with my installation upgrading to CR. This is however a fairly plain installation, barely a month old (also a very recent P50 with 2017 bios) I'll give it another spin when centos 7.4 hits shortly. |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-09-11 05:52 | pbergene | New Issue | |
2017-09-11 05:52 | pbergene | Status | new => assigned |
2017-09-11 05:52 | pbergene | Assigned To | => pperry |
2017-09-11 05:52 | pbergene | Reported upstream | => https://access.redhat.com/discussions/3176211#comment-1217571 (?) |
2017-09-11 12:23 | toracat | Note Added: 0005475 | |
2017-09-11 12:28 | pbergene | Note Added: 0005476 | |
2017-09-11 13:37 | pperry | Note Added: 0005479 | |
2017-09-11 13:40 | pperry | Note Edited: 0005479 | |
2017-09-11 13:42 | pperry | Note Added: 0005480 | |
2017-09-11 13:55 | pbergene | Note Added: 0005481 |