View Issue Details

IDProjectCategoryView StatusLast Update
0000768channel: elrepo/el7kmod-nvidia-340xxpublic2017-09-07 11:33
Reporterjcf Assigned Topperry  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Summary0000768: kmod-nvidia-340xx does not install under the RHEL 7.4 kernel
DescriptionThe kmod-nvidia-340xx package does not install modules for kernel-3.10.0-693.el7.x86_64.
TagsNo tags attached.
Reported upstream

Activities

pperry

2017-08-16 23:48

administrator   ~0005395

Thank you for your report.

I has received reports that the 340xx module for RHEL7.4 was not working (hence why it is in the testing repository), but I do not have hardware to test.

If you please have any clue as to why, or a fix, please feel free to share. Some more details may be useful

The first thing I would ask you to test is to uninstall the elrepo driver packages and to try Nvidia's installer (.run) package and see if that works on RHEL7.4 so we can try to narrow down where the issue may be.

Thanks

pperry

2017-08-16 23:51

administrator   ~0005396

Last edited: 2017-08-16 23:51

Just to confirm, as you haven't provided any details, you did actually try the driver built for the RHEL7.4 kernel from the testing repository?

http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-nvidia-340xx-340.102-3.el7_4.elrepo.x86_64.rpm

jcf

2017-08-17 20:34

reporter   ~0005400

The video card in question is a Dell OEM Geforce GTX 745. The ELRepo nvidia drivers were initially installed via "yum install $(nvidia-detect)", I don't recall how long ago. The kmod-nvidia-340xx-340.102-2.el7.elrepo.x86_64 and nvidia-x11-drv-340xx-340.102-1.el7.elrepo.x86_64 packages were currently installed.

I didn't know about the testing repository, so thanks for the pointer.

After running "yum upgrade http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-nvidia-340xx-340.102-3.el7_4.elrepo.x86_64.rpm" I did have modules under the /lib/modules/3.10.0-693.el7.x86_64/extra directory, however the system did not boot to a GUI. The following errors were in the dmesg:

[ 1219.855265] proc_dir_entry 'driver/nvidia' already registered
[ 1219.855266] Modules linked in: nvidia(POE+) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xt_limit ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_hdmi lrw snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel gf128mul snd_hda_codec
[ 1219.855347] [<ffffffffc14aeb6c>] nv_register_procfs+0x4c/0x1d0 [nvidia]
[ 1219.855367] [<ffffffffc08092a6>] nvidia_init_module+0x2a6/0x7c5 [nvidia]
[ 1219.855387] [<ffffffffc08097da>] ? nv_drm_init+0x15/0x15 [nvidia]
[ 1219.855427] [<ffffffffc080982b>] nvidia_frontend_init_module+0x51/0x826 [nvidia]
[ 1219.855484] nvidia: probe of 0000:01:00.0 failed with error -1
[ 1219.855500] Error: Driver 'nvidia' is already registered, aborting...

When I went to download the driver from geforce.com, I was pointed to the 384.59 release. I thought that was odd, so I ran nvidia-detect again and got "kmod-nvidia". So I removed the existing packages via "rpm -e --nodeps kmod-nvidia-340xx nvidia-x11-drv-340xx". After running "yum install $(nvidia-detect)" I had a working system with the 384.59-1 ELRepo packages installed. This fixed my problem.

Since you mentioned running the NVIDIA installer, I downloaded the NVIDIA-Linux-x86_64-340.102.run file from geforce.com. I removed the ELRepo 384.59-1 packages and ran the installer. It failed, with the following in the /var/log/nvidia-installer.log file:

ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 2969.639792] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.644905] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.669888] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.674981] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.699937] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.705035] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.730013] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.735124] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.760072] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.765182] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.790139] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.795285] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.820214] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.825360] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.850290] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.855438] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.880370] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2969.885515] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 2 ep 4 with no TDs queued?
[ 2995.556330] nvidia-modeset: Freed GPU:0 (GPU-ce1c6ace-df76-4dab-4134-1fabad4e88f0) @ PCI:0000:01:00.0
[ 3005.435532] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 3005.441253] nvidia-modeset: Unloading
[ 3005.446592] nvidia-nvlink: Unregistered the Nvlink Core, major device number 244
[ 3026.353767] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=io+mem
[ 3026.353918] Error: Driver 'nvidia' is already registered, aborting...
[ 3026.353919] NVRM: DRM init failed
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

So, as far as I can tell, the kmod-nvidia-340xx-340.102-3.el7_4.elrepo.x86_64 package produces the same error as the nvidia installer. Which suggests the issue is with the NVIDIA driver and not the ELRepo package.

toracat

2017-08-17 20:56

administrator   ~0005401

Thanks for the testing. Turns out kmod-nvidia-340xx-340.102-3.el7_4.elrepo.x86_64.rpm is defective. A patch to fix the problem is now available. We will be publishing a fixed version soon.

toracat

2017-08-17 23:46

administrator   ~0005402

The patched version of kmod-nvidia-340xx has been released to the elrepo-testing repository:

kmod-nvidia-340xx-340.102-4.el7_4.elrepo.x86_64

(Note that it is 340.102-4, not 340.102-3)

It will show up in our mirror sites shortly. Please test and let us know if this one works for you.

jcf

2017-08-18 16:31

reporter   ~0005403

I installed the kmod-nvidia-340xx-340.102-4.el7_4.elrepo.x86_64 package and it works with my setup. Thanks.

toracat

2017-08-18 17:42

administrator   ~0005404

Thanks for reporting back. The patched version is now in the main repository. Closing as resolved.

Issue History

Date Modified Username Field Change
2017-08-16 15:43 jcf New Issue
2017-08-16 15:43 jcf Status new => assigned
2017-08-16 15:43 jcf Assigned To => pperry
2017-08-16 23:48 pperry Note Added: 0005395
2017-08-16 23:51 pperry Note Added: 0005396
2017-08-16 23:51 pperry Note Edited: 0005396
2017-08-17 20:34 jcf Note Added: 0005400
2017-08-17 20:56 toracat Note Added: 0005401
2017-08-17 23:46 toracat Note Added: 0005402
2017-08-18 16:31 jcf Note Added: 0005403
2017-08-18 17:42 toracat Note Added: 0005404
2017-08-18 17:42 toracat Status assigned => resolved
2017-08-18 17:42 toracat Resolution open => fixed