View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001022 | channel: elrepo/el8 | kmod-nvidia | public | 2020-07-12 15:36 | 2020-09-09 20:56 |
Reporter | mroche | Assigned To | pperry | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Summary | 0001022: Upgrade to 450.57 from 440.100 breaks gnome-session | ||||
Description | In upgrading from 440.100 on RHEL 8.2 (latest), gdm no longer starts appropriately and will completely fail. In one set of boots I was able to capture some selinux catches, but a second attempt at upgrading after a rollback doesn't output anything. I've provided below the audit.log contents specific to the issue as well as the output from journal on the first attempt. The actual visual problem is GDM tries to load and then provides an all white screen saying something has gone wrong with just a button to log out. Rolling back to 440.100 as it's still stable on my system. audit.log type=AVC msg=audit(1594585929.767:76): avc: denied { map } for pid=2248 comm="gnome-session-c" path=2F6D656D66643A2F2E6E76696469615F6472762E585858585858202864656C6574656429 dev="tmpfs" ino=37368 scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:xserver_tmpfs_t:s0 tclass=file permissive=0 type=AVC msg=audit(1594585929.855:77): avc: denied { map } for pid=2251 comm="gnome-session-c" path=2F6D656D66643A2F2E6E76696469615F6472762E585858585858202864656C6574656429 dev="tmpfs" ino=37368 scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:xserver_tmpfs_t:s0 tclass=file permissive=0 type=AVC msg=audit(1594585934.899:78): avc: denied { map } for pid=2263 comm="gnome-session-c" path=2F6D656D66643A2F2E6E76696469615F6472762E585858585858202864656C6574656429 dev="tmpfs" ino=37368 scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:xserver_tmpfs_t:s0 tclass=file permissive=0 type=AVC msg=audit(1594585934.923:79): avc: denied { map } for pid=2264 comm="gnome-session-c" path=2F6D656D66643A2F2E6E76696469615F6472762E585858585858202864656C6574656429 dev="tmpfs" ino=37368 scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:xserver_tmpfs_t:s0 tclass=file permissive=0 | ||||
Tags | No tags attached. | ||||
Attached Files | gnome-session-failure.txt (14,082 bytes)
Jul 12 16:32:09 workstation gnome-session[2234]: X Error of failed request: BadValue (integer parameter out of range for operation) Jul 12 16:32:09 workstation gnome-session[2234]: Major opcode of failed request: 150 (GLX) Jul 12 16:32:09 workstation gnome-session[2234]: Minor opcode of failed request: 3 (X_GLXCreateContext) Jul 12 16:32:09 workstation gnome-session[2234]: Value in failed request: 0x0 Jul 12 16:32:09 workstation gnome-session[2234]: Serial number of failed request: 34 Jul 12 16:32:09 workstation gnome-session[2234]: Current serial number in output stream: 35 Jul 12 16:32:09 workstation gnome-session[2234]: gnome-session-check-accelerated: GL Helper exited with code 256 Jul 12 16:32:09 workstation gnome-session-c[2251]: Failed to create EGL surface Jul 12 16:32:09 workstation gnome-session[2234]: gnome-session-check-accelerated: GLES Helper exited with code 256 Jul 12 16:32:12 workstation dbus-daemon[1170]: [system] Activating service name='org.fedoraproject.Setroubleshootd' requested by ':1.204' (uid=0 pid=1113 comm="/usr/sbin/sedispatch " label="system_u:system_r:auditd_t:s0") (using servicehelper) Jul 12 16:32:12 workstation dbus-daemon[2254]: [system] Failed to reset fd limit before activating service: org.freedesktop.DBus.Error.AccessDenied: Failed to restore old fd limit: Operation not permitted Jul 12 16:32:13 workstation dbus-daemon[1170]: [system] Successfully activated service 'org.fedoraproject.Setroubleshootd' Jul 12 16:32:14 workstation setroubleshoot[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). For complete SELinux messages run: sealert -l 91d94d03-1d8e-4460-a031-b6aed025614b Jul 12 16:32:14 workstation platform-python[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). ***** Plugin restorecon (92.2 confidence) suggests ************************ If you want to fix the label. /memfd:/.nvidia_drv.XXXXXX (deleted) default label should be default_t. Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly. Do # /sbin/restorecon -v /memfd:/.nvidia_drv.XXXXXX (deleted) ***** Plugin catchall_boolean (7.83 confidence) suggests ****************** If you want to allow domain to can mmap files Then you must tell SELinux about this by enabling the 'domain_can_mmap_files' boolean. Do setsebool -P domain_can_mmap_files 1 ***** Plugin catchall (1.41 confidence) suggests ************************** If you believe that gnome-session-c should be allowed map access on the .nvidia_drv.XXXXXX (deleted) file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'gnome-session-c' --raw | audit2allow -M my-gnomesessionc # semodule -X 300 -i my-gnomesessionc.pp Jul 12 16:32:14 workstation setroubleshoot[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). For complete SELinux messages run: sealert -l 91d94d03-1d8e-4460-a031-b6aed025614b Jul 12 16:32:14 workstation platform-python[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). ***** Plugin restorecon (92.2 confidence) suggests ************************ If you want to fix the label. /memfd:/.nvidia_drv.XXXXXX (deleted) default label should be default_t. Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly. Do # /sbin/restorecon -v /memfd:/.nvidia_drv.XXXXXX (deleted) ***** Plugin catchall_boolean (7.83 confidence) suggests ****************** If you want to allow domain to can mmap files Then you must tell SELinux about this by enabling the 'domain_can_mmap_files' boolean. Do setsebool -P domain_can_mmap_files 1 ***** Plugin catchall (1.41 confidence) suggests ************************** If you believe that gnome-session-c should be allowed map access on the .nvidia_drv.XXXXXX (deleted) file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'gnome-session-c' --raw | audit2allow -M my-gnomesessionc # semodule -X 300 -i my-gnomesessionc.pp Jul 12 16:32:14 workstation gnome-session[2234]: X Error of failed request: BadValue (integer parameter out of range for operation) Jul 12 16:32:14 workstation gnome-session[2234]: Major opcode of failed request: 150 (GLX) Jul 12 16:32:14 workstation gnome-session[2234]: Minor opcode of failed request: 3 (X_GLXCreateContext) Jul 12 16:32:14 workstation gnome-session[2234]: Value in failed request: 0x0 Jul 12 16:32:14 workstation gnome-session[2234]: Serial number of failed request: 34 Jul 12 16:32:14 workstation gnome-session[2234]: Current serial number in output stream: 35 Jul 12 16:32:14 workstation gnome-session[2234]: gnome-session-check-accelerated: GL Helper exited with code 256 Jul 12 16:32:14 workstation gnome-session-c[2264]: Failed to create EGL surface Jul 12 16:32:14 workstation gnome-session[2234]: gnome-session-check-accelerated: GLES Helper exited with code 256 Jul 12 16:32:14 workstation gnome-session[2234]: gnome-session-binary[2234]: WARNING: software acceleration check failed: Child process exited with code 1 Jul 12 16:32:14 workstation gnome-session-binary[2234]: WARNING: software acceleration check failed: Child process exited with code 1 Jul 12 16:32:18 workstation setroubleshoot[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). For complete SELinux messages run: sealert -l 91d94d03-1d8e-4460-a031-b6aed025614b Jul 12 16:32:18 workstation platform-python[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). ***** Plugin restorecon (92.2 confidence) suggests ************************ If you want to fix the label. /memfd:/.nvidia_drv.XXXXXX (deleted) default label should be default_t. Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly. Do # /sbin/restorecon -v /memfd:/.nvidia_drv.XXXXXX (deleted) ***** Plugin catchall_boolean (7.83 confidence) suggests ****************** If you want to allow domain to can mmap files Then you must tell SELinux about this by enabling the 'domain_can_mmap_files' boolean. Do setsebool -P domain_can_mmap_files 1 ***** Plugin catchall (1.41 confidence) suggests ************************** If you believe that gnome-session-c should be allowed map access on the .nvidia_drv.XXXXXX (deleted) file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'gnome-session-c' --raw | audit2allow -M my-gnomesessionc # semodule -X 300 -i my-gnomesessionc.pp Jul 12 16:32:18 workstation setroubleshoot[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). For complete SELinux messages run: sealert -l 91d94d03-1d8e-4460-a031-b6aed025614b Jul 12 16:32:18 workstation platform-python[2254]: SELinux is preventing gnome-session-c from map access on the file /memfd:/.nvidia_drv.XXXXXX (deleted). ***** Plugin restorecon (92.2 confidence) suggests ************************ If you want to fix the label. /memfd:/.nvidia_drv.XXXXXX (deleted) default label should be default_t. Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly. Do # /sbin/restorecon -v /memfd:/.nvidia_drv.XXXXXX (deleted) ***** Plugin catchall_boolean (7.83 confidence) suggests ****************** If you want to allow domain to can mmap files Then you must tell SELinux about this by enabling the 'domain_can_mmap_files' boolean. Do setsebool -P domain_can_mmap_files 1 ***** Plugin catchall (1.41 confidence) suggests ************************** If you believe that gnome-session-c should be allowed map access on the .nvidia_drv.XXXXXX (deleted) file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'gnome-session-c' --raw | audit2allow -M my-gnomesessionc # semodule -X 300 -i my-gnomesessionc.pp | ||||
|
Are you able to test in permissive mode or with SELinux temporarily disabled to establish if it's purely an SELinux issue or if there's other issues at play? I'm not sure where to start troubleshooting this. |
|
Didn't realize uploading a file would wipe the text area... whoops. Essentially boiled down to this: - Rebooted into multi-user and updated driver. - Set default to graphical and set selinux to permissive. Rebooted - Everything, GDM, etc loads up fine and I can log in and things seem to work. - Set selinux to enforcing and reboot - Error as previously described - Reboot into permissive and things work nvupload.tar.xz contains the audit.log snippets, journal snippets, and Xorg logs if that helps. |
|
Thanks for that. So it looks like it's an SELinux issue, with gnome-session-c trying to perform a memory map on a temporary file on tmpfs. Have you tried the suggested SELinux mitigations? I doubt it's a relabeling issue, but worth trying setting the following boolean: setsebool -P domain_can_mmap_files 1 to see if this fixes the issue. I guess we need to establish if (a) this is a universal issue or something unique to your system, and (b) if it's a universal issue, identify the correct course of action to fix it. |
|
A little bit more background info on this. On RHEL7, domain_can_mmap_files is enabled by default meaning that memory map permissions are not checked. However, this option is not enabled by default on RHEL8 meaning memory map permissions are now enforced by SELinux: RHEL7: $ getsebool domain_can_mmap_files domain_can_mmap_files --> on RHEL8: $ getsebool domain_can_mmap_files domain_can_mmap_files --> off Enabling domain_can_mmap_files should fix the issue (I hope). Creating a custom policy as suggested in your attached journal.txt may provide a more specific/targeted solution if you are concerned about enabling globally. |
|
I can duplicate the 'Something went wrong' white screen issue with 450.57 Used rescue mode to downgrade to 440.100. I will try the change in the boolean listed above and report back. |
|
UPDATE: Setting the domain_can_mmap_files resolved the issue for me; updated to 450.57 . Here's some data: [lowen@localhost ~]$ nvidia-detect -v Probing for supported NVIDIA devices... [10de:11be] NVIDIA Corporation GK104GLM [Quadro K3000M] This device requires the current 440.64 NVIDIA driver kmod-nvidia [lowen@localhost ~]$ lspci -v -d 10de:11be 01:00.0 VGA compatible controller: NVIDIA Corporation GK104GLM [Quadro K3000M] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Dell Device 053f Flags: bus master, fast devsel, latency 0, IRQ 36 Memory at f5000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: nvidia Kernel modules: nouveau, nvidia_drm, nvidia [lowen@localhost ~]$ |
|
Brilliant, thank you. That would also explain why I see no issues on RHEL7. So next question - is setting the domain_can_mmap_files boolean the best solution to this issue or can we craft a more targeted response. I would be somewhat reluctant to have our nvidia package make global changes to the default setting of domain_can_mmap_files boolean but equally don't want to ship broken packages. I will see if I can compile a custom SELinux policy module to allow mmap access that we could ship with our nvidia package. |
|
Generating a local SELinux policy from the originally attached audit entries: # cat audit.txt | audit2allow -M nvidialocal gives the following type enforcement policy file: # cat nvidialocal.te module nvidialocal 1.0; require { type xserver_tmpfs_t; type xdm_t; class file map; } #============= xdm_t ============== #!!!! This avc can be allowed using the boolean 'domain_can_mmap_files' allow xdm_t xserver_tmpfs_t:file map; Would anyone be able to test the above policy file fixes the issue once domain_can_mmap_files boolean as been reset to it's original state? If so, I would propose we install this custom SELinux policy to allow mmap access from within the nvidia package. Any thoughts? |
|
I can take a stab at that tonight after work. For the uninitiated and lazy, where do policies go? Or do they need to be loaded by command which will then save the rule? Thanks for all the work, Perry! This is also on the RH Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1851448 |
|
From your audit .log, do: # grep gnome-session-c /path/to/audit.log | audit2allow -M nvidialocal should give you nvidialocal.te and nvidialocal.pp in your current directory. You can check nvidialocal.te, should look like: # cat nvidialocal.te module nvidialocal 1.0; require { type xserver_tmpfs_t; type xdm_t; class file map; } #============= xdm_t ============== allow xdm_t xserver_tmpfs_t:file map; Now load the .pp file using: semodule -i nvidialocal.pp and reset the domain_can_mmap_files boolean fix we previously did: setsebool -P domain_can_mmap_files 0 and reboot to test. To uninstall the SELinux module, use: semodule -r nvidialocal Assuming that works, I will implement the above SELinux policy as part of the package. |
|
Booted into multi-user, upgraded to 450.57, applied selinux module, set-default to graphical, and rebooted. Everything works. My module had the same output as you: +++++++++++++++ module nvidialocal 1.0; require { type xserver_tmpfs_t; type xdm_t; class file map; } #============= xdm_t ============== #!!!! This avc can be allowed using the boolean 'domain_can_mmap_files' allow xdm_t xserver_tmpfs_t:file map; +++++++++++++ Once into the system, I verified that enforcing was still active and the module loaded successfully. # nvidia-settings --version nvidia-settings: version 450.57 # getenforce Enforcing # semodule -l | grep nvidia nvidialocal |
|
Brilliant, thanks Michael. Can you just confirm you reset the domain_can_mmap_files boolean first: setsebool -P domain_can_mmap_files 0 $ getsebool domain_can_mmap_files domain_can_mmap_files --> off I will take that .te file and add it to the nvidia package source, and allow the package to compile and install the module on el8. I don't see any point separating it out into a separate selinux sub-package. |
|
Sorry, thought I had included that in the above report! I never turned in on, but I made sure that I had it off when doing the prior test. # getsebool domain_can_mmap_files domain_can_mmap_files --> off |
|
Thanks for the confirmation Michael. From Aaron's (nvidia) response to the Red Hat bug report, it seems this is something that should be in selinux-policy an maybe not something we should be looking to fix with an ad-hoc policy. Therefore, I'm tempted to take no action given that affected users can switch on domain_can_mmap_files as a workaround for now. Hopefully RH will fix this sooner rather than later as it's a patch that is already in fedora. |
|
Upstream have released an updated selinux-policy package today which hopefully fixes this bug: https://access.redhat.com/errata/RHBA-2020:3655 selinux-policy-3.14.3-41.el8_2.6.noarch.rpm If anyone can confirm this fixes the issue, I'll close the bug here. |
|
Hey Perry, about to test this, I'll let you know how it goes! Mike |
|
Alrighty, everything seems to check out! I'm using 450.66: * Rebooted into multi-user target * Unloaded nvidialocal module :: semodule -r nvidialocal * Rebooted system * Double checked boolean was disabled :: getsebool domain_can_mmap_files -> off * Ran upgrade * Rebooted into graphical target Everything seems to work with selinux-policy-3.14.3-41.el8_2.6 If there's anything else you need from me, just ask! Cheers, Mike |
|
Small addition: * Double checked boolean was disabled :: getsebool domain_can_mmap_files -> off AND * Double checked selinux module didn't exist :: semodule --list | grep nvidia Cheers, Mike |
|
Fantastic, thanks for the feedback. Closing as fixed. |
Date Modified | Username | Field | Change |
---|---|---|---|
2020-07-12 15:36 | mroche | New Issue | |
2020-07-12 15:36 | mroche | Status | new => assigned |
2020-07-12 15:36 | mroche | Assigned To | => pperry |
2020-07-12 15:36 | mroche | File Added: gnome-session-failure.txt | |
2020-07-12 16:34 | pperry | Note Added: 0007037 | |
2020-07-13 20:23 | mroche | File Added: nvupload.tar.xz | |
2020-07-13 20:26 | mroche | Note Added: 0007038 | |
2020-07-14 03:30 | pperry | Note Added: 0007039 | |
2020-07-14 03:43 | pperry | Note Added: 0007040 | |
2020-07-14 03:44 | pperry | Note Edited: 0007040 | |
2020-07-14 11:08 | lowen | Note Added: 0007041 | |
2020-07-14 11:17 | lowen | Note Added: 0007042 | |
2020-07-14 12:56 | pperry | Note Added: 0007043 | |
2020-07-14 13:19 | pperry | Note Added: 0007044 | |
2020-07-14 14:20 | mroche | Note Added: 0007045 | |
2020-07-14 15:26 | pperry | Note Added: 0007046 | |
2020-07-14 16:54 | mroche | Note Added: 0007047 | |
2020-07-15 01:56 | pperry | Note Added: 0007048 | |
2020-07-15 07:10 | mroche | Note Added: 0007049 | |
2020-07-16 06:17 | pperry | Note Added: 0007052 | |
2020-09-08 10:25 | pperry | Note Added: 0007194 | |
2020-09-08 10:25 | pperry | Status | assigned => feedback |
2020-09-09 20:21 | mroche | Note Added: 0007195 | |
2020-09-09 20:36 | mroche | Note Added: 0007196 | |
2020-09-09 20:37 | mroche | Note Added: 0007197 | |
2020-09-09 20:56 | pperry | Note Added: 0007198 | |
2020-09-09 20:56 | pperry | Status | feedback => resolved |
2020-09-09 20:56 | pperry | Resolution | open => fixed |