View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001334 | channel: kernel/el8 | kernel-lt | public | 2023-03-19 14:30 | 2023-03-22 17:46 |
Reporter | thomas.w.simmons@nasa.gov | Assigned To | toracat | ||
Priority | normal | Severity | crash | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Summary | 0001334: 5.4.237-1.el8.elrepo.x86_64 locks up with black screen during boot | ||||
Description | I have (3) el8 servers that are failing to boot the latest kernel (5.4.237-1.el8.elrepo.x86_64). These systems were and continue to boot fine with the previous kernel (5.4.236-1.el8.elrepo.x86_64). All systems are exhibiting the same behavior - the system starts to boot, but at the point where the resolution normally changes, the system instead displays a black screen and locks up. At this point the system does not respond to the keyboard, nor acpi shutdown via the power button and must hard reset. Two of these servers are IvyBridge CPUs and one is Haswell. All use Intel i915 graphics. I found by passing nomodeset or i915.modeset=0 as a kernel parameter allows the system to boot with the new kernel. I also found that if I pass vga=1440x900x16 as a kernel parameter, the system will still hang up, however it does so not at a black screen, but immedeatly after the output "fb0: switching to inteldrmfb from simple". | ||||
Steps To Reproduce | Upgrade kernel to 5.4.237-1.el8.elrepo.x86_64 and reboot. | ||||
Tags | No tags attached. | ||||
|
Related: http://lists.elrepo.org/pipermail/elrepo/2023-March/006429.html |
|
@thomas.w.simmons@nasa.gov In the changelog of linux 5.4.237, there is a patch [1]: "drm/i915: Dont use BAR mappings for ring buffers with LLC" To test if this is causing the current issue, I've reverted it and rebuilt a kernel-lt set. It is available here: https://toracat.org/test/kernel/bug1334/ Could you give it a try and see if it fixes the problem? Please note that the packages are for testing purposes only and are not signed. [1] https://www.spinics.net/lists/stable/msg637749.html |
|
Thank you. I have verified the provided packages fix the lock-up issue I was experiencing. |
|
@thomas.w.simmons@nasa.gov That is great news. Thank you for reporting back. Now we need to report this upstream (kernel.org). One is to use bugzilla.kernel.org and another is to write to the stable mailing list. |
|
For those experiencing this problem on el7 (RHEL7 / CentOS7 / clone) systems, a rebuilt kernel-lt package set has been created with the identified patch reverted. The package set can be identified by the -2.bcat.el7.elrepo section of its name. It is available from the following location -- https://elrepo.org/people/ajb/tmp/ Testing will be appreciated, please. @JPilk would you be able to assist? |
|
Upstream bug report has been submitted: https://bugzilla.kernel.org/show_bug.cgi?id=217222 |
|
@thomas.w.simmons@nasa.gov Thank you. I'm on the CC list now. |
|
See https://lkml.org/lkml/2023/3/21/68 and https://lkml.org/lkml/2023/3/21/202 [quote] Date Tue, 21 Mar 2023 08:42:29 +0100 From Greg KH <> Subject Re: PROBLEM: Linux 5.4.237 i915 driver crashes on boot (-longterm regression) On Tue, Mar 21, 2023 at 01:36:05AM -0400, Nick Bowler wrote: > Hi, > > Linux 5.4.237 crashes immediately on my machine every time when the i915 > driver is loaded, with an error like the one below. Previous versions are OK. > > I bisected it to commit 1aed78cfda7f ("drm/i915: Don't use BAR mappings > for ring buffers with LLC"). I can revert this on top of 5.4.237 and > this seems sufficient to make the machine boot and work again. > > Let me know if you need any more info. This should be fixed in the 5.4.238-rc1 release that is out for testing right now, if not, please let me know. thanks, greg k-h [/quote] |
|
As far as I can see, what happened was some "miscommunication" between the patch submitter and GKH: Subject: Re: [PATCH 5.4.y] drm/i915: Don't use BAR mappings for ring buffers with LLC From: John Harrison <john.c.harrison@intel.com> (snip) >> The original patch series was two patches - >> https://patchwork.freedesktop.org/series/114080/. One to not use stolen >> memory and the other to not use BAR mappings. If the anti-BAR patch is >> applied without the anti-stolen patch then the i915 driver will attempt to >> access stolen memory directly which will fail. So both patches must be >> applied and in the correct order to fix the problem of cache aliasing when >> using BAR accesses on LLC systems. >> >> As above, I am working my way through the bunch of 'FAILED patch' emails. >> The what-to-do instructions in those emails explicitly say to send the patch >> individually in reply to the 'FAILED' message rather than as part of any >> original series. > So what commits exactly in Linus's tree should be in these stable > branches? Sorry, I still do not understand if we are missing one or if > we need to revert something. > > confused, > > greg k-h As far as I can tell, I have replied to all the "FAILED: patch" emails now. There should be a versions of these two patches available for all trees (being 4.14, 4.19, 5.4, 5.10 and 5.15): 690e0ec8e63d drm/i915: Don't use stolen memory for ring buffers with LLC 85636167e320 drm/i915: Don't use BAR mappings for ring buffers with LLC They should be applied in the order of 'stolen memory' first and 'BAR mappings' second. Thanks, John. ======================================================== It seems that only the 2nd patch made it to linux 5.4.237 thus causing the issue. [EDIT] The first (missing) patch has been added to 5.4.238. |
|
Re el7 clones, in my case SL7. I installed the -2.bcat.el7.elrepo packages from note 0009076. Thanks, but the system locked up as before. Under 236-1, "dmesg | grep i915" shows Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0 |
|
Unfortunately, installing kernel-lt-5.4.238-1.el7.elrepo.x86_64 has not solved this. Back with 236. |
|
@JPilk Hmm that indicates your issue is not the same as what is being tracked here. |
|
I can verify the issue is fixed in 5.4.238-1.el8.elrepo.x86_64. Thank you. |
|
@thomas.w.simmons@nasa.gov Thank you for letting us know. @JPilk Now that it is confirmed that the original issue has been fixed, I'm going to close the ticket as resolved. Would you mind opening a new ticket? |
|
With the release of kernel-lt- 5.4.238-1.el8.elrepo.x86_64, the problem has been resolved. |
Date Modified | Username | Field | Change |
---|---|---|---|
2023-03-19 14:30 | thomas.w.simmons@nasa.gov | New Issue | |
2023-03-19 14:30 | thomas.w.simmons@nasa.gov | Status | new => assigned |
2023-03-19 14:30 | thomas.w.simmons@nasa.gov | Assigned To | => burakkucat |
2023-03-19 14:34 | toracat | Note Added: 0009067 | |
2023-03-19 15:35 | burakkucat | Status | assigned => acknowledged |
2023-03-20 02:38 | toracat | Note Added: 0009073 | |
2023-03-20 02:38 | toracat | Status | acknowledged => feedback |
2023-03-20 12:55 | thomas.w.simmons@nasa.gov | Note Added: 0009074 | |
2023-03-20 12:55 | thomas.w.simmons@nasa.gov | Status | feedback => assigned |
2023-03-20 13:13 | burakkucat | Assigned To | burakkucat => toracat |
2023-03-20 13:25 | toracat | Note Added: 0009075 | |
2023-03-20 13:38 | burakkucat | Note Added: 0009076 | |
2023-03-20 13:40 | burakkucat | Note View State: 0009076: public | |
2023-03-20 13:43 | burakkucat | Note Edited: 0009076 | |
2023-03-20 15:22 | thomas.w.simmons@nasa.gov | Note Added: 0009077 | |
2023-03-20 15:26 | toracat | Note Added: 0009078 | |
2023-03-21 08:50 | burakkucat | Note Added: 0009079 | |
2023-03-21 08:51 | burakkucat | Note Edited: 0009079 | |
2023-03-21 12:40 | toracat | Note Added: 0009080 | |
2023-03-21 12:54 | toracat | Note Edited: 0009080 | |
2023-03-22 07:03 | JPilk | Note Added: 0009081 | |
2023-03-22 13:12 | JPilk | Note Added: 0009082 | |
2023-03-22 14:00 | toracat | Note Added: 0009083 | |
2023-03-22 17:32 | thomas.w.simmons@nasa.gov | Note Added: 0009088 | |
2023-03-22 17:44 | toracat | Note Added: 0009089 | |
2023-03-22 17:46 | toracat | Status | assigned => resolved |
2023-03-22 17:46 | toracat | Resolution | open => fixed |
2023-03-22 17:46 | toracat | Note Added: 0009090 |