View Issue Details

IDProjectCategoryView StatusLast Update
0001334channel: kernel/el8kernel-ltpublic2023-03-22 17:46
Reporterthomas.w.simmons@nasa.gov Assigned Totoracat  
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
Summary0001334: 5.4.237-1.el8.elrepo.x86_64 locks up with black screen during boot
DescriptionI have (3) el8 servers that are failing to boot the latest kernel (5.4.237-1.el8.elrepo.x86_64). These systems were and continue to boot fine with the previous kernel (5.4.236-1.el8.elrepo.x86_64). All systems are exhibiting the same behavior - the system starts to boot, but at the point where the resolution normally changes, the system instead displays a black screen and locks up. At this point the system does not respond to the keyboard, nor acpi shutdown via the power button and must hard reset.

Two of these servers are IvyBridge CPUs and one is Haswell. All use Intel i915 graphics. I found by passing nomodeset or i915.modeset=0 as a kernel parameter allows the system to boot with the new kernel. I also found that if I pass vga=1440x900x16 as a kernel parameter, the system will still hang up, however it does so not at a black screen, but immedeatly after the output "fb0: switching to inteldrmfb from simple".
Steps To ReproduceUpgrade kernel to 5.4.237-1.el8.elrepo.x86_64 and reboot.
TagsNo tags attached.

Activities

toracat

2023-03-19 14:34

administrator   ~0009067

Related:

http://lists.elrepo.org/pipermail/elrepo/2023-March/006429.html

toracat

2023-03-20 02:38

administrator   ~0009073

@thomas.w.simmons@nasa.gov

In the changelog of linux 5.4.237, there is a patch [1]:

"drm/i915: Dont use BAR mappings for ring buffers with LLC"

To test if this is causing the current issue, I've reverted it and rebuilt a kernel-lt set. It is available here:

https://toracat.org/test/kernel/bug1334/

Could you give it a try and see if it fixes the problem? Please note that the packages are for testing purposes only and are not signed.

[1] https://www.spinics.net/lists/stable/msg637749.html

thomas.w.simmons@nasa.gov

2023-03-20 12:55

reporter   ~0009074

Thank you. I have verified the provided packages fix the lock-up issue I was experiencing.

toracat

2023-03-20 13:25

administrator   ~0009075

@thomas.w.simmons@nasa.gov

That is great news. Thank you for reporting back.

Now we need to report this upstream (kernel.org). One is to use bugzilla.kernel.org and another is to write to the stable mailing list.

burakkucat

2023-03-20 13:38

administrator   ~0009076

Last edited: 2023-03-20 13:43

For those experiencing this problem on el7 (RHEL7 / CentOS7 / clone) systems, a rebuilt kernel-lt package set has been created with the identified patch reverted.

The package set can be identified by the -2.bcat.el7.elrepo section of its name. It is available from the following location --

https://elrepo.org/people/ajb/tmp/

Testing will be appreciated, please.

@JPilk would you be able to assist?

thomas.w.simmons@nasa.gov

2023-03-20 15:22

reporter   ~0009077

Upstream bug report has been submitted:
https://bugzilla.kernel.org/show_bug.cgi?id=217222

toracat

2023-03-20 15:26

administrator   ~0009078

@thomas.w.simmons@nasa.gov

Thank you. I'm on the CC list now.

burakkucat

2023-03-21 08:50

administrator   ~0009079

Last edited: 2023-03-21 08:51

See https://lkml.org/lkml/2023/3/21/68 and https://lkml.org/lkml/2023/3/21/202

[quote]
Date Tue, 21 Mar 2023 08:42:29 +0100
From Greg KH <>
Subject Re: PROBLEM: Linux 5.4.237 i915 driver crashes on boot (-longterm regression)

On Tue, Mar 21, 2023 at 01:36:05AM -0400, Nick Bowler wrote:
> Hi,
>
> Linux 5.4.237 crashes immediately on my machine every time when the i915
> driver is loaded, with an error like the one below. Previous versions are OK.
>
> I bisected it to commit 1aed78cfda7f ("drm/i915: Don't use BAR mappings
> for ring buffers with LLC"). I can revert this on top of 5.4.237 and
> this seems sufficient to make the machine boot and work again.
>
> Let me know if you need any more info.

This should be fixed in the 5.4.238-rc1 release that is out for testing
right now, if not, please let me know.

thanks,

greg k-h
[/quote]

toracat

2023-03-21 12:40

administrator   ~0009080

Last edited: 2023-03-21 12:54

As far as I can see, what happened was some "miscommunication" between the patch submitter and GKH:

Subject: Re: [PATCH 5.4.y] drm/i915: Don't use BAR mappings for ring buffers with LLC
From: John Harrison <john.c.harrison@intel.com>

(snip)
>> The original patch series was two patches -
>> https://patchwork.freedesktop.org/series/114080/. One to not use stolen
>> memory and the other to not use BAR mappings. If the anti-BAR patch is
>> applied without the anti-stolen patch then the i915 driver will attempt to
>> access stolen memory directly which will fail. So both patches must be
>> applied and in the correct order to fix the problem of cache aliasing when
>> using BAR accesses on LLC systems.
>>
>> As above, I am working my way through the bunch of 'FAILED patch' emails.
>> The what-to-do instructions in those emails explicitly say to send the patch
>> individually in reply to the 'FAILED' message rather than as part of any
>> original series.
> So what commits exactly in Linus's tree should be in these stable
> branches? Sorry, I still do not understand if we are missing one or if
> we need to revert something.
>
> confused,
>
> greg k-h

As far as I can tell, I have replied to all the "FAILED: patch" emails
now. There should be a versions of these two patches available for all
trees (being 4.14, 4.19, 5.4, 5.10 and 5.15):
     690e0ec8e63d drm/i915: Don't use stolen memory for ring buffers with LLC
     85636167e320 drm/i915: Don't use BAR mappings for ring buffers with LLC

They should be applied in the order of 'stolen memory' first and 'BAR
mappings' second.

Thanks,
John.
========================================================

It seems that only the 2nd patch made it to linux 5.4.237 thus causing the issue.

[EDIT] The first (missing) patch has been added to 5.4.238.

JPilk

2023-03-22 07:03

reporter   ~0009081

Re el7 clones, in my case SL7. I installed the -2.bcat.el7.elrepo packages from note 0009076. Thanks, but the system locked up as before.

Under 236-1, "dmesg | grep i915" shows Initialized i915 1.6.0 20190822 for 0000:00:02.0 on minor 0

JPilk

2023-03-22 13:12

reporter   ~0009082

Unfortunately, installing kernel-lt-5.4.238-1.el7.elrepo.x86_64 has not solved this. Back with 236.

toracat

2023-03-22 14:00

administrator   ~0009083

@JPilk

Hmm that indicates your issue is not the same as what is being tracked here.

thomas.w.simmons@nasa.gov

2023-03-22 17:32

reporter   ~0009088

I can verify the issue is fixed in 5.4.238-1.el8.elrepo.x86_64. Thank you.

toracat

2023-03-22 17:44

administrator   ~0009089

@thomas.w.simmons@nasa.gov

Thank you for letting us know.

@JPilk

Now that it is confirmed that the original issue has been fixed, I'm going to close the ticket as resolved. Would you mind opening a new ticket?

toracat

2023-03-22 17:46

administrator   ~0009090

With the release of kernel-lt- 5.4.238-1.el8.elrepo.x86_64, the problem has been resolved.

Issue History

Date Modified Username Field Change
2023-03-19 14:30 thomas.w.simmons@nasa.gov New Issue
2023-03-19 14:30 thomas.w.simmons@nasa.gov Status new => assigned
2023-03-19 14:30 thomas.w.simmons@nasa.gov Assigned To => burakkucat
2023-03-19 14:34 toracat Note Added: 0009067
2023-03-19 15:35 burakkucat Status assigned => acknowledged
2023-03-20 02:38 toracat Note Added: 0009073
2023-03-20 02:38 toracat Status acknowledged => feedback
2023-03-20 12:55 thomas.w.simmons@nasa.gov Note Added: 0009074
2023-03-20 12:55 thomas.w.simmons@nasa.gov Status feedback => assigned
2023-03-20 13:13 burakkucat Assigned To burakkucat => toracat
2023-03-20 13:25 toracat Note Added: 0009075
2023-03-20 13:38 burakkucat Note Added: 0009076
2023-03-20 13:40 burakkucat Note View State: 0009076: public
2023-03-20 13:43 burakkucat Note Edited: 0009076
2023-03-20 15:22 thomas.w.simmons@nasa.gov Note Added: 0009077
2023-03-20 15:26 toracat Note Added: 0009078
2023-03-21 08:50 burakkucat Note Added: 0009079
2023-03-21 08:51 burakkucat Note Edited: 0009079
2023-03-21 12:40 toracat Note Added: 0009080
2023-03-21 12:54 toracat Note Edited: 0009080
2023-03-22 07:03 JPilk Note Added: 0009081
2023-03-22 13:12 JPilk Note Added: 0009082
2023-03-22 14:00 toracat Note Added: 0009083
2023-03-22 17:32 thomas.w.simmons@nasa.gov Note Added: 0009088
2023-03-22 17:44 toracat Note Added: 0009089
2023-03-22 17:46 toracat Status assigned => resolved
2023-03-22 17:46 toracat Resolution open => fixed
2023-03-22 17:46 toracat Note Added: 0009090