View Issue Details

IDProjectCategoryView StatusLast Update
0001365channel: kernel/el7kernel-mlpublic2023-07-12 14:24
ReporterOnnoZweers Assigned Totoracat  
PrioritynormalSeveritymajorReproducibilityrandom
Status assignedResolutionopen 
PlatformDell PoweredgeOSCentosOS Version7 & 8
Summary0001365: Kernel 6.4.0-1.el7.elrepo.x86_64 unstable, freezes & IPv6 issues
DescriptionWe have been running 6.4.0-1.el7.elrepo.x86_64 for two weeks on a cluster of 0000054:0000250 nodes with dual stack (IPv4 & IPv6). We've had so many network issues since then, that our service was basically out of production. At random, nodes cannot ping6 other nodes anymore, while IPv4 ping still works, and other nodes via ping6 also work. Sometimes the network would just stop working at all, until reboot. When we select the previous kernel we had, 6.2.6, there are no issues anymore. We see this on both Intel and Mellanox network cards. All our servers are Dell Poweredges.
Additionally, two nodes froze during this period. There was no logging, there was no message on the console besides a frozen login propmt. They just stopped responding. After a reboot they worked again.
All this made us scream and run back to the previous kernel we used, 6.2.6.
I'm afraid I'm too busy to help troubleshooting this issue. I just wanted to let people know. Stay away from 6.4.0-1.el7.elrepo.x86_64.
Steps To ReproduceRun 6.4.0-1.el7.elrepo.x86_64 on a large cluster with IPv6
TagsNo tags attached.

Activities

OnnoZweers

2023-07-12 08:59

reporter   ~0009270

Oh by the way, the reason we started using the ML kernels was performance. We have 10Gbit/s and 25Gbit/s interfaces and we need to get the maximum network performance out of them. The Centos stock kernels did not provide that.

toracat

2023-07-12 13:24

administrator   ~0009271

@OnnoZweers

As noted in our announcement mail as well as on our website, "If a bug is found when using these kernels, the end user is encouraged to report it upstream to the Linux Kernel Bug Tracker ( http://bugzilla.kernel.org/ ).

We do not modify the source code. We can only handle issues associated with the packaging process.

OnnoZweers

2023-07-12 14:24

reporter   ~0009272

@toracat Thanks, I will probably do that.

Issue History

Date Modified Username Field Change
2023-07-12 08:46 OnnoZweers New Issue
2023-07-12 08:46 OnnoZweers Status new => assigned
2023-07-12 08:46 OnnoZweers Assigned To => toracat
2023-07-12 08:59 OnnoZweers Note Added: 0009270
2023-07-12 13:24 toracat Note Added: 0009271
2023-07-12 14:24 OnnoZweers Note Added: 0009272