View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001477 | channel: kernel/el9 | kernel-ml | public | 2024-08-27 13:40 | 2024-09-01 09:52 |
Reporter | iocc | Assigned To | toracat | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | assigned | Resolution | open | ||
Platform | x86_64 | OS | AlmaLinux | OS Version | 9.4 |
Summary | 0001477: igb clearing Tx timestamp hang loop | ||||
Description | Previously I was using 6.2.8 for over a year without any issues. I got pause true on both rx and tx in NetworkManager. [root@anyhostname system-connections]# grep pause * eth0.nmconnection:pause-autoneg=false eth0.nmconnection:pause-rx=true eth0.nmconnection:pause-tx=true eth1.nmconnection:pause-autoneg=false eth1.nmconnection:pause-rx=true eth1.nmconnection:pause-tx=true eth2.nmconnection:pause-autoneg=false eth2.nmconnection:pause-rx=true eth2.nmconnection:pause-tx=true eth3.nmconnection:pause-autoneg=false eth3.nmconnection:pause-rx=true eth3.nmconnection:pause-tx=true eth4.nmconnection:pause-autoneg=false eth4.nmconnection:pause-rx=true eth4.nmconnection:pause-tx=true eth5.nmconnection:pause-autoneg=false eth5.nmconnection:pause-rx=true eth5.nmconnection:pause-tx=true eth6.nmconnection:pause-autoneg=false eth6.nmconnection:pause-rx=true eth6.nmconnection:pause-tx=true eth7.nmconnection:pause-autoneg=false eth7.nmconnection:pause-rx=true eth7.nmconnection:pause-tx=true However, it didnt fully work on the tg3 card so in have this in rc.local: # flowcontrol eth4->eth7 rx on (NetworkManager doesnt quite work) ethtool -A eth4 tx on sleep 1 ethtool -A eth5 tx on sleep 1 ethtool -A eth6 tx on sleep 1 ethtool -A eth7 tx on But 6.10.6 isnt very happy about tx. [ 70.421611] igb 0000:05:00.2: clearing Tx timestamp hang [ 72.469626] igb 0000:05:00.3: clearing Tx timestamp hang [ 92.437869] igb 0000:05:00.2: clearing Tx timestamp hang [ 100.438906] igb 0000:05:00.3: clearing Tx timestamp hang (...) [ 2380.458841] igb 0000:05:00.2: clearing Tx timestamp hang [ 2398.506904] igb 0000:05:00.2: clearing Tx timestamp hang [ 2412.459922] igb 0000:05:00.3: clearing Tx timestamp hang [ 2414.507005] igb 0000:05:00.2: clearing Tx timestamp hang (...) [ 3608.495712] igb 0000:05:00.3: clearing Tx timestamp hang [ 3610.479784] igb 0000:05:00.2: clearing Tx timestamp hang [ 3624.496830] igb 0000:05:00.3: clearing Tx timestamp hang [ 3642.480044] igb 0000:05:00.2: clearing Tx timestamp hang But why is it only on 2 of 4 ethernet ports? [ 45.688614] igb 0000:05:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 46.088046] igb 0000:05:00.1 eth5: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 46.875900] igb 0000:05:00.3 eth7: igb: eth7 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 46.911638] igb 0000:05:00.2 eth6: igb: eth6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX FYI tg3 ports: [ 44.987584] tg3 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex [ 44.987954] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX [ 44.988255] tg3 0000:02:00.0 eth0: EEE is disabled [ 45.210848] tg3 0000:02:00.1 eth1: Link is up at 1000 Mbps, full duplex [ 45.211215] tg3 0000:02:00.1 eth1: Flow control is on for TX and on for RX [ 45.211517] tg3 0000:02:00.1 eth1: EEE is disabled [ 45.508038] tg3 0000:02:00.2 eth3: Link is up at 1000 Mbps, full duplex [ 45.508411] tg3 0000:02:00.2 eth3: Flow control is on for TX and on for RX [ 45.508723] tg3 0000:02:00.2 eth3: EEE is disabled [ 45.848260] tg3 0000:02:00.3 eth4: Link is up at 1000 Mbps, full duplex [ 45.848634] tg3 0000:02:00.3 eth4: Flow control is on for TX and on for RX [ 45.849574] tg3 0000:02:00.3 eth4: EEE is disabled Also, its in a quite strange order. I boot the kernel arguments that are related to this: net.ifnames=0 biosdevname=0 nopat | ||||
Tags | 6.10.6, igb | ||||
|
To see if the fix is already in the queue for the next version of kernel-ml, can you do a test-install of kernel-ml-6.11.0-0.rc5.el9.elrepo.x86_64.rpm? The kernel set is available from: https://elrepo.org/people/toracat/devel/kernel-ml/el9/x86_64/RPMS/ |
|
The problem still exist but MUCH less. [ 74.847542] igb 0000:05:00.2: clearing Tx timestamp hang [ 77.854515] igb 0000:05:00.3: clearing Tx timestamp hang [ 96.863688] igb 0000:05:00.2: clearing Tx timestamp hang [ 105.886727] igb 0000:05:00.3: clearing Tx timestamp hang [ 114.847854] igb 0000:05:00.2: clearing Tx timestamp hang [ 136.862950] igb 0000:05:00.2: clearing Tx timestamp hang [ 137.887960] igb 0000:05:00.3: clearing Tx timestamp hang [ 152.864067] igb 0000:05:00.2: clearing Tx timestamp hang [ 153.887075] igb 0000:05:00.3: clearing Tx timestamp hang [ 169.887204] igb 0000:05:00.3: clearing Tx timestamp hang [ 170.848271] igb 0000:05:00.2: clearing Tx timestamp hang [ 185.887374] igb 0000:05:00.3: clearing Tx timestamp hang [ 186.847375] igb 0000:05:00.2: clearing Tx timestamp hang [ 202.847497] igb 0000:05:00.2: clearing Tx timestamp hang [ 203.871519] igb 0000:05:00.3: clearing Tx timestamp hang [ 218.847612] igb 0000:05:00.2: clearing Tx timestamp hang [ 234.847743] igb 0000:05:00.2: clearing Tx timestamp hang [ 235.872770] igb 0000:05:00.3: clearing Tx timestamp hang [ 250.848845] igb 0000:05:00.2: clearing Tx timestamp hang [ 266.848002] igb 0000:05:00.2: clearing Tx timestamp hang [ 267.871990] igb 0000:05:00.3: clearing Tx timestamp hang [ 282.848082] igb 0000:05:00.2: clearing Tx timestamp hang [ 298.848232] igb 0000:05:00.2: clearing Tx timestamp hang [ 299.872228] igb 0000:05:00.3: clearing Tx timestamp hang [ 314.848358] igb 0000:05:00.2: clearing Tx timestamp hang [ 330.848449] igb 0000:05:00.2: clearing Tx timestamp hang [ 331.873451] igb 0000:05:00.3: clearing Tx timestamp hang [ 347.872588] igb 0000:05:00.3: clearing Tx timestamp hang [ 348.832532] igb 0000:05:00.2: clearing Tx timestamp hang [ 363.873701] igb 0000:05:00.3: clearing Tx timestamp hang [ 364.832716] igb 0000:05:00.2: clearing Tx timestamp hang [ 379.872822] igb 0000:05:00.3: clearing Tx timestamp hang [ 382.880838] igb 0000:05:00.2: clearing Tx timestamp hang [ 395.872926] igb 0000:05:00.3: clearing Tx timestamp hang [ 412.833003] igb 0000:05:00.2: clearing Tx timestamp hang [ 413.857014] igb 0000:05:00.3: clearing Tx timestamp hang [ 430.881236] igb 0000:05:00.2: clearing Tx timestamp hang [ 445.857271] igb 0000:05:00.3: clearing Tx timestamp hang [ 460.833362] igb 0000:05:00.2: clearing Tx timestamp hang [ 477.857491] igb 0000:05:00.3: clearing Tx timestamp hang [ 478.881596] igb 0000:05:00.2: clearing Tx timestamp hang [ 493.857616] igb 0000:05:00.3: clearing Tx timestamp hang [ 494.881715] igb 0000:05:00.2: clearing Tx timestamp hang [ 509.857756] igb 0000:05:00.3: clearing Tx timestamp hang [ 510.882791] igb 0000:05:00.2: clearing Tx timestamp hang [ 525.857893] igb 0000:05:00.3: clearing Tx timestamp hang [ 526.883020] igb 0000:05:00.2: clearing Tx timestamp hang [ 541.858966] igb 0000:05:00.3: clearing Tx timestamp hang [ 544.865996] igb 0000:05:00.2: clearing Tx timestamp hang [ 557.859083] igb 0000:05:00.3: clearing Tx timestamp hang [ 573.858237] igb 0000:05:00.3: clearing Tx timestamp hang [ 574.882314] igb 0000:05:00.2: clearing Tx timestamp hang [ 589.858321] igb 0000:05:00.3: clearing Tx timestamp hang [ 592.866341] igb 0000:05:00.2: clearing Tx timestamp hang [ 607.842513] igb 0000:05:00.3: clearing Tx timestamp hang [ 622.882620] igb 0000:05:00.2: clearing Tx timestamp hang [ 638.882762] igb 0000:05:00.2: clearing Tx timestamp hang [ 639.842743] igb 0000:05:00.3: clearing Tx timestamp hang [ 654.882903] igb 0000:05:00.2: clearing Tx timestamp hang [ 655.842901] igb 0000:05:00.3: clearing Tx timestamp hang [ 670.882991] igb 0000:05:00.2: clearing Tx timestamp hang [ 671.843011] igb 0000:05:00.3: clearing Tx timestamp hang [ 686.883098] igb 0000:05:00.2: clearing Tx timestamp hang [ 687.843104] igb 0000:05:00.3: clearing Tx timestamp hang [ 702.883244] igb 0000:05:00.2: clearing Tx timestamp hang [ 703.843230] igb 0000:05:00.3: clearing Tx timestamp hang And this reboot the order of the interfaces are like this: [root@host ~]# ether-today.sh [ 43.043958] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX [ 43.225144] tg3 0000:02:00.1 eth1: Flow control is on for TX and on for RX [ 43.754791] tg3 0000:02:00.2 eth3: Flow control is on for TX and on for RX [ 43.845568] igb 0000:05:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 43.920034] tg3 0000:02:00.3 eth4: Flow control is on for TX and on for RX [ 44.096569] igb 0000:05:00.1 eth5: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 44.367864] igb 0000:05:00.2 eth6: igb: eth6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 45.131562] igb 0000:05:00.3 eth7: igb: eth7 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [root@host ~]# |
|
A patch that seems to be related to the issue reported here is on the way to the mainline kernel. I've rebuilt kernel-ml by applying this patch and made the set available here: https://elrepo.org/people/toracat/devel/bug1477/ Can you give this kernel a try? |
|
Can try tomorrow, downtime isnt great now. btw, can elrepo use a mailserver that doesnt have 63 spam hits? Aug 30 00:36:05 mail postfix/smtpd[5976]: NOQUEUE: reject: RCPT from omta037.useast.a.cloudfilter.net[44.202.169.36]: 554 5.7.1 Service unavailable; Client host [44.202.169.36] blocked using spam.dnsbl.anonmails.de; Spam received on 2024-08-06 23:16:35. See https://anonmails.de/dnsbl.php?ip=44.202.169.36 Spam hits: 63; from=<bugtracker@elrepo.com> to=<x@y.z> proto=ESMTP helo=<omta037.useast.a.cloudfilter.net> |
|
@iocc btw, can elrepo use a mailserver that doesnt have 63 spam hits? Aug 30 00:36:05 mail postfix/smtpd[5976]: NOQUEUE: reject: RCPT from omta037.useast.a.cloudfilter.net[44.202.169.36]: 554 5.7.1 Service unavailable; Client host [44.202.169.36] blocked using spam.dnsbl.anonmails.de; Spam received on 2024-08-06 23:16:35. See https://anonmails.de/dnsbl.php?ip=44.202.169.36 Spam hits: 63; from=<bugtracker@elrepo.com> to=<x@y.z> proto=ESMTP helo=<omta037.useast.a.cloudfilter.net> Apologies, we have no control over the sending mail server that our hosting company uses to send out email notifications from this bug tracker, nor do we have any control over the blacklists your mail host chooses to use. You should report this as a false positive to your mail provider. Can you ask your mail provider to not use blacklists that block legitimate mail (or at least use them in a scoring system such as SpamAssassin), and/or ask them to whitelist the sending elrepo.com domain to ensure your mail is not blocked. |
|
@toracat I tried but it was impossible to write correct luks password over ilo. It made up chars and removed chars. I gave up. So I guess it failed and did things worse. And todays bingo of ethernet numbering is: [ 52.932550] tg3 0000:02:00.1 eth1: Flow control is on for TX and on for RX [ 52.997915] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX [ 53.323062] tg3 0000:02:00.2 eth2: Flow control is on for TX and on for RX [ 53.484671] igb 0000:05:00.0 eth3: igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 53.558889] tg3 0000:02:00.3 eth4: Flow control is on for TX and on for RX [ 53.819620] igb 0000:05:00.1 eth5: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 53.851636] igb 0000:05:00.2 eth6: igb: eth6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 54.746609] igb 0000:05:00.3 eth7: igb: eth7 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX |
|
@iocc Sorry to hear the patch did not work. You may want to update the kernel to 6.10.7. I see at least one patch that is related to igb. |
|
@pperry I understand its not your fault. It has been like this for a long time, usually things like this fixes itself but in this case it wasnt any improvement so I just wanted to inform you about it. Maybe you are not aware of it. You could tell your hosting company that some of the relays they use are blocked in a few lists: https://multirbl.valli.org/lookup/44.202.169.34.html https://multirbl.valli.org/lookup/44.202.169.36.html https://multirbl.valli.org/lookup/35.89.44.38.html I have whitelisted elrepo.com in my mail server so its "solved" for me now. |
|
@toracat Yeah. But thanks anyway, worth a try. Yep, will try when I can download 6.10.7. |
|
@iocc - thanks for the heads up re spam block list. We are aware it's been an ongoing issue on and off for a long time - par for the course for shared hosting unfortunately and it doesn't appear to have improved since the host has farmed out mail delivery to cloudfilter.net We try to keep the SPF record up to date (although the sending IPs do change from time to time) which you can use to help with whitelisting. @whn: looks like we need to add ip4:35.89.44.32/27 to the SPF record of elrepo.com |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-08-27 13:40 | iocc | New Issue | |
2024-08-27 13:40 | iocc | Status | new => assigned |
2024-08-27 13:40 | iocc | Assigned To | => toracat |
2024-08-27 13:40 | iocc | Tag Attached: 6.10.6 | |
2024-08-27 13:40 | iocc | Tag Attached: igb | |
2024-08-27 14:23 | toracat | Note Added: 0010063 | |
2024-08-27 15:11 | iocc | Note Added: 0010064 | |
2024-08-28 23:45 | toracat | Note Added: 0010066 | |
2024-08-29 18:35 | toracat | Status | assigned => feedback |
2024-08-30 16:17 | iocc | Note Added: 0010072 | |
2024-08-30 16:17 | iocc | Status | feedback => assigned |
2024-08-31 10:57 | pperry | Note Added: 0010075 | |
2024-08-31 10:59 | pperry | Note Edited: 0010075 | |
2024-08-31 15:38 | iocc | Note Added: 0010076 | |
2024-08-31 15:45 | toracat | Note Added: 0010077 | |
2024-08-31 16:53 | iocc | Note Added: 0010078 | |
2024-08-31 16:57 | iocc | Note Added: 0010079 | |
2024-09-01 08:15 | pperry | Note Added: 0010081 |