View Issue Details

IDProjectCategoryView StatusLast Update
0001477channel: kernel/el9kernel-mlpublic2024-09-01 09:52
Reporteriocc Assigned Totoracat  
PrioritynormalSeverityminorReproducibilityalways
Status assignedResolutionopen 
Platformx86_64OSAlmaLinuxOS Version9.4
Summary0001477: igb clearing Tx timestamp hang loop
DescriptionPreviously I was using 6.2.8 for over a year without any issues.

I got pause true on both rx and tx in NetworkManager.

[root@anyhostname system-connections]# grep pause *
eth0.nmconnection:pause-autoneg=false
eth0.nmconnection:pause-rx=true
eth0.nmconnection:pause-tx=true
eth1.nmconnection:pause-autoneg=false
eth1.nmconnection:pause-rx=true
eth1.nmconnection:pause-tx=true
eth2.nmconnection:pause-autoneg=false
eth2.nmconnection:pause-rx=true
eth2.nmconnection:pause-tx=true
eth3.nmconnection:pause-autoneg=false
eth3.nmconnection:pause-rx=true
eth3.nmconnection:pause-tx=true
eth4.nmconnection:pause-autoneg=false
eth4.nmconnection:pause-rx=true
eth4.nmconnection:pause-tx=true
eth5.nmconnection:pause-autoneg=false
eth5.nmconnection:pause-rx=true
eth5.nmconnection:pause-tx=true
eth6.nmconnection:pause-autoneg=false
eth6.nmconnection:pause-rx=true
eth6.nmconnection:pause-tx=true
eth7.nmconnection:pause-autoneg=false
eth7.nmconnection:pause-rx=true
eth7.nmconnection:pause-tx=true

However, it didnt fully work on the tg3 card so in have this in rc.local:

# flowcontrol eth4->eth7 rx on (NetworkManager doesnt quite work)
ethtool -A eth4 tx on
sleep 1
ethtool -A eth5 tx on
sleep 1
ethtool -A eth6 tx on
sleep 1
ethtool -A eth7 tx on

But 6.10.6 isnt very happy about tx.

[ 70.421611] igb 0000:05:00.2: clearing Tx timestamp hang
[ 72.469626] igb 0000:05:00.3: clearing Tx timestamp hang
[ 92.437869] igb 0000:05:00.2: clearing Tx timestamp hang
[ 100.438906] igb 0000:05:00.3: clearing Tx timestamp hang
(...)
[ 2380.458841] igb 0000:05:00.2: clearing Tx timestamp hang
[ 2398.506904] igb 0000:05:00.2: clearing Tx timestamp hang
[ 2412.459922] igb 0000:05:00.3: clearing Tx timestamp hang
[ 2414.507005] igb 0000:05:00.2: clearing Tx timestamp hang
(...)
[ 3608.495712] igb 0000:05:00.3: clearing Tx timestamp hang
[ 3610.479784] igb 0000:05:00.2: clearing Tx timestamp hang
[ 3624.496830] igb 0000:05:00.3: clearing Tx timestamp hang
[ 3642.480044] igb 0000:05:00.2: clearing Tx timestamp hang

But why is it only on 2 of 4 ethernet ports?

[ 45.688614] igb 0000:05:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 46.088046] igb 0000:05:00.1 eth5: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 46.875900] igb 0000:05:00.3 eth7: igb: eth7 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 46.911638] igb 0000:05:00.2 eth6: igb: eth6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

FYI tg3 ports:

[ 44.987584] tg3 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex
[ 44.987954] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX
[ 44.988255] tg3 0000:02:00.0 eth0: EEE is disabled
[ 45.210848] tg3 0000:02:00.1 eth1: Link is up at 1000 Mbps, full duplex
[ 45.211215] tg3 0000:02:00.1 eth1: Flow control is on for TX and on for RX
[ 45.211517] tg3 0000:02:00.1 eth1: EEE is disabled
[ 45.508038] tg3 0000:02:00.2 eth3: Link is up at 1000 Mbps, full duplex
[ 45.508411] tg3 0000:02:00.2 eth3: Flow control is on for TX and on for RX
[ 45.508723] tg3 0000:02:00.2 eth3: EEE is disabled
[ 45.848260] tg3 0000:02:00.3 eth4: Link is up at 1000 Mbps, full duplex
[ 45.848634] tg3 0000:02:00.3 eth4: Flow control is on for TX and on for RX
[ 45.849574] tg3 0000:02:00.3 eth4: EEE is disabled

Also, its in a quite strange order.

I boot the kernel arguments that are related to this: net.ifnames=0 biosdevname=0 nopat
Tags6.10.6, igb

Activities

toracat

2024-08-27 14:23

administrator   ~0010063

To see if the fix is already in the queue for the next version of kernel-ml, can you do a test-install of kernel-ml-6.11.0-0.rc5.el9.elrepo.x86_64.rpm? The kernel set is available from:

https://elrepo.org/people/toracat/devel/kernel-ml/el9/x86_64/RPMS/

iocc

2024-08-27 15:11

reporter   ~0010064

The problem still exist but MUCH less.

[ 74.847542] igb 0000:05:00.2: clearing Tx timestamp hang
[ 77.854515] igb 0000:05:00.3: clearing Tx timestamp hang
[ 96.863688] igb 0000:05:00.2: clearing Tx timestamp hang
[ 105.886727] igb 0000:05:00.3: clearing Tx timestamp hang
[ 114.847854] igb 0000:05:00.2: clearing Tx timestamp hang
[ 136.862950] igb 0000:05:00.2: clearing Tx timestamp hang
[ 137.887960] igb 0000:05:00.3: clearing Tx timestamp hang
[ 152.864067] igb 0000:05:00.2: clearing Tx timestamp hang
[ 153.887075] igb 0000:05:00.3: clearing Tx timestamp hang
[ 169.887204] igb 0000:05:00.3: clearing Tx timestamp hang
[ 170.848271] igb 0000:05:00.2: clearing Tx timestamp hang
[ 185.887374] igb 0000:05:00.3: clearing Tx timestamp hang
[ 186.847375] igb 0000:05:00.2: clearing Tx timestamp hang
[ 202.847497] igb 0000:05:00.2: clearing Tx timestamp hang
[ 203.871519] igb 0000:05:00.3: clearing Tx timestamp hang
[ 218.847612] igb 0000:05:00.2: clearing Tx timestamp hang
[ 234.847743] igb 0000:05:00.2: clearing Tx timestamp hang
[ 235.872770] igb 0000:05:00.3: clearing Tx timestamp hang
[ 250.848845] igb 0000:05:00.2: clearing Tx timestamp hang
[ 266.848002] igb 0000:05:00.2: clearing Tx timestamp hang
[ 267.871990] igb 0000:05:00.3: clearing Tx timestamp hang
[ 282.848082] igb 0000:05:00.2: clearing Tx timestamp hang
[ 298.848232] igb 0000:05:00.2: clearing Tx timestamp hang
[ 299.872228] igb 0000:05:00.3: clearing Tx timestamp hang
[ 314.848358] igb 0000:05:00.2: clearing Tx timestamp hang
[ 330.848449] igb 0000:05:00.2: clearing Tx timestamp hang
[ 331.873451] igb 0000:05:00.3: clearing Tx timestamp hang
[ 347.872588] igb 0000:05:00.3: clearing Tx timestamp hang
[ 348.832532] igb 0000:05:00.2: clearing Tx timestamp hang
[ 363.873701] igb 0000:05:00.3: clearing Tx timestamp hang
[ 364.832716] igb 0000:05:00.2: clearing Tx timestamp hang
[ 379.872822] igb 0000:05:00.3: clearing Tx timestamp hang
[ 382.880838] igb 0000:05:00.2: clearing Tx timestamp hang
[ 395.872926] igb 0000:05:00.3: clearing Tx timestamp hang
[ 412.833003] igb 0000:05:00.2: clearing Tx timestamp hang
[ 413.857014] igb 0000:05:00.3: clearing Tx timestamp hang
[ 430.881236] igb 0000:05:00.2: clearing Tx timestamp hang
[ 445.857271] igb 0000:05:00.3: clearing Tx timestamp hang
[ 460.833362] igb 0000:05:00.2: clearing Tx timestamp hang
[ 477.857491] igb 0000:05:00.3: clearing Tx timestamp hang
[ 478.881596] igb 0000:05:00.2: clearing Tx timestamp hang
[ 493.857616] igb 0000:05:00.3: clearing Tx timestamp hang
[ 494.881715] igb 0000:05:00.2: clearing Tx timestamp hang
[ 509.857756] igb 0000:05:00.3: clearing Tx timestamp hang
[ 510.882791] igb 0000:05:00.2: clearing Tx timestamp hang
[ 525.857893] igb 0000:05:00.3: clearing Tx timestamp hang
[ 526.883020] igb 0000:05:00.2: clearing Tx timestamp hang
[ 541.858966] igb 0000:05:00.3: clearing Tx timestamp hang
[ 544.865996] igb 0000:05:00.2: clearing Tx timestamp hang
[ 557.859083] igb 0000:05:00.3: clearing Tx timestamp hang
[ 573.858237] igb 0000:05:00.3: clearing Tx timestamp hang
[ 574.882314] igb 0000:05:00.2: clearing Tx timestamp hang
[ 589.858321] igb 0000:05:00.3: clearing Tx timestamp hang
[ 592.866341] igb 0000:05:00.2: clearing Tx timestamp hang
[ 607.842513] igb 0000:05:00.3: clearing Tx timestamp hang
[ 622.882620] igb 0000:05:00.2: clearing Tx timestamp hang
[ 638.882762] igb 0000:05:00.2: clearing Tx timestamp hang
[ 639.842743] igb 0000:05:00.3: clearing Tx timestamp hang
[ 654.882903] igb 0000:05:00.2: clearing Tx timestamp hang
[ 655.842901] igb 0000:05:00.3: clearing Tx timestamp hang
[ 670.882991] igb 0000:05:00.2: clearing Tx timestamp hang
[ 671.843011] igb 0000:05:00.3: clearing Tx timestamp hang
[ 686.883098] igb 0000:05:00.2: clearing Tx timestamp hang
[ 687.843104] igb 0000:05:00.3: clearing Tx timestamp hang
[ 702.883244] igb 0000:05:00.2: clearing Tx timestamp hang
[ 703.843230] igb 0000:05:00.3: clearing Tx timestamp hang

And this reboot the order of the interfaces are like this:

[root@host ~]# ether-today.sh
[ 43.043958] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX
[ 43.225144] tg3 0000:02:00.1 eth1: Flow control is on for TX and on for RX
[ 43.754791] tg3 0000:02:00.2 eth3: Flow control is on for TX and on for RX
[ 43.845568] igb 0000:05:00.0 eth2: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 43.920034] tg3 0000:02:00.3 eth4: Flow control is on for TX and on for RX
[ 44.096569] igb 0000:05:00.1 eth5: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 44.367864] igb 0000:05:00.2 eth6: igb: eth6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 45.131562] igb 0000:05:00.3 eth7: igb: eth7 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[root@host ~]#

toracat

2024-08-28 23:45

administrator   ~0010066

A patch that seems to be related to the issue reported here is on the way to the mainline kernel. I've rebuilt kernel-ml by applying this patch and made the set available here:

https://elrepo.org/people/toracat/devel/bug1477/

Can you give this kernel a try?

iocc

2024-08-30 16:17

reporter   ~0010072

Can try tomorrow, downtime isnt great now.

btw, can elrepo use a mailserver that doesnt have 63 spam hits?

Aug 30 00:36:05 mail postfix/smtpd[5976]: NOQUEUE: reject: RCPT from omta037.useast.a.cloudfilter.net[44.202.169.36]: 554 5.7.1 Service unavailable; Client host [44.202.169.36] blocked using spam.dnsbl.anonmails.de; Spam received on 2024-08-06 23:16:35. See https://anonmails.de/dnsbl.php?ip=44.202.169.36 Spam hits: 63; from=<bugtracker@elrepo.com> to=<x@y.z> proto=ESMTP helo=<omta037.useast.a.cloudfilter.net>

pperry

2024-08-31 10:57

administrator   ~0010075

Last edited: 2024-08-31 10:59

@iocc

btw, can elrepo use a mailserver that doesnt have 63 spam hits?

Aug 30 00:36:05 mail postfix/smtpd[5976]: NOQUEUE: reject: RCPT from omta037.useast.a.cloudfilter.net[44.202.169.36]: 554 5.7.1 Service unavailable; Client host [44.202.169.36] blocked using spam.dnsbl.anonmails.de; Spam received on 2024-08-06 23:16:35. See https://anonmails.de/dnsbl.php?ip=44.202.169.36 Spam hits: 63; from=<bugtracker@elrepo.com> to=<x@y.z> proto=ESMTP helo=<omta037.useast.a.cloudfilter.net>

Apologies, we have no control over the sending mail server that our hosting company uses to send out email notifications from this bug tracker, nor do we have any control over the blacklists your mail host chooses to use. You should report this as a false positive to your mail provider. Can you ask your mail provider to not use blacklists that block legitimate mail (or at least use them in a scoring system such as SpamAssassin), and/or ask them to whitelist the sending elrepo.com domain to ensure your mail is not blocked.

iocc

2024-08-31 15:38

reporter   ~0010076

@toracat

I tried but it was impossible to write correct luks password over ilo.
It made up chars and removed chars. I gave up. So I guess it failed
and did things worse.

And todays bingo of ethernet numbering is:

[ 52.932550] tg3 0000:02:00.1 eth1: Flow control is on for TX and on for RX
[ 52.997915] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX
[ 53.323062] tg3 0000:02:00.2 eth2: Flow control is on for TX and on for RX
[ 53.484671] igb 0000:05:00.0 eth3: igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 53.558889] tg3 0000:02:00.3 eth4: Flow control is on for TX and on for RX
[ 53.819620] igb 0000:05:00.1 eth5: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 53.851636] igb 0000:05:00.2 eth6: igb: eth6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 54.746609] igb 0000:05:00.3 eth7: igb: eth7 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX

toracat

2024-08-31 15:45

administrator   ~0010077

@iocc

Sorry to hear the patch did not work.

You may want to update the kernel to 6.10.7. I see at least one patch that is related to igb.

iocc

2024-08-31 16:53

reporter   ~0010078

@pperry

I understand its not your fault.

It has been like this for a long time, usually things like this fixes itself
but in this case it wasnt any improvement so I just wanted to inform you
about it. Maybe you are not aware of it.

You could tell your hosting company that some of the relays they use are
blocked in a few lists:

https://multirbl.valli.org/lookup/44.202.169.34.html
https://multirbl.valli.org/lookup/44.202.169.36.html
https://multirbl.valli.org/lookup/35.89.44.38.html

I have whitelisted elrepo.com in my mail server so its "solved" for me now.

iocc

2024-08-31 16:57

reporter   ~0010079

@toracat

Yeah. But thanks anyway, worth a try.

Yep, will try when I can download 6.10.7.

pperry

2024-09-01 08:15

administrator   ~0010081

@iocc - thanks for the heads up re spam block list. We are aware it's been an ongoing issue on and off for a long time - par for the course for shared hosting unfortunately and it doesn't appear to have improved since the host has farmed out mail delivery to cloudfilter.net

We try to keep the SPF record up to date (although the sending IPs do change from time to time) which you can use to help with whitelisting.

@whn: looks like we need to add ip4:35.89.44.32/27 to the SPF record of elrepo.com

Issue History

Date Modified Username Field Change
2024-08-27 13:40 iocc New Issue
2024-08-27 13:40 iocc Status new => assigned
2024-08-27 13:40 iocc Assigned To => toracat
2024-08-27 13:40 iocc Tag Attached: 6.10.6
2024-08-27 13:40 iocc Tag Attached: igb
2024-08-27 14:23 toracat Note Added: 0010063
2024-08-27 15:11 iocc Note Added: 0010064
2024-08-28 23:45 toracat Note Added: 0010066
2024-08-29 18:35 toracat Status assigned => feedback
2024-08-30 16:17 iocc Note Added: 0010072
2024-08-30 16:17 iocc Status feedback => assigned
2024-08-31 10:57 pperry Note Added: 0010075
2024-08-31 10:59 pperry Note Edited: 0010075
2024-08-31 15:38 iocc Note Added: 0010076
2024-08-31 15:45 toracat Note Added: 0010077
2024-08-31 16:53 iocc Note Added: 0010078
2024-08-31 16:57 iocc Note Added: 0010079
2024-09-01 08:15 pperry Note Added: 0010081