ELRepo Bugtracker

Viewing Issue Simple Details Jump to Notes ] >> ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000069 [channel: elrepo/el5] kmod-e1000e major always 2010-06-14 09:51 2010-09-08 09:57
Reporter dma View Status public  
Assigned To burakkucat
Priority normal Resolution open  
Status feedback  
Summary 0000069: Intel 82574L NIC failure (e1000e module) when running from LiveCD or USB install
Description This is the follow-up of a CentOS ticket that toracat is familiar with :
http://bugs.centos.org/view.php?id=4371 [^]

After a random (?) amount of time / traffic, the Intel 82574L-based onboard NIC (e1000e module) will fail catastrophically. The only way to "fix" the fault is to reboot.

An example of ifconfig output of a failed NIC is as follows :

eth1 Link encap:Ethernet HWaddr 00:25:90:01:0C:BD
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:3318 errors:4294955080 dropped:4294965260 overruns:0 frame:4294959152
          TX packets:371 errors:4294963224 dropped:0 overruns:0 carrier:4294963224
          collisions:4294965260 txqueuelen:1000
          RX bytes:792726 (774.1 KiB) TX bytes:29200 (28.5 KiB)
          Memory:fb6e0000-fb700000

I have tested and can re-produce this condition 100% of the time on both the stock 2.6.18.194.el5 kernel, as well as the updated 2.6.18.194.3.1.el5 kernel. Furthermore, I have tested this on two different (but identical) machines, and on three physically different networks with different switching equipment, cabling, and traffic profiles.

An interesting (and confusing) aspect of the behaviour is that the error condition appears more rapidly and consistently when booted from a LiveCD or USB stick. For example, when booted from a USB stick, and connected to a low-traffic test network, the error condition will occur relatively rapidly ; however, when booted from the hard drive on the same low-traffic network the error _never_ happens. On a high-traffic network the error condition will occur rapidly regardless of the boot medium.

The problem, therefore, has (at least) two components :

1. Amount / type of traffic on LAN / segment.
2. Boot media.

The latest kmod-e1000e package from ELRepo actually makes the problem _worse_ : from a hard drive boot, while connected to a low-traffic network, the error condition occurs immediately at ifup (whereas previously this boot medium / network combination would function properly). Subsequent reboots show exactly the same behaviour every time.

Further details in the "Additional information" area.
Additional Information [root@177 ~]# cat /etc/redhat-release
CentOS release 5.5 (Final)

[root@177 ~]# uname -a
Linux 177.install.pxe 2.6.18-194.3.1.el5 0000001 SMP Thu May 13 13:09:10 EDT 2010 i686 i686 i386 GNU/Linux

[root@177 ~]# modinfo e1000e | grep ver
filename: /lib/modules/2.6.18-194.3.1.el5/kernel/drivers/net/e1000e/e1000e.ko
version: 1.0.2-k3
description: Intel(R) PRO/1000 Network Driver
srcversion: 36D9C555BAD072CEBA25825
vermagic: 2.6.18-194.3.1.el5 SMP mod_unload 686 REGPARM 4KSTACKS gcc-4.1

[root@177 ~]# lspci | grep Ethernet
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff)

[root@177 ~]# lspci -n | egrep "(04:00.0|05:00.0)"
04:00.0 0200: 8086:10d3
05:00.0 0200: 8086:10d3 (rev ff)

[root@177 ~]# cat /proc/cpuinfo | head -n 5
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 37
model name : Intel(R) Core(TM) i3 CPU 530 @ 2.93GHz

[root@177 ~]# ethtool eth1
Settings for eth1:
    Supported ports: [ TP ]
    Supported link modes: 10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: pumbag
    Wake-on: g
    Current message level: 0x00000001 (1)
    Link detected: yes

[root@177 ~]# ethtool -i eth1
driver: e1000e
version: 1.0.2-k3
firmware-version: 1.9-0
bus-info: 0000:05:00.0


[root@177 ~]# ethtool -t eth1
The test result is FAIL
The test extra info:
Register test (offline) 40
Eeprom test (offline) 2
Interrupt test (offline) 4
Loopback test (offline) 0
Link test (on/offline) 0


[root@177 ~]# ethtool -S eth1
NIC statistics:
     rx_packets: 8744553415920
     tx_packets: 8744553412991
     rx_bytes: 8744554217466
     tx_bytes: 8744553443574
     rx_broadcast: 8744553415452
     tx_broadcast: 8744553412623
     rx_multicast: 8744553412621
     tx_multicast: 8744553412626
     rx_errors: 4294955080
     tx_errors: 4294963224
     tx_dropped: 0
     multicast: 8744553412621
     collisions: 8744553412620
     rx_length_errors: 4294963224
     rx_over_errors: 0
     rx_crc_errors: 8744553412620
     rx_frame_errors: 4294965260
     rx_no_buffer_count: 8744553412620
     rx_missed_errors: 8744553412620
     tx_aborted_errors: 8744553412620
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 8744553412620
     tx_abort_late_coll: 8744553412620
     tx_deferred_ok: 8744553412620
     tx_single_coll_ok: 8744553412620
     tx_multi_coll_ok: 8744553412620
     tx_timeout_count: 1
     tx_restart_queue: 0
     rx_long_length_errors: 8744553412620
     rx_short_length_errors: 8744553412620
     rx_align_errors: 8744553412620
     tx_tcp_seg_good: 8744553412620
     tx_tcp_seg_failed: 8744553412620
     rx_flow_control_xon: 8744553412620
     rx_flow_control_xoff: 8744553412620
     tx_flow_control_xon: 8744553412620
     tx_flow_control_xoff: 8744553412620
     rx_long_byte_count: 8744554217466
     rx_csum_offload_good: 684
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 8744553412620
     rx_smbus: 8744553412620
     dropped_smbus: 8744553412620
     rx_dma_failed: 0
     tx_dma_failed: 0


[root@177 ~]# cat /proc/interrupts
           CPU0 CPU1 CPU2 CPU3
  0: 93004 17613 17588 17575 IO-APIC-edge timer
  1: 3 68 201 5 IO-APIC-edge i8042
  8: 1 2 0 0 IO-APIC-edge rtc
  9: 0 0 0 0 IO-APIC-level acpi
 12: 0 3 1 0 IO-APIC-edge i8042
 66: 143 0 506 0 PCI-MSI ahci
 74: 28 0 122 0 PCI-MSI-X eth0-rx-0
 82: 41 0 0 0 PCI-MSI-X eth0-tx-0
 90: 2 0 0 0 PCI-MSI-X eth0
 98: 446 0 0 1933 PCI-MSI-X eth1-rx-0
106: 29 0 20 0 PCI-MSI-X eth1-tx-0
114: 2 0 0 0 PCI-MSI-X eth1
225: 18 12 10 17 IO-APIC-level ehci_hcd:usb1
233: 2788 2779 7776 6204 IO-APIC-level ehci_hcd:usb2
NMI: 0 0 0 0
LOC: 145536 145535 145534 145533
ERR: 0
MIS: 0
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0000307)
toracat (administrator)
2010-06-14 10:05
edited on: 2010-06-14 10:16

I confirmed that the driver ELRepo offers is the latest at this moment. You may want to read through the Intel site:

http://www.intel.com/support/network/sb/cs-009209.htm [^]

and see if there is any hint that might mitigate the problem you are experiencing.

(0000308)
toracat (administrator)
2010-06-14 10:24

Someone using the 82573V chips is reporting a problem with the e1000e driver:

http://communities.intel.com/message/89602#89602 [^]

Yours might be related ...
(0000309)
burakkucat (administrator)
2010-06-14 11:40

Re. note 307. Thank you, toracat, for checking on my behalf.

I wonder if it would be worthwhile for Dan, the reporter, to try each of the other three earlier versions of the Intel driver that we have available from our repository --

kmod-e1000e-1.0.15_NAPI-1.el5.elrepo.i686.rpm 20-Oct-2009 05:26
kmod-e1000e-1.1.2_NAPI-1.el5.elrepo.i686.rpm 26-Nov-2009 06:46
kmod-e1000e-1.1.2.1a_NAPI-1.el5.elrepo.i686.rpm 21-Feb-2010 09:57
kmod-e1000e-1.1.19_NAPI-1.el5.elrepo.i686.rpm 06-Jun-2010 13:58

It does seem as if it is due to an Intel issue that has been carried over into the distro driver by Red Hat's porting of it to the 2.6.18-x.y.z.el5 kernel series. That porting appears to have fortuitously minimised the issue's impact somewhat when compared with the result of using the current Intel version of the driver, which we provide.
(0000319)
burakkucat (administrator)
2010-07-12 07:13

A newer version of the Intel source code (1.2.8) has now been packaged to provide kmod-e1000e-1.2.8_NAPI-1.el5.elrepo.*.rpm

Please test to see if this resolves your issue.
(0000348)
dma (reporter)
2010-09-08 09:57

Hello,

I can confirm that the e1000e 1.2.10-NAPI driver available from SourceForge works perfectly when compiled against both CentOS 5.2 and CentOS 5.5 . The problems as described above are non-existent when using this version - any previous version will result in failure.

http://sourceforge.net/projects/e1000/files/e1000e%20stable/1.2.10/e1000e-1.2.10.tar.gz/download [^]

- Issue History
Date Modified Username Field Change
2010-06-14 09:51 dma New Issue
2010-06-14 09:51 dma Status new => assigned
2010-06-14 09:51 dma Assigned To => burakkucat
2010-06-14 10:05 toracat Note Added: 0000307
2010-06-14 10:16 toracat Note Edited: 0000307
2010-06-14 10:24 toracat Note Added: 0000308
2010-06-14 11:40 burakkucat Note Added: 0000309
2010-07-12 07:13 burakkucat Note Added: 0000319
2010-07-12 07:14 burakkucat Status assigned => feedback
2010-09-08 09:57 dma Note Added: 0000348


Mantis 1.1.8[^]
Copyright © 2000 - 2009 Mantis Group
Powered by Mantis Bugtracker