View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001275 | channel: elrepo/el9 | kmod-mlx4 | public | 2022-10-05 14:36 | 2022-11-15 05:52 |
Reporter | torkil | Assigned To | pperry | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Summary | 0001275: Kmod-mlx4 for RHEL 9 / Mellanox Technologies MT25408A0-FCC-QI ConnectX | ||||
Description | Hi I got the following from Red Hat: " Unfortunately the "Mellanox Technologies MT25408A0-FCC-QI ConnectX" card is no longer supported as of RHEL8: # egrep Mellanox lspci 08:00.0 Network controller [0280]: Mellanox Technologies MT25408A0-FCC-QI ConnectX, Dual Port 40Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s In... (rev b0) Subsystem: Mellanox Technologies HP InfiniBand 4X QDR CX-2 PCI-e G2 Dual Port HCA [15b3:0021] # egrep :08: sos_commands/kernel/dmesg | head -n1 [ 0.351192] pci 0000:08:00.0: [15b3:673c] type 00 class 0x028000 It's supported on RHEL7: [root@rhel7 ~]# modinfo mlx4_core | grep -i 15b3d | grep -i 673c alias: pci:v000015B3d0000673Csv*sd*bc*sc*i* [root@rhel7 ~]# uname -a Linux rhel7 3.10.0-1160.76.1.el7.x86_64 #1 SMP Tue Jul 26 14:15:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux But starting from RHEL8 the adapter has been removed from the mlx4_core driver: [root@rhel8 ~]# modinfo mlx4_core | grep -i 15b3 | grep -i 673c [root@rhel8 ~]# uname -a Linux rhel8 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Sat Aug 27 02:44:20 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux [root@rhel9 ~]# modinfo mlx4_core | grep -i 15b3 | grep -i 673c [root@rhel9 ~]# uname -a Linux rhel9 5.14.0-70.26.1.el9_0.x86_64 #1 SMP PREEMPT Fri Sep 2 16:07:40 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux " I've used kmod-mlx4 on RHEL 8 for the cards, can that be built for RHEL 9 also? Thanks | ||||
Tags | No tags attached. | ||||
|
Acknowledged. In the meantime, you may want to (test-)install kernel-ml for el9 which has the mlx4 driver enabled. |
|
Thanks, works like a charm with kernel-ml |
|
That's great news. We will get to the kmod package as soon as we are able. |
|
The following package has been built for rhel9 and uploaded to the main elrepo repository: kmod-mlx4-4.0-1.el9_0.elrepo.x86_64.rpm It should be available on our mirror sites to test shortly. Please note - our kmod packages are only compatible with the RHEL distro kernel. They do not work with our own kernel-ml packages. To test, please install the kmod package and reboot to a RHEL9 distro kernel (not our kernel-ml) and test. The device should now work as expected with the distro kernel(s). Thanks |
|
Hi Wow, that was fast =) It doesn't quite work though. I have this on dmesg: " [ 10.854188] mlx4_core 0000:08:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [ 11.167136] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0 [ 11.168241] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0 [ 11.168244] <mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0 [ 11.202932] infiniband mlx4_0: Couldn't register device with driver model [ 11.228829] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0 " [root@g79 ~]# uname -a Linux g79.drcmr 5.14.0-70.26.1.el9_0.x86_64 #1 SMP PREEMPT Fri Sep 2 16:07:40 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux Host is freshly rebuilt with distro kernel. |
|
Thanks for the feedback. We are going to have to do a bit more work to fix this, I think. Originally, on RHEL8, Red Hat simply didn't enable support for older hardware, so all we had to do was rebuild the RHEL drivers/net/ethernet/mellanox/mlx4 source code with -DCONFIG_MLX4_CORE_GEN2 to switch support back on for Gen2 cards, and that is what the first version above did. This bug looks like it may be the issue you have reported: https://bugzilla.redhat.com/show_bug.cgi?id=2014094 and the patch is here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/infiniband/hw/mlx4?h=v5.15.72&id=0bccc44a54e8d68be5ed02b7985a869cc2df6444 The driver is actually split into two parts - an infiniband part and ethernet part, and the issue we have here is that the bug and patch applies to the infiniband module: drivers/infiniband/hw/mlx4/main.c I think we will need to backport the patch to the /drivers/infiniband/hw/mlx4/mlx4_ib.ko module, and also build and ship this module in our kmod package (I didn't originally realise this driver was split into two disparate modules). It's late now, but I can take a look at that tomorrow for you, and hopefully get a v2 package out for you to test. I will update here as soon as I have something available for you. |
|
Sounds good, thanks. Mvh. Torkil |
|
An updated package has been released to the main repository: kmod-mlx4-4.0-2.el9_0.elrepo.x86_64.rpm As discussed above, I have also built the mlx4_ib infiniband module, and have backported the upstream patch: mlx4: Do not fail the registration on port stats Hoping that will now have fixed the issues for you. If you could please test and provide feedback, that would be great. Many thanks. |
|
Seems to work, if a little noisy: " [Fri Oct 7 21:44:22 2022] mlx4_core: Mellanox ConnectX core driver v4.0-0 [Fri Oct 7 21:44:22 2022] ------------[ cut here ]------------ [Fri Oct 7 21:44:22 2022] WARNING: CPU: 0 PID: 289 at net/core/devlink.c:10134 devlink_param_register+0x1b3/0x1d0 q[Fri Oct 7 21:44:22 2022] Modules linked in: mlx4_core(OE+) tls rfkill ib_uverbs ib_core sunrpc intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support ipmi_ssif mgag200 sb_edac drm_kms_helper x86_pkg_temp_thermal intel_powerclamp syscopyarea sysfillrect coretemp sysimgblt fb_sys_fops rapl intel_cstate cec intel_uncore drm pcspkr acpi_ipmi ipmi_si ipmi_devintf lpc_ich acpi_power_meter ipmi_msghandler hpilo ioatdma fuse xfs libcrc32c sd_mod t10_pi sg crct10dif_pclmul ahci crc32_pclmul crc32c_intel libahci mpt3sas ghash_clmulni_intel libata igb serio_raw i2c_algo_bit hpwdt dca raid_class scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mlx4_core] [Fri Oct 7 21:44:22 2022] CPU: 0 PID: 289 Comm: kworker/0:8 Kdump: loaded Tainted: G W OE --------- --- 5.14.0-70.26.1.el9_0.x86_64 #1 [Fri Oct 7 21:44:22 2022] Hardware name: HP ProLiant SL230s Gen8 /, BIOS P75 05/24/2019 [Fri Oct 7 21:44:22 2022] Workqueue: events work_for_cpu_fn [Fri Oct 7 21:44:22 2022] RIP: 0010:devlink_param_register+0x1b3/0x1d0 [Fri Oct 7 21:44:22 2022] Code: ff ff ff 0f 0b 49 8b 6c 24 08 e9 05 ff ff ff 0f 0b e9 54 ff ff ff 0f 0b e9 2b ff ff ff 49 83 7c 24 28 00 75 a4 e9 40 ff ff ff <0f> 0b 49 8b 6c 24 08 e9 de fe ff ff 0f 0b e9 68 fe ff ff b8 f4 ff [Fri Oct 7 21:44:22 2022] RSP: 0018:ffffa34847cf7d98 EFLAGS: 00010246 [Fri Oct 7 21:44:22 2022] RAX: 000000000000000e RBX: ffffffffc08f3968 RCX: 0000000000000001 [Fri Oct 7 21:44:22 2022] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8b7c86b91c00 [Fri Oct 7 21:44:22 2022] RBP: ffffffffc090ec91 R08: 0000000000000230 R09: 0000000004000000 [Fri Oct 7 21:44:22 2022] R10: 0000000000000000 R11: 0000000000000010 R12: ffffffffc08f3968 [Fri Oct 7 21:44:22 2022] R13: ffff8b7c86730000 R14: 0000000000000005 R15: ffff8b8b7f62e90d [Fri Oct 7 21:44:22 2022] FS: 0000000000000000(0000) GS:ffff8b8b7f600000(0000) knlGS:0000000000000000 [Fri Oct 7 21:44:22 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Oct 7 21:44:22 2022] CR2: 000055a3ff2f84a0 CR3: 000000118166c001 CR4: 00000000000606f0 [Fri Oct 7 21:44:22 2022] Call Trace: [Fri Oct 7 21:44:22 2022] devlink_params_register+0x50/0xb0 [Fri Oct 7 21:44:22 2022] mlx4_init_one+0x111/0x2a0 [mlx4_core] [Fri Oct 7 21:44:22 2022] local_pci_probe+0x45/0x80 [Fri Oct 7 21:44:22 2022] work_for_cpu_fn+0x16/0x20 [Fri Oct 7 21:44:22 2022] process_one_work+0x1e8/0x3c0 [Fri Oct 7 21:44:22 2022] worker_thread+0x1da/0x3b0 [Fri Oct 7 21:44:22 2022] ? rescuer_thread+0x370/0x370 [Fri Oct 7 21:44:22 2022] kthread+0x149/0x170 [Fri Oct 7 21:44:22 2022] ? set_kthread_struct+0x40/0x40 [Fri Oct 7 21:44:22 2022] ret_from_fork+0x22/0x30 [Fri Oct 7 21:44:22 2022] ---[ end trace 10dcc546735bafc5 ]--- [Fri Oct 7 21:44:22 2022] mlx4_core: Initializing 0000:08:00.0 [Fri Oct 7 21:44:25 2022] mlx4_core 0000:08:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link) [Fri Oct 7 21:44:25 2022] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0 [Fri Oct 7 21:44:25 2022] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0 [Fri Oct 7 21:44:25 2022] <mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0 [Fri Oct 7 21:44:25 2022] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0 [Fri Oct 7 21:44:25 2022] mlx4_core 0000:08:00.0 ib1: "NetworkManager" wants to know my dev_id. Should it look at dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more info. [Fri Oct 7 21:44:25 2022] mlx4_core 0000:08:00.0 ibs2d1: renamed from ib1 [Fri Oct 7 21:44:25 2022] Loading iSCSI transport class v2.0-870. [Fri Oct 7 21:44:25 2022] mlx4_core 0000:08:00.0 ibs2: renamed from ib0 [Fri Oct 7 21:44:25 2022] iscsi: registered transport (iser) [Fri Oct 7 21:44:25 2022] Rounding down aligned max_sectors from 4294967295 to 4294967288 [Fri Oct 7 21:44:25 2022] db_root: cannot open: /etc/target [Fri Oct 7 21:44:25 2022] RPC: Registered rdma transport module. [Fri Oct 7 21:44:25 2022] RPC: Registered rdma backchannel transport module. [Fri Oct 7 21:44:44 2022] IPv6: ADDRCONF(NETDEV_CHANGE): ibs2: link becomes ready " Thanks a lot =) Mvh. Torkil |
|
Thanks for the feedback. I'll mark as resolved for now - if you get any issues or anything actionable we can improve, please do not hesitate to let us know. |
|
The patch in #8695 has now been applied in RHEL 9.1. |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-10-05 14:36 | torkil | New Issue | |
2022-10-05 14:36 | torkil | Status | new => assigned |
2022-10-05 14:36 | torkil | Assigned To | => toracat |
2022-10-05 14:43 | toracat | Note Added: 0008687 | |
2022-10-05 14:44 | toracat | Assigned To | toracat => pperry |
2022-10-05 14:48 | burakkucat | Project | channel: kernel/el9 => channel: elrepo/el9 |
2022-10-05 14:48 | burakkucat | Category | --kernel--request-for-enhancement-- => General |
2022-10-05 14:50 | burakkucat | Category | General => --elrepo--request-for-enhancement-- |
2022-10-05 16:03 | torkil | Note Added: 0008688 | |
2022-10-05 18:09 | toracat | Note Added: 0008689 | |
2022-10-05 18:48 | burakkucat | Category | --elrepo--request-for-enhancement-- => kmod-mlx4 |
2022-10-06 08:47 | pperry | Note Added: 0008692 | |
2022-10-06 08:47 | pperry | Status | assigned => feedback |
2022-10-06 08:48 | pperry | Note Edited: 0008692 | |
2022-10-06 14:50 | torkil | Note Added: 0008693 | |
2022-10-06 14:50 | torkil | Status | feedback => assigned |
2022-10-06 18:14 | pperry | Note Added: 0008695 | |
2022-10-07 04:38 | torkil | Note Added: 0008696 | |
2022-10-07 12:10 | pperry | Note Added: 0008697 | |
2022-10-07 12:10 | pperry | Status | assigned => feedback |
2022-10-07 15:49 | torkil | Note Added: 0008700 | |
2022-10-07 15:49 | torkil | Status | feedback => assigned |
2022-10-07 17:01 | pperry | Note Added: 0008701 | |
2022-10-07 17:01 | pperry | Status | assigned => resolved |
2022-10-07 17:01 | pperry | Resolution | open => fixed |
2022-11-15 05:52 | pperry | Note Added: 0008738 |