View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001253 | channel: elrepo/el7 | kmod-drbd90 | public | 2022-08-17 10:58 | 2023-07-18 09:07 |
Reporter | Jaybus | Assigned To | toracat | ||
Priority | high | Severity | major | Reproducibility | have not tried |
Status | assigned | Resolution | open | ||
Platform | x86_64 | OS | Centos | OS Version | 7.9.2009 |
Summary | 0001253: I/O error in 9.1.5 through 9.1.8 when using md RAID10 storage | ||||
Description | Sorry for the late report. I reported this on the Linbit drbd_users list months ago, but the situation is still the same with the latest 9.1.8 build. Centos 7.9.2009 kernel 3.10.0-1160.71.1.el7.x86_64 kmod-drbd90-9.1.5-1.el7_9.elrepo.x86_64 package from elrepo drbd90-utils-9.21.1-1.el7.elrepo.x86_64 from elrepo On both cluster nodes, the drbd device is using a LVM logical volume as backing storage, where the volume group has a single PV. On one of the nodes, cnode1, the PV is a mdraid RAID 1 device. The other node, cnode3, is the exact same setup except that the PV is a mdraid RAID10 device. The node using a RAID10 PV is logging the following errors with 9.1.5 or above: Jan 19 09:36:33 cnode3 kernel: md/raid10:md125: make_request bug: can't convert block across chunks or bigger than 512k 939791352 48 Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: disk( UpToDate -> Failed ) Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: Local IO failed in drbd_request_endio. Detaching... Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: local WRITE IO error sector 310643704+96 on dm-1 Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: sending new current UUID: 5A8A36BB73AD216D Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: disk( Failed -> Diskless ) Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: Should have called drbd_al_complete_io(, 310642696, 516096), but my Disk seems to have failed Reverting to 9.1.4 using elrepo package kmod-drbd90-9.1.4-1.el7_9.elrepo.x86_64 restores normal functionality (after recovery) when using the same kernel and same mdraid and LVM modules. | ||||
Steps To Reproduce | Install kmod_drbd90-9.1.5 when using a LVM LV as the DRBD device backing storage, where a PV in the volume group is a mdraid RAID10 device. | ||||
Tags | No tags attached. | ||||
Reported upstream | |||||
|
@Jaybus Thank you for reporting this issue. Could you give us a link to your post on the drbd list? Does the same problem exist in el8 or el9 as well? |
|
I see it now: https://github.com/LINBIT/drbd/issues/26 |
|
@Jaybus kmod-drbd90 is now available in 2 versions. Can you please test either (or both) package(s) ? elrepo main repo: kmod-drbd90-9.1.12-1.el7_9.elrepo.x86_64.rpm elrepo-testing repo: kmod-drbd90-9.2.1-1.el7_9.elrepo.x86_64.rpm To install the v9.2.1 package, run: sudo yum --enablerepo=elrepo-testing install kmod-drbd90 |
|
I tried kmod-drbd90-9.1.12-1.el7_9.elrepo.x86_64.rpm and get the same error. See below log excerpt. Again, this is only when the backing device is an LVM2 PV that is a md raid10 device. A LVM2 PV that is a md raid1 device works just fine. I'm not set up to test any other raid levels. Dec 20 11:32:52 cnode2 kernel: md/raid10:md127: make_request bug: can't convert block across chunks or bigger than 256k 448794880 132 Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( UpToDate -> Failed ) Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: Local IO failed in drbd_request_endio. Detaching... Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: local READ IO error sector 29362432+264 on ffff927179e81040 Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: sending new current UUID: AC56426AC1A85BE7 Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( Failed -> Diskless ) Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: sending new current UUID: 762510B9B75C5695 |
|
@Jaybus Thanks for reporting back. Perhaps, you could report the issue to Linbit? Hopefully the drbd developers may be able to see what the problem is. When an updated version (or a test version) is out, we'd be willing to build a kmod package for that. |
|
There is already a bug report with Linbit/drbd on github at https://github.com/LINBIT/drbd/issues/26. Issue #26 titled "Bug in drbd 9.1.5 on CentOS 7 #26" from Feb. 2022. I added an update to that issue noting that it persists in 9.1.12 and giving device info. |
|
Thanks for update. Let's keep an eye on the development there. |
|
@Jaybus The final release of drbd-9.1.13 has just been announced. I have built kmod-drbd90-9.1.13-1.el7_9.elrepo.x86_64.rpm. It will be available from our mirror sites shortly. Please test if you are able. |
|
Sorry for the delay. I installed kmod-drbd90-9.1.13-1.el7_9 on one server having a raid10 md device as backing storage and it appears to be syncing normally with no kernel log errors. I believe the issue has been fixed in 9.1.13. I will report back if something comes up, but the previous versions since 9.1.5 all failed immediately and would not sync. Thank you! |
|
@Jaybus Thank you for reporting back with your observations. Good to hear 9.1.13 is working. Yes, please do let us know if you find any issue. |
|
Spoke too soon. In versions 9.1.5 through 9.1.12 a DRBD device using a MD raid10 device would fail immediately when brought up. With 9.1.13 it now works when secondary and will sync at startup, but as soon as it is made primary it fails in the same manner as before. Reverting to version 9.1.4 still restores normal functionality. Mar 1 08:43:39 cnode2 kernel: md/raid10:md127: make_request bug: can't convert block across chunks or bigger than 256k 448794880 132 Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( UpToDate -> Failed ) Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: Local IO failed in drbd_request_endio. Detaching... Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: local READ IO error sector 29362432+264 on ffff9fcff9a389c0 Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: sending new current UUID: 9C66E258C0F9F361 Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( Failed -> Diskless ) |
|
Hmm, that's unfortunate. |
|
@Jaybus drbd 9.1.14 is out. Can you give it a try? |
|
Just now cycling back to this issue. I did not try 9.1.14, but just recently tried 9.1.15 with Centos 7 kernel 3.10.0-1160.92.1 and had the same exact errors when the PV is a md raid10 device. Reverting back to the 9.1.4 kmod works as expected. It is definitely something that changed between 9.1.4 and 9.1.5. |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-08-17 10:58 | Jaybus | New Issue | |
2022-08-17 10:58 | Jaybus | Status | new => assigned |
2022-08-17 10:58 | Jaybus | Assigned To | => toracat |
2022-08-17 11:12 | toracat | Note Added: 0008528 | |
2022-08-17 12:35 | toracat | Note Added: 0008530 | |
2022-11-22 19:50 | toracat | Note Added: 0008773 | |
2022-11-22 19:50 | toracat | Status | assigned => feedback |
2022-12-20 14:00 | Jaybus | Note Added: 0008866 | |
2022-12-20 14:00 | Jaybus | Status | feedback => assigned |
2022-12-20 19:40 | toracat | Note Added: 0008869 | |
2022-12-21 11:13 | Jaybus | Note Added: 0008870 | |
2022-12-22 13:44 | toracat | Note Added: 0008873 | |
2023-01-30 13:41 | toracat | Note Added: 0008942 | |
2023-02-18 13:47 | toracat | Status | assigned => feedback |
2023-02-28 13:22 | Jaybus | Note Added: 0008998 | |
2023-02-28 13:22 | Jaybus | Status | feedback => assigned |
2023-02-28 13:42 | toracat | Note Added: 0009001 | |
2023-03-01 08:58 | Jaybus | Note Added: 0009012 | |
2023-03-01 13:20 | toracat | Note Added: 0009014 | |
2023-04-11 13:19 | toracat | Note Added: 0009131 | |
2023-05-01 16:08 | toracat | Status | assigned => feedback |
2023-07-18 09:07 | Jaybus | Note Added: 0009292 | |
2023-07-18 09:07 | Jaybus | Status | feedback => assigned |