View Issue Details

IDProjectCategoryView StatusLast Update
0001253channel: elrepo/el7kmod-drbd90public2023-07-18 09:07
ReporterJaybus Assigned Totoracat  
PriorityhighSeveritymajorReproducibilityhave not tried
Status assignedResolutionopen 
Platformx86_64OSCentosOS Version7.9.2009
Summary0001253: I/O error in 9.1.5 through 9.1.8 when using md RAID10 storage
DescriptionSorry for the late report. I reported this on the Linbit drbd_users list months ago, but the situation is still the same with the latest 9.1.8 build.

Centos 7.9.2009 kernel 3.10.0-1160.71.1.el7.x86_64
kmod-drbd90-9.1.5-1.el7_9.elrepo.x86_64 package from elrepo
drbd90-utils-9.21.1-1.el7.elrepo.x86_64 from elrepo

On both cluster nodes, the drbd device is using a LVM logical volume as backing storage, where the volume group has a single PV. On one of the nodes, cnode1, the PV is a mdraid RAID 1 device. The other node, cnode3, is the exact same setup except that the PV is a mdraid RAID10 device. The node using a RAID10 PV is logging the following errors with 9.1.5 or above:

Jan 19 09:36:33 cnode3 kernel: md/raid10:md125: make_request bug: can't convert block across chunks or bigger than 512k 939791352 48
Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: disk( UpToDate -> Failed )
Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: Local IO failed in drbd_request_endio. Detaching...
Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: local WRITE IO error sector 310643704+96 on dm-1
Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: sending new current UUID: 5A8A36BB73AD216D
Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: disk( Failed -> Diskless )
Jan 19 09:36:33 cnode3 kernel: drbd drbd_mail2_home/0 drbd5: Should have called drbd_al_complete_io(, 310642696, 516096), but my Disk seems to have failed

Reverting to 9.1.4 using elrepo package kmod-drbd90-9.1.4-1.el7_9.elrepo.x86_64 restores normal functionality (after recovery) when using the same kernel and same mdraid and LVM modules.

Steps To ReproduceInstall kmod_drbd90-9.1.5 when using a LVM LV as the DRBD device backing storage, where a PV in the volume group is a mdraid RAID10 device.
TagsNo tags attached.
Reported upstream

Activities

toracat

2022-08-17 11:12

administrator   ~0008528

@Jaybus

Thank you for reporting this issue. Could you give us a link to your post on the drbd list?

Does the same problem exist in el8 or el9 as well?

toracat

2022-08-17 12:35

administrator   ~0008530

I see it now:

https://github.com/LINBIT/drbd/issues/26

toracat

2022-11-22 19:50

administrator   ~0008773

@Jaybus

kmod-drbd90 is now available in 2 versions. Can you please test either (or both) package(s) ?

elrepo main repo:
   kmod-drbd90-9.1.12-1.el7_9.elrepo.x86_64.rpm

elrepo-testing repo:
   kmod-drbd90-9.2.1-1.el7_9.elrepo.x86_64.rpm

To install the v9.2.1 package, run:

sudo yum --enablerepo=elrepo-testing install kmod-drbd90

Jaybus

2022-12-20 14:00

reporter   ~0008866

I tried kmod-drbd90-9.1.12-1.el7_9.elrepo.x86_64.rpm and get the same error. See below log excerpt. Again, this is only when the backing device is an LVM2 PV that is a md raid10 device. A LVM2 PV that is a md raid1 device works just fine. I'm not set up to test any other raid levels.

Dec 20 11:32:52 cnode2 kernel: md/raid10:md127: make_request bug: can't convert block across chunks or bigger than 256k 448794880 132
Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( UpToDate -> Failed )
Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: Local IO failed in drbd_request_endio. Detaching...
Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: local READ IO error sector 29362432+264 on ffff927179e81040
Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: sending new current UUID: AC56426AC1A85BE7
Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( Failed -> Diskless )
Dec 20 11:32:52 cnode2 kernel: drbd drbd_access_home/0 drbd13: sending new current UUID: 762510B9B75C5695

toracat

2022-12-20 19:40

administrator   ~0008869

@Jaybus

Thanks for reporting back.

Perhaps, you could report the issue to Linbit? Hopefully the drbd developers may be able to see what the problem is. When an updated version (or a test version) is out, we'd be willing to build a kmod package for that.

Jaybus

2022-12-21 11:13

reporter   ~0008870

There is already a bug report with Linbit/drbd on github at https://github.com/LINBIT/drbd/issues/26. Issue #26 titled "Bug in drbd 9.1.5 on CentOS 7 #26" from Feb. 2022. I added an update to that issue noting that it persists in 9.1.12 and giving device info.

toracat

2022-12-22 13:44

administrator   ~0008873

Thanks for update. Let's keep an eye on the development there.

toracat

2023-01-30 13:41

administrator   ~0008942

@Jaybus

The final release of drbd-9.1.13 has just been announced. I have built kmod-drbd90-9.1.13-1.el7_9.elrepo.x86_64.rpm. It will be available from our mirror sites shortly. Please test if you are able.

Jaybus

2023-02-28 13:22

reporter   ~0008998

Sorry for the delay. I installed kmod-drbd90-9.1.13-1.el7_9 on one server having a raid10 md device as backing storage and it appears to be syncing normally with no kernel log errors. I believe the issue has been fixed in 9.1.13. I will report back if something comes up, but the previous versions since 9.1.5 all failed immediately and would not sync. Thank you!

toracat

2023-02-28 13:42

administrator   ~0009001

@Jaybus

Thank you for reporting back with your observations. Good to hear 9.1.13 is working. Yes, please do let us know if you find any issue.

Jaybus

2023-03-01 08:58

reporter   ~0009012

Spoke too soon. In versions 9.1.5 through 9.1.12 a DRBD device using a MD raid10 device would fail immediately when brought up. With 9.1.13 it now works when secondary and will sync at startup, but as soon as it is made primary it fails in the same manner as before. Reverting to version 9.1.4 still restores normal functionality.

Mar 1 08:43:39 cnode2 kernel: md/raid10:md127: make_request bug: can't convert block across chunks or bigger than 256k 448794880 132
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( UpToDate -> Failed )
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: Local IO failed in drbd_request_endio. Detaching...
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: local READ IO error sector 29362432+264 on ffff9fcff9a389c0
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: sending new current UUID: 9C66E258C0F9F361
Mar 1 08:43:39 cnode2 kernel: drbd drbd_access_home/0 drbd13: disk( Failed -> Diskless )

toracat

2023-03-01 13:20

administrator   ~0009014

Hmm, that's unfortunate.

toracat

2023-04-11 13:19

administrator   ~0009131

@Jaybus

drbd 9.1.14 is out. Can you give it a try?

Jaybus

2023-07-18 09:07

reporter   ~0009292

Just now cycling back to this issue. I did not try 9.1.14, but just recently tried 9.1.15 with Centos 7 kernel 3.10.0-1160.92.1 and had the same exact errors when the PV is a md raid10 device. Reverting back to the 9.1.4 kmod works as expected. It is definitely something that changed between 9.1.4 and 9.1.5.

Issue History

Date Modified Username Field Change
2022-08-17 10:58 Jaybus New Issue
2022-08-17 10:58 Jaybus Status new => assigned
2022-08-17 10:58 Jaybus Assigned To => toracat
2022-08-17 11:12 toracat Note Added: 0008528
2022-08-17 12:35 toracat Note Added: 0008530
2022-11-22 19:50 toracat Note Added: 0008773
2022-11-22 19:50 toracat Status assigned => feedback
2022-12-20 14:00 Jaybus Note Added: 0008866
2022-12-20 14:00 Jaybus Status feedback => assigned
2022-12-20 19:40 toracat Note Added: 0008869
2022-12-21 11:13 Jaybus Note Added: 0008870
2022-12-22 13:44 toracat Note Added: 0008873
2023-01-30 13:41 toracat Note Added: 0008942
2023-02-18 13:47 toracat Status assigned => feedback
2023-02-28 13:22 Jaybus Note Added: 0008998
2023-02-28 13:22 Jaybus Status feedback => assigned
2023-02-28 13:42 toracat Note Added: 0009001
2023-03-01 08:58 Jaybus Note Added: 0009012
2023-03-01 13:20 toracat Note Added: 0009014
2023-04-11 13:19 toracat Note Added: 0009131
2023-05-01 16:08 toracat Status assigned => feedback
2023-07-18 09:07 Jaybus Note Added: 0009292
2023-07-18 09:07 Jaybus Status feedback => assigned