View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001250 | channel: elrepo/el7 | kmod-drbd90 | public | 2022-08-09 15:02 | 2022-10-10 05:18 |
Reporter | richarson | Assigned To | toracat | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Summary | 0001250: Revert kmod-drbd90 to 9.1.7 - 9.1.8 can't see metadata from earlier versions | ||||
Description | I had a couple of CentOS 7.9 servers running drbd 9.1.7, I upgraded one to 9.1.8 and after reboot it stayed in Diskless mode, dmesg showed lines like these: [Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: drbd_md_sync_page_io(,1875385000s,READ) failed with error -5 [Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: Error while reading metadata. After downgrading to 9.1.7 everithing went back to normal. This happened again yesterday on another pair of servers, this time under AlmaLinux 8.6 | ||||
Steps To Reproduce | - Install a couple of machines with CentOS 7.9 or Alma/Rocky/etc. 8.6 and kmod-drbd90 < 9.1.8 - Create a resource and let it sync - Upgrade one machine and reboot it - Check drbd status with `drbdadm status` | ||||
Additional Information | I haven't yet reported it upstream, nor have I tried creating a resource with 9.1.8 installed on both machines. | ||||
Tags | No tags attached. | ||||
Reported upstream | |||||
|
Thanks for the report. We are going to remove kmod-drbd90 version 9.1.8 until this issue is resolved. |
|
Also, kmod-drbd9x-9.1.8 for el9 has been removed. |
|
And el8 as well. |
|
Wow, thanks for such a fast response! |
|
@richarson Does 9.1.7 or older work for you? I ask because the following bug has just been filed: https://elrepo.org/bugs/view.php?id=1253 |
|
Found the original report: https://github.com/LINBIT/drbd/issues/26 |
|
Hi, I've had no issues with diferent DRBD versions up to 9.1.7 but I'm using regular partitions, no LVM or RAID below DRBD (actually, LVM on top of DRBD). The other bug report seems to indicate something related to the block size of either LVM or RAID, right?: make_request bug: can't convert block across chunks or bigger than 512k 2755544 32 Anyway, I believe that reverting to 9.1.4 (if needed) is not going to cause any problems for us. |
|
Oh, my upstream report: https://github.com/LINBIT/drbd/issues/45 |
|
@richarson According to the upstream report you filed, they came up with a patch that supposedly fixes the issue. I've rebuilt kmod-drbd90 for el8.6 with the patch applied and released it to the elrepo-testing repository: kmod-drbd90-9.1.8-2.el8_6.elrepo.x86_64.rpm Can you give it a try on your AlmaLinux 8 box? |
|
Thanks, those are production machines but I'll see if I can test it somewhere. Actually, if you can rebuild it for el 7.9 I have 2 test servers available. |
|
I will build the patched version for el7 shortly. |
|
kmod-drbd90-9.1.8-2.el7_9.elrepo.x86_64.rpm has just been released to the elrepo-testing repository. Will show up on our mirror sites soon. |
|
Same issue with this version ``` [root@lab-b ~] # grep -s version /proc/drbd version: 9.1.7 (api:2/proto:110-121) ``` ``` [root@lab-a ~] # grep -s version /proc/drbd version: 9.1.8 (api:2/proto:86-121) ``` ``` [root@lab-a ~] # rpm -q kmod-drbd90 kmod-drbd90-9.1.8-2.el7_9.elrepo.x86_64 ``` ``` [root@lab-a ~] # drbdadm status home1 role:Secondary disk:Diskless lab-b role:Primary peer-disk:UpToDate home2 role:Secondary disk:Diskless lab-b role:Primary peer-disk:UpToDate ``` ``` [mar ago 23 18:15:48 2022] drbd home1: Starting worker thread (from drbdsetup [19656]) [mar ago 23 18:15:48 2022] drbd home2: Starting worker thread (from drbdsetup [19658]) [mar ago 23 18:15:48 2022] drbd home1 lab-b: Starting sender thread (from drbdsetup [19686]) [mar ago 23 18:15:48 2022] drbd home2 lab-b: Starting sender thread (from drbdsetup [19691]) [mar ago 23 18:15:48 2022] drbd home1/0 drbd2: meta-data IO uses: blk-bio [mar ago 23 18:15:48 2022] drbd home1/0 drbd2: drbd_md_sync_page_io(,1953525160s,READ) failed with error -5 [mar ago 23 18:15:48 2022] drbd home1/0 drbd2: Error while reading metadata. [mar ago 23 18:15:48 2022] drbd home2/0 drbd3: meta-data IO uses: blk-bio [mar ago 23 18:15:48 2022] drbd home2/0 drbd3: drbd_md_sync_page_io(,1953525160s,READ) failed with error -5 [mar ago 23 18:15:48 2022] drbd home2/0 drbd3: Error while reading metadata. [mar ago 23 18:15:48 2022] drbd home1 lab-b: conn( StandAlone -> Unconnected ) [mar ago 23 18:15:48 2022] drbd home1 lab-b: Starting receiver thread (from drbd_w_home1 [19657]) [mar ago 23 18:15:48 2022] drbd home1 lab-b: conn( Unconnected -> Connecting ) [mar ago 23 18:15:48 2022] drbd home2 lab-b: conn( StandAlone -> Unconnected ) [mar ago 23 18:15:48 2022] drbd home2 lab-b: Starting receiver thread (from drbd_w_home2 [19659]) [mar ago 23 18:15:48 2022] drbd home2 lab-b: conn( Unconnected -> Connecting ) [mar ago 23 18:15:49 2022] drbd home1 lab-b: Handshake to peer 1 successful: Agreed network protocol version 121 [mar ago 23 18:15:49 2022] drbd home1 lab-b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES. [mar ago 23 18:15:49 2022] drbd home1 lab-b: Starting ack_recv thread (from drbd_r_home1 [19733]) [mar ago 23 18:15:49 2022] drbd home2 lab-b: Handshake to peer 1 successful: Agreed network protocol version 121 [mar ago 23 18:15:49 2022] drbd home2 lab-b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES. [mar ago 23 18:15:49 2022] drbd home2 lab-b: Starting ack_recv thread (from drbd_r_home2 [19735]) [mar ago 23 18:15:49 2022] drbd home1: Preparing cluster-wide state change 3641581203 (0->1 499/146) [mar ago 23 18:15:49 2022] drbd home2: Preparing cluster-wide state change 3975294569 (0->1 499/146) [mar ago 23 18:15:49 2022] drbd home2/0 drbd3: disabling discards due to peer capabilities [mar ago 23 18:15:49 2022] drbd home1/0 drbd2: disabling discards due to peer capabilities [mar ago 23 18:15:49 2022] drbd home2/0 drbd3: size = 927 GB (971649028 KB) [mar ago 23 18:15:49 2022] drbd home1/0 drbd2: size = 927 GB (971649028 KB) [mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: my exposed UUID: 0000000000000000 [mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: peer 3B29AC44594D67A0:0000000000000000:69CC366B62245032:E3DD42074C878166 bits:0 flags:120 [mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: my exposed UUID: 0000000000000000 [mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: peer 0C1A198B548AD30C:0000000000000000:4E94AFC73414BC9A:2A8E3B7E31F66CD8 bits:0 flags:120 [mar ago 23 18:15:49 2022] drbd home1: State change 3641581203: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC [mar ago 23 18:15:49 2022] drbd home1: Committing cluster-wide state change 3641581203 (343ms) [mar ago 23 18:15:49 2022] drbd home1 lab-b: conn( Connecting -> Connected ) peer( Unknown -> Primary ) [mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [mar ago 23 18:15:49 2022] drbd home2: State change 3975294569: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC [mar ago 23 18:15:49 2022] drbd home2: Committing cluster-wide state change 3975294569 (362ms) [mar ago 23 18:15:49 2022] drbd home2 lab-b: conn( Connecting -> Connected ) peer( Unknown -> Primary ) [mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) ``` |
|
Hmmm, that means the proposed patch did not fix the issue. Let's wait for the next update and see how it goes. |
|
It appears so :/ Already reported upstream. |
|
From upstream: "Seems like something went wrong with the rebuild there. I suspect the kernel compatibility patch cache was not updated correctly. We have internally verified that this issue is fixed with >9.1.9. Please try with 9.1.10 and see if that works for you." |
|
OK, I see that 9.1.10 has just been released. Will package it shortly. |
|
kmod-drbd90-9.1.10-1.el7_9.elrepo.x86_64.rpm has been built and set to sync to our mirrors. |
|
Great, thanks! I'll be sure to test it tomorrow. |
|
Sorry for not replying before, I didn't have time ti test it till right now. Version 9.1.10 seems to work fine with 9.1.7: ``` [root@lab-a ~] # grep -s version /proc/drbd version: 9.1.7 (api:2/proto:110-121) [root@lab-a ~] # drbdadm status home1 role:Primary disk:UpToDate lab-b.dattaweb.com role:Secondary peer-disk:UpToDate home2 role:Primary disk:UpToDate lab-b.dattaweb.com role:Secondary peer-disk:UpToDate ``` ``` [root@lab-b ~] # grep -s version /proc/drbd version: 9.1.10 (api:2/proto:86-121) [root@lab-b ~] # drbdadm status home1 role:Secondary disk:UpToDate lab-a.dattaweb.com role:Primary peer-disk:UpToDate home2 role:Secondary disk:UpToDate lab-a.dattaweb.com role:Primary peer-disk:UpToDate ``` Thanks! |
|
@richarson Thank you for reporting back. kmod-drbd90-9.1.10-1.el7_9.elrepo has been moved to the main repository. Now there is an updated version, drbd90-kmod-9.1.11-1.el7_9.elrepo in the testing repository. It would be great if you can give it a try. |
|
I've promoted -9.1.11 to the main repo. |
|
Closing as 'resolved'. If you find any issue, please open a new ticket. |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-08-09 15:02 | richarson | New Issue | |
2022-08-09 15:02 | richarson | Status | new => assigned |
2022-08-09 15:02 | richarson | Assigned To | => toracat |
2022-08-09 15:06 | toracat | Note Added: 0008510 | |
2022-08-09 15:10 | toracat | Note Added: 0008511 | |
2022-08-09 15:14 | toracat | Note Added: 0008512 | |
2022-08-09 15:25 | richarson | Note Added: 0008513 | |
2022-08-17 12:28 | toracat | Note Added: 0008529 | |
2022-08-17 12:38 | toracat | Note Added: 0008531 | |
2022-08-17 17:16 | richarson | Note Added: 0008532 | |
2022-08-17 17:17 | richarson | Note Added: 0008533 | |
2022-08-20 14:58 | toracat | Note Added: 0008538 | |
2022-08-22 16:22 | richarson | Note Added: 0008539 | |
2022-08-22 16:54 | toracat | Note Added: 0008540 | |
2022-08-22 17:16 | toracat | Note Added: 0008541 | |
2022-08-22 17:17 | toracat | Status | assigned => feedback |
2022-08-23 17:23 | richarson | Note Added: 0008542 | |
2022-08-23 17:23 | richarson | Status | feedback => assigned |
2022-08-23 19:03 | toracat | Note Added: 0008543 | |
2022-08-23 19:29 | richarson | Note Added: 0008544 | |
2022-09-01 12:30 | richarson | Note Added: 0008562 | |
2022-09-01 12:42 | toracat | Note Added: 0008564 | |
2022-09-01 13:11 | toracat | Note Added: 0008565 | |
2022-09-01 19:44 | richarson | Note Added: 0008566 | |
2022-09-05 18:52 | richarson | Note Added: 0008575 | |
2022-10-04 19:05 | toracat | Note Added: 0008686 | |
2022-10-10 05:16 | toracat | Note Added: 0008709 | |
2022-10-10 05:18 | toracat | Status | assigned => resolved |
2022-10-10 05:18 | toracat | Resolution | open => fixed |
2022-10-10 05:18 | toracat | Note Added: 0008710 |