View Issue Details

IDProjectCategoryView StatusLast Update
0001250channel: elrepo/el7kmod-drbd90public2022-10-10 05:18
Reporterricharson Assigned Totoracat  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Summary0001250: Revert kmod-drbd90 to 9.1.7 - 9.1.8 can't see metadata from earlier versions
DescriptionI had a couple of CentOS 7.9 servers running drbd 9.1.7, I upgraded one to 9.1.8 and after reboot it stayed in Diskless mode, dmesg showed lines like these:

[Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: drbd_md_sync_page_io(,1875385000s,READ) failed with error -5
[Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: Error while reading metadata.

After downgrading to 9.1.7 everithing went back to normal.

This happened again yesterday on another pair of servers, this time under AlmaLinux 8.6
Steps To Reproduce- Install a couple of machines with CentOS 7.9 or Alma/Rocky/etc. 8.6 and kmod-drbd90 < 9.1.8
- Create a resource and let it sync
- Upgrade one machine and reboot it
- Check drbd status with `drbdadm status`
Additional InformationI haven't yet reported it upstream, nor have I tried creating a resource with 9.1.8 installed on both machines.
TagsNo tags attached.
Reported upstream

Activities

toracat

2022-08-09 15:06

administrator   ~0008510

Thanks for the report. We are going to remove kmod-drbd90 version 9.1.8 until this issue is resolved.

toracat

2022-08-09 15:10

administrator   ~0008511

Also, kmod-drbd9x-9.1.8 for el9 has been removed.

toracat

2022-08-09 15:14

administrator   ~0008512

And el8 as well.

richarson

2022-08-09 15:25

reporter   ~0008513

Wow, thanks for such a fast response!

toracat

2022-08-17 12:28

administrator   ~0008529

@richarson

Does 9.1.7 or older work for you? I ask because the following bug has just been filed:

https://elrepo.org/bugs/view.php?id=1253

toracat

2022-08-17 12:38

administrator   ~0008531

Found the original report:

https://github.com/LINBIT/drbd/issues/26

richarson

2022-08-17 17:16

reporter   ~0008532

Hi,

I've had no issues with diferent DRBD versions up to 9.1.7 but I'm using regular partitions, no LVM or RAID below DRBD (actually, LVM on top of DRBD).

The other bug report seems to indicate something related to the block size of either LVM or RAID, right?:

make_request bug: can't convert block across chunks or bigger than 512k 2755544 32

Anyway, I believe that reverting to 9.1.4 (if needed) is not going to cause any problems for us.

richarson

2022-08-17 17:17

reporter   ~0008533

Oh, my upstream report:

https://github.com/LINBIT/drbd/issues/45

toracat

2022-08-20 14:58

administrator   ~0008538

@richarson

According to the upstream report you filed, they came up with a patch that supposedly fixes the issue.

I've rebuilt kmod-drbd90 for el8.6 with the patch applied and released it to the elrepo-testing repository:

kmod-drbd90-9.1.8-2.el8_6.elrepo.x86_64.rpm

Can you give it a try on your AlmaLinux 8 box?

richarson

2022-08-22 16:22

reporter   ~0008539

Thanks, those are production machines but I'll see if I can test it somewhere.

Actually, if you can rebuild it for el 7.9 I have 2 test servers available.

toracat

2022-08-22 16:54

administrator   ~0008540

I will build the patched version for el7 shortly.

toracat

2022-08-22 17:16

administrator   ~0008541

kmod-drbd90-9.1.8-2.el7_9.elrepo.x86_64.rpm has just been released to the elrepo-testing repository. Will show up on our mirror sites soon.

richarson

2022-08-23 17:23

reporter   ~0008542

Same issue with this version

```
[root@lab-b ~] # grep -s version /proc/drbd
version: 9.1.7 (api:2/proto:110-121)
```

```
[root@lab-a ~] # grep -s version /proc/drbd
version: 9.1.8 (api:2/proto:86-121)
```

```
[root@lab-a ~] # rpm -q kmod-drbd90
kmod-drbd90-9.1.8-2.el7_9.elrepo.x86_64
```

```
[root@lab-a ~] # drbdadm status
home1 role:Secondary
  disk:Diskless
  lab-b role:Primary
    peer-disk:UpToDate

home2 role:Secondary
  disk:Diskless
  lab-b role:Primary
    peer-disk:UpToDate
```

```
[mar ago 23 18:15:48 2022] drbd home1: Starting worker thread (from drbdsetup [19656])
[mar ago 23 18:15:48 2022] drbd home2: Starting worker thread (from drbdsetup [19658])
[mar ago 23 18:15:48 2022] drbd home1 lab-b: Starting sender thread (from drbdsetup [19686])
[mar ago 23 18:15:48 2022] drbd home2 lab-b: Starting sender thread (from drbdsetup [19691])
[mar ago 23 18:15:48 2022] drbd home1/0 drbd2: meta-data IO uses: blk-bio
[mar ago 23 18:15:48 2022] drbd home1/0 drbd2: drbd_md_sync_page_io(,1953525160s,READ) failed with error -5
[mar ago 23 18:15:48 2022] drbd home1/0 drbd2: Error while reading metadata.
[mar ago 23 18:15:48 2022] drbd home2/0 drbd3: meta-data IO uses: blk-bio
[mar ago 23 18:15:48 2022] drbd home2/0 drbd3: drbd_md_sync_page_io(,1953525160s,READ) failed with error -5
[mar ago 23 18:15:48 2022] drbd home2/0 drbd3: Error while reading metadata.
[mar ago 23 18:15:48 2022] drbd home1 lab-b: conn( StandAlone -> Unconnected )
[mar ago 23 18:15:48 2022] drbd home1 lab-b: Starting receiver thread (from drbd_w_home1 [19657])
[mar ago 23 18:15:48 2022] drbd home1 lab-b: conn( Unconnected -> Connecting )
[mar ago 23 18:15:48 2022] drbd home2 lab-b: conn( StandAlone -> Unconnected )
[mar ago 23 18:15:48 2022] drbd home2 lab-b: Starting receiver thread (from drbd_w_home2 [19659])
[mar ago 23 18:15:48 2022] drbd home2 lab-b: conn( Unconnected -> Connecting )
[mar ago 23 18:15:49 2022] drbd home1 lab-b: Handshake to peer 1 successful: Agreed network protocol version 121
[mar ago 23 18:15:49 2022] drbd home1 lab-b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[mar ago 23 18:15:49 2022] drbd home1 lab-b: Starting ack_recv thread (from drbd_r_home1 [19733])
[mar ago 23 18:15:49 2022] drbd home2 lab-b: Handshake to peer 1 successful: Agreed network protocol version 121
[mar ago 23 18:15:49 2022] drbd home2 lab-b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[mar ago 23 18:15:49 2022] drbd home2 lab-b: Starting ack_recv thread (from drbd_r_home2 [19735])
[mar ago 23 18:15:49 2022] drbd home1: Preparing cluster-wide state change 3641581203 (0->1 499/146)
[mar ago 23 18:15:49 2022] drbd home2: Preparing cluster-wide state change 3975294569 (0->1 499/146)
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3: disabling discards due to peer capabilities
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2: disabling discards due to peer capabilities
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3: size = 927 GB (971649028 KB)
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2: size = 927 GB (971649028 KB)
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: my exposed UUID: 0000000000000000
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: peer 3B29AC44594D67A0:0000000000000000:69CC366B62245032:E3DD42074C878166 bits:0 flags:120
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: my exposed UUID: 0000000000000000
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: peer 0C1A198B548AD30C:0000000000000000:4E94AFC73414BC9A:2A8E3B7E31F66CD8 bits:0 flags:120
[mar ago 23 18:15:49 2022] drbd home1: State change 3641581203: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
[mar ago 23 18:15:49 2022] drbd home1: Committing cluster-wide state change 3641581203 (343ms)
[mar ago 23 18:15:49 2022] drbd home1 lab-b: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[mar ago 23 18:15:49 2022] drbd home2: State change 3975294569: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
[mar ago 23 18:15:49 2022] drbd home2: Committing cluster-wide state change 3975294569 (362ms)
[mar ago 23 18:15:49 2022] drbd home2 lab-b: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
```

toracat

2022-08-23 19:03

administrator   ~0008543

Hmmm, that means the proposed patch did not fix the issue.

Let's wait for the next update and see how it goes.

richarson

2022-08-23 19:29

reporter   ~0008544

It appears so :/

Already reported upstream.

richarson

2022-09-01 12:30

reporter   ~0008562

From upstream:

"Seems like something went wrong with the rebuild there. I suspect the kernel compatibility patch cache was not updated correctly.

We have internally verified that this issue is fixed with >9.1.9. Please try with 9.1.10 and see if that works for you."

toracat

2022-09-01 12:42

administrator   ~0008564

OK, I see that 9.1.10 has just been released. Will package it shortly.

toracat

2022-09-01 13:11

administrator   ~0008565

kmod-drbd90-9.1.10-1.el7_9.elrepo.x86_64.rpm has been built and set to sync to our mirrors.

richarson

2022-09-01 19:44

reporter   ~0008566

Great, thanks!

I'll be sure to test it tomorrow.

richarson

2022-09-05 18:52

reporter   ~0008575

Sorry for not replying before, I didn't have time ti test it till right now.

Version 9.1.10 seems to work fine with 9.1.7:

```
[root@lab-a ~] # grep -s version /proc/drbd
version: 9.1.7 (api:2/proto:110-121)

[root@lab-a ~] # drbdadm status
home1 role:Primary
  disk:UpToDate
  lab-b.dattaweb.com role:Secondary
    peer-disk:UpToDate

home2 role:Primary
  disk:UpToDate
  lab-b.dattaweb.com role:Secondary
    peer-disk:UpToDate
```

```
[root@lab-b ~] # grep -s version /proc/drbd
version: 9.1.10 (api:2/proto:86-121)

[root@lab-b ~] # drbdadm status
home1 role:Secondary
  disk:UpToDate
  lab-a.dattaweb.com role:Primary
    peer-disk:UpToDate

home2 role:Secondary
  disk:UpToDate
  lab-a.dattaweb.com role:Primary
    peer-disk:UpToDate
```

Thanks!

toracat

2022-10-04 19:05

administrator   ~0008686

@richarson

Thank you for reporting back.

kmod-drbd90-9.1.10-1.el7_9.elrepo has been moved to the main repository.

Now there is an updated version, drbd90-kmod-9.1.11-1.el7_9.elrepo in the testing repository. It would be great if you can give it a try.

toracat

2022-10-10 05:16

administrator   ~0008709

I've promoted -9.1.11 to the main repo.

toracat

2022-10-10 05:18

administrator   ~0008710

Closing as 'resolved'. If you find any issue, please open a new ticket.

Issue History

Date Modified Username Field Change
2022-08-09 15:02 richarson New Issue
2022-08-09 15:02 richarson Status new => assigned
2022-08-09 15:02 richarson Assigned To => toracat
2022-08-09 15:06 toracat Note Added: 0008510
2022-08-09 15:10 toracat Note Added: 0008511
2022-08-09 15:14 toracat Note Added: 0008512
2022-08-09 15:25 richarson Note Added: 0008513
2022-08-17 12:28 toracat Note Added: 0008529
2022-08-17 12:38 toracat Note Added: 0008531
2022-08-17 17:16 richarson Note Added: 0008532
2022-08-17 17:17 richarson Note Added: 0008533
2022-08-20 14:58 toracat Note Added: 0008538
2022-08-22 16:22 richarson Note Added: 0008539
2022-08-22 16:54 toracat Note Added: 0008540
2022-08-22 17:16 toracat Note Added: 0008541
2022-08-22 17:17 toracat Status assigned => feedback
2022-08-23 17:23 richarson Note Added: 0008542
2022-08-23 17:23 richarson Status feedback => assigned
2022-08-23 19:03 toracat Note Added: 0008543
2022-08-23 19:29 richarson Note Added: 0008544
2022-09-01 12:30 richarson Note Added: 0008562
2022-09-01 12:42 toracat Note Added: 0008564
2022-09-01 13:11 toracat Note Added: 0008565
2022-09-01 19:44 richarson Note Added: 0008566
2022-09-05 18:52 richarson Note Added: 0008575
2022-10-04 19:05 toracat Note Added: 0008686
2022-10-10 05:16 toracat Note Added: 0008709
2022-10-10 05:18 toracat Status assigned => resolved
2022-10-10 05:18 toracat Resolution open => fixed
2022-10-10 05:18 toracat Note Added: 0008710