View Issue Details

IDProjectCategoryView StatusLast Update
0001488channel: elrepo/el9drbd9x-utilspublic2024-10-31 10:28
Reporteranenni Assigned Totoracat  
PrioritynormalSeveritymajorReproducibilityalways
Status acknowledgedResolutionopen 
PlatformLinuxOSrhelOS Version9.4
Summary0001488: pacemaker-schedulerd warning: Unexpected result (error: Resource agent did not complete within 1m40s)
Descriptionwhen restarting drbd standby node, pacemaker stop procedure always fails making node unusable for a lot of time, with this message multiple in logs:

pacemaker-schedulerd warning: Unexpected result (error: Resource agent did not complete within 1m40s)

after a lot of minutes it finally reboots, unless obviously you configure fencing that kills the node, but this is not acceptable as a routine measure.

Steps To Reproducejust reboot the drbd unpromoted (slave) server.

Instead, stopping pacemaker before and then rebooting the node goes flawlessly.
Additional Informationusing drbd9x-utils-9.28.0-1.el9.elrepo.x86_64
tried kmod-drbd9x-9.1.22 and .21 and .20

selinux is permissive

resource config is as per linbit latest rhel9/drbd9 docs.

Clone: users_drbd-clone
  Meta Attributes: users_drbd-clone-meta_attributes
    clone-max=2
    clone-node-max=1
    notify=true
    promotable=true
    promoted-max=1
    promoted-node-max=1
  Resource: users_drbd (class=ocf provider=linbit type=drbd)
    Attributes: users_drbd-instance_attributes
      drbd_resource=users
    Operations:
      demote: users_drbd-demote-interval-0s
        interval=0s timeout=90
      monitor: users_drbd-monitor-interval-29s
        interval=29s timeout=20s role=Promoted
      monitor: users_drbd-monitor-interval-31s
        interval=31s timeout=20s role=Unpromoted
      notify: users_drbd-notify-interval-0s
        interval=0s timeout=90
      promote: users_drbd-promote-interval-0s
        interval=0s timeout=90
      reload: users_drbd-reload-interval-0s
        interval=0s timeout=30
      start: users_drbd-start-interval-0s
        interval=0s timeout=240
      stop: users_drbd-stop-interval-0s
        interval=0s timeout=100
TagsNo tags attached.

Activities

anenni

2024-10-28 15:10

reporter   ~0010165

this is logs from system console while trying to reboot
drbdreboot.jpg (222,900 bytes)   
drbdreboot.jpg (222,900 bytes)   

anenni

2024-10-28 15:13

reporter   ~0010166

we have a very similar setup on a rhel8.9 system with slightly lower versions that has no problem at all.

drbd90-utils-9.27.0-1.el8.elrepo.x86_64
kmod-drbd90-9.1.19-1.el8_9.elrepo.x86_64

toracat

2024-10-28 15:19

administrator   ~0010167

Acknowledged.

anenni

2024-10-28 15:34

reporter   ~0010168

infact after downgrading to this almost identycal combination all started to work as expected:

kmod-drbd9x-9.1.19-2.el9_4.elrepo.x86_64
drbd9x-utils-9.27.0-1.el9.elrepo.x86_64

anenni

2024-10-28 15:39

reporter   ~0010169

a lot has changed in latest utils:

https://github.com/LINBIT/drbd-utils/blob/master/ChangeLog

9.28.0
-----------
 * events2: set may_promote:no promotion_score:0 while
   force-io-failure:yes
 * drbdsetup,v9: show TLS in connection status
 * drbdsetup,v9: add udev command
 * 8.3: remove
 * crm-fence-peer.9.sh: fixes for pacemaker 2.1.7
 * events2: improved out of order message handling

anenni

2024-10-28 15:56

reporter   ~0010170

I finished picking a kmod-drbd9x version older then drbd9x-utils-9.28.0-1.el9.elrepo.x86_64, just to be on the safe side with a somewhat common combination.

    drbd9x-utils-9.27.0-1.el9.elrepo.x86_64.rpm 2023-12-23 12:41 1.0M
     drbd9x-utils-9.28.0-1.el9.elrepo.x86_64.rpm 2024-05-11 20:36 886K

     kmod-drbd9x-9.1.19-2.el9_4.elrepo.x86_64.rpm 2024-05-01 17:03 400K
     kmod-drbd9x-9.1.20-1.el9_4.elrepo.x86_64.rpm 2024-05-13 18:59 402K
     kmod-drbd9x-9.1.21-1.el9_4.elrepo.x86_64.rpm 2024-06-08 18:33 402K
     kmod-drbd9x-9.1.22-1.el9_4.elrepo.x86_64.rpm 2024-08-12 18:48 403K

anenni

2024-10-28 16:09

reporter   ~0010171

It seems a lot has changed in 9.28.0 ....

https://lists.linbit.com/pipermail/drbd-announce/2024-May/000728.html

"In contrast to the recent releases this one contains a bit more exciting
news:"

toracat

2024-10-28 16:24

administrator   ~0010172

Thank you. I was about to suggest trying drbd9x-utils-9.27.0.

By the way 9.29.0 is on its way. The changelog is:

9.29.0-rc.1
-----------
 * drbdmeta: fix initialization for external md
 * build: allow disabling keyutils
 * tests: export sanitized environment
 * drbdmon: various improvements
 * build: add cyclonedx
 * drbsetup,v9: fix multiple paths drbdsetup show --json
   strictly speaking breaking change, but maily used internally
 * events2: expose if device is open
 * drbdadm: fix undefined behavior that triggered on amd64
 * shared: fix out-of-bounds access in parsing
 * drbsetup,v9: event consistency with peer devices
 * drbdadm: fix parsing of v8.4 configs for compatibility
 * drbdmeta: fix segfault for check-resize on intentionally diskless
 * drbd-promote@.service: check if ExecCondition is available

anenni

2024-10-28 16:32

reporter   ~0010173

Thank you, but I have to go production so if it proves stable, I'll stop here for a while.

And it could also be something rhel9 specific.

Regards.

toracat

2024-10-28 17:56

administrator   ~0010174

Understood. Best to stay with what is proven to work.

anenni

2024-10-29 13:50

reporter   ~0010176

i'll leave tests for the next cluster
in the meantime maybe it's better to retire this version

anenni

2024-10-29 14:37

reporter   ~0010177

Also, rhel 9.5 should be near now, so some testing will be necessary

toracat

2024-10-31 10:28

administrator   ~0010178

drbd9x-utils-9.29.0-1.el9.elrepo.x86_64.rpm is out.

Issue History

Date Modified Username Field Change
2024-10-28 15:06 anenni New Issue
2024-10-28 15:06 anenni Status new => assigned
2024-10-28 15:06 anenni Assigned To => toracat
2024-10-28 15:10 anenni Note Added: 0010165
2024-10-28 15:10 anenni File Added: drbdreboot.jpg
2024-10-28 15:13 anenni Note Added: 0010166
2024-10-28 15:19 toracat Status assigned => acknowledged
2024-10-28 15:19 toracat Note Added: 0010167
2024-10-28 15:34 anenni Note Added: 0010168
2024-10-28 15:39 anenni Note Added: 0010169
2024-10-28 15:56 anenni Note Added: 0010170
2024-10-28 16:09 anenni Note Added: 0010171
2024-10-28 16:24 toracat Note Added: 0010172
2024-10-28 16:32 anenni Note Added: 0010173
2024-10-28 17:56 toracat Note Added: 0010174
2024-10-29 13:50 anenni Note Added: 0010176
2024-10-29 14:37 anenni Note Added: 0010177
2024-10-31 10:28 toracat Note Added: 0010178