View Issue Details

IDProjectCategoryView StatusLast Update
0000829channel: kernel/el7regressionspublic2018-02-23 13:53
Reporterchiluk Assigned Toburakkucat  
PriorityhighSeveritycrashReproducibilityrandom
Status resolvedResolutionfixed 
Summary0000829: XFS data corruption due to incorrect xfs_agfl_t size after upgrade to kernel-ml or kernel-lt
DescriptionWe are eventually hitting a crash in xfs after formatting the filesystem using xfsprogs-4.5.0-12.el7.x86_64 on the stock 3.10 kernel and then upgrading to the elrepo kernels. Both kernels are affected. We've root caused this to the fact that the 4.5+ kernels have changed the size of the xfs_agfl_t structure. The solution is thereby either revert this patch in the elrepo kernels or ship a version of xfsprogs with elrepo that includes the packed structure. Either way the userspace xfsprogs and the in-kernel structure size must match which they currently do not.

I think the correct thing to do is revert the structure size change in the kernel personally, as reverting the structure size in xfsprogs will require every xfs volume formatted with the old mkfs.xfs to have xfs_repair run against it. Either way I think it is prudent to run a size-matching xfs_repair on any machine that may have had the mismatch structure sizes.

Here's a representative stack trace we were getting.
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvkernel:
XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file
fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs]
kernel: CPU: 18 PID: 9896 Comm: mesos-slave Not tainted
4.10.10-1.el7.elrepo.x86_64 #1
kernel: Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0
12/17/2015
kernel: Call Trace:
kernel: dump_stack+0x63/0x87
kernel: xfs_error_report+0x3b/0x40 [xfs]
kernel: ? xfs_free_ag_extent+0x35d/0x7a0 [xfs]
kernel: xfs_btree_insert+0x1b0/0x1c0 [xfs]
kernel: xfs_free_ag_extent+0x35d/0x7a0 [xfs]
kernel: xfs_free_extent+0xbb/0x150 [xfs]
kernel: xfs_trans_free_extent+0x4f/0x110 [xfs]
kernel: ? xfs_trans_add_item+0x5d/0x90 [xfs]
kernel: xfs_extent_free_finish_item+0x26/0x40 [xfs]
kernel: xfs_defer_finish+0x149/0x410 [xfs]
kernel: xfs_remove+0x281/0x330 [xfs]
kernel: xfs_vn_unlink+0x55/0xa0 [xfs]
kernel: vfs_rmdir+0xb6/0x130
kernel: do_rmdir+0x1b3/0x1d0
kernel: SyS_rmdir+0x16/0x20
kernel: do_syscall_64+0x67/0x180
kernel: entry_SYSCALL64_slow_path+0x25/0x25
kernel: RIP: 0033:0x7f85d8d92397
kernel: RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX:
0000000000000054
kernel: RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397
kernel: RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70
kernel: RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002
kernel: R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640
kernel: R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40
kernel: XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file
fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffa028f087
kernel: XFS (dm-4): Corruption of in-memory data detected. Shutting down
filesystem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Additional InformationReproduction is fairly difficult, but we are hitting this in our 50 node cluster about once a week under our workloads.
TagsNo tags attached.
Attached Files
kernel-xfs.diff (5,503 bytes)   
Binary files rpmbuild.org/SOURCES/kernel-ml/linux-4.15.4.tar.xz and rpmbuild/SOURCES/kernel-ml/linux-4.15.4.tar.xz differ
diff -purN rpmbuild.org/SOURCES/kernel-ml/Revert-libxfs-pack-the-agfl-header-structure-so-XFS_.patch rpmbuild/SOURCES/kernel-ml/Revert-libxfs-pack-the-agfl-header-structure-so-XFS_.patch
--- rpmbuild.org/SOURCES/kernel-ml/Revert-libxfs-pack-the-agfl-header-structure-so-XFS_.patch	1969-12-31 18:00:00.000000000 -0600
+++ rpmbuild/SOURCES/kernel-ml/Revert-libxfs-pack-the-agfl-header-structure-so-XFS_.patch	2018-02-21 14:39:22.166117574 -0600
@@ -0,0 +1,88 @@
+From 53f25d944436500846f8bcdde587edf2654c93f2 Mon Sep 17 00:00:00 2001
+From: Dave Chiluk <dchiluk@indeed.com>
+Date: Wed, 21 Feb 2018 10:56:40 -0600
+Subject: [PATCH] Revert "libxfs: pack the agfl header structure so
+ XFS_AGFL_SIZE is correct"
+
+This reverts commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30.  This
+is necessary on RHEL kernels since the userspace xfsprogs and stock kernels
+have this structure unpacked.  Not reverting this fix can lead to corruption
+and the logs showing stack traces similar to the below.  The corruption
+specifically stems from the userspace xfsprogs which is used to format
+the volume using a different sized xfs_agfl_t structure than the 4.5+
+kernels.
+
+vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
+XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file
+fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs]
+kernel: CPU: 18 PID: 9896 Comm: host Not tainted
+4.10.10-1.el7.elrepo.x86_64 #1
+kernel: Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0
+12/17/2015
+kernel: Call Trace:
+kernel: dump_stack+0x63/0x87
+kernel: xfs_error_report+0x3b/0x40 [xfs]
+kernel: ? xfs_free_ag_extent+0x35d/0x7a0 [xfs]
+kernel: xfs_btree_insert+0x1b0/0x1c0 [xfs]
+kernel: xfs_free_ag_extent+0x35d/0x7a0 [xfs]
+kernel: xfs_free_extent+0xbb/0x150 [xfs]
+kernel: xfs_trans_free_extent+0x4f/0x110 [xfs]
+kernel: ? xfs_trans_add_item+0x5d/0x90 [xfs]
+kernel: xfs_extent_free_finish_item+0x26/0x40 [xfs]
+kernel: xfs_defer_finish+0x149/0x410 [xfs]
+kernel: xfs_remove+0x281/0x330 [xfs]
+kernel: xfs_vn_unlink+0x55/0xa0 [xfs]
+kernel: vfs_rmdir+0xb6/0x130
+kernel: do_rmdir+0x1b3/0x1d0
+kernel: SyS_rmdir+0x16/0x20
+kernel: do_syscall_64+0x67/0x180
+kernel: entry_SYSCALL64_slow_path+0x25/0x25
+kernel: RIP: 0033:0x7f85d8d92397
+kernel: RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX:
+0000000000000054
+kernel: RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397
+kernel: RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70
+kernel: RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002
+kernel: R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640
+kernel: R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40
+kernel: XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file
+fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffa028f087
+kernel: XFS (dm-4): Corruption of in-memory data detected. Shutting down
+filesystem
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Signed-off-by: Dave Chiluk <dchiluk@indeed.com>
+---
+ fs/xfs/libxfs/xfs_format.h | 2 +-
+ fs/xfs/xfs_ondisk.h        | 2 +-
+ 2 files changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
+index 1acb584fc5f7..e65a9ac8c294 100644
+--- a/fs/xfs/libxfs/xfs_format.h
++++ b/fs/xfs/libxfs/xfs_format.h
+@@ -821,7 +821,7 @@ typedef struct xfs_agfl {
+ 	__be64		agfl_lsn;
+ 	__be32		agfl_crc;
+ 	__be32		agfl_bno[];	/* actually XFS_AGFL_SIZE(mp) */
+-} __attribute__((packed)) xfs_agfl_t;
++} xfs_agfl_t;
+ 
+ #define XFS_AGFL_CRC_OFF	offsetof(struct xfs_agfl, agfl_crc)
+ 
+diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
+index 0492436a053f..4379e642f6c6 100644
+--- a/fs/xfs/xfs_ondisk.h
++++ b/fs/xfs/xfs_ondisk.h
+@@ -34,7 +34,7 @@ xfs_check_ondisk_structs(void)
+ 	XFS_CHECK_STRUCT_SIZE(struct xfs_acl,			4);
+ 	XFS_CHECK_STRUCT_SIZE(struct xfs_acl_entry,		12);
+ 	XFS_CHECK_STRUCT_SIZE(struct xfs_agf,			224);
+-	XFS_CHECK_STRUCT_SIZE(struct xfs_agfl,			36);
++	// XFS_CHECK_STRUCT_SIZE(struct xfs_agfl,			36);
+ 	XFS_CHECK_STRUCT_SIZE(struct xfs_agi,			336);
+ 	XFS_CHECK_STRUCT_SIZE(struct xfs_bmbt_key,		8);
+ 	XFS_CHECK_STRUCT_SIZE(struct xfs_bmbt_rec,		16);
+-- 
+2.16.2
+
Binary files rpmbuild.org/SPECS/.haproxy.spec.swp and rpmbuild/SPECS/.haproxy.spec.swp differ
diff -purN rpmbuild.org/SPECS/kernel-ml-4.15.spec rpmbuild/SPECS/kernel-ml-4.15.spec
--- rpmbuild.org/SPECS/kernel-ml-4.15.spec	2018-02-21 11:47:36.363017912 -0600
+++ rpmbuild/SPECS/kernel-ml-4.15.spec	2018-02-21 14:50:08.506174862 -0600
@@ -147,6 +147,12 @@ Source1: config-%{version}-x86_64
 Source2: cpupower.service
 Source3: cpupower.config
 
+# Patches
+# It is necessary to revert this to maintain compatibility with the RHEL/Centos provided
+# xfsprogs, and any kernels that were formatted with the stock kernel and then moved to
+# the ELREPO kernel.
+Patch0: Revert-libxfs-pack-the-agfl-header-structure-so-XFS_.patch
+
 # Do not package the source tarball.
 NoSource: 0
 
@@ -270,6 +276,9 @@ libraries, derived from the kernel sourc
 
 %prep
 %setup -q -n %{name}-%{version} -c
+pushd linux-%{LKAver} > /dev/null
+%patch0 -p1
+popd > /dev/null
 %{__mv} linux-%{LKAver} linux-%{version}-%{release}.%{_target_cpu}
 
 pushd linux-%{version}-%{release}.%{_target_cpu} > /dev/null
kernel-xfs.diff (5,503 bytes)   
irc-#xfs.txt (4,788 bytes)   
<chiluk> hey dchinner... I'm following up on https://www.spinics.net/lists/linux-xfs/msg13544.html
<chiluk> We were finally able to run xfs_repair on one of the bad volumes and got this .... http://paste.ubuntu.com/26315462/
<chiluk> I'll follow up in the thread as well.
<chiluk> Is there an xfs off by one bug that was fixed in the mainline/stable kernels since 4.10.10  that resolves freeblk count 3 != flcount 4 in ag 1012 ?
<chiluk> What typically causes those errors.
<chiluk> fyi... I've looked but haven't seen anything obvious.. I was hoping you guys had an idea.
<djwong> v5 fs?
<djwong> possibly created before 4.5?
<djwong> if so, then possibly
<chiluk> possible ... can you elaborate?
<-- yocum (~yocum@70.59.184.10) has quit (Quit: Remote host closed the connection)
<chiluk> it was running 4.10.10 when we hit this.
<dchinner> chiluk: where did the filesystem come from?
<chiluk> how do I check fs version?
<dchinner> what xfsprogs version are you using?
<chiluk> dchinner I'm not sure what you are asking... when the machine was deployed it was deployed on a 2.5 gb lvm volume, and then grown..
<chiluk> whatever comes with cent7
<dchinner> yeah, there's your problem
<chiluk> dchinner is there something you can point me to that outlines "my problem", commit id ...etc ?
<dchinner> if you are running centos, you need to use centos kernels and xfsprogs
<dchinner> if you are running a mainline kernel, you need to use mainline xfsprogs
<dchinner> this is the problematic issue: commit 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
<dchinner> that commit went into the 4.5 kernels
<djwong> (fubar'd structure padding means the agfl size changed in 4.5 on .... 64bit kernels?)
<chiluk> alright ... so that explains all the xfs_repair complaints
<chiluk> but that doesn't explain the 
<chiluk> XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file
<chiluk> fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs]
<dchinner> it hasn't been backported to RHEL/centos kernels for compatibility reasons
<dchinner> that could be anything
<chiluk> so the thought is that the filesystem gets created with default 3.10 cent7 kernel + cent7 mkfs... then upgraded to 4.10... and we start hitting this issue?  Am I understanding this?
<dchinner> that's the vector that can cause it
<dchinner> you need to run xfs_repair from xfsprogs >= 4.7.0 to fix it up
<chiluk> dchinner we are hitting this in our production cluster once a week or so.
<djwong> well... mounted with the default 3.10 kernel, then later remounted on 4.5+
<djwong> and, trickily, only if the active part of the agfl goes near the end
<chiluk> ok let me check to see if we are formatting using 3.10 first...
<chiluk> we might be deploying with 4.10 out of the gate.
<djwong> (just in case you're making xfs images with a system having a 3.10 kernel and then deploying them to machines that boot 4.10)
<chiluk> we are not... deploys are done using puppet + chef + kickstart.
<dchinner> yeah, it has nothing to do with the deployed kernel - it's about where the filesystem image being deployed was made in the first place
-*- dchinner needs to resurrect the old patches he had that automatically detected this condition and fixed it.
<dchinner> djwong: I suspect this is a good case for agfl scrub + repair at mount time :P
<chiluk> yeah I'd second that.
<sandeen> dchinner, urk sorry missed this, now need to run an errand, I'll try to catch you tonight.  mostly just wanted to coordinate on changes to your small mkfs series
<dchinner> (i.e. after the second phase of journal recovery, before EFIs and intents are processed)
<dchinner> sandeen: no worries
<chiluk> Alright so the recommendation would be to upgrade xfs_progs, and run xfs_repair after the kernel upgrade before the mount under the new kernel.
--> yocum (~yocum@2607:fb90:4b14:fba0:10b:5576:706e:1fa4) has joined #xfs
<dchinner> you don't need to upgrade the kernel to run the newer xfs_repair
<dchinner> just don't mount it on an older kernel after running the newer repair.
--> navidr (uid112413@gateway/web/irccloud.com/x-wxhyyksnbqkzqeia) has joined #xfs
<dchinner> djwong: I really need to go back and update and test those AGFL patches again :/
<djwong> yeah
<dchinner> I'm not even sure the approach I took in that patch set is the right way to do it anymore, either....
<djwong> at this point i suspect it might be easier to stuff it in scrub/agheader.c as one of the repair functions
<djwong> tbh as i read the other functions i started wondering if i ought to just shove it in the online repair patch set for 4.17
<djwong> then the only problem is, do we read/fix every agfl on every mount?
<djwong> (i guess at this point we dig through every AG's refcountbt on mount...)
irc-#xfs.txt (4,788 bytes)   
irc-#xfs.2.txt (8,557 bytes)   
<chiluk>  Hey I've been looking at the agfl issues I mentioned a few weeks ago (got distracted by meltdown/spectre), and back then I was told I needed to run xfs_repair 4.7+ against all our xfs volumes that were formated pre-96f859d52bcb and are now running post-96f859d52bcb kernels.  I've reviewed xfs_repair, but It's not immediately obvious why xfs_repair 4.7+ was required. Is there a specific commit in xfs_repair that is necessary?  Also what's the best way to run programattically xfs_repair against boot volumes?  I checked dracut but didn't find any xfs_repair module.
<sandeen> what goes into the initramfs depends on your distro I suppose
<chiluk> Also a side note...  formatting a disk with a centos 7 3.10 kernel and then upgrading to a 4.14 kernel still shows the kernel as a v4 filesystem. ([  194.700597] XFS (vdb2): Mounting V4 Filesystem)   What is required to upgrade a v4 filesystem to v5?
<sandeen> on my centos box, xfs_repair is there
<chiluk> in initramfs?  
<sandeen> yes
<chiluk> how is it supposed to be run.
<sandeen> well
<chiluk> i guess i expected it to be a dracut module.
<sandeen> we really need to make this better
<chiluk> but the xfs bits of the fs module are castrated.
<chiluk> yeah... 
<sandeen> if you can boot to single user and have the root fs mounted ro you can repair it
<chiluk> that'd be fine for one server.
<chiluk> we have a farm of roughly 200 that probably need this done to.
<sandeen> ok, so another option, please test it carefully first
<sandeen> modify the fsck.xfs script to detect the "-f" option
<sandeen> if it's present, call xfs_repair on the device
<sandeen> then set fsck.mode=force on the kernel commandline
<chiluk> lol centos 7 fsck.xfs is castrated as well.
<sandeen> as for upgrading from v4 to v5, you can't
<sandeen> fsck.xfs is like this upstream
<sandeen> it's intentional
<chiluk> probably because xfs_repair likely requires interaction.
<sandeen> no
<sandeen> because initscripts think they need to call fsck if the system wasn't cleanly shut down, and this is not the case for journaling filesystems
<chiluk> sandeen: is it possible to format a volume as v5 from the default centos 3.10 kernel?
<sandeen> there is no fsck work in general after an xfs crash
<sandeen> chiluk, it's not the kernel that formats it, it's xfsprogs.
<chiluk> really.. huh.
<sandeen> and yes, you can format an xfs volume with recent centos7 xfsprogs
<chiluk> I would've expected the kernel to take care of the actual heavy lifting of the format.
<sandeen> it's simply bits to a disk, that's  userspace task.
<chiluk> yep... I think ext4 does some parts in the kernel so it can do lazy inode creation, and return from the mkfs sooner.
<chiluk> alright so back to the problem at hand.
<chiluk> what makes v4.7 of xfsprogs special in order to solve our agfl issues..
<sandeen> so, i'll start by saying that this problem was introduced when you went off the reservation and ran upstream on a distro box ;)
<sandeen> and it'll get re-introduced if you go back to the distro kernel
<sandeen> chiluk, yes, mkfs.ext4 plays silly games like that ;)
<chiluk> yeah ... the realities of cloud life dictated the new kernel for overlayfs bits
<chiluk> and yes I agree we went off the res a bit.
<chiluk> although even though we're one of the first... many will eventually be hitting this.
<chiluk> i.e. when people start doing centos 7 -> 8 upgrades
<sandeen> well, they won't be supported by rhel or centos :)
<sandeen> no, we'll handle the problem gracefully by then
--> psychicist__ (~psychicis@ip127-8-212-87.adsl2.static.versatel.nl) has joined #xfs
<sandeen> the issue is a disk structure that was coming out to different sizes on different architectures
<chiluk> Yeah ... the decision to go elrepo kernel-ml was before my time.
<sandeen> so djwong packed it to make it consistent, but then that made it different sizes across different kernels on teh same architecture :(
<chiluk> yep.. I'm aware.
<sandeen> so crossing that boundary of when the packing was added causes issues.  on rhel7 we simply don't pack it so it doesn't change
<chiluk> yep..
<chiluk> I'm tempted to remove the change in our upstream kernel.. or possibly push the fix into the elrepo kernels
<chiluk> actually that's a great idea.
<chiluk> I may do just that.
<djwong> or review the series i sent for 4.17 to autofix all that malarky? :)
-*- sandeen blinks ... is there really no v4.6.0 tag in xfsprogs?
<chiluk> djwong:  linkage?
<djwong> "[RFC 0/5] xfs-4.17: fix v5 AGFL wrapping"
<sandeen> if elrepo carries patches that might not be a bad idea.
<chiluk> yeah they do carry the patch.. I'll pursue that option.
<chiluk> that's because it was pushed down through the linux-stable process afaik.
<sandeen> patching it out of rhel7 was a semi-unfortunate hack, but since we shipped 7.0GA w/o packing, it was ... most expedient.
<chiluk> ubuntu v4.4 kernels have it as well.
<chiluk> although I'm not sure if unstall medua started with the patch or acquired it later
<-- rwareing_ (~rwareing@199.201.64.130) has quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
-*- sandeen shrugs
<sandeen> you build your own distro, you get to keep all the pieces! :)
<chiluk> fun fun..
<chiluk> so sandeen djwong... do either of you know why running xfsprogs v4.7+ was the demarkation tag for resolving this?
<chiluk> assuming I don't pursue one of the other crazy solutions.
<sandeen> I did a quick spot check of commits and didn't see anything that rang a bell  - is this the thing where we print that version when we hit the corruption?  I don't remember.
<djwong> it was 4.5
<djwong> not 4.7
<chiluk> the commit went into the 4.5 kernel..
<sandeen> (we have no 4.4 tag either?!)
<sandeen> ok, so, that's when we simply packed the structure in xfs_repair as well
<chiluk> yeah which is why I was confused when dchinner told me to use xfsprogs v4.7+ 
<chiluk> "<dchinner> you need to run xfs_repair from xfsprogs >= 4.7.0 to fix it up"
<chiluk> that's back from way back.
<sandeen> dchinner doesn't know what he's talking about, he's a wanna-be hanger-on ;)
<chiluk> lol..
<chiluk> ok so now the question becomes ... does the centos v4.5 xfsprogs have the correct bits?
-*- chiluk goes to download the srpm
<chiluk> thanks for the help again guys... dchinner... if you figure out why you mentioned v4.7+ ... I'd love to hear it.
<sandeen> no, if we packed it in our repair, it would not handle the unpacked disk format the kernel writes
-*- sandeen wonders if it's ok w/ dchinner the tag-master if I go back and add those missing tags
<djwong> sometimes distros skip xfsprogs releases, so maybe rh never built a 4.5/4.6 package?
<djwong> speaking of which
<djwong> debian's xfsprogs package is old (4.9.0)
<chiluk> now that I can fix.
<chiluk> how important is it that the kernel match the version of xfsprogs?
<chiluk> I'm almost wondering if xfsprogs shouldn't be included in linux-tools directly..
<sandeen> chiluk, it's generally supposed to be compatible except at very clear feature/compatibility breakpoints, and then it's still safe, just not functional :)
<sandeen> i.e. old xfsprogs won't touch a v5 filesystem
<chiluk> first patch in centos xfsprogs v4.5 package "xfsprogs-4.5.0-revert-AGFL-pack.patch"
<chiluk> at least they are consistent.
<-- psychicist__ (~psychicis@ip127-8-212-87.adsl2.static.versatel.nl) has quit (Quit: leaving)
<sandeen> * Mon Jun 06 2016 Eric Sandeen <sandeen@redhat.com> 4.5.0-2
<sandeen> - Revert AGFL header packing (#1336920)
<sandeen> yeah
<sandeen> those guys are pretty smart ;)
--> rwareing_ (~rwareing@199.201.64.130) has joined #xfs
<dchinner> sandeen: we never released a xfsprogs 4.4.0 or 4.6.0
<sandeen> oh!
<sandeen> hah
<sandeen> ok
<sandeen> I forgot about back in the day :)
-*- sandeen didn't actually go look
<dchinner> chiluk: I said "need xfsprogs >= 4.7.0" because I misread the git describe output
<sandeen> :)
<dchinner> `git describe <commit id>` indexes it's output from the head tag in teh tree at the time it was committed
<dchinner> `git describe --contains <commit id>` indexes it's output from the next tag added to the tree /after/ the commit was made
<dchinner> $ git describe --contains 9fccb9f6deaa
<dchinner> v4.5.0-rc1~47
<dchinner> means it was in the 4.5.0-rc1 release, but I had a brain-fart and though it was committed after 4.5.0....
<chiluk> Ok thanks dchinner
<chiluk> That's what I was hoping
<dchinner> because I can't remember what I typed a second after typing .... what was I going to say?
<dchinner> :P
<chiluk> My faith in you has been reevaluated...;)
irc-#xfs.2.txt (8,557 bytes)   

Activities

chiluk

2018-02-21 14:10

reporter   ~0005712

Please see the spec file from centos xfsprogs for more evidence that this is necessary (http://vault.centos.org/7.4.1708/os/Source/SPackages/xfsprogs-4.5.0-12.el7.src.rpm). Specifically Patch0 xfsprogs-4.5.0-revert-AGFL-pack.patch.

You'll also notice that the centos kernels also do not have this patch even though it was explicitly pushed into the linux-stable trees.

chiluk

2018-02-21 14:18

reporter   ~0005713

As I did not explicitly state it, I attached a patch that should fix this problem.

toracat

2018-02-21 18:45

administrator   ~0005714

Last edited: 2018-02-21 18:48

We normally do not patch the kernel code. However it may make sense to apply the fix that addresses RHEL-specific issues. We will discuss this problem and get back to you.

chiluk

2018-02-22 10:38

reporter   ~0005717

Let me know if you have any questions that I might be able to answer. I'd love to be included in the conversation. I also joined the elrepo freenode channel in case it's easier to get in touch with me there.

chiluk

2018-02-22 10:59

reporter   ~0005718

Attaching two irc conversations with xfs developers to provide more context as to why I think this is the correct solution.

toracat

2018-02-22 11:10

administrator   ~0005719

Thank you. That is informative.

burakkucat

2018-02-22 11:15

administrator   ~0005720

Acknowledging both: (1) your report & (2) your attached diff.

This clearly needs to be addressed for both the EL7 kernel-lt & kernel-ml packages.

My feeling is that it should also be applied for both the EL6 kernel-lt & kernel-ml packages and will do so . . . unless I can be convinced otherwise.

I am expecting the release of linux-4.4.117 & linux-4.15.5 sources sometime tomorrow and intend to incorporate the fix in the respective kernel builds.

TrevorH

2018-02-22 11:24

reporter   ~0005721

The irc transcripts attached say this was a change that went into kernel 4.5 so the 4.4 series probably don't want or need this patch.

toracat

2018-02-22 11:36

administrator   ~0005722

Well, GregKH is very good at backporting the patches ...

chiluk

2018-02-22 12:35

reporter   ~0005726

@burakkucat. Yes you will want this for the EL6 repos as well.

@TrevorH the fix was pushed into the linux-stable streams iiuc, and yes it has been integrated into both the 4.4 linux-stable stream and the Ubuntu 4.4 kernels.

Interesting thing about Ubuntu is that we have the opposite problem in that we have the change pack in the kernel, but not the pack in the userspace xfsprogs.

toracat

2018-02-22 13:05

administrator   ~0005728

We are all set to go, it seems.

burakkucat

2018-02-22 14:22

administrator   ~0005733

Earlier than expected, both linux-4.4.117.tar.xz and linux-4.15.5.tar.xz have been released upstream.

chiluk

2018-02-22 15:13

reporter   ~0005735

This may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1314605 , which is linked from http://elrepo.org/tiki/kernel-ml. I will likely take a look at that tomorrow.

burakkucat

2018-02-23 13:53

administrator   ~0005737

With the recent release of the kernel-lt-4.4.117-1.el{6|7}.elrepo [1][2] and the kernel-ml-4.15.5-1.el{6|7}.elrepo [3][4] package sets, which incorporate the recommended patch, this request has been resolved.

Now closing as "resolved/fixed".

[1] http://lists.elrepo.org/pipermail/elrepo/2018-February/004126.html
[2] http://lists.elrepo.org/pipermail/elrepo/2018-February/004127.html
[3] http://lists.elrepo.org/pipermail/elrepo/2018-February/004128.html
[4] http://lists.elrepo.org/pipermail/elrepo/2018-February/004129.html

Issue History

Date Modified Username Field Change
2018-02-21 14:01 chiluk New Issue
2018-02-21 14:01 chiluk Status new => assigned
2018-02-21 14:01 chiluk Assigned To => dag
2018-02-21 14:01 chiluk File Added: kernel-xfs.diff
2018-02-21 14:10 chiluk Note Added: 0005712
2018-02-21 14:18 chiluk Note Added: 0005713
2018-02-21 18:38 toracat Assigned To dag => burakkucat
2018-02-21 18:45 toracat Note Added: 0005714
2018-02-21 18:48 toracat Note Edited: 0005714
2018-02-22 00:31 pperry Priority normal => high
2018-02-22 10:38 chiluk Note Added: 0005717
2018-02-22 10:59 chiluk Note Added: 0005718
2018-02-22 11:00 chiluk File Added: irc-#xfs.txt
2018-02-22 11:01 chiluk File Added: irc-#xfs.2.txt
2018-02-22 11:10 toracat Note Added: 0005719
2018-02-22 11:15 burakkucat Note Added: 0005720
2018-02-22 11:24 TrevorH Note Added: 0005721
2018-02-22 11:36 toracat Note Added: 0005722
2018-02-22 12:35 chiluk Note Added: 0005726
2018-02-22 13:05 toracat Note Added: 0005728
2018-02-22 14:22 burakkucat Note Added: 0005733
2018-02-22 15:13 chiluk Note Added: 0005735
2018-02-23 13:53 burakkucat Note Added: 0005737
2018-02-23 13:53 burakkucat Status assigned => resolved
2018-02-23 13:53 burakkucat Resolution open => fixed