linux/fs
Dave Chinner 7bc0dc271e xfs: rework remote attr CRCs
Note: this changes the on-disk remote attribute format. I assert
that this is OK to do as CRCs are marked experimental and the first
kernel it is included in has not yet reached release yet. Further,
the userspace utilities are still evolving and so anyone using this
stuff right now is a developer or tester using volatile filesystems
for testing this feature. Hence changing the format right now to
save longer term pain is the right thing to do.

The fundamental change is to move from a header per extent in the
attribute to a header per filesytem block in the attribute. This
means there are more header blocks and the parsing of the attribute
data is slightly more complex, but it has the advantage that we
always know the size of the attribute on disk based on the length of
the data it contains.

This is where the header-per-extent method has problems. We don't
know the size of the attribute on disk without first knowing how
many extents are used to hold it. And we can't tell from a
mapping lookup, either, because remote attributes can be allocated
contiguously with other attribute blocks and so there is no obvious
way of determining the actual size of the atribute on disk short of
walking and mapping buffers.

The problem with this approach is that if we map a buffer
incorrectly (e.g. we make the last buffer for the attribute data too
long), we then get buffer cache lookup failure when we map it
correctly. i.e. we get a size mismatch on lookup. This is not
necessarily fatal, but it's a cache coherency problem that can lead
to returning the wrong data to userspace or writing the wrong data
to disk. And debug kernels will assert fail if this occurs.

I found lots of niggly little problems trying to fix this issue on a
4k block size filesystem, finally getting it to pass with lots of
fixes. The thing is, 1024 byte filesystems still failed, and it was
getting really complex handling all the corner cases that were
showing up. And there were clearly more that I hadn't found yet.

It is complex, fragile code, and if we don't fix it now, it will be
complex, fragile code forever more.

Hence the simple fix is to add a header to each filesystem block.
This gives us the same relationship between the attribute data
length and the number of blocks on disk as we have without CRCs -
it's a linear mapping and doesn't require us to guess anything. It
is simple to implement, too - the remote block count calculated at
lookup time can be used by the remote attribute set/get/remove code
without modification for both CRC and non-CRC filesystems. The world
becomes sane again.

Because the copy-in and copy-out now need to iterate over each
filesystem block, I moved them into helper functions so we separate
the block mapping and buffer manupulations from the attribute data
and CRC header manipulations. The code becomes much clearer as a
result, and it is a lot easier to understand and debug. It also
appears to be much more robust - once it worked on 4k block size
filesystems, it has worked without failure on 1k block size
filesystems, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit ad1858d777)
2013-05-30 17:26:31 -05:00
..
9p aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
adfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
affs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
afs aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
autofs4 autofs - remove autofs dentry mount check 2013-05-06 13:06:59 -07:00
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-04-30 09:36:50 -07:00
bfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-05-09 13:07:40 -07:00
cachefiles lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
ceph aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
cifs cifs: small variable name cleanup 2013-05-04 22:18:10 -05:00
coda lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
configfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
cramfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
debugfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
devpts fs: Limit sys_mount to only request filesystem modules (Part 2). 2013-03-07 01:08:55 -08:00
dlm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
ecryptfs Improve performance when AES-NI (and most likely other crypto accelerators) is 2013-05-10 09:20:01 -07:00
efivarfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
efs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
exofs block: Add bio_for_each_segment_all() 2013-03-23 14:26:28 -07:00
exportfs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ext2 aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
ext3 Merge branch 'akpm' (incoming from Andrew) 2013-05-07 20:49:51 -07:00
ext4 Merge branch 'akpm' (incoming from Andrew) 2013-05-07 20:49:51 -07:00
f2fs f2fs updates for v3.10 2013-05-08 15:11:48 -07:00
fat aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
freevxfs fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
fscache fs/fscache/stats.c: fix memory leak 2013-04-29 15:54:27 -07:00
fuse Merge branch 'akpm' (incoming from Andrew) 2013-05-07 20:49:51 -07:00
gfs2 Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
hfs Merge branch 'akpm' (incoming from Andrew) 2013-05-07 20:49:51 -07:00
hfsplus aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
hostfs hostfs: use kmalloc instead of kzalloc 2013-05-04 15:48:45 -04:00
hpfs hpfs: move setting hpfs-private i_dirty to ->write_end() 2013-04-09 14:12:55 -04:00
hppfs hppfs: get rid of ->fsync() 2013-04-29 15:41:42 -04:00
hugetlbfs hugetlbfs: fix mmap failure in unaligned size request 2013-05-07 18:38:27 -07:00
isofs fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
jbd Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2013-05-03 09:56:25 -07:00
jbd2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
jffs2 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
jfs Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
lockd LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot 2013-04-21 18:08:42 -04:00
logfs block: Remove bi_idx references 2013-03-23 14:15:31 -07:00
minix fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
ncpfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
nfs More NFS client bugfixes for 3.10 2013-05-09 10:24:54 -07:00
nfs_common nfs_common: Update the translation between nfsv3 acls linux posix acls 2013-02-13 06:15:14 -08:00
nfsd Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux 2013-05-10 09:28:55 -07:00
nilfs2 aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
nls
notify unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
ntfs aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
ocfs2 aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
omfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
openpromfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
proc Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-05-07 08:42:20 -07:00
pstore Couple of pstore cleanups 2013-05-09 16:42:10 -07:00
qnx4 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
qnx6 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
quota quota: add missing use of dq_data_lock in __dquot_initialize 2013-03-11 22:05:56 +01:00
ramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
reiserfs Merge branch 'akpm' (incoming from Andrew) 2013-05-07 20:49:51 -07:00
romfs romfs: fix nommu map length to keep inside filesystem 2013-04-29 09:17:57 +10:00
squashfs fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
sysfs sysfs: check if one entry has been removed before freeing 2013-04-05 15:35:52 -07:00
sysv fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
ubifs aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
udf aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
ufs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-04-30 09:36:50 -07:00
xfs xfs: rework remote attr CRCs 2013-05-30 17:26:31 -05:00
aio.c aio: kill ki_retry 2013-05-07 19:46:02 -07:00
anon_inodes.c get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero 2013-02-26 02:46:11 -05:00
attr.c
bad_inode.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
binfmt_aout.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
binfmt_elf_fdpic.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
binfmt_elf.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
binfmt_em86.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_flat.c new helper: read_code() 2013-04-29 15:40:23 -04:00
binfmt_misc.c binfmt_misc: reuse string_unescape_inplace() 2013-04-30 17:04:03 -07:00
binfmt_script.c exec: do not leave bprm->interp on stack 2012-12-20 17:40:19 -08:00
binfmt_som.c
bio-integrity.c bio-integrity: Add explicit field for owner of bip_buf 2013-03-23 14:26:34 -07:00
bio.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
block_dev.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
buffer.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c Removed unused typedef to avoid "unused local typedef" warnings. 2013-05-04 15:03:05 -04:00
compat.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
coredump.c do_coredump(): don't wait for thaw if coredump has already been interrupted 2013-05-04 14:45:54 -04:00
coredump.h
dcache.c vfs: use list_move instead of list_del/list_add 2013-05-04 15:43:02 -04:00
dcookies.c consolidate compat lookup_dcookie() 2013-03-03 23:00:23 -05:00
direct-io.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
drop_caches.c
eventfd.c fs, eventfd: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
eventpoll.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
exec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
fcntl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
fhandle.c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux 2012-12-20 14:04:11 -08:00
file_table.c cache the value of file_inode() in struct file 2013-03-01 19:48:30 -05:00
file.c don't bother with deferred freeing of fdtables 2013-05-01 17:31:42 -04:00
filesystems.c fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
fs_struct.c constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
fs-writeback.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
generic_acl.c
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
internal.h pipe: fold file_operations instances in one 2013-04-09 14:12:58 -04:00
ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
ioprio.c
Kconfig efivarfs: Move to fs/efivarfs 2013-04-17 13:25:09 +01:00
Kconfig.binfmt fs: make binfmt support for #! scripts modular and removable 2013-04-30 17:04:04 -07:00
libfs.c vfs: drop vmtruncate 2012-12-20 18:46:29 -05:00
locks.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
mbcache.c
mount.h get rid of full-hash scan on detaching vfsmounts 2013-04-09 14:12:52 -04:00
mpage.c
namei.c Merge git://git.infradead.org/users/eparis/audit 2013-05-11 14:29:11 -07:00
namespace.c create_mnt_ns: unidiomatic use of list_add() 2013-05-04 15:18:53 -04:00
no-block.c
open.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
pipe.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
pnode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
pnode.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
posix_acl.c
proc_namespace.c
read_write.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
readdir.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
select.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
seq_file.c new helper: single_open_size() 2013-04-09 14:13:29 -04:00
signalfd.c switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE 2013-03-03 22:58:46 -05:00
splice.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
stack.c
stat.c switch vfs_getattr() to struct path 2013-02-26 02:46:08 -05:00
statfs.c vfs: fix user_statfs to retry once on ESTALE errors 2012-12-20 18:50:07 -05:00
super.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
sync.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
timerfd.c compat: restore timerfd settime and gettime compat syscalls 2013-03-02 09:35:13 -05:00
utimes.c vfs: allow utimensat() calls to retry once on an ESTALE error 2012-12-20 18:50:08 -05:00
xattr_acl.c
xattr.c vfs: make lremovexattr retry once on ESTALE error 2012-12-20 18:50:11 -05:00