linux/fs
David Chinner 59a33f9f77 [XFS] Ensure a btree insert returns a valid cursor.
When writing into preallocated regions there is a case where XFS can oops
or hang doing the unwritten extent conversion on I/O completion. It turns
out that the problem is related to the btree cursor being invalid.

When we do an insert into the tree, we may need to split blocks in the
tree. When we only split at the leaf level (i.e. level 0), everything
works just fine. However, if we have a multi-level split in the btreee,
the cursor passed to the insert function is no longer valid once the
insert is complete.

The leaf level split is handled correctly because all the operations at
level 0 are done using the original cursor, hence it is updated correctly.
However, when we need to update the next level up the tree, we don't use
that cursor - we use a cloned cursor that points to the index in the next
level up where we need to do the insert.

Hence if we need to split a second level, the changes to the tree are
reflected in the cloned cursor and not the original cursor. This
clone-and-move-up-a-level-on-split behaviour recurses all the way to the
top of the tree.

The complexity here is that these cloned cursors do not point to the
original index that was inserted - they point to the newly allocated block
(the right block) and the original cursor pointer to that level may still
point to the left block. Hence, without deep examination of the cloned
cursor and buffers, we cannot update the original cursor with the new path
from the cloned cursor.

In these cases the original cursor could be pointing to the wrong block(s)
and hence a subsequent modification to the tree using that cursor will
lead to corruption of the tree.

The crash case occurs when the tree changes height - we insert a new level
in the tree, and the cursor does not have a buffer in it's path for that
level. Hence any attempt to walk back up the cursor to the root block will
result in a null pointer dereference.

To make matters even more complex, the BMAP BT is rooted in an inode, so
we can have a change of height in the btree *without a root split*. That
is, if the root block in the inode is full when we split a leaf node, we
cannot fit the pointer to the new block in the root, so we allocate a new
block, migrate all the ptrs out of the inode into the new block and point
the inode root block at the newly allocated block. This changes the height
of the tree without a root split having occurred and hence invalidates the
path in the original cursor.

The patch below prevents xfs_bmbt_insert() from returning with an invalid
cursor by detecting the cases that invalidate the original cursor and
refresh it by do a lookup into the btree for the original index we were
inserting at.

Note that the INOBT, AGFBNO and AGFCNT btree implementations also have
this bug, but the cursor is currently always destroyed or revalidated
after an insert for those trees. Hence this patch only address the problem
in the BMBT code.

SGI-PV: 979339
SGI-Modid: xfs-linux-melb:xfs-kern:30701a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18 11:42:21 +10:00
..
9p Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) 2008-02-07 08:42:26 -08:00
adfs mount options: fix adfs 2008-02-08 09:22:39 -08:00
affs mount options: fix affs 2008-02-08 09:22:39 -08:00
afs AFS: Do not describe debug parameters with their value 2008-04-16 07:43:48 -07:00
autofs mount options: fix autofs 2008-02-08 09:22:40 -08:00
autofs4 Introduce path_put() 2008-02-14 21:13:33 -08:00
befs mount options: fix befs 2008-02-08 09:22:40 -08:00
bfs iget: stop BFS from using iget() and read_inode() 2008-02-07 08:42:27 -08:00
cifs cifs: fix misannotations 2008-03-30 14:20:23 -07:00
coda Introduce path_put() 2008-02-14 21:13:33 -08:00
configfs Introduce path_put() 2008-02-14 21:13:33 -08:00
cramfs fs/cramfs/inode.c: replace hardcoded value with preprocessor constant 2007-10-18 14:37:29 -07:00
debugfs debugfs: fix sparse warnings 2008-03-04 14:47:06 -08:00
devpts mount options: fix devpts 2008-02-08 09:22:40 -08:00
dlm dlm: fix rcom_names message to self 2008-02-21 15:19:54 -06:00
ecryptfs eCryptfs: Swap dput() and mntput() 2008-03-19 18:53:36 -07:00
efs efs: update error msg to not refer to deleted read_inode() 2008-04-02 15:28:19 -07:00
exportfs exportfs: update documentation 2007-10-22 08:13:21 -07:00
ext2 vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs 2008-04-15 19:35:41 -07:00
ext3 vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs 2008-04-15 19:35:41 -07:00
ext4 vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs 2008-04-15 19:35:41 -07:00
fat mount options: fix fat 2008-02-08 09:22:40 -08:00
freevxfs iget: stop FreeVXFS from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
fuse fuse: fix permission checking 2008-02-23 17:12:13 -08:00
gfs2 Introduce path_put() 2008-02-14 21:13:33 -08:00
hfs hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakage 2008-03-17 09:46:55 -07:00
hfsplus HFS+: fix unlink of links 2008-04-10 13:37:51 -07:00
hostfs uml: fix hostfs tv_usec calculations 2008-02-05 09:44:30 -08:00
hpfs mount options: fix hpfs 2008-02-08 09:22:40 -08:00
hppfs [PATCH] sanitize hppfs 2008-03-19 06:42:18 -04:00
hugetlbfs [PATCH] double iput() on failure exit in hugetlb 2008-03-19 06:55:01 -04:00
isofs zisofs: fix readpage() outside i_size 2008-03-19 18:53:36 -07:00
jbd jbd/jbd2 NULL noise 2008-03-30 14:18:41 -07:00
jbd2 jbd/jbd2 NULL noise 2008-03-30 14:18:41 -07:00
jffs2 JFFS2 Fix of panics caused by wrong condition for hole frag creation in write_begin 2008-04-14 15:43:14 -07:00
jfs BKL-removal: Implement a compat_ioctl handler for JFS 2008-02-07 13:45:29 -06:00
lockd Wrap buffers used for rpc debug printks into RPC_IFDEBUG 2008-02-21 18:42:29 -05:00
minix iget: stop the MINIX filesystem from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
msdos
ncpfs mount options: fix ncpfs 2008-02-08 09:22:40 -08:00
nfs fix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_private() 2008-04-08 21:06:56 -04:00
nfs_common
nfsd nfsd: fix oops on access from high-numbered ports 2008-03-14 16:49:15 -07:00
nls sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
ntfs is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
ocfs2 ocfs2: Fix NULL pointer dereferences in o2net 2008-03-10 15:14:19 -07:00
openpromfs iget: stop OPENPROMFS from using iget() and read_inode() 2008-02-07 08:42:29 -08:00
partitions Enhanced partition statistics: remove old partition statistics 2008-02-08 12:42:01 +01:00
proc Change pagemap output format to allow for future reporting of huge pages 2008-03-22 17:03:10 -07:00
qnx4 iget: stop QNX4 from using iget() and read_inode() 2008-02-07 08:42:28 -08:00
ramfs Remove valueless definition of hard-selected RAMFS option 2007-10-17 08:42:56 -07:00
reiserfs NULL noise: fs/*, mm/*, kernel/* 2008-03-30 14:18:41 -07:00
romfs ROMFS: Fix up an error in iget removal 2008-03-19 18:53:36 -07:00
smbfs NULL noise: fs/*, mm/*, kernel/* 2008-03-30 14:18:41 -07:00
sysfs driver core: debug for bad dev_attr_show() return value. 2008-03-24 22:33:49 -07:00
sysv iget: stop the SYSV filesystem from using iget() and read_inode() 2008-02-07 08:42:29 -08:00
udf udf: fix udf_add_free_space 2008-02-13 16:21:20 -08:00
ufs fs/ufs/balloc.c: fix sparc64 printk warning 2008-03-19 18:53:37 -07:00
vfat Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) 2008-02-07 08:42:26 -08:00
xfs [XFS] Ensure a btree insert returns a valid cursor. 2008-04-18 11:42:21 +10:00
aio.c eventfd/kaio integration fix 2008-04-11 08:06:43 -07:00
anon_inodes.c [PATCH] fix up new filp allocators 2008-03-19 06:54:05 -04:00
attr.c VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations 2007-10-18 14:37:22 -07:00
bad_inode.c iget: introduce a function to register iget failure 2008-02-07 08:42:26 -08:00
binfmt_aout.c aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT 2008-02-08 09:22:30 -08:00
binfmt_elf_fdpic.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
binfmt_elf.c core dump: user_regset writeback 2008-03-04 16:35:10 -08:00
binfmt_em86.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_flat.c FLAT binaries: drop BINFMT_FLAT bad header magic warning 2008-02-14 20:58:05 -08:00
binfmt_misc.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_script.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_som.c aout: remove unnecessary inclusions of {asm, linux}/a.out.h 2008-02-08 09:22:30 -08:00
bio.c Revert "unexport bio_{,un}map_user" 2008-03-17 21:14:40 +01:00
block_dev.c fs/block_dev.c: remove #if 0'ed code 2008-02-19 10:04:00 +01:00
buffer.c Be more careful about marking buffers dirty 2008-04-04 14:38:17 -07:00
char_dev.c fs/char_dev.c: chrdev_open marked static and removed from fs.h 2008-02-08 09:22:42 -08:00
compat_binfmt_elf.c x86: compat_binfmt_elf 2008-01-30 13:31:46 +01:00
compat_ioctl.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
compat.c Merge branch 'linus_origin' into hotfixes 2008-02-15 13:36:30 -05:00
dcache.c dentries: Extract common code to remove dentry from lru 2008-02-14 21:17:09 -08:00
dcookies.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
direct-io.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
dnotify.c mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
dquot.c quota: add possibly missing iput() when quotaon and quotaoff races 2008-03-19 18:53:35 -07:00
drop_caches.c invalidate_mapping_pages(): add cond_resched 2007-07-16 09:05:36 -07:00
eventfd.c fs/eventfd.c should #include <linux/syscalls.h> 2008-02-06 10:41:03 -08:00
eventpoll.c lockdep: annotate epoll 2008-02-05 09:44:07 -08:00
exec.c Allow ARG_MAX execve string space even with a small stack limit 2008-03-03 10:12:14 -08:00
fcntl.c fs: remove fastcall, it is always empty 2008-02-08 09:22:31 -08:00
fifo.c
file_table.c [PATCH] fix up new filp allocators 2008-03-19 06:54:05 -04:00
file.c get rid of NR_OPEN and introduce a sysctl_nr_open 2008-02-06 10:41:06 -08:00
filesystems.c
fs-writeback.c fs: fix kernel-doc notation warnings 2008-03-19 18:53:36 -07:00
generic_acl.c Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check 2007-07-17 12:00:03 -07:00
inode.c iget: remove iget() and the read_inode() super op as being obsolete 2008-02-07 08:42:29 -08:00
inotify_user.c Introduce path_put() 2008-02-14 21:13:33 -08:00
inotify.c inotify: remove debug code 2008-02-06 10:41:07 -08:00
internal.h
ioctl.c fix up kerneldoc in fs/ioctl.c a little bit 2008-02-09 11:08:33 -08:00
ioprio.c cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions 2008-01-28 11:38:15 +01:00
Kconfig Documentation: move nfsroot.txt to filesystems/ 2008-04-11 13:18:01 -06:00
Kconfig.binfmt aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT 2008-02-08 09:22:30 -08:00
libfs.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
locks.c locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs 2008-04-14 12:22:14 -07:00
Makefile x86: compat_binfmt_elf Kconfig 2008-01-30 13:31:46 +01:00
mbcache.c vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs 2008-04-15 19:35:41 -07:00
mpage.c docbook: fix filesystems.tmpl source files 2008-03-03 10:47:13 -08:00
namei.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
namespace.c [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock 2008-03-27 20:48:04 -04:00
nfsctl.c Introduce path_put() 2008-02-14 21:13:33 -08:00
no-block.c
open.c asmlinkage_protect replaces prevent_tail_call 2008-04-10 17:28:26 -07:00
pipe.c [PATCH] fix up new filp allocators 2008-03-19 06:54:05 -04:00
pnode.c [PATCH] count ghost references to vfsmounts 2008-03-27 20:47:46 -04:00
pnode.h [PATCH] new helpers - collect_mounts() and release_collected_mounts() 2007-10-21 02:37:25 -04:00
posix_acl.c
quota_v1.c
quota_v2.c
quota.c Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) 2008-02-07 08:42:26 -08:00
read_write.c remove the unused exports of sys_open/sys_read 2008-02-08 09:22:36 -08:00
read_write.h
readdir.c Use mutex_lock_killable in vfs_readdir 2007-12-06 17:39:54 -05:00
select.c make sys_poll() wait at least timeout ms 2008-02-06 10:41:09 -08:00
seq_file.c d_path: Make d_path() use a struct path 2008-02-14 21:17:09 -08:00
signalfd.c signalfd: fix for incorrect SI_QUEUE user data reporting 2008-04-11 08:06:44 -07:00
splice.c splice: fix infinite loop in generic_file_splice_read() 2008-04-10 08:24:25 +02:00
stack.c
stat.c Introduce path_put() 2008-02-14 21:13:33 -08:00
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
sync.c
timerfd.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
utimes.c Introduce path_put() 2008-02-14 21:13:33 -08:00
xattr_acl.c
xattr.c Introduce path_put() 2008-02-14 21:13:33 -08:00