linux/fs
Jiaying Zhang 3889fd57ea ext4: flush the i_completed_io_list during ext4_truncate
Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.

The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls 
ext4_convert_unwritten_extents(), we may reallocate some blocks that have 
been truncated, so the inode size becomes inconsistent with the allocated
blocks. 

The following patch flushes the i_completed_io_list during truncate to reduce 
the risk that some pending end_io requests are executed later and convert 
already truncated blocks to initialized. 

Note that although the fix helps reduce the problem a lot there may still 
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.

Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:

a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold 
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.

b) We change the ext4 write path so that we change the extent tree to contain 
the newly allocated blocks and update i_size both at the same time --- when 
the write of the data blocks is completed.

c) We add some additional locking to synchronize vmtruncate() and 
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.

All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-10 12:47:05 -05:00
..
9p convert v9fs 2010-10-29 04:16:38 -04:00
adfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
affs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
afs convert afs 2010-10-29 04:17:13 -04:00
autofs4 autofs4 - remove ioctl mutex (bz23142) 2010-12-07 07:45:44 -08:00
befs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
bfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
btrfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2010-12-14 11:08:13 -08:00
cachefiles llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
ceph ceph: fix ioctl magic 2010-12-06 09:45:22 -08:00
cifs cifs: remove bogus remapping of error in cifs_filldir() 2010-12-08 18:47:54 +00:00
coda convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
configfs convert get_sb_single() users 2010-10-29 04:16:28 -04:00
cramfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
debugfs convert get_sb_single() users 2010-10-29 04:16:28 -04:00
devpts convert get_sb_single() users 2010-10-29 04:16:28 -04:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm 2010-10-22 17:33:16 -07:00
ecryptfs BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
efs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
exofs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
exportfs exportfs: use dget_parent 2010-10-25 21:26:13 -04:00
ext2 ext2: remove dead code in ext2_xattr_get 2011-01-10 12:10:37 -05:00
ext3 ext2,ext3,ext4: clarify comment for extN_xattr_set_handle 2011-01-10 12:10:30 -05:00
ext4 ext4: flush the i_completed_io_list during ext4_truncate 2011-01-10 12:47:05 -05:00
fat new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
freevxfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
fscache Add a dummy printk function for the maintenance of unused printks 2010-08-12 09:51:35 -07:00
fuse fuse: verify ioctl retries 2010-11-30 16:39:27 +01:00
gfs2 GFS2: Userland expects quota limit/warn/usage in 512b blocks 2010-11-19 11:20:29 +00:00
hfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
hfsplus new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
hostfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
hpfs Merge branches 'irq-core-for-linus' and 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-31 20:40:24 -04:00
hppfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
hugetlbfs hugetlbfs: lessen the impact of a deprecation warning 2010-11-12 07:55:32 -08:00
isofs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
jbd Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-10-27 20:13:18 -07:00
jbd2 ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary 2011-01-10 12:29:43 -05:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2010-10-30 08:31:35 -07:00
jfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
lockd BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
logfs fs: logfs: Fix up MTD=y build. 2010-11-01 16:34:56 -04:00
minix new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ncpfs BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
nfs NFS: Fix panic after nfs_umount() 2010-12-10 13:01:50 -05:00
nfs_common
nfsd nfsd: Fix possible BUG_ON firing in set_change_info 2010-12-08 11:44:04 -05:00
nilfs2 nilfs2: fix typo in comment of nilfs_dat_move function 2010-11-24 12:51:48 +09:00
nls
notify make fanotify_read() restartable across signals 2010-10-30 14:07:35 -04:00
ntfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ocfs2 ocfs2_connection_find() returns pointer to bad structure 2010-11-18 15:41:41 -08:00
omfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
openpromfs sparc: fix openpromfs compile 2010-11-08 14:29:39 -08:00
partitions Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2010-10-25 07:45:10 -07:00
proc Revert "vfs: show unreachable paths in getcwd and proc" 2010-12-05 16:39:45 -08:00
qnx4 new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
quota quota: Fix possible oops in __dquot_initialize() 2010-10-28 01:30:06 +02:00
ramfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
reiserfs reiserfs: don't acquire lock recursively in reiserfs_acl_chmod 2010-12-02 14:51:15 -08:00
romfs convert get_sb_mtd() users to ->mount() 2010-10-29 04:16:26 -04:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus 2010-10-29 08:48:58 -07:00
sysfs convert sysfs 2010-10-29 04:17:08 -04:00
sysv new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ubifs convert ubifs 2010-10-29 04:16:36 -04:00
udf new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
ufs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
xfs xfs: log timestamp changes to the source inode in rename 2010-12-09 17:07:02 -06:00
aio.c new helper: ihold() 2010-10-25 21:26:11 -04:00
anon_inodes.c convert get_sb_pseudo() users 2010-10-29 04:16:33 -04:00
attr.c check ATTR_SIZE contraints in inode_change_ok 2010-08-09 16:47:39 -04:00
bad_inode.c bkl: Remove locked .ioctl file operation 2010-08-14 00:24:24 +02:00
binfmt_aout.c Don't dump task struct in a.out core-dumps 2010-10-14 10:57:40 -07:00
binfmt_elf_fdpic.c
binfmt_elf.c ARM: 6342/1: fix ASLR of PIE executables 2010-10-08 10:02:53 +01:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c convert get_sb_single() users 2010-10-29 04:16:28 -04:00
binfmt_script.c Make do_execve() take a const filename pointer 2010-08-17 18:07:43 -07:00
binfmt_som.c
bio-integrity.c fs/bio-integrity.c: return -ENOMEM on kmalloc failure 2010-08-23 13:36:59 +02:00
bio.c bio: take care not overflow page count when mapping/copying user data 2010-11-10 14:40:43 +01:00
block_dev.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
buffer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
char_dev.c Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl 2010-10-22 10:52:56 -07:00
compat_binfmt_elf.c
compat_ioctl.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
compat.c exec: copy-and-paste the fixes into compat_do_execve() paths 2010-11-30 17:56:38 -08:00
dcache.c fs: use RCU read side protection in d_validate 2010-10-25 21:26:13 -04:00
dcookies.c
direct-io.c fs/direct-io.c: fix truncation error in dio_complete() return 2010-10-26 16:52:13 -07:00
drop_caches.c simplify checks for I_CLEAR/I_FREEING 2010-08-09 16:47:44 -04:00
eventfd.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
eventpoll.c epoll: make epoll_wait() use the hrtimer range feature 2010-10-27 18:03:18 -07:00
exec.c install_special_mapping skips security_file_mmap check. 2010-12-15 12:30:36 -08:00
fcntl.c fasync: Fix placement of FASYNC flag comment 2010-10-27 18:17:02 -07:00
fifo.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
file_table.c fs: allow for more than 2^31 files 2010-10-26 16:52:15 -07:00
file.c vfs: use kmalloc() to allocate fdmem if possible 2010-08-11 08:59:02 -07:00
filesystems.c
fs_struct.c fs: fs_struct rwlock to spinlock 2010-08-18 08:35:46 -04:00
fs-writeback.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2010-10-30 09:05:48 -07:00
generic_acl.c vfs: update ctime when changing the file's permission by setfacl 2010-08-18 01:04:22 -04:00
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
internal.h braino in internal.h 2010-10-29 05:49:13 -04:00
ioctl.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2010-11-19 19:46:45 -08:00
ioprio.c ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() 2010-11-15 10:23:31 +01:00
Kconfig Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
Kconfig.binfmt coredump: default CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y 2010-10-27 18:03:12 -07:00
libfs.c convert get_sb_pseudo() users 2010-10-29 04:16:33 -04:00
locks.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
Makefile Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
mbcache.c mbcache: Limit the maximum number of cache entries 2010-08-18 06:24:41 -04:00
mpage.c
namei.c fix open/umount race 2010-10-29 04:14:56 -04:00
namespace.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
nfsctl.c
no-block.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
open.c fix open/umount race 2010-10-29 04:14:56 -04:00
pipe.c Un-inline get_pipe_info() helper function 2010-11-28 16:27:19 -08:00
pnode.c fs: brlock vfsmount_lock 2010-08-18 08:35:48 -04:00
pnode.h
posix_acl.c
read_write.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
read_write.h
readdir.c vfs: fix warning: 'dirent' is used uninitialized in this function 2010-08-09 20:45:05 -07:00
select.c epoll: make epoll_wait() use the hrtimer range feature 2010-10-27 18:03:18 -07:00
seq_file.c fs: take dcache_lock inside __d_path 2010-10-25 21:26:12 -04:00
signalfd.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2010-10-26 10:13:10 -07:00
splice.c Export 'get_pipe_info()' to other users 2010-11-28 14:09:57 -08:00
stack.c
stat.c Mark arguments to certain syscalls as being const 2010-08-13 16:53:13 -07:00
statfs.c add f_flags to struct statfs(64) 2010-08-09 16:48:44 -04:00
super.c switch get_sb_ns() users 2010-10-29 04:17:03 -04:00
sync.c get rid of file_fsync() 2010-08-09 16:47:43 -04:00
timerfd.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
utimes.c Mark arguments to certain syscalls as being const 2010-08-13 16:53:13 -07:00
xattr_acl.c
xattr.c