linux/fs/ext4
Jiaying Zhang 3889fd57ea ext4: flush the i_completed_io_list during ext4_truncate
Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.

The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls 
ext4_convert_unwritten_extents(), we may reallocate some blocks that have 
been truncated, so the inode size becomes inconsistent with the allocated
blocks. 

The following patch flushes the i_completed_io_list during truncate to reduce 
the risk that some pending end_io requests are executed later and convert 
already truncated blocks to initialized. 

Note that although the fix helps reduce the problem a lot there may still 
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.

Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:

a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold 
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.

b) We change the ext4 write path so that we change the extent tree to contain 
the newly allocated blocks and update i_size both at the same time --- when 
the write of the data blocks is completed.

c) We add some additional locking to synchronize vmtruncate() and 
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.

All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-10 12:47:05 -05:00
..
acl.c ext4: update ctime when changing the file's permission by setfacl 2010-06-15 12:19:59 -04:00
acl.h ext[234]: move over to 'check_acl' permission model 2009-09-08 11:09:04 -07:00
balloc.c ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED 2011-01-10 12:12:36 -05:00
bitmap.c ext4: Change unsigned long to unsigned int 2008-11-05 00:14:04 -05:00
block_validity.c ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*() 2010-10-27 21:30:14 -04:00
dir.c ext4: Use ext4_error_file() to print the pathname to the corrupted inode 2011-01-10 12:10:55 -05:00
ext4_extents.h ext4: drop ec_type from the ext4_ext_cache structure 2011-01-10 12:13:26 -05:00
ext4_jbd2.c ext4: Pass line numbers to ext4_error() and friends 2010-07-27 11:56:40 -04:00
ext4_jbd2.h ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary 2011-01-10 12:29:43 -05:00
ext4.h ext4: flush the i_completed_io_list during ext4_truncate 2011-01-10 12:47:05 -05:00
extents.c ext4: flush the i_completed_io_list during ext4_truncate 2011-01-10 12:47:05 -05:00
file.c ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary 2011-01-10 12:29:43 -05:00
fsync.c ext4: flush the i_completed_io_list during ext4_truncate 2011-01-10 12:47:05 -05:00
hash.c ext4: Add support for non-native signed/unsigned htree hash algorithms 2008-10-28 13:21:44 -04:00
ialloc.c ext4: drop i_state_flags on architectures with 64-bit longs 2011-01-10 12:18:25 -05:00
inode.c ext4: add error checking to calls to ext4_handle_dirty_metadata() 2011-01-10 12:46:59 -05:00
ioctl.c ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard 2010-11-19 21:47:07 -05:00
Kconfig ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configured 2009-12-21 10:54:09 -05:00
Makefile ext4: use bio layer instead of buffer layer in mpage_da_submit_io 2010-10-27 21:30:10 -04:00
mballoc.c ext4: fix trimming of a single group 2011-01-10 12:30:39 -05:00
mballoc.h ext4: consolidate in_range() definitions 2010-03-03 23:55:01 -05:00
migrate.c ext4: ext4_ext_migrate should use NULL not 0 2011-01-10 12:11:00 -05:00
move_extent.c ext4: rename {ext,idx}_pblock and inline small extent functions 2010-10-27 21:30:14 -04:00
namei.c ext4: add error checking to calls to ext4_handle_dirty_metadata() 2011-01-10 12:46:59 -05:00
page-io.c ext4: test the correct variable in ext4_init_pageio() 2011-01-10 12:10:44 -05:00
resize.c ext4: add error checking to calls to ext4_handle_dirty_metadata() 2011-01-10 12:46:59 -05:00
super.c ext4: fix uninitialized variable in ext4_register_li_request 2011-01-10 12:30:17 -05:00
symlink.c ext4: symlink must be handled via filesystem specific operation 2010-05-16 02:00:00 -04:00
xattr_security.c ext4: constify xattr_handler 2010-05-21 18:31:19 -04:00
xattr_trusted.c ext4: constify xattr_handler 2010-05-21 18:31:19 -04:00
xattr_user.c ext4: constify xattr_handler 2010-05-21 18:31:19 -04:00
xattr.c ext2,ext3,ext4: clarify comment for extN_xattr_set_handle 2011-01-10 12:10:30 -05:00
xattr.h ext4: fix compile with CONFIG_EXT4_FS_XATTR disabled 2010-10-28 09:29:17 -07:00