linux

Author	SHA1	Message	Date
Eric Whitney	ad6599ab3a	ext4: fix premature freeing of partial clusters split across leaf blocks Xfstests generic/311 and shared/298 fail when run on a bigalloc file system. Kernel error messages produced during the tests report that blocks to be freed are already on the to-be-freed list. When e2fsck is run at the end of the tests, it typically reports bad i_blocks and bad free blocks counts. The bug that causes these failures is located in ext4_ext_rm_leaf(). Code at the end of the function frees a partial cluster if it's not shared with an extent remaining in the leaf. However, if all the extents in the leaf have been removed, the code dereferences an invalid extent pointer (off the front of the leaf) when the check for sharing is made. This generally has the effect of unconditionally freeing the partial cluster, which leads to the observed failures when the partial cluster is shared with the last extent in the next leaf. Fix this by attempting to free the cluster only if extents remain in the leaf. Any remaining partial cluster will be freed if possible when the next leaf is processed or when leaf removal is complete. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-04-01 19:49:30 -04:00
Lukas Czerner	e5b30416f3	ext4: remove unneeded test of ret variable Currently in ext4_fallocate() and ext4_zero_range() we're testing ret variable along with new_size. However in ext4_fallocate() we just tested ret before and in ext4_zero_range() if will always be zero when we get there so there is no need to test it in both cases. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-01 00:59:21 -04:00
Matthew Wilcox	e04027e887	ext4: fix comment typo Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 15:15:07 -04:00
Matthew Wilcox	94350ab5c3	ext4: make ext4_block_zero_page_range static It's only called within inode.c, so make it static, remove its prototype from ext4.h and move it above all of its callers so it doesn't need a prototype within inode.c. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 15:09:16 -04:00
Theodore Ts'o	5f16f3225b	ext4: atomically set inode->i_flags in ext4_set_inode_flags() Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2014-03-24 14:43:12 -04:00
Theodore Ts'o	ed3654eb98	ext4: optimize Hurd tests when reading/writing inodes Set a in-memory superblock flag to indicate whether the file system is designed to support the Hurd. Also, add a sanity check to make sure the 64-bit feature is not set for Hurd file systems, since i_file_acl_high conflicts with a Hurd-specific field. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 14:09:06 -04:00
Theodore Ts'o	c4f6570605	ext4: kill i_version support for Hurd-castrated file systems The Hurd file system uses uses the inode field which is now used for i_version for its translator block. This means that ext2 file systems that are formatted for GNU Hurd can't be used to support NFSv4. Given that Hurd file systems don't support extents, and a huge number of modern file system features, this is no great loss. If we don't do this, the attempt to update the i_version field will stomp over the translator block field, which will cause file system corruption for Hurd file systems. This can be replicated via: mke2fs -t ext2 -o hurd /dev/vdc mount -t ext4 /dev/vdc /vdc touch /vdc/bug0000 umount /dev/vdc e2fsck -f /dev/vdc Addresses-Debian-Bug: #738758 Reported-By: Gabriele Giacone <1o5g4r8o@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-20 00:32:57 -04:00
T Makphaibulchoke	9c191f701c	ext4: each filesystem creates and uses its own mb_cache This patch adds new interfaces to create and destory cache, ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove the cache creation and destory calls from ex4_init_xattr() and ext4_exitxattr() in fs/ext4/xattr.c. fs/ext4/super.c has been changed so that when a filesystem is mounted a cache is allocated and attched to its ext4_sb_info structure. fs/mbcache.c has been changed so that only one slab allocator is allocated and used by all mbcache structures. Signed-off-by: T. Makphaibulchoke <tmac@hp.com>	2014-03-18 19:24:49 -04:00
T Makphaibulchoke	1f3e55fe02	fs/mbcache.c: doucple the locking of local from global data The patch increases the parallelism of mbcache by using the built-in lock in the hlist_bl_node to protect the mb_cache's local block and index hash chains. The global data mb_cache_lru_list and mb_cache_list continue to be protected by the global mb_cache_spinlock. New block group spinlock, mb_cache_bg_lock is also added to serialize accesses to mb_cache_entry's local data. A new member e_refcnt is added to the mb_cache_entry structure to help preventing an mb_cache_entry from being deallocated by a free while it is being referenced by either mb_cache_entry_get() or mb_cache_entry_find(). Signed-off-by: T. Makphaibulchoke <tmac@hp.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 19:23:20 -04:00
T Makphaibulchoke	3e037e5211	fs/mbcache.c: change block and index hash chain to hlist_bl_node This patch changes each mb_cache's both block and index hash chains to use a hlist_bl_node, which contains a built-in lock. This is the first step in decoupling of locks serializing accesses to mb_cache global data and each mb_cache_entry local data. Signed-off-by: T. Makphaibulchoke <tmac@hp.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 19:19:41 -04:00
Lukas Czerner	b8a8684502	ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Also add appropriate tracepoints. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 18:05:35 -04:00
Lukas Czerner	0e8b6879f3	ext4: refactor ext4_fallocate code Move block allocation out of the ext4_fallocate into separate function called ext4_alloc_file_blocks(). This will allow us to use the same allocation code for other allocation operations such as zero range which is commit in the next patch. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 18:03:51 -04:00
Lukas Czerner	f282ac19d8	ext4: Update inode i_size after the preallocation Currently in ext4_fallocate we would update inode size, c_time and sync the file with every partial allocation which is entirely unnecessary. It is true that if the crash happens in the middle of truncate we might end up with unchanged i size, or c_time which I do not think is really a problem - it does not mean file system corruption in any way. Note that xfs is doing things the same way e.g. update all of the mentioned after the allocation is done. This commit moves all the updates after the allocation is done. In addition we also need to change m_time as not only inode has been change bot also data regions might have changed (unwritten extents). However m_time will be only updated when i_size changed. Also we do not need to be paranoid about changing the c_time only if the actual allocation have happened, we can change it even if we try to allocate only to find out that there are already block allocated. It's not really a big deal and it will save us some additional complexity. Also use ext4_debug, instead of ext4_warning in #ifdef EXT4FS_DEBUG section. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>- -- v3: Do not remove the code to set EXT4_INODE_EOFBLOCKS flag fs/ext4/extents.c \| 96 ++++++++++++++++++++++++------------------------------- 1 file changed, 42 insertions(+), 54 deletions(-)	2014-03-18 17:44:35 -04:00
Eric Whitney	c063449394	ext4: fix partial cluster handling for bigalloc file systems Commit `9cb00419fa`, which enables hole punching for bigalloc file systems, exposed a bug introduced by commit `6ae06ff51e` in an earlier release. When run on a bigalloc file system, xfstests generic/013, 068, 075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors or cause kernel error messages indicating that previously freed blocks are being freed again. The latter commit optimizes the selection of the starting extent in ext4_ext_rm_leaf() when hole punching by beginning with the extent supplied in the path argument rather than with the last extent in the leaf node (as is still done when truncating). However, the code in rm_leaf that initially sets partial_cluster to track cluster sharing on extent boundaries is only guaranteed to run if rm_leaf starts with the last node in the leaf. Consequently, partial_cluster is not correctly initialized when hole punching, and a cluster on the boundary of a punched region that should be retained may instead be deallocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-03-13 23:34:16 -04:00
Eric Whitney	31cf0f2c31	ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents Code deallocating the extent path referenced by an argument to ext4_ext_handle_uninitialized_extents was made redundant with identical code in its one caller, ext4_ext_map_blocks, by commit `3779473246`. Allocating and deallocating the path in the same function also makes the code clearer. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-13 23:14:46 -04:00
Theodore Ts'o	38c03b3439	ext4: only call sync_filesystm() when remounting read-only This is the only time it is required for ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-13 22:49:42 -04:00
Theodore Ts'o	02b9984d64	fs: push sync_filesystem() down to the file system's remount_fs() Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org	2014-03-13 10:14:33 -04:00
Theodore Ts'o	66a4cb187b	jbd2: improve error messages for inconsistent journal heads Fix up error messages printed when the transaction pointers in a journal head are inconsistent. This improves the error messages which are printed when running xfstests generic/068 in data=journal mode. See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-12 16:38:03 -04:00
Theodore Ts'o	0bfea8118d	jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-09 00:56:58 -05:00
Theodore Ts'o	6e4862a5bb	jbd2: minimize region locked by j_list_lock in journal_get_create_access() It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-09 00:46:23 -05:00
Theodore Ts'o	d2eb0b9989	jbd2: check jh->b_transaction without taking j_list_lock jh->b_transaction is adequately protected for reading by the jbd_lock_bh_state(bh), so we don't need to take j_list_lock in __journal_try_to_free_buffer(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-09 00:07:19 -05:00
Theodore Ts'o	d4e839d4a9	jbd2: add transaction to checkpoint list earlier We don't otherwise need j_list_lock during the rest of commit phase #7, so add the transaction to the checkpoint list at the very end of commit phase #6. This allows us to drop j_list_lock earlier, which is a good thing since it is a super hot lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-08 22:34:10 -05:00
Theodore Ts'o	42cf3452d5	jbd2: calculate statistics without holding j_state_lock and j_list_lock The two hottest locks, and thus the biggest scalability bottlenecks, in the jbd2 layer, are the j_list_lock and j_state_lock. This has inspired some people to do some truly unnatural things[1]. [1] https://www.usenix.org/system/files/conference/fast14/fast14-paper_kang.pdf We don't need to be holding both j_state_lock and j_list_lock while calculating the journal statistics, so move those calculations to the very end of jbd2_journal_commit_transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-08 19:51:16 -05:00
Theodore Ts'o	3469a32a1e	jbd2: don't hold j_state_lock while calling wake_up() The j_state_lock is one of the hottest locks in the jbd2 layer and thus one of its scalability bottlenecks. We don't need to be holding the j_state_lock while we are calling wake_up(&journal->j_wait_commit), so release the lock a little bit earlier. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-08 19:11:36 -05:00
Theodore Ts'o	df3c1e9a05	jbd2: don't unplug after writing revoke records During commit process, keep the block device plugged after we are done writing the revoke records, until we are finished writing the rest of the commit records in the journal. This will allow most of the journal blocks to be written in a single I/O operation, instead of separating the the revoke blocks from the rest of the journal blocks. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-08 18:13:52 -05:00
Jan Kara	10542c229a	ext4: Speedup WB_SYNC_ALL pass called from sync(2) When doing filesystem wide sync, there's no need to force transaction commit (or synchronously write inode buffer) separately for each inode because ext4_sync_fs() takes care of forcing commit at the end (VFS takes care of flushing buffer cache, respectively). Most of the time this slowness doesn't manifest because previous WB_SYNC_NONE writeback doesn't leave much to write but when there are processes aggressively creating new files and several filesystems to sync, the sync slowness can be noticeable. In the following test script sync(1) takes around 6 minutes when there are two ext4 filesystems mounted on a standard SATA drive. After this patch sync takes a couple of seconds so we have about two orders of magnitude improvement. function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-04 10:50:50 -05:00
Namjae Jeon	9eb79482a9	ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4. The semantics of this flag are following: 1) It collapses the range lying between offset and length by removing any data blocks which are present in this range and than updates all the logical offsets of extents beyond "offset + len" to nullify the hole created by removing blocks. In short, it does not leave a hole. 2) It should be used exclusively. No other fallocate flag in combination. 3) Offset and length supplied to fallocate should be fs block size aligned in case of xfs and ext4. 4) Collaspe range does not work beyond i_size. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Tested-by: Dongsu Park <dongsu.park@profitbricks.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-23 15:18:59 -05:00
Lukas Czerner	a633f5a319	ext4: translate fallocate mode bits to strings Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-22 06:18:17 -05:00
Darrick J. Wong	a9b8241594	ext4: merge uninitialized extents Allow for merging uninitialized extents. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 21:17:35 -05:00
Maxim Patlasov	e251f9bca9	ext4: avoid exposure of stale data in ext4_punch_hole() While handling punch-hole fallocate, it's useless to truncate page cache before removing the range from extent tree (or block map in indirect case) because page cache can be re-populated (by read-ahead or read(2) or mmap-ed read) immediately after truncating page cache, but before updating extent tree (or block map). In that case the user will see stale data even after fallocate is completed. Until the problem of data corruption resulting from pages backed by already freed blocks is fully resolved, the simple thing we can do now is to add another truncation of pagecache after punch hole is done. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-02-20 16:58:05 -05:00
Eric Whitney	ce140cdd9c	ext4: silence warnings in extent status tree debugging code Adjust the conversion specifications in a few optionally compiled debug messages to match the return type of ext4_es_status(). Also, make a couple of minor grammatical message edits while we're at it. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 16:09:12 -05:00
Eric Sandeen	dc9ddd984d	ext4: remove unused ac_ex_scanned When looking at a bug report with: > kernel: EXT4-fs: 0 scanned, 0 found I thought wow, 0 scanned, that's odd? But it's not odd; it's printing a variable that is initialized to 0 and never touched again. It's never been used since the original merge, so I don't really even know what the original intent was, either. If anyone knows how to hook it up, speak now via patch, otherwise just yank it so it's not making a confusing situation more confusing in kernel logs. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 13:32:10 -05:00
Theodore Ts'o	e861b5e9a4	ext4: avoid possible overflow in ext4_map_blocks() The ext4_map_blocks() function returns the number of blocks which satisfying the caller's request. This number of blocks requested by the caller is specified by an unsigned integer, but the return value of ext4_map_blocks() is a signed integer (to accomodate error codes per the kernel's standard error signalling convention). Historically, overflows could never happen since mballoc() will refuse to allocate more than 2048 blocks at a time (which is something we should fix), and if the blocks were already allocated, the fact that there would be some number of intervening metadata blocks pretty much guaranteed that there could never be a contiguous region of data blocks that was greater than 231 blocks. However, this is now possible if there is a file system which is a bit bigger than 8TB, and is created using the new mke2fs hugeblock feature, which can create a perfectly contiguous file. In that case, if a userspace program attempted to call fallocate() on this already fully allocated file, it's possible that ext4_map_blocks() could return a number large enough that it would overflow a signed integer, resulting in a ext4 thinking that the ext4_map_blocks() call had failed with some strange error code. Since ext4_map_blocks() is always free to return a smaller number of blocks than what was requested by the caller, fix this by capping the number of blocks that ext4_map_blocks() will ever try to map to 231 - 1. In practice this should never get hit, except by someone deliberately trying to provke the above-described bug. Thanks to the PaX team for asking whethre this could possibly happen in some off-line discussions about using some static code checking technology they are developing to find bugs in kernel code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 12:54:05 -05:00
Theodore Ts'o	ab0c00fccf	ext4: make sure ex.fe_logical is initialized The lowest levels of mballoc set all of the fields of struct ext4_free_extent except for fe_logical, since they are just trying to find the requested free set of blocks, and the logical block hasn't been set yet. This makes some static code checkers sad. Set it to various different debug values, which would be useful when debugging mballoc if these values were to ever show up due to the parts of mballoc triyng to use ac->ac_b_ex.fe_logical before it is properly upper layers of mballoc failing to properly set, usually by ext4_mb_use_best_found(). Addresses-Coverity-Id: #139697 Addresses-Coverity-Id: #139698 Addresses-Coverity-Id: #139699 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 00:36:41 -05:00
Theodore Ts'o	7b1b2c1b9c	ext4: don't calculate total xattr header size unless needed The function ext4_expand_extra_isize_ea() doesn't need the size of all of the extended attribute headers. So if we don't calculate it when it is unneeded, it we can skip some undeeded memory references, and as a bonus, we eliminate some kvetching by static code analysis tools. Addresses-Coverity-Id: #741291 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-19 20:15:21 -05:00
Theodore Ts'o	9a6633b1a3	ext4: add ext4_es_store_pblock_status() Avoid false positives by static code analysis tools such as sparse and coverity caused by the fact that we set the physical block, and then the status in the extent_status structure. It is also more efficient to set both of these values at once. Addresses-Coverity-Id: #989077 Addresses-Coverity-Id: #989078 Addresses-Coverity-Id: #1080722 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>	2014-02-19 20:15:15 -05:00
Eric Whitney	ce37c42919	ext4: fix error return from ext4_ext_handle_uninitialized_extents() Commit `3779473246` breaks the return of error codes from ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks(). A portion of the patch assigns that function's signed integer return value to an unsigned int. Consequently, negatively valued error codes are lost and can be treated as a bogus allocated block count. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-02-19 18:52:39 -05:00
Patrick Palka	024949ec8f	ext4: address a benign compiler warning When !defined(CONFIG_EXT4_DEBUG), mb_debug() should be defined as a no_printk() statement instead of an empty statement in order to suppress the following compiler warning: fs/ext4/mballoc.c: In function ‘ext4_mb_cleanup_pa’: fs/ext4/mballoc.c:2659:47: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] mb_debug(1, "mballoc: %u PAs left\n", count); Signed-off-by: Patrick Palka <patrick@parcs.ath.cx> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-17 20:50:59 -05:00
Rashika Kheria	7747e6d028	jbd2: mark file-local functions as static Mark functions as static in jbd2/journal.c because they are not used outside this file. This eliminates the following warning in jbd2/journal.c: fs/jbd2/journal.c:125:5: warning: no previous prototype for ‘jbd2_verify_csum_type’ [-Wmissing-prototypes] fs/jbd2/journal.c:146:5: warning: no previous prototype for ‘jbd2_superblock_csum_verify’ [-Wmissing-prototypes] fs/jbd2/journal.c:154:6: warning: no previous prototype for ‘jbd2_superblock_csum_set’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2014-02-17 20:49:04 -05:00
Dan Carpenter	df3a98b086	ext4: remove an unneeded check in mext_page_mkuptodate() "err" is zero here, there is no need to check again. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-17 20:46:40 -05:00
Theodore Ts'o	d8558a2978	ext4: clean up error handling in swap_inode_boot_loader() Tighten up the code to make the code easier to read and maintain. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-17 20:44:36 -05:00
Fabian Frederick	e67bc2b359	ext4: Add __init marking to init_inodecache init_inodecache is only called by __init init_ext4_fs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-17 20:34:53 -05:00
Dan Carpenter	92e3b40537	jbd2: fix use after free in jbd2_journal_start_reserved() If start_this_handle() fails then it leads to a use after free of "handle". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-02-17 20:33:01 -05:00
Theodore Ts'o	19ea806037	ext4: don't leave i_crtime.tv_sec uninitialized If the i_crtime field is not present in the inode, don't leave the field uninitialized. Fixes: `ef7f38359` ("ext4: Add nanosecond timestamps") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-02-16 19:29:32 -05:00
Theodore Ts'o	3d2660d0c9	ext4: fix online resize with a non-standard blocks per group setting The set_flexbg_block_bitmap() function assumed that the number of blocks in a blockgroup was sb->blocksize * 8, which is normally true, but not always! Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block bitmap corruption after: mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G mount -t ext4 /dev/vdd /vdd resize2fs /dev/vdd 8G Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Jon Bernard <jbernard@tuxion.com> Cc: stable@vger.kernel.org	2014-02-15 22:42:25 -05:00
Theodore Ts'o	b93c953534	ext4: fix online resize with very large inode tables If a file system has a large number of inodes per block group, all of the metadata blocks in a flex_bg may be larger than what can fit in a single block group. Unfortunately, ext4_alloc_group_tables() in resize.c was never tested to see if it would handle this case correctly, and there were a large number of bugs which caused the following sequence to result in a BUG_ON: kernel bug at fs/ext4/resize.c:409! ... call trace: [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830 [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80 [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00 [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0 [<ffffffff811b9df2>] ? final_putname+0x22/0x50 [<ffffffff811c1371>] sys_ioctl+0x81/0xa0 [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0 rip [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180 This can be reproduced with the following command sequence: mke2fs -t ext4 -i 4096 /dev/vdd 1G mount -t ext4 /dev/vdd /vdd resize2fs /dev/vdd 8G To fix this, we need to make sure the right thing happens when a block group's inode table straddles two block groups, which means the following bugs had to be fixed: 1) Not clearing the BLOCK_UNINIT flag in the second block group in ext4_alloc_group_tables --- the was proximate cause of the BUG_ON. 2) Incorrectly determining how many block groups contained contiguous free blocks in ext4_alloc_group_tables(). 3) Incorrectly setting the start of the next block range to be marked in use after a discontinuity in setup_new_flex_group_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-02-15 21:33:13 -05:00
Theodore Ts'o	2330141097	ext4: don't try to modify s_flags if the the file system is read-only If an ext4 file system is created by some tool other than mke2fs (perhaps by someone who has a pathalogical fear of the GPL) that doesn't set one or the other of the EXT2_FLAGS_{UN}SIGNED_HASH flags, and that file system is then mounted read-only, don't try to modify the s_flags field. Otherwise, if dm_verity is in use, the superblock will change, causing an dm_verity failure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-02-12 12:16:04 -05:00
Zheng Liu	30d29b119e	ext4: fix error paths in swap_inode_boot_loader() In swap_inode_boot_loader() we forgot to release ->i_mutex and resume unlocked dio for inode and inode_bl if there is an error starting the journal handle. This commit fixes this issue. Reported-by: Ahmed Tamrawi <ahmedtamrawi@gmail.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dr. Tilmann Bubeck <t.bubeck@reinform.de> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # v3.10+	2014-02-12 11:48:31 -05:00
Eric Whitney	15cc176785	ext4: fix xfstest generic/299 block validity failures Commit `a115f749c1` (ext4: remove wait for unwritten extent conversion from ext4_truncate) exposed a bug in ext4_ext_handle_uninitialized_extents(). It can be triggered by xfstest generic/299 when run on a test file system created without a journal. This test continuously fallocates and truncates files to which random dio/aio writes are simultaneously performed by a separate process. The test completes successfully, but if the test filesystem is mounted with the block_validity option, a warning message stating that a logical block has been mapped to an illegal physical block is posted in the kernel log. The bug occurs when an extent is being converted to the written state by ext4_end_io_dio() and ext4_ext_handle_uninitialized_extents() discovers a mapping for an existing uninitialized extent. Although it sets EXT4_MAP_MAPPED in map->m_flags, it fails to set map->m_pblk to the discovered physical block number. Because map->m_pblk is not otherwise initialized or set by this function or its callers, its uninitialized value is returned to ext4_map_blocks(), where it is stored as a bogus mapping in the extent status tree. Since map->m_pblk can accidentally contain illegal values that are larger than the physical size of the file system, calls to check_block_validity() in ext4_map_blocks() that are enabled if the block_validity mount option is used can fail, resulting in the logged warning message. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # 3.11+	2014-02-12 10:42:45 -05:00
Linus Torvalds	f94aa7c7f1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes, both -stable fodder. The O_SYNC bug is fairly old..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix a kmap leak in virtio_console fix O_SYNC\|O_APPEND syncing the wrong range on write()	2014-02-09 18:12:07 -08:00

1 2 3 4 5 ...

35092 Commits