Commit Graph

95035 Commits

Author SHA1 Message Date
Solofo Ramangalahy
60bd63d192 ext4: cleanup for compiling mballoc with verification and debugging #defines
This patch allows compiling mballoc with:
#define AGGRESSIVE_CHECK
#define DOUBLE_CHECK
#define MB_DEBUG

It fixes:
Compilation errors:
fs/ext4/mballoc.c: In function '__mb_check_buddy':
fs/ext4/mballoc.c:605: error: 'struct ext4_prealloc_space' has no member named 'group_list'
fs/ext4/mballoc.c:606: error: 'struct ext4_prealloc_space' has no member named 'pstart'
fs/ext4/mballoc.c:608: error: 'struct ext4_prealloc_space' has no member named 'len'

Compilation warnings:
fs/ext4/mballoc.c: In function 'ext4_mb_normalize_group_request':
fs/ext4/mballoc.c:2863: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'
fs/ext4/mballoc.c: In function 'ext4_mb_use_inode_pa':
fs/ext4/mballoc.c:3103: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'

Sparse check:
fs/ext4/mballoc.c:3818:2: warning: context imbalance in 'ext4_mb_show_ac' - different lock contexts for basic block

Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 21:59:59 -04:00
Josef Bacik
c19204b0ae ext4: don't use ext4_error in ext4_check_descriptors
Because ext4_check_descriptors is called at mount time you can't use ext4_error
as it calls ext4_commit_sb, which since the sb isn't all the way initialized
causes bad things to happen (ie a panic).  This patch changes the ext4_error's
to printk's to keep this problem from happening.  Thanks much,

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:28 -04:00
Aneesh Kumar K.V
8753e88f1b ext4: mark inode dirty after initializing the extent tree
We should mark the inode dirty only after initializing the extent
tree.  Also if we fail during extent initialization we need
to call DQUOT_FREE_INODE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:36 -04:00
Solofo Ramangalahy
ef7377289a ext4: update ctime and mtime for truncate with extents.
The recently announced "Linux POSIX file system test suite"
caught a truncate issue when using extents:
mtime and ctime are not updated when truncate is successful.

This is the single issue caught with "default" ext4 (mkfs and mount
with minimal options).
The testsuite does not report failure with -o noextents.

With the following patch, all tests of the testsuite pass.

Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com> 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:41 -04:00
Aneesh Kumar K.V
c83617db76 ext4: Don't do GFP_NOFS allocations after taking ext4_lock_group
We can't do GFP_NOFS allocation after taking ext4_lock_group

BUG: sleeping function called from invalid context at mm/slab.c:3054
in_atomic():1, irqs_disabled():0
1 lock held by vi/2426:
#0:  (&ei->i_data_sem){----}, at: [<c01cf665>] ext4_release_file+0x23/0x66
Pid: 2426, comm: vi Not tainted 2.6.25-rc7 #24
[<c011a3dc>] __might_sleep+0xbe/0xc5
[<c01620c9>] kmem_cache_alloc+0x22/0xa6
[<c01e382a>] ext4_mb_release_inode_pa+0x73/0x1b3
[<c01e6adf>] ext4_mb_discard_inode_preallocations+0x22d/0x2d4
[<c013000a>] ? param_set_ushort+0x32/0x39
[<c01ceba1>] ext4_discard_reservation+0x27/0x6a
[<c01cf66c>] ext4_release_file+0x2a/0x66
[<c0165bd6>] __fput+0xae/0x155
[<c0165e46>] fput+0x17/0x19
[<c0163756>] filp_close+0x50/0x5a
[<c01647c0>] sys_close+0x71/0xad
[<c0104aba>] sysenter_past_esp+0x5f/0xa5

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:47 -04:00
Christoph Hellwig
3dcf54515a ext4: move headers out of include/linux
Move ext4 headers out of include/linux.  This is just the trivial move,
there's some more thing that could be done later. 

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 18:13:32 -04:00
Josef Bacik
216553c4b7 ext4: fix wrong gfp type under transaction
This fixes the allocations with GFP_KERNEL while under a transaction problems
in ext4.  This patch is the same as its ext3 counterpart, just switches these
to GFP_NOFS.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:02:02 -04:00
Jan Kara
2887df139c ext4: Fix hang on umount with quotas when journal is aborted
Call dquot_drop() from ext4_dquot_drop() even if we fail to start a
transaction. Otherwise we never get to dropping references to quota structures
from the inode and umount will hang indefinitely.  Thanks to Payphone LIOU for
spotting the problem.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
CC: Payphone LIOU <lioupayphone@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:02:07 -04:00
Jan Kara
53b7e9f680 ext4: Fix update of mtime and ctime on rename
The patch below makes ext4 update mtime and ctime of the directory
into which we move file even if the directory entry already exists.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:02:11 -04:00
Harvey Harrison
329d291f50 jdb2: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Harvey Harrison
46e665e9d2 ext4: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Duane Griffin
620de4e198 jbd2: only create debugfs and stats entries if init is successful
jbd2 debugfs and stats entries should only be created if cache initialisation
is successful.  At the moment they are being created unconditionally which
will leave them dangling if cache (and hence module) initialisation fails.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:02:47 -04:00
Randy Dunlap
5648ba5b2d jbd2: fix kernel-doc notation
Fix kernel-doc notation in jbd2.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Duane Griffin
8a9362eb40 jbd2: replace potentially false assertion with if block
If an error occurs during jbd2 cache initialisation it is possible for the
journal_head_cache to be NULL when jbd2_journal_destroy_journal_head_cache is
called.  Replace the J_ASSERT with an if block to handle the situation
correctly.

Note that even with this fix things will break badly if jbd2 is statically
compiled in and cache initialisation fails.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Duane Griffin
83c49523c9 jbd2: eliminate duplicated code in revocation table init/destroy functions
The revocation table initialisation/destruction code is repeated for each of
the two revocation tables stored in the journal. Refactoring the duplicated
code into functions is tidier, simplifies the logic in initialisation in
particular, and slightly reduces the code size.

There should not be any functional change.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Duane Griffin
9fa27c85de jbd2: tidy up revoke cache initialisation and destruction
Make revocation cache destruction safe to call if initialisation fails
partially or entirely.  This allows it to be used to cleanup in the case of
initialisation failure, simplifying the code slightly.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2008-04-28 09:40:00 -04:00
Joe Perches
418f6e9e5b ext4: remove duplicate include of ext4_fs_i.h header file
include/linux/ext4_fs_i.h is included in include/linux/ext_fs.h twice
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Mingming Cao
d3a95d477d ext4: make ext4_xattr_list() static
This patch makes the needlessly global ext4_xattr_list() static.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Mingming Cao
14499f3592 ext4: remove extra define of ext4_new_blocks_old from mballoc.c
The function prototype of ext4_new_blocks_old() is defined in ext4_fs.h,
so we don't need the extra function prototype in mballoc.c

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Akinobu Mita
a871611b47 ext4: check ext4_journal_get_write_access() errors
Check ext4_journal_get_write_access() errors.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>
2008-04-17 10:38:59 -04:00
Akinobu Mita
c0a4ef38ac ext4: use ext4_get_group_desc()
Use ext4_get_group_desc() in ext4_get_inode_block() instead of open
coding the functionality.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>
2008-04-17 10:38:59 -04:00
Akinobu Mita
d00a6d7b40 ext4: use ext4_group_first_block_no()
Use ext4_group_first_block_no() and assign the return values to
ext4_fsblk_t variables.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>
2008-04-17 10:38:59 -04:00
Marcin Slusarz
216c34b2b8 ext4: convert byte order of constant instead of variable
Convert byte order of constant instead of variable which can be done at
compile time (vs run time).

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Marcin Slusarz
e8546d0615 ext4: le*_add_cpu conversion
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Cc: sct@redhat.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: adilger@clusterfs.com
Cc: Mingming Cao <cmm@us.ibm.com>
2008-04-17 10:38:59 -04:00
Aneesh Kumar K.V
9a0762c5af ext4: Convert list_for_each_rcu() to list_for_each_entry_rcu()
The list_for_each_entry_rcu() primitive should be used instead of
list_for_each_rcu(), as the former is easier to use and provides
better type safety.

http://groups.google.com/group/linux.kernel/browse_thread/thread/45749c83451cebeb/0633a65759ce7713?lnk=raot

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Eric Sandeen
4ddfef7b41 ext4: reduce mballoc stack usage with noinline_for_stack
mballoc.c is a whole lot of static functions, which gcc seems to
really like to inline.

With the changes below, on x86, I can at least get from:

432 ext4_mb_new_blocks
240 ext4_mb_free_blocks
208 ext4_mb_discard_group_preallocations
188 ext4_mb_seq_groups_show
164 ext4_mb_init_cache
152 ext4_mb_release_inode_pa
136 ext4_mb_seq_history_show
...

to

220 ext4_mb_free_blocks
188 ext4_mb_seq_groups_show
176 ext4_mb_regular_allocator
164 ext4_mb_init_cache
156 ext4_mb_new_blocks
152 ext4_mb_release_inode_pa
136 ext4_mb_seq_history_show
124 ext4_mb_release_group_pa
...

which still has some big functions in there, but not 432 bytes!

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Andi Kleen
5cdd7b2d77 Convert ext4 to use unlocked_ioctl
I checked ext4_ioctl and it looked largely safe to not be used
without BKL.  So convert it over to unlocked_ioctl.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-04-29 22:03:54 -04:00
Aneesh Kumar K.V
161e7b7c1d ext4: Cache the correct extent length for uninit extents
When we convert an uninitialized extent to an initialized extent
we need to make sure we return the number of blocks in the
extent from the file system block corresponding to logical
file block.  Otherwise we cache wrong extent details and this
results in file system corruption.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:03:59 -04:00
Aneesh Kumar K.V
1a89734d40 ext4: Return unwritten buffer head when trying to read from prealloc space.
ext4_ext_get_blocks() returns the number of blocks allocated with buffer
head unmapped for a read from prealloc space.  This is needed so that
delayed allocation doesn't do block reservation for prealloc space
since the blocks are already reserved on disk.  Mark the buffer head
unwritten.  Some code paths try to read the block if the buffer_head is
not new and no uptodate.  Marking the buffer head unwritten avoids this
reading.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
e067ba0078 ext4: make ext4_ext_get_blocks always return <= max_blocks
ext4_ext_get_blocks() returns number of blocks allocated with buffer
heads unmapped for a read from prealloc space.  This is needed so that
delayed allocation doesn't do block reservation for prealloc space since
the blocks are already resevred on disk.  Fix ext4_ext_get_blocks to not
return greater than max_blocks, since some of the code paths cannot
handle such a return value.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
fd28784adc ext4: Fix fallocate to update the file size in each transaction
ext4_fallocate needs to update file size in each transaction.  Otherwise
if we crash the file size won't be seen.  We were also not marking
the inode dirty after updating file size before.  Also when we try to
retry allocation due to ENOSPC, make sure we reset the variable ret so
that we actually do a retry.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
267e4db9ac ext4: Fix race between migration and mmap write
Fail migrate if we allocated new blocks via mmap write.

If we write to holes in the file via mmap, we end up allocating
new blocks. This block allocation happens without taking inode->i_mutex.
Since migrate is protected by i_mutex and migrate expects that no
new blocks get allocated during migrate, fail migrate if new blocks
get allocated.

We can't take inode->i_mutex in the mmap write path because that
would result in a locking order violation between i_mutex and mmap_sem.
Also adding a separate rw_sempahore for protection is really high overhead
for a rare operation such as migrate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
3977c965ec ext4: zero out small extents when writing to prealloc area.
If the preallocated area is small zero out the full extent
instead of splitting them. This should avoid the "write
every alternate block" problem that could grow the number
of extents dramatically.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Aneesh Kumar K.V
093a088b76 ext4: ENOSPC error handling for writing to an uninitialized extent
This patch handles possible ENOSPC errors when writing to an
uninitialized extent in case the filesystem is full.

A write to a prealloc area causes the split of an unititalized extent
into initialized and uninitialized extents.  If we don't have
space to add new extent information, instead of returning error,
convert the existing uninitialized extent to initialized one.  We
need to zero out the blocks corresponding to the entire extent to
prevent uninitialized data reaching userspace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
35802c0b2b sparc: Export symbols for ZERO_PAGE usage in modules.
ext4 uses ZERO_PAGE(0) to zero out blocks.  We need to export
different symbols in different arches for the usage of ZERO_PAGE
in modules.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
5fcf430303 m68k: Export empty_zero_page for ZERO_PAGE usage in modules.
ext4 uses ZERO_PAGE(0) to zero out blocks.  We need to export
different symbols in different arches for the usage of ZERO_PAGE
in modules.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
3653f3abe3 arm: Export empty_zero_page for ZERO_PAGE usage in modules.
ext4 uses ZERO_PAGE(0) to zero out blocks.  We need to export
different symbols in different arches for the usage of ZERO_PAGE
in modules.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
e65187e6d0 ext4: Enable extent format for symlinks.
This patch enables extent-formatted normal symlinks.  Using extents
format allows a symlink to refer to a block number larger than 2^32
on large filesystems.  We still don't enable extent format for fast
symlinks, which are contained in the inode itself.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 08:11:12 -04:00
Aneesh Kumar K.V
95c3889cb8 ext4: Fix fallocate error path
Put the old extent details back if we fail to split the
uninitialized extent.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Josef Bacik
f3f12faa74 ext4: fix mount option parsing
The "resize" option won't be noticed as it comes after the NULL option,
so if you try to mount (or in this case remount) with that option it
won't be recognized.

Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:05:28 -04:00
Hisashi Hifumi
53c550e975 ext4: fdatasync should skip metadata writeout when overwriting
Currently fdatasync is identical to fsync in ext3.

I think fdatasync should skip journal flush in data=ordered and
data=writeback mode when it overwrites to already-instantiated blocks on
HDD.  When I_DIRTY_DATASYNC flag is not set, fdatasync should skip journal
writeout because this indicates only atime or/and mtime updates.

Following patch is the same approach of ext2's fsync code(ext2_sync_file).

I did a performance test using the sysbench.

#sysbench --num-threads=128 --max-requests=50000 --test=fileio --file-total-size=128G
--file-test-mode=rndwr --file-fsync-mode=fdatasync run

The result on ext3 was:

	-2.6.24
	Operations performed:  0 Read, 50080 Write, 59600 Other = 109680 Total
	Read 0b  Written 782.5Mb  Total transferred 782.5Mb  (12.116Mb/sec)
	  775.45 Requests/sec executed

	Test execution summary:
	    total time:                          64.5814s
	    total number of events:              50080
	    total time taken by event execution: 3713.9836
	    per-request statistics:
	         min:                            0.0000s
	         avg:                            0.0742s
	         max:                            0.9375s
	         approx.  95 percentile:         0.2901s

	Threads fairness:
	    events (avg/stddev):           391.2500/23.26
	    execution time (avg/stddev):   29.0155/1.99

	-2.6.24-patched
	Operations performed:  0 Read, 50009 Write, 61596 Other = 111605 Total
	Read 0b  Written 781.39Mb  Total transferred 781.39Mb  (16.419Mb/sec)
	1050.83 Requests/sec executed

	Test execution summary:
	    total time:                          47.5900s
	    total number of events:              50009
	    total time taken by event execution: 2934.5768
	    per-request statistics:
 	         min:                            0.0000s
	         avg:                            0.0587s
 	         max:                            0.8938s
	         approx.  95 percentile:         0.1993s

	Threads fairness:
	    events (avg/stddev):           390.6953/22.64
	    execution time (avg/stddev):   22.9264/1.17

Filesystem I/O throughput was improved.

Signed-off-by :Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Josef Bacik
97bd42b9c8 ext4: check return of ext4_orphan_get properly
This patch fix a panic while running fsfuzzer. 
We are improperly checking the return of ext4_orphan_get.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:04:56 -04:00
Josef Bacik
1dfc3220d9 jbd2: fix possible journal overflow issues
There are several cases where the running transaction can get buffers
added to its BJ_Metadata list which it never dirtied, which makes its
t_nr_buffers counter end up larger than its t_outstanding_credits
counter.

This will cause issues when starting new transactions as while we are
logging buffers we decrement t_outstanding_buffers, so when
t_outstanding_buffers goes negative, we will report that we need less
space in the journal than we actually need, so transactions will be
started even though there may not be enough room for them.  In the worst
case scenario (which admittedly is almost impossible to reproduce) this
will result in the journal running out of space.

The fix is to only refile buffers from the committing transaction to the
running transactions BJ_Modified list when b_modified is set on that
journal, which is the only way to be sure if the running transaction has
modified that buffer.

This patch also fixes an accounting error in journal_forget, it is
possible that we can call journal_forget on a buffer without having
modified it, only gotten write access to it, so instead of freeing a
credit, we only do so if the buffer was modified.  The assert will help
catch if this problem occurs.  Without these two patches I could hit
this assert within minutes of running postmark, with them this issue no
longer arises.

Cc: <linux-ext4@vger.kernel.org>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Josef Bacik
9fc7c63a1d jbd2: fix the way the b_modified flag is cleared
Currently at the start of a journal commit we loop through all of the buffers
on the committing transaction and clear the b_modified flag (the flag that is
set when a transaction modifies the buffer) under the j_list_lock.

The problem is that everywhere else this flag is modified only under the jbd2
lock buffer flag, so it will race with a running transaction who could
potentially set it, and have it unset by the committing transaction.

This is also a big waste, you can have several thousands of buffers that you
are clearing the modified flag on when you may not need to.  This patch
removes this code and instead clears the b_modified flag upon entering
do_get_write_access/journal_get_create_access, so if that transaction does
indeed use the buffer then it will be accounted for properly, and if it does
not then we know we didn't use it.

That will be important for the next patch in this series.  Tested thoroughly
by myself using postmark/iozone/bonnie++.

Cc: <linux-ext4@vger.kernel.org>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17 10:38:59 -04:00
Linus Torvalds
33ae0cdd3e Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Provide ACPI fixup for /proc/cpuinfo/physical_id
  [IA64] Remove printk noise on unimplemented SAL_PHYSICAL_ID_INFO
  [IA64] allocate multiple contiguous pages via uncached allocator
  [IA64] bugfix: nptcg breaks cpu-hotadd
2008-04-29 16:50:49 -07:00
Linus Torvalds
f5ba0cf3cb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  Smack: Integrate Smack with Audit
  Security: Typecast CAP_*_SET macros
  Security: Make secctx_to_secid() take const secdata
2008-04-29 15:58:24 -07:00
Ahmed S. Darwish
d20bdda6d4 Smack: Integrate Smack with Audit
Setup the new Audit hooks for Smack. SELinux Audit rule fields are recycled
to avoid `auditd' userspace modifications. Currently only equality testing
is supported on labels acting as a subject (AUDIT_SUBJ_USER) or as an object
(AUDIT_OBJ_USER).

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
2008-04-30 08:34:10 +10:00
David Howells
780db6c104 Security: Typecast CAP_*_SET macros
Cast the CAP_*_SET macros to be of kernel_cap_t type to avoid compiler
warnings.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-04-30 08:24:19 +10:00
David Howells
e52c1764f1 Security: Make secctx_to_secid() take const secdata
Make secctx_to_secid() take constant secdata.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-04-30 08:23:51 +10:00
Linus Torvalds
c65a3500b2 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] linux/libata.h: reorganize ata_device struct members a bit
  ahci: SB600 ahci can't do MSI, blacklist that capability
  libata: More TSSTcorp pain, keep in sync with legacy IDE
  pata_via: Fix 6410 misdetect
  [libata] pata_atiixp: fix PIO timing data misprogramming
2008-04-29 15:19:09 -07:00