The EXT4_MB_BITMAP and EXT4_MB_BUDDY macros obfuscate more than they
provide any abstraction. So remove them.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The per-commit callback was used by mballoc code to manage free space
bitmaps after deleted blocks have been released. This patch expands
it to support multiple different callbacks, to allow other things to
be done after the commit has been completed.
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The ext4_free_blocks() function now has two new flags that indicate
whether a partial cluster at the beginning or the end of the block
extents should be freed or not. That will be up the caller (i.e.,
truncate), who can figure out whether partial clusters at the
beginning or the end of a block range can be freed.
We also have to update the ext4_mb_free_metadata() and
release_blocks_on_commit() machinery to be cluster-based, since it is
used by ext4_free_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In most of mballoc.c, we do everything in units of clusters, since the
block allocation bitmaps and buddy bitmaps are all denominated in
clusters. The one place where we do deal with absolute block numbers
is in the code that handles the preallocation regions, since in the
case of inode-based preallocation regions, the start of the
preallocation region can't be relative to the beginning of the group.
So this adds a bit of complexity, where pa_pstart and pa_lstart are
block numbers, while pa_free, pa_len, and fe_len are denominated in
units of clusters.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Certain parts of the ext4 code base, primarily in mballoc.c, use a
block group number and offset from the beginning of the block group.
This offset is invariably used to index into the allocation bitmap, so
change the offset to be denominated in units of clusters.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ac_repeats isn't referenced in the mballoc code. So remove it.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
After taking care of all group init races, all that remains is to
remove alloc_semp from ext4_allocation_context and ext4_buddy structs.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There are duplicate macro definitions of in_range() in mballoc.h and
balloc.c. This consolidates these two definitions into ext4.h, and
changes extents.c to use in_range() as well.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
This is a cleanup and simplification patch which takes some open-coded
calculations to calculate the first block number of a group and
converts them to use the (already defined) ext4_group_first_block_no()
function.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@sun.com>
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.
By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There are a number of kernel printk's which are printed when an ext4
filesystem is mounted and unmounted. Disable them to economize space
in the system logs. In addition, disabling the mballoc stats by
default saves a number of unneeded atomic operations for every block
allocation or deallocation.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
unsigned short is potentially too small to track blocks within
a group; today it is safe due to restrictions in e2fsprogs but
we have _lo / _hi bits for group blocks with the intent to go
up to 32 bits, so clean this up now.
There are many more places where we use unsigned/int/unsigned int
to contain a group block but this should at least fix all the
short types.
I added a few comments to the struct ext4_group_info definition
as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Impact: code cleanup
This patch rename pa_linear to pa_type and add MB_INODE_PA
and MB_GROUP_PA to indicate inode and group prealloc space.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Remove some leftovers from when the old block allocator was removed
(c2ea3fde). ext4_sb_info is now a bit lighter. Also remove a dangling
read_block_bitmap() prototype.
Signed-off-by: Mike Snitzer <snitzer@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
After we mark the blocks in the buddy cache as allocated,
we need to ensure that we don't reinit the buddy cache until
the block bitmap is updated. This commit achieves this by holding
the group_info alloc_semaphore till ext4_mb_release_context
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
The mballoc code likes to call ext4_error while it is holding locked
block groups. This can causes a scheduling in atomic context BUG. We
can't just unlock the block group and relock it after/if ext4_error
returns since that might result in race conditions in the case where
the filesystem is set to continue after finding errors.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Move some of the forward declaration of the static functions
to mballoc.c where they are used. This enables us to include
mballoc.h in other .c files. Also correct the buddy cache
documentation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The new groups added during resize are flagged as
need_init group. Make sure we properly initialize these
groups. When we have block size < page size and we are adding
new groups the page may still be marked uptodate even though
we haven't initialized the group. While forcing the init
of buddy cache we need to make sure other groups part of the
same page of buddy cache is not using the cache.
group_info->alloc_sem is added to ensure the same.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
cc: stable@kernel.org
The multiblock allocator needs to be able to release blocks (and issue
a blkdev discard request) when the transaction which freed those
blocks is committed. Previously this was done via a polling mechanism
when blocks are allocated or freed. A much better way of doing things
is to create a jbd2 callback function and attaching the list of blocks
to be freed directly to the transaction structure.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Let the block device know when unused blocks can be discarded, using
the new sb_issue_discard() interface.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
With this patch we track the block freed during a transaction using
red-black tree. We also make sure contiguous blocks freed are collected
in one node in the tree.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
...and into the core setup/teardown code in fs/ext4/super.c so that
other parts of ext4 can define tuning parameters.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently, the locality group prealloc list is freed only when there
is a block allocation failure. This can result in large number of
entries in the preallocation list making ext4_mb_use_preallocated()
expensive.
To fix this, we convert the locality group prealloc list to a hash
list. The hash index is the order of number of blocks in the prealloc
space with a max order of 9. When adding prealloc space to the list we
make sure total entries for each order does not exceed 8. If it is
more than 8 we discard few entries and make sure the we have only <= 5
entries.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Move function and structure definiations out of mballoc.c and put it under
a new header file mballoc.h
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>