mirror of
https://github.com/torvalds/linux.git
synced 2024-11-22 20:22:09 +00:00
6a518afcc2
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bwTgAKCRCRxhvAZXjc ovd2AQCK00NAtGjQCjQPQGyTa4GAPqvWgq1ef0lnhv+TL5US5gD9FncQ8UofeMXt pBfjtAD6ettTPCTxUQfnTwWEU4rc7Qg= =27Wm -----END PGP SIGNATURE----- Merge tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull VFS acl updates from Christian Brauner: "This contains the work that builds a dedicated vfs posix acl api. The origins of this work trace back to v5.19 but it took quite a while to understand the various filesystem specific implementations in sufficient detail and also come up with an acceptable solution. As we discussed and seen multiple times the current state of how posix acls are handled isn't nice and comes with a lot of problems: The current way of handling posix acls via the generic xattr api is error prone, hard to maintain, and type unsafe for the vfs until we call into the filesystem's dedicated get and set inode operations. It is already the case that posix acls are special-cased to death all the way through the vfs. There are an uncounted number of hacks that operate on the uapi posix acl struct instead of the dedicated vfs struct posix_acl. And the vfs must be involved in order to interpret and fixup posix acls before storing them to the backing store, caching them, reporting them to userspace, or for permission checking. Currently a range of hacks and duct tape exist to make this work. As with most things this is really no ones fault it's just something that happened over time. But the code is hard to understand and difficult to maintain and one is constantly at risk of introducing bugs and regressions when having to touch it. Instead of continuing to hack posix acls through the xattr handlers this series builds a dedicated posix acl api solely around the get and set inode operations. Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl() helpers must be used in order to interact with posix acls. They operate directly on the vfs internal struct posix_acl instead of abusing the uapi posix acl struct as we currently do. In the end this removes all of the hackiness, makes the codepaths easier to maintain, and gets us type safety. This series passes the LTP and xfstests suites without any regressions. For xfstests the following combinations were tested: - xfs - ext4 - btrfs - overlayfs - overlayfs on top of idmapped mounts - orangefs - (limited) cifs There's more simplifications for posix acls that we can make in the future if the basic api has made it. A few implementation details: - The series makes sure to retain exactly the same security and integrity module permission checks. Especially for the integrity modules this api is a win because right now they convert the uapi posix acl struct passed to them via a void pointer into the vfs struct posix_acl format to perform permission checking on the mode. There's a new dedicated security hook for setting posix acls which passes the vfs struct posix_acl not a void pointer. Basing checking on the posix acl stored in the uapi format is really unreliable. The vfs currently hacks around directly in the uapi struct storing values that frankly the security and integrity modules can't correctly interpret as evidenced by bugs we reported and fixed in this area. It's not necessarily even their fault it's just that the format we provide to them is sub optimal. - Some filesystems like 9p and cifs need access to the dentry in order to get and set posix acls which is why they either only partially or not even at all implement get and set inode operations. For example, cifs allows setxattr() and getxattr() operations but doesn't allow permission checking based on posix acls because it can't implement a get acl inode operation. Thus, this patch series updates the set acl inode operation to take a dentry instead of an inode argument. However, for the get acl inode operation we can't do this as the old get acl method is called in e.g., generic_permission() and inode_permission(). These helpers in turn are called in various filesystem's permission inode operation. So passing a dentry argument to the old get acl inode operation would amount to passing a dentry to the permission inode operation which we shouldn't and probably can't do. So instead of extending the existing inode operation Christoph suggested to add a new one. He also requested to ensure that the get and set acl inode operation taking a dentry are consistently named. So for this version the old get acl operation is renamed to ->get_inode_acl() and a new ->get_acl() inode operation taking a dentry is added. With this we can give both 9p and cifs get and set acl inode operations and in turn remove their complex custom posix xattr handlers. In the future I hope to get rid of the inode method duplication but it isn't like we have never had this situation. Readdir is just one example. And frankly, the overall gain in type safety and the more pleasant api wise are simply too big of a benefit to not accept this duplication for a while. - We've done a full audit of every codepaths using variant of the current generic xattr api to get and set posix acls and surprisingly it isn't that many places. There's of course always a chance that we might have missed some and if so I'm sure we'll find them soon enough. The crucial codepaths to be converted are obviously stacking filesystems such as ecryptfs and overlayfs. For a list of all callers currently using generic xattr api helpers see [2] including comments whether they support posix acls or not. - The old vfs generic posix acl infrastructure doesn't obey the create and replace semantics promised on the setxattr(2) manpage. This patch series doesn't address this. It really is something we should revisit later though. The patches are roughly organized as follows: (1) Change existing set acl inode operation to take a dentry argument (Intended to be a non-functional change) (2) Rename existing get acl method (Intended to be a non-functional change) (3) Implement get and set acl inode operations for filesystems that couldn't implement one before because of the missing dentry. That's mostly 9p and cifs (Intended to be a non-functional change) (4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl() including security and integrity hooks (Intended to be a non-functional change) (5) Implement get and set acl inode operations for stacking filesystems (Intended to be a non-functional change) (6) Switch posix acl handling in stacking filesystems to new posix acl api now that all filesystems it can stack upon support it. (7) Switch vfs to new posix acl api (semantical change) (8) Remove all now unused helpers (9) Additional regression fixes reported after we merged this into linux-next Thanks to Seth for a lot of good discussion around this and encouragement and input from Christoph" * tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits) posix_acl: Fix the type of sentinel in get_acl orangefs: fix mode handling ovl: call posix_acl_release() after error checking evm: remove dead code in evm_inode_set_acl() cifs: check whether acl is valid early acl: make vfs_posix_acl_to_xattr() static acl: remove a slew of now unused helpers 9p: use stub posix acl handlers cifs: use stub posix acl handlers ovl: use stub posix acl handlers ecryptfs: use stub posix acl handlers evm: remove evm_xattr_acl_change() xattr: use posix acl api ovl: use posix acl api ovl: implement set acl method ovl: implement get acl method ecryptfs: implement set acl method ecryptfs: implement get acl method ksmbd: use vfs_remove_acl() acl: add vfs_remove_acl() ...
1628 lines
45 KiB
C
1628 lines
45 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
/*
|
|
* linux/fs/ext4/ialloc.c
|
|
*
|
|
* Copyright (C) 1992, 1993, 1994, 1995
|
|
* Remy Card (card@masi.ibp.fr)
|
|
* Laboratoire MASI - Institut Blaise Pascal
|
|
* Universite Pierre et Marie Curie (Paris VI)
|
|
*
|
|
* BSD ufs-inspired inode and directory allocation by
|
|
* Stephen Tweedie (sct@redhat.com), 1993
|
|
* Big-endian to little-endian byte-swapping/bitmaps by
|
|
* David S. Miller (davem@caip.rutgers.edu), 1995
|
|
*/
|
|
|
|
#include <linux/time.h>
|
|
#include <linux/fs.h>
|
|
#include <linux/stat.h>
|
|
#include <linux/string.h>
|
|
#include <linux/quotaops.h>
|
|
#include <linux/buffer_head.h>
|
|
#include <linux/random.h>
|
|
#include <linux/bitops.h>
|
|
#include <linux/blkdev.h>
|
|
#include <linux/cred.h>
|
|
|
|
#include <asm/byteorder.h>
|
|
|
|
#include "ext4.h"
|
|
#include "ext4_jbd2.h"
|
|
#include "xattr.h"
|
|
#include "acl.h"
|
|
|
|
#include <trace/events/ext4.h>
|
|
|
|
/*
|
|
* ialloc.c contains the inodes allocation and deallocation routines
|
|
*/
|
|
|
|
/*
|
|
* The free inodes are managed by bitmaps. A file system contains several
|
|
* blocks groups. Each group contains 1 bitmap block for blocks, 1 bitmap
|
|
* block for inodes, N blocks for the inode table and data blocks.
|
|
*
|
|
* The file system contains group descriptors which are located after the
|
|
* super block. Each descriptor contains the number of the bitmap block and
|
|
* the free blocks count in the block.
|
|
*/
|
|
|
|
/*
|
|
* To avoid calling the atomic setbit hundreds or thousands of times, we only
|
|
* need to use it within a single byte (to ensure we get endianness right).
|
|
* We can use memset for the rest of the bitmap as there are no other users.
|
|
*/
|
|
void ext4_mark_bitmap_end(int start_bit, int end_bit, char *bitmap)
|
|
{
|
|
int i;
|
|
|
|
if (start_bit >= end_bit)
|
|
return;
|
|
|
|
ext4_debug("mark end bits +%d through +%d used\n", start_bit, end_bit);
|
|
for (i = start_bit; i < ((start_bit + 7) & ~7UL); i++)
|
|
ext4_set_bit(i, bitmap);
|
|
if (i < end_bit)
|
|
memset(bitmap + (i >> 3), 0xff, (end_bit - i) >> 3);
|
|
}
|
|
|
|
void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate)
|
|
{
|
|
if (uptodate) {
|
|
set_buffer_uptodate(bh);
|
|
set_bitmap_uptodate(bh);
|
|
}
|
|
unlock_buffer(bh);
|
|
put_bh(bh);
|
|
}
|
|
|
|
static int ext4_validate_inode_bitmap(struct super_block *sb,
|
|
struct ext4_group_desc *desc,
|
|
ext4_group_t block_group,
|
|
struct buffer_head *bh)
|
|
{
|
|
ext4_fsblk_t blk;
|
|
struct ext4_group_info *grp;
|
|
|
|
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
|
|
return 0;
|
|
|
|
grp = ext4_get_group_info(sb, block_group);
|
|
|
|
if (buffer_verified(bh))
|
|
return 0;
|
|
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
|
|
return -EFSCORRUPTED;
|
|
|
|
ext4_lock_group(sb, block_group);
|
|
if (buffer_verified(bh))
|
|
goto verified;
|
|
blk = ext4_inode_bitmap(sb, desc);
|
|
if (!ext4_inode_bitmap_csum_verify(sb, block_group, desc, bh,
|
|
EXT4_INODES_PER_GROUP(sb) / 8) ||
|
|
ext4_simulate_fail(sb, EXT4_SIM_IBITMAP_CRC)) {
|
|
ext4_unlock_group(sb, block_group);
|
|
ext4_error(sb, "Corrupt inode bitmap - block_group = %u, "
|
|
"inode_bitmap = %llu", block_group, blk);
|
|
ext4_mark_group_bitmap_corrupted(sb, block_group,
|
|
EXT4_GROUP_INFO_IBITMAP_CORRUPT);
|
|
return -EFSBADCRC;
|
|
}
|
|
set_buffer_verified(bh);
|
|
verified:
|
|
ext4_unlock_group(sb, block_group);
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* Read the inode allocation bitmap for a given block_group, reading
|
|
* into the specified slot in the superblock's bitmap cache.
|
|
*
|
|
* Return buffer_head of bitmap on success, or an ERR_PTR on error.
|
|
*/
|
|
static struct buffer_head *
|
|
ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
|
|
{
|
|
struct ext4_group_desc *desc;
|
|
struct ext4_sb_info *sbi = EXT4_SB(sb);
|
|
struct buffer_head *bh = NULL;
|
|
ext4_fsblk_t bitmap_blk;
|
|
int err;
|
|
|
|
desc = ext4_get_group_desc(sb, block_group, NULL);
|
|
if (!desc)
|
|
return ERR_PTR(-EFSCORRUPTED);
|
|
|
|
bitmap_blk = ext4_inode_bitmap(sb, desc);
|
|
if ((bitmap_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) ||
|
|
(bitmap_blk >= ext4_blocks_count(sbi->s_es))) {
|
|
ext4_error(sb, "Invalid inode bitmap blk %llu in "
|
|
"block_group %u", bitmap_blk, block_group);
|
|
ext4_mark_group_bitmap_corrupted(sb, block_group,
|
|
EXT4_GROUP_INFO_IBITMAP_CORRUPT);
|
|
return ERR_PTR(-EFSCORRUPTED);
|
|
}
|
|
bh = sb_getblk(sb, bitmap_blk);
|
|
if (unlikely(!bh)) {
|
|
ext4_warning(sb, "Cannot read inode bitmap - "
|
|
"block_group = %u, inode_bitmap = %llu",
|
|
block_group, bitmap_blk);
|
|
return ERR_PTR(-ENOMEM);
|
|
}
|
|
if (bitmap_uptodate(bh))
|
|
goto verify;
|
|
|
|
lock_buffer(bh);
|
|
if (bitmap_uptodate(bh)) {
|
|
unlock_buffer(bh);
|
|
goto verify;
|
|
}
|
|
|
|
ext4_lock_group(sb, block_group);
|
|
if (ext4_has_group_desc_csum(sb) &&
|
|
(desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT))) {
|
|
if (block_group == 0) {
|
|
ext4_unlock_group(sb, block_group);
|
|
unlock_buffer(bh);
|
|
ext4_error(sb, "Inode bitmap for bg 0 marked "
|
|
"uninitialized");
|
|
err = -EFSCORRUPTED;
|
|
goto out;
|
|
}
|
|
memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8);
|
|
ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb),
|
|
sb->s_blocksize * 8, bh->b_data);
|
|
set_bitmap_uptodate(bh);
|
|
set_buffer_uptodate(bh);
|
|
set_buffer_verified(bh);
|
|
ext4_unlock_group(sb, block_group);
|
|
unlock_buffer(bh);
|
|
return bh;
|
|
}
|
|
ext4_unlock_group(sb, block_group);
|
|
|
|
if (buffer_uptodate(bh)) {
|
|
/*
|
|
* if not uninit if bh is uptodate,
|
|
* bitmap is also uptodate
|
|
*/
|
|
set_bitmap_uptodate(bh);
|
|
unlock_buffer(bh);
|
|
goto verify;
|
|
}
|
|
/*
|
|
* submit the buffer_head for reading
|
|
*/
|
|
trace_ext4_load_inode_bitmap(sb, block_group);
|
|
ext4_read_bh(bh, REQ_META | REQ_PRIO, ext4_end_bitmap_read);
|
|
ext4_simulate_fail_bh(sb, bh, EXT4_SIM_IBITMAP_EIO);
|
|
if (!buffer_uptodate(bh)) {
|
|
put_bh(bh);
|
|
ext4_error_err(sb, EIO, "Cannot read inode bitmap - "
|
|
"block_group = %u, inode_bitmap = %llu",
|
|
block_group, bitmap_blk);
|
|
ext4_mark_group_bitmap_corrupted(sb, block_group,
|
|
EXT4_GROUP_INFO_IBITMAP_CORRUPT);
|
|
return ERR_PTR(-EIO);
|
|
}
|
|
|
|
verify:
|
|
err = ext4_validate_inode_bitmap(sb, desc, block_group, bh);
|
|
if (err)
|
|
goto out;
|
|
return bh;
|
|
out:
|
|
put_bh(bh);
|
|
return ERR_PTR(err);
|
|
}
|
|
|
|
/*
|
|
* NOTE! When we get the inode, we're the only people
|
|
* that have access to it, and as such there are no
|
|
* race conditions we have to worry about. The inode
|
|
* is not on the hash-lists, and it cannot be reached
|
|
* through the filesystem because the directory entry
|
|
* has been deleted earlier.
|
|
*
|
|
* HOWEVER: we must make sure that we get no aliases,
|
|
* which means that we have to call "clear_inode()"
|
|
* _before_ we mark the inode not in use in the inode
|
|
* bitmaps. Otherwise a newly created file might use
|
|
* the same inode number (not actually the same pointer
|
|
* though), and then we'd have two inodes sharing the
|
|
* same inode number and space on the harddisk.
|
|
*/
|
|
void ext4_free_inode(handle_t *handle, struct inode *inode)
|
|
{
|
|
struct super_block *sb = inode->i_sb;
|
|
int is_directory;
|
|
unsigned long ino;
|
|
struct buffer_head *bitmap_bh = NULL;
|
|
struct buffer_head *bh2;
|
|
ext4_group_t block_group;
|
|
unsigned long bit;
|
|
struct ext4_group_desc *gdp;
|
|
struct ext4_super_block *es;
|
|
struct ext4_sb_info *sbi;
|
|
int fatal = 0, err, count, cleared;
|
|
struct ext4_group_info *grp;
|
|
|
|
if (!sb) {
|
|
printk(KERN_ERR "EXT4-fs: %s:%d: inode on "
|
|
"nonexistent device\n", __func__, __LINE__);
|
|
return;
|
|
}
|
|
if (atomic_read(&inode->i_count) > 1) {
|
|
ext4_msg(sb, KERN_ERR, "%s:%d: inode #%lu: count=%d",
|
|
__func__, __LINE__, inode->i_ino,
|
|
atomic_read(&inode->i_count));
|
|
return;
|
|
}
|
|
if (inode->i_nlink) {
|
|
ext4_msg(sb, KERN_ERR, "%s:%d: inode #%lu: nlink=%d\n",
|
|
__func__, __LINE__, inode->i_ino, inode->i_nlink);
|
|
return;
|
|
}
|
|
sbi = EXT4_SB(sb);
|
|
|
|
ino = inode->i_ino;
|
|
ext4_debug("freeing inode %lu\n", ino);
|
|
trace_ext4_free_inode(inode);
|
|
|
|
dquot_initialize(inode);
|
|
dquot_free_inode(inode);
|
|
|
|
is_directory = S_ISDIR(inode->i_mode);
|
|
|
|
/* Do this BEFORE marking the inode not in use or returning an error */
|
|
ext4_clear_inode(inode);
|
|
|
|
es = sbi->s_es;
|
|
if (ino < EXT4_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
|
|
ext4_error(sb, "reserved or nonexistent inode %lu", ino);
|
|
goto error_return;
|
|
}
|
|
block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
|
|
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
|
|
bitmap_bh = ext4_read_inode_bitmap(sb, block_group);
|
|
/* Don't bother if the inode bitmap is corrupt. */
|
|
if (IS_ERR(bitmap_bh)) {
|
|
fatal = PTR_ERR(bitmap_bh);
|
|
bitmap_bh = NULL;
|
|
goto error_return;
|
|
}
|
|
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
|
|
grp = ext4_get_group_info(sb, block_group);
|
|
if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
|
|
fatal = -EFSCORRUPTED;
|
|
goto error_return;
|
|
}
|
|
}
|
|
|
|
BUFFER_TRACE(bitmap_bh, "get_write_access");
|
|
fatal = ext4_journal_get_write_access(handle, sb, bitmap_bh,
|
|
EXT4_JTR_NONE);
|
|
if (fatal)
|
|
goto error_return;
|
|
|
|
fatal = -ESRCH;
|
|
gdp = ext4_get_group_desc(sb, block_group, &bh2);
|
|
if (gdp) {
|
|
BUFFER_TRACE(bh2, "get_write_access");
|
|
fatal = ext4_journal_get_write_access(handle, sb, bh2,
|
|
EXT4_JTR_NONE);
|
|
}
|
|
ext4_lock_group(sb, block_group);
|
|
cleared = ext4_test_and_clear_bit(bit, bitmap_bh->b_data);
|
|
if (fatal || !cleared) {
|
|
ext4_unlock_group(sb, block_group);
|
|
goto out;
|
|
}
|
|
|
|
count = ext4_free_inodes_count(sb, gdp) + 1;
|
|
ext4_free_inodes_set(sb, gdp, count);
|
|
if (is_directory) {
|
|
count = ext4_used_dirs_count(sb, gdp) - 1;
|
|
ext4_used_dirs_set(sb, gdp, count);
|
|
if (percpu_counter_initialized(&sbi->s_dirs_counter))
|
|
percpu_counter_dec(&sbi->s_dirs_counter);
|
|
}
|
|
ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
|
|
EXT4_INODES_PER_GROUP(sb) / 8);
|
|
ext4_group_desc_csum_set(sb, block_group, gdp);
|
|
ext4_unlock_group(sb, block_group);
|
|
|
|
if (percpu_counter_initialized(&sbi->s_freeinodes_counter))
|
|
percpu_counter_inc(&sbi->s_freeinodes_counter);
|
|
if (sbi->s_log_groups_per_flex) {
|
|
struct flex_groups *fg;
|
|
|
|
fg = sbi_array_rcu_deref(sbi, s_flex_groups,
|
|
ext4_flex_group(sbi, block_group));
|
|
atomic_inc(&fg->free_inodes);
|
|
if (is_directory)
|
|
atomic_dec(&fg->used_dirs);
|
|
}
|
|
BUFFER_TRACE(bh2, "call ext4_handle_dirty_metadata");
|
|
fatal = ext4_handle_dirty_metadata(handle, NULL, bh2);
|
|
out:
|
|
if (cleared) {
|
|
BUFFER_TRACE(bitmap_bh, "call ext4_handle_dirty_metadata");
|
|
err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
|
|
if (!fatal)
|
|
fatal = err;
|
|
} else {
|
|
ext4_error(sb, "bit already cleared for inode %lu", ino);
|
|
ext4_mark_group_bitmap_corrupted(sb, block_group,
|
|
EXT4_GROUP_INFO_IBITMAP_CORRUPT);
|
|
}
|
|
|
|
error_return:
|
|
brelse(bitmap_bh);
|
|
ext4_std_error(sb, fatal);
|
|
}
|
|
|
|
struct orlov_stats {
|
|
__u64 free_clusters;
|
|
__u32 free_inodes;
|
|
__u32 used_dirs;
|
|
};
|
|
|
|
/*
|
|
* Helper function for Orlov's allocator; returns critical information
|
|
* for a particular block group or flex_bg. If flex_size is 1, then g
|
|
* is a block group number; otherwise it is flex_bg number.
|
|
*/
|
|
static void get_orlov_stats(struct super_block *sb, ext4_group_t g,
|
|
int flex_size, struct orlov_stats *stats)
|
|
{
|
|
struct ext4_group_desc *desc;
|
|
|
|
if (flex_size > 1) {
|
|
struct flex_groups *fg = sbi_array_rcu_deref(EXT4_SB(sb),
|
|
s_flex_groups, g);
|
|
stats->free_inodes = atomic_read(&fg->free_inodes);
|
|
stats->free_clusters = atomic64_read(&fg->free_clusters);
|
|
stats->used_dirs = atomic_read(&fg->used_dirs);
|
|
return;
|
|
}
|
|
|
|
desc = ext4_get_group_desc(sb, g, NULL);
|
|
if (desc) {
|
|
stats->free_inodes = ext4_free_inodes_count(sb, desc);
|
|
stats->free_clusters = ext4_free_group_clusters(sb, desc);
|
|
stats->used_dirs = ext4_used_dirs_count(sb, desc);
|
|
} else {
|
|
stats->free_inodes = 0;
|
|
stats->free_clusters = 0;
|
|
stats->used_dirs = 0;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Orlov's allocator for directories.
|
|
*
|
|
* We always try to spread first-level directories.
|
|
*
|
|
* If there are blockgroups with both free inodes and free clusters counts
|
|
* not worse than average we return one with smallest directory count.
|
|
* Otherwise we simply return a random group.
|
|
*
|
|
* For the rest rules look so:
|
|
*
|
|
* It's OK to put directory into a group unless
|
|
* it has too many directories already (max_dirs) or
|
|
* it has too few free inodes left (min_inodes) or
|
|
* it has too few free clusters left (min_clusters) or
|
|
* Parent's group is preferred, if it doesn't satisfy these
|
|
* conditions we search cyclically through the rest. If none
|
|
* of the groups look good we just look for a group with more
|
|
* free inodes than average (starting at parent's group).
|
|
*/
|
|
|
|
static int find_group_orlov(struct super_block *sb, struct inode *parent,
|
|
ext4_group_t *group, umode_t mode,
|
|
const struct qstr *qstr)
|
|
{
|
|
ext4_group_t parent_group = EXT4_I(parent)->i_block_group;
|
|
struct ext4_sb_info *sbi = EXT4_SB(sb);
|
|
ext4_group_t real_ngroups = ext4_get_groups_count(sb);
|
|
int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
|
|
unsigned int freei, avefreei, grp_free;
|
|
ext4_fsblk_t freec, avefreec;
|
|
unsigned int ndirs;
|
|
int max_dirs, min_inodes;
|
|
ext4_grpblk_t min_clusters;
|
|
ext4_group_t i, grp, g, ngroups;
|
|
struct ext4_group_desc *desc;
|
|
struct orlov_stats stats;
|
|
int flex_size = ext4_flex_bg_size(sbi);
|
|
struct dx_hash_info hinfo;
|
|
|
|
ngroups = real_ngroups;
|
|
if (flex_size > 1) {
|
|
ngroups = (real_ngroups + flex_size - 1) >>
|
|
sbi->s_log_groups_per_flex;
|
|
parent_group >>= sbi->s_log_groups_per_flex;
|
|
}
|
|
|
|
freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
|
|
avefreei = freei / ngroups;
|
|
freec = percpu_counter_read_positive(&sbi->s_freeclusters_counter);
|
|
avefreec = freec;
|
|
do_div(avefreec, ngroups);
|
|
ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
|
|
|
|
if (S_ISDIR(mode) &&
|
|
((parent == d_inode(sb->s_root)) ||
|
|
(ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
|
|
int best_ndir = inodes_per_group;
|
|
int ret = -1;
|
|
|
|
if (qstr) {
|
|
hinfo.hash_version = DX_HASH_HALF_MD4;
|
|
hinfo.seed = sbi->s_hash_seed;
|
|
ext4fs_dirhash(parent, qstr->name, qstr->len, &hinfo);
|
|
parent_group = hinfo.hash % ngroups;
|
|
} else
|
|
parent_group = get_random_u32_below(ngroups);
|
|
for (i = 0; i < ngroups; i++) {
|
|
g = (parent_group + i) % ngroups;
|
|
get_orlov_stats(sb, g, flex_size, &stats);
|
|
if (!stats.free_inodes)
|
|
continue;
|
|
if (stats.used_dirs >= best_ndir)
|
|
continue;
|
|
if (stats.free_inodes < avefreei)
|
|
continue;
|
|
if (stats.free_clusters < avefreec)
|
|
continue;
|
|
grp = g;
|
|
ret = 0;
|
|
best_ndir = stats.used_dirs;
|
|
}
|
|
if (ret)
|
|
goto fallback;
|
|
found_flex_bg:
|
|
if (flex_size == 1) {
|
|
*group = grp;
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* We pack inodes at the beginning of the flexgroup's
|
|
* inode tables. Block allocation decisions will do
|
|
* something similar, although regular files will
|
|
* start at 2nd block group of the flexgroup. See
|
|
* ext4_ext_find_goal() and ext4_find_near().
|
|
*/
|
|
grp *= flex_size;
|
|
for (i = 0; i < flex_size; i++) {
|
|
if (grp+i >= real_ngroups)
|
|
break;
|
|
desc = ext4_get_group_desc(sb, grp+i, NULL);
|
|
if (desc && ext4_free_inodes_count(sb, desc)) {
|
|
*group = grp+i;
|
|
return 0;
|
|
}
|
|
}
|
|
goto fallback;
|
|
}
|
|
|
|
max_dirs = ndirs / ngroups + inodes_per_group*flex_size / 16;
|
|
min_inodes = avefreei - inodes_per_group*flex_size / 4;
|
|
if (min_inodes < 1)
|
|
min_inodes = 1;
|
|
min_clusters = avefreec - EXT4_CLUSTERS_PER_GROUP(sb)*flex_size / 4;
|
|
|
|
/*
|
|
* Start looking in the flex group where we last allocated an
|
|
* inode for this parent directory
|
|
*/
|
|
if (EXT4_I(parent)->i_last_alloc_group != ~0) {
|
|
parent_group = EXT4_I(parent)->i_last_alloc_group;
|
|
if (flex_size > 1)
|
|
parent_group >>= sbi->s_log_groups_per_flex;
|
|
}
|
|
|
|
for (i = 0; i < ngroups; i++) {
|
|
grp = (parent_group + i) % ngroups;
|
|
get_orlov_stats(sb, grp, flex_size, &stats);
|
|
if (stats.used_dirs >= max_dirs)
|
|
continue;
|
|
if (stats.free_inodes < min_inodes)
|
|
continue;
|
|
if (stats.free_clusters < min_clusters)
|
|
continue;
|
|
goto found_flex_bg;
|
|
}
|
|
|
|
fallback:
|
|
ngroups = real_ngroups;
|
|
avefreei = freei / ngroups;
|
|
fallback_retry:
|
|
parent_group = EXT4_I(parent)->i_block_group;
|
|
for (i = 0; i < ngroups; i++) {
|
|
grp = (parent_group + i) % ngroups;
|
|
desc = ext4_get_group_desc(sb, grp, NULL);
|
|
if (desc) {
|
|
grp_free = ext4_free_inodes_count(sb, desc);
|
|
if (grp_free && grp_free >= avefreei) {
|
|
*group = grp;
|
|
return 0;
|
|
}
|
|
}
|
|
}
|
|
|
|
if (avefreei) {
|
|
/*
|
|
* The free-inodes counter is approximate, and for really small
|
|
* filesystems the above test can fail to find any blockgroups
|
|
*/
|
|
avefreei = 0;
|
|
goto fallback_retry;
|
|
}
|
|
|
|
return -1;
|
|
}
|
|
|
|
static int find_group_other(struct super_block *sb, struct inode *parent,
|
|
ext4_group_t *group, umode_t mode)
|
|
{
|
|
ext4_group_t parent_group = EXT4_I(parent)->i_block_group;
|
|
ext4_group_t i, last, ngroups = ext4_get_groups_count(sb);
|
|
struct ext4_group_desc *desc;
|
|
int flex_size = ext4_flex_bg_size(EXT4_SB(sb));
|
|
|
|
/*
|
|
* Try to place the inode is the same flex group as its
|
|
* parent. If we can't find space, use the Orlov algorithm to
|
|
* find another flex group, and store that information in the
|
|
* parent directory's inode information so that use that flex
|
|
* group for future allocations.
|
|
*/
|
|
if (flex_size > 1) {
|
|
int retry = 0;
|
|
|
|
try_again:
|
|
parent_group &= ~(flex_size-1);
|
|
last = parent_group + flex_size;
|
|
if (last > ngroups)
|
|
last = ngroups;
|
|
for (i = parent_group; i < last; i++) {
|
|
desc = ext4_get_group_desc(sb, i, NULL);
|
|
if (desc && ext4_free_inodes_count(sb, desc)) {
|
|
*group = i;
|
|
return 0;
|
|
}
|
|
}
|
|
if (!retry && EXT4_I(parent)->i_last_alloc_group != ~0) {
|
|
retry = 1;
|
|
parent_group = EXT4_I(parent)->i_last_alloc_group;
|
|
goto try_again;
|
|
}
|
|
/*
|
|
* If this didn't work, use the Orlov search algorithm
|
|
* to find a new flex group; we pass in the mode to
|
|
* avoid the topdir algorithms.
|
|
*/
|
|
*group = parent_group + flex_size;
|
|
if (*group > ngroups)
|
|
*group = 0;
|
|
return find_group_orlov(sb, parent, group, mode, NULL);
|
|
}
|
|
|
|
/*
|
|
* Try to place the inode in its parent directory
|
|
*/
|
|
*group = parent_group;
|
|
desc = ext4_get_group_desc(sb, *group, NULL);
|
|
if (desc && ext4_free_inodes_count(sb, desc) &&
|
|
ext4_free_group_clusters(sb, desc))
|
|
return 0;
|
|
|
|
/*
|
|
* We're going to place this inode in a different blockgroup from its
|
|
* parent. We want to cause files in a common directory to all land in
|
|
* the same blockgroup. But we want files which are in a different
|
|
* directory which shares a blockgroup with our parent to land in a
|
|
* different blockgroup.
|
|
*
|
|
* So add our directory's i_ino into the starting point for the hash.
|
|
*/
|
|
*group = (*group + parent->i_ino) % ngroups;
|
|
|
|
/*
|
|
* Use a quadratic hash to find a group with a free inode and some free
|
|
* blocks.
|
|
*/
|
|
for (i = 1; i < ngroups; i <<= 1) {
|
|
*group += i;
|
|
if (*group >= ngroups)
|
|
*group -= ngroups;
|
|
desc = ext4_get_group_desc(sb, *group, NULL);
|
|
if (desc && ext4_free_inodes_count(sb, desc) &&
|
|
ext4_free_group_clusters(sb, desc))
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* That failed: try linear search for a free inode, even if that group
|
|
* has no free blocks.
|
|
*/
|
|
*group = parent_group;
|
|
for (i = 0; i < ngroups; i++) {
|
|
if (++*group >= ngroups)
|
|
*group = 0;
|
|
desc = ext4_get_group_desc(sb, *group, NULL);
|
|
if (desc && ext4_free_inodes_count(sb, desc))
|
|
return 0;
|
|
}
|
|
|
|
return -1;
|
|
}
|
|
|
|
/*
|
|
* In no journal mode, if an inode has recently been deleted, we want
|
|
* to avoid reusing it until we're reasonably sure the inode table
|
|
* block has been written back to disk. (Yes, these values are
|
|
* somewhat arbitrary...)
|
|
*/
|
|
#define RECENTCY_MIN 60
|
|
#define RECENTCY_DIRTY 300
|
|
|
|
static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
|
|
{
|
|
struct ext4_group_desc *gdp;
|
|
struct ext4_inode *raw_inode;
|
|
struct buffer_head *bh;
|
|
int inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
|
|
int offset, ret = 0;
|
|
int recentcy = RECENTCY_MIN;
|
|
u32 dtime, now;
|
|
|
|
gdp = ext4_get_group_desc(sb, group, NULL);
|
|
if (unlikely(!gdp))
|
|
return 0;
|
|
|
|
bh = sb_find_get_block(sb, ext4_inode_table(sb, gdp) +
|
|
(ino / inodes_per_block));
|
|
if (!bh || !buffer_uptodate(bh))
|
|
/*
|
|
* If the block is not in the buffer cache, then it
|
|
* must have been written out.
|
|
*/
|
|
goto out;
|
|
|
|
offset = (ino % inodes_per_block) * EXT4_INODE_SIZE(sb);
|
|
raw_inode = (struct ext4_inode *) (bh->b_data + offset);
|
|
|
|
/* i_dtime is only 32 bits on disk, but we only care about relative
|
|
* times in the range of a few minutes (i.e. long enough to sync a
|
|
* recently-deleted inode to disk), so using the low 32 bits of the
|
|
* clock (a 68 year range) is enough, see time_before32() */
|
|
dtime = le32_to_cpu(raw_inode->i_dtime);
|
|
now = ktime_get_real_seconds();
|
|
if (buffer_dirty(bh))
|
|
recentcy += RECENTCY_DIRTY;
|
|
|
|
if (dtime && time_before32(dtime, now) &&
|
|
time_before32(now, dtime + recentcy))
|
|
ret = 1;
|
|
out:
|
|
brelse(bh);
|
|
return ret;
|
|
}
|
|
|
|
static int find_inode_bit(struct super_block *sb, ext4_group_t group,
|
|
struct buffer_head *bitmap, unsigned long *ino)
|
|
{
|
|
bool check_recently_deleted = EXT4_SB(sb)->s_journal == NULL;
|
|
unsigned long recently_deleted_ino = EXT4_INODES_PER_GROUP(sb);
|
|
|
|
next:
|
|
*ino = ext4_find_next_zero_bit((unsigned long *)
|
|
bitmap->b_data,
|
|
EXT4_INODES_PER_GROUP(sb), *ino);
|
|
if (*ino >= EXT4_INODES_PER_GROUP(sb))
|
|
goto not_found;
|
|
|
|
if (check_recently_deleted && recently_deleted(sb, group, *ino)) {
|
|
recently_deleted_ino = *ino;
|
|
*ino = *ino + 1;
|
|
if (*ino < EXT4_INODES_PER_GROUP(sb))
|
|
goto next;
|
|
goto not_found;
|
|
}
|
|
return 1;
|
|
not_found:
|
|
if (recently_deleted_ino >= EXT4_INODES_PER_GROUP(sb))
|
|
return 0;
|
|
/*
|
|
* Not reusing recently deleted inodes is mostly a preference. We don't
|
|
* want to report ENOSPC or skew allocation patterns because of that.
|
|
* So return even recently deleted inode if we could find better in the
|
|
* given range.
|
|
*/
|
|
*ino = recently_deleted_ino;
|
|
return 1;
|
|
}
|
|
|
|
int ext4_mark_inode_used(struct super_block *sb, int ino)
|
|
{
|
|
unsigned long max_ino = le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count);
|
|
struct buffer_head *inode_bitmap_bh = NULL, *group_desc_bh = NULL;
|
|
struct ext4_group_desc *gdp;
|
|
ext4_group_t group;
|
|
int bit;
|
|
int err = -EFSCORRUPTED;
|
|
|
|
if (ino < EXT4_FIRST_INO(sb) || ino > max_ino)
|
|
goto out;
|
|
|
|
group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
|
|
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
|
|
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
|
|
if (IS_ERR(inode_bitmap_bh))
|
|
return PTR_ERR(inode_bitmap_bh);
|
|
|
|
if (ext4_test_bit(bit, inode_bitmap_bh->b_data)) {
|
|
err = 0;
|
|
goto out;
|
|
}
|
|
|
|
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
|
|
if (!gdp || !group_desc_bh) {
|
|
err = -EINVAL;
|
|
goto out;
|
|
}
|
|
|
|
ext4_set_bit(bit, inode_bitmap_bh->b_data);
|
|
|
|
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
|
|
err = ext4_handle_dirty_metadata(NULL, NULL, inode_bitmap_bh);
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
err = sync_dirty_buffer(inode_bitmap_bh);
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
|
|
/* We may have to initialize the block bitmap if it isn't already */
|
|
if (ext4_has_group_desc_csum(sb) &&
|
|
gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
|
|
struct buffer_head *block_bitmap_bh;
|
|
|
|
block_bitmap_bh = ext4_read_block_bitmap(sb, group);
|
|
if (IS_ERR(block_bitmap_bh)) {
|
|
err = PTR_ERR(block_bitmap_bh);
|
|
goto out;
|
|
}
|
|
|
|
BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
|
|
err = ext4_handle_dirty_metadata(NULL, NULL, block_bitmap_bh);
|
|
sync_dirty_buffer(block_bitmap_bh);
|
|
|
|
/* recheck and clear flag under lock if we still need to */
|
|
ext4_lock_group(sb, group);
|
|
if (ext4_has_group_desc_csum(sb) &&
|
|
(gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
|
|
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
|
|
ext4_free_group_clusters_set(sb, gdp,
|
|
ext4_free_clusters_after_init(sb, group, gdp));
|
|
ext4_block_bitmap_csum_set(sb, group, gdp,
|
|
block_bitmap_bh);
|
|
ext4_group_desc_csum_set(sb, group, gdp);
|
|
}
|
|
ext4_unlock_group(sb, group);
|
|
brelse(block_bitmap_bh);
|
|
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
}
|
|
|
|
/* Update the relevant bg descriptor fields */
|
|
if (ext4_has_group_desc_csum(sb)) {
|
|
int free;
|
|
|
|
ext4_lock_group(sb, group); /* while we modify the bg desc */
|
|
free = EXT4_INODES_PER_GROUP(sb) -
|
|
ext4_itable_unused_count(sb, gdp);
|
|
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
|
|
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
|
|
free = 0;
|
|
}
|
|
|
|
/*
|
|
* Check the relative inode number against the last used
|
|
* relative inode number in this group. if it is greater
|
|
* we need to update the bg_itable_unused count
|
|
*/
|
|
if (bit >= free)
|
|
ext4_itable_unused_set(sb, gdp,
|
|
(EXT4_INODES_PER_GROUP(sb) - bit - 1));
|
|
} else {
|
|
ext4_lock_group(sb, group);
|
|
}
|
|
|
|
ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
|
|
if (ext4_has_group_desc_csum(sb)) {
|
|
ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
|
|
EXT4_INODES_PER_GROUP(sb) / 8);
|
|
ext4_group_desc_csum_set(sb, group, gdp);
|
|
}
|
|
|
|
ext4_unlock_group(sb, group);
|
|
err = ext4_handle_dirty_metadata(NULL, NULL, group_desc_bh);
|
|
sync_dirty_buffer(group_desc_bh);
|
|
out:
|
|
return err;
|
|
}
|
|
|
|
static int ext4_xattr_credits_for_new_inode(struct inode *dir, mode_t mode,
|
|
bool encrypt)
|
|
{
|
|
struct super_block *sb = dir->i_sb;
|
|
int nblocks = 0;
|
|
#ifdef CONFIG_EXT4_FS_POSIX_ACL
|
|
struct posix_acl *p = get_inode_acl(dir, ACL_TYPE_DEFAULT);
|
|
|
|
if (IS_ERR(p))
|
|
return PTR_ERR(p);
|
|
if (p) {
|
|
int acl_size = p->a_count * sizeof(ext4_acl_entry);
|
|
|
|
nblocks += (S_ISDIR(mode) ? 2 : 1) *
|
|
__ext4_xattr_set_credits(sb, NULL /* inode */,
|
|
NULL /* block_bh */, acl_size,
|
|
true /* is_create */);
|
|
posix_acl_release(p);
|
|
}
|
|
#endif
|
|
|
|
#ifdef CONFIG_SECURITY
|
|
{
|
|
int num_security_xattrs = 1;
|
|
|
|
#ifdef CONFIG_INTEGRITY
|
|
num_security_xattrs++;
|
|
#endif
|
|
/*
|
|
* We assume that security xattrs are never more than 1k.
|
|
* In practice they are under 128 bytes.
|
|
*/
|
|
nblocks += num_security_xattrs *
|
|
__ext4_xattr_set_credits(sb, NULL /* inode */,
|
|
NULL /* block_bh */, 1024,
|
|
true /* is_create */);
|
|
}
|
|
#endif
|
|
if (encrypt)
|
|
nblocks += __ext4_xattr_set_credits(sb,
|
|
NULL /* inode */,
|
|
NULL /* block_bh */,
|
|
FSCRYPT_SET_CONTEXT_MAX_SIZE,
|
|
true /* is_create */);
|
|
return nblocks;
|
|
}
|
|
|
|
/*
|
|
* There are two policies for allocating an inode. If the new inode is
|
|
* a directory, then a forward search is made for a block group with both
|
|
* free space and a low directory-to-inode ratio; if that fails, then of
|
|
* the groups with above-average free space, that group with the fewest
|
|
* directories already is chosen.
|
|
*
|
|
* For other inodes, search forward from the parent directory's block
|
|
* group to find a free inode.
|
|
*/
|
|
struct inode *__ext4_new_inode(struct user_namespace *mnt_userns,
|
|
handle_t *handle, struct inode *dir,
|
|
umode_t mode, const struct qstr *qstr,
|
|
__u32 goal, uid_t *owner, __u32 i_flags,
|
|
int handle_type, unsigned int line_no,
|
|
int nblocks)
|
|
{
|
|
struct super_block *sb;
|
|
struct buffer_head *inode_bitmap_bh = NULL;
|
|
struct buffer_head *group_desc_bh;
|
|
ext4_group_t ngroups, group = 0;
|
|
unsigned long ino = 0;
|
|
struct inode *inode;
|
|
struct ext4_group_desc *gdp = NULL;
|
|
struct ext4_inode_info *ei;
|
|
struct ext4_sb_info *sbi;
|
|
int ret2, err;
|
|
struct inode *ret;
|
|
ext4_group_t i;
|
|
ext4_group_t flex_group;
|
|
struct ext4_group_info *grp = NULL;
|
|
bool encrypt = false;
|
|
|
|
/* Cannot create files in a deleted directory */
|
|
if (!dir || !dir->i_nlink)
|
|
return ERR_PTR(-EPERM);
|
|
|
|
sb = dir->i_sb;
|
|
sbi = EXT4_SB(sb);
|
|
|
|
if (unlikely(ext4_forced_shutdown(sbi)))
|
|
return ERR_PTR(-EIO);
|
|
|
|
ngroups = ext4_get_groups_count(sb);
|
|
trace_ext4_request_inode(dir, mode);
|
|
inode = new_inode(sb);
|
|
if (!inode)
|
|
return ERR_PTR(-ENOMEM);
|
|
ei = EXT4_I(inode);
|
|
|
|
/*
|
|
* Initialize owners and quota early so that we don't have to account
|
|
* for quota initialization worst case in standard inode creating
|
|
* transaction
|
|
*/
|
|
if (owner) {
|
|
inode->i_mode = mode;
|
|
i_uid_write(inode, owner[0]);
|
|
i_gid_write(inode, owner[1]);
|
|
} else if (test_opt(sb, GRPID)) {
|
|
inode->i_mode = mode;
|
|
inode_fsuid_set(inode, mnt_userns);
|
|
inode->i_gid = dir->i_gid;
|
|
} else
|
|
inode_init_owner(mnt_userns, inode, dir, mode);
|
|
|
|
if (ext4_has_feature_project(sb) &&
|
|
ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT))
|
|
ei->i_projid = EXT4_I(dir)->i_projid;
|
|
else
|
|
ei->i_projid = make_kprojid(&init_user_ns, EXT4_DEF_PROJID);
|
|
|
|
if (!(i_flags & EXT4_EA_INODE_FL)) {
|
|
err = fscrypt_prepare_new_inode(dir, inode, &encrypt);
|
|
if (err)
|
|
goto out;
|
|
}
|
|
|
|
err = dquot_initialize(inode);
|
|
if (err)
|
|
goto out;
|
|
|
|
if (!handle && sbi->s_journal && !(i_flags & EXT4_EA_INODE_FL)) {
|
|
ret2 = ext4_xattr_credits_for_new_inode(dir, mode, encrypt);
|
|
if (ret2 < 0) {
|
|
err = ret2;
|
|
goto out;
|
|
}
|
|
nblocks += ret2;
|
|
}
|
|
|
|
if (!goal)
|
|
goal = sbi->s_inode_goal;
|
|
|
|
if (goal && goal <= le32_to_cpu(sbi->s_es->s_inodes_count)) {
|
|
group = (goal - 1) / EXT4_INODES_PER_GROUP(sb);
|
|
ino = (goal - 1) % EXT4_INODES_PER_GROUP(sb);
|
|
ret2 = 0;
|
|
goto got_group;
|
|
}
|
|
|
|
if (S_ISDIR(mode))
|
|
ret2 = find_group_orlov(sb, dir, &group, mode, qstr);
|
|
else
|
|
ret2 = find_group_other(sb, dir, &group, mode);
|
|
|
|
got_group:
|
|
EXT4_I(dir)->i_last_alloc_group = group;
|
|
err = -ENOSPC;
|
|
if (ret2 == -1)
|
|
goto out;
|
|
|
|
/*
|
|
* Normally we will only go through one pass of this loop,
|
|
* unless we get unlucky and it turns out the group we selected
|
|
* had its last inode grabbed by someone else.
|
|
*/
|
|
for (i = 0; i < ngroups; i++, ino = 0) {
|
|
err = -EIO;
|
|
|
|
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
|
|
if (!gdp)
|
|
goto out;
|
|
|
|
/*
|
|
* Check free inodes count before loading bitmap.
|
|
*/
|
|
if (ext4_free_inodes_count(sb, gdp) == 0)
|
|
goto next_group;
|
|
|
|
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
|
|
grp = ext4_get_group_info(sb, group);
|
|
/*
|
|
* Skip groups with already-known suspicious inode
|
|
* tables
|
|
*/
|
|
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
|
|
goto next_group;
|
|
}
|
|
|
|
brelse(inode_bitmap_bh);
|
|
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
|
|
/* Skip groups with suspicious inode tables */
|
|
if (((!(sbi->s_mount_state & EXT4_FC_REPLAY))
|
|
&& EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) ||
|
|
IS_ERR(inode_bitmap_bh)) {
|
|
inode_bitmap_bh = NULL;
|
|
goto next_group;
|
|
}
|
|
|
|
repeat_in_this_group:
|
|
ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
|
|
if (!ret2)
|
|
goto next_group;
|
|
|
|
if (group == 0 && (ino + 1) < EXT4_FIRST_INO(sb)) {
|
|
ext4_error(sb, "reserved inode found cleared - "
|
|
"inode=%lu", ino + 1);
|
|
ext4_mark_group_bitmap_corrupted(sb, group,
|
|
EXT4_GROUP_INFO_IBITMAP_CORRUPT);
|
|
goto next_group;
|
|
}
|
|
|
|
if ((!(sbi->s_mount_state & EXT4_FC_REPLAY)) && !handle) {
|
|
BUG_ON(nblocks <= 0);
|
|
handle = __ext4_journal_start_sb(dir->i_sb, line_no,
|
|
handle_type, nblocks, 0,
|
|
ext4_trans_default_revoke_credits(sb));
|
|
if (IS_ERR(handle)) {
|
|
err = PTR_ERR(handle);
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
}
|
|
BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
|
|
err = ext4_journal_get_write_access(handle, sb, inode_bitmap_bh,
|
|
EXT4_JTR_NONE);
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
ext4_lock_group(sb, group);
|
|
ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
|
|
if (ret2) {
|
|
/* Someone already took the bit. Repeat the search
|
|
* with lock held.
|
|
*/
|
|
ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
|
|
if (ret2) {
|
|
ext4_set_bit(ino, inode_bitmap_bh->b_data);
|
|
ret2 = 0;
|
|
} else {
|
|
ret2 = 1; /* we didn't grab the inode */
|
|
}
|
|
}
|
|
ext4_unlock_group(sb, group);
|
|
ino++; /* the inode bitmap is zero-based */
|
|
if (!ret2)
|
|
goto got; /* we grabbed the inode! */
|
|
|
|
if (ino < EXT4_INODES_PER_GROUP(sb))
|
|
goto repeat_in_this_group;
|
|
next_group:
|
|
if (++group == ngroups)
|
|
group = 0;
|
|
}
|
|
err = -ENOSPC;
|
|
goto out;
|
|
|
|
got:
|
|
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
|
|
err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
|
|
BUFFER_TRACE(group_desc_bh, "get_write_access");
|
|
err = ext4_journal_get_write_access(handle, sb, group_desc_bh,
|
|
EXT4_JTR_NONE);
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
|
|
/* We may have to initialize the block bitmap if it isn't already */
|
|
if (ext4_has_group_desc_csum(sb) &&
|
|
gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
|
|
struct buffer_head *block_bitmap_bh;
|
|
|
|
block_bitmap_bh = ext4_read_block_bitmap(sb, group);
|
|
if (IS_ERR(block_bitmap_bh)) {
|
|
err = PTR_ERR(block_bitmap_bh);
|
|
goto out;
|
|
}
|
|
BUFFER_TRACE(block_bitmap_bh, "get block bitmap access");
|
|
err = ext4_journal_get_write_access(handle, sb, block_bitmap_bh,
|
|
EXT4_JTR_NONE);
|
|
if (err) {
|
|
brelse(block_bitmap_bh);
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
|
|
BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
|
|
err = ext4_handle_dirty_metadata(handle, NULL, block_bitmap_bh);
|
|
|
|
/* recheck and clear flag under lock if we still need to */
|
|
ext4_lock_group(sb, group);
|
|
if (ext4_has_group_desc_csum(sb) &&
|
|
(gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
|
|
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
|
|
ext4_free_group_clusters_set(sb, gdp,
|
|
ext4_free_clusters_after_init(sb, group, gdp));
|
|
ext4_block_bitmap_csum_set(sb, group, gdp,
|
|
block_bitmap_bh);
|
|
ext4_group_desc_csum_set(sb, group, gdp);
|
|
}
|
|
ext4_unlock_group(sb, group);
|
|
brelse(block_bitmap_bh);
|
|
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
}
|
|
|
|
/* Update the relevant bg descriptor fields */
|
|
if (ext4_has_group_desc_csum(sb)) {
|
|
int free;
|
|
struct ext4_group_info *grp = NULL;
|
|
|
|
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
|
|
grp = ext4_get_group_info(sb, group);
|
|
down_read(&grp->alloc_sem); /*
|
|
* protect vs itable
|
|
* lazyinit
|
|
*/
|
|
}
|
|
ext4_lock_group(sb, group); /* while we modify the bg desc */
|
|
free = EXT4_INODES_PER_GROUP(sb) -
|
|
ext4_itable_unused_count(sb, gdp);
|
|
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
|
|
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
|
|
free = 0;
|
|
}
|
|
/*
|
|
* Check the relative inode number against the last used
|
|
* relative inode number in this group. if it is greater
|
|
* we need to update the bg_itable_unused count
|
|
*/
|
|
if (ino > free)
|
|
ext4_itable_unused_set(sb, gdp,
|
|
(EXT4_INODES_PER_GROUP(sb) - ino));
|
|
if (!(sbi->s_mount_state & EXT4_FC_REPLAY))
|
|
up_read(&grp->alloc_sem);
|
|
} else {
|
|
ext4_lock_group(sb, group);
|
|
}
|
|
|
|
ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
|
|
if (S_ISDIR(mode)) {
|
|
ext4_used_dirs_set(sb, gdp, ext4_used_dirs_count(sb, gdp) + 1);
|
|
if (sbi->s_log_groups_per_flex) {
|
|
ext4_group_t f = ext4_flex_group(sbi, group);
|
|
|
|
atomic_inc(&sbi_array_rcu_deref(sbi, s_flex_groups,
|
|
f)->used_dirs);
|
|
}
|
|
}
|
|
if (ext4_has_group_desc_csum(sb)) {
|
|
ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
|
|
EXT4_INODES_PER_GROUP(sb) / 8);
|
|
ext4_group_desc_csum_set(sb, group, gdp);
|
|
}
|
|
ext4_unlock_group(sb, group);
|
|
|
|
BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
|
|
err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto out;
|
|
}
|
|
|
|
percpu_counter_dec(&sbi->s_freeinodes_counter);
|
|
if (S_ISDIR(mode))
|
|
percpu_counter_inc(&sbi->s_dirs_counter);
|
|
|
|
if (sbi->s_log_groups_per_flex) {
|
|
flex_group = ext4_flex_group(sbi, group);
|
|
atomic_dec(&sbi_array_rcu_deref(sbi, s_flex_groups,
|
|
flex_group)->free_inodes);
|
|
}
|
|
|
|
inode->i_ino = ino + group * EXT4_INODES_PER_GROUP(sb);
|
|
/* This is the optimal IO size (for stat), not the fs block size */
|
|
inode->i_blocks = 0;
|
|
inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
|
|
ei->i_crtime = inode->i_mtime;
|
|
|
|
memset(ei->i_data, 0, sizeof(ei->i_data));
|
|
ei->i_dir_start_lookup = 0;
|
|
ei->i_disksize = 0;
|
|
|
|
/* Don't inherit extent flag from directory, amongst others. */
|
|
ei->i_flags =
|
|
ext4_mask_flags(mode, EXT4_I(dir)->i_flags & EXT4_FL_INHERITED);
|
|
ei->i_flags |= i_flags;
|
|
ei->i_file_acl = 0;
|
|
ei->i_dtime = 0;
|
|
ei->i_block_group = group;
|
|
ei->i_last_alloc_group = ~0;
|
|
|
|
ext4_set_inode_flags(inode, true);
|
|
if (IS_DIRSYNC(inode))
|
|
ext4_handle_sync(handle);
|
|
if (insert_inode_locked(inode) < 0) {
|
|
/*
|
|
* Likely a bitmap corruption causing inode to be allocated
|
|
* twice.
|
|
*/
|
|
err = -EIO;
|
|
ext4_error(sb, "failed to insert inode %lu: doubly allocated?",
|
|
inode->i_ino);
|
|
ext4_mark_group_bitmap_corrupted(sb, group,
|
|
EXT4_GROUP_INFO_IBITMAP_CORRUPT);
|
|
goto out;
|
|
}
|
|
inode->i_generation = get_random_u32();
|
|
|
|
/* Precompute checksum seed for inode metadata */
|
|
if (ext4_has_metadata_csum(sb)) {
|
|
__u32 csum;
|
|
__le32 inum = cpu_to_le32(inode->i_ino);
|
|
__le32 gen = cpu_to_le32(inode->i_generation);
|
|
csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&inum,
|
|
sizeof(inum));
|
|
ei->i_csum_seed = ext4_chksum(sbi, csum, (__u8 *)&gen,
|
|
sizeof(gen));
|
|
}
|
|
|
|
ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
|
|
ext4_set_inode_state(inode, EXT4_STATE_NEW);
|
|
|
|
ei->i_extra_isize = sbi->s_want_extra_isize;
|
|
ei->i_inline_off = 0;
|
|
if (ext4_has_feature_inline_data(sb) &&
|
|
(!(ei->i_flags & EXT4_DAX_FL) || S_ISDIR(mode)))
|
|
ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
|
|
ret = inode;
|
|
err = dquot_alloc_inode(inode);
|
|
if (err)
|
|
goto fail_drop;
|
|
|
|
/*
|
|
* Since the encryption xattr will always be unique, create it first so
|
|
* that it's less likely to end up in an external xattr block and
|
|
* prevent its deduplication.
|
|
*/
|
|
if (encrypt) {
|
|
err = fscrypt_set_context(inode, handle);
|
|
if (err)
|
|
goto fail_free_drop;
|
|
}
|
|
|
|
if (!(ei->i_flags & EXT4_EA_INODE_FL)) {
|
|
err = ext4_init_acl(handle, inode, dir);
|
|
if (err)
|
|
goto fail_free_drop;
|
|
|
|
err = ext4_init_security(handle, inode, dir, qstr);
|
|
if (err)
|
|
goto fail_free_drop;
|
|
}
|
|
|
|
if (ext4_has_feature_extents(sb)) {
|
|
/* set extent flag only for directory, file and normal symlink*/
|
|
if (S_ISDIR(mode) || S_ISREG(mode) || S_ISLNK(mode)) {
|
|
ext4_set_inode_flag(inode, EXT4_INODE_EXTENTS);
|
|
ext4_ext_tree_init(handle, inode);
|
|
}
|
|
}
|
|
|
|
if (ext4_handle_valid(handle)) {
|
|
ei->i_sync_tid = handle->h_transaction->t_tid;
|
|
ei->i_datasync_tid = handle->h_transaction->t_tid;
|
|
}
|
|
|
|
err = ext4_mark_inode_dirty(handle, inode);
|
|
if (err) {
|
|
ext4_std_error(sb, err);
|
|
goto fail_free_drop;
|
|
}
|
|
|
|
ext4_debug("allocating inode %lu\n", inode->i_ino);
|
|
trace_ext4_allocate_inode(inode, dir, mode);
|
|
brelse(inode_bitmap_bh);
|
|
return ret;
|
|
|
|
fail_free_drop:
|
|
dquot_free_inode(inode);
|
|
fail_drop:
|
|
clear_nlink(inode);
|
|
unlock_new_inode(inode);
|
|
out:
|
|
dquot_drop(inode);
|
|
inode->i_flags |= S_NOQUOTA;
|
|
iput(inode);
|
|
brelse(inode_bitmap_bh);
|
|
return ERR_PTR(err);
|
|
}
|
|
|
|
/* Verify that we are loading a valid orphan from disk */
|
|
struct inode *ext4_orphan_get(struct super_block *sb, unsigned long ino)
|
|
{
|
|
unsigned long max_ino = le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count);
|
|
ext4_group_t block_group;
|
|
int bit;
|
|
struct buffer_head *bitmap_bh = NULL;
|
|
struct inode *inode = NULL;
|
|
int err = -EFSCORRUPTED;
|
|
|
|
if (ino < EXT4_FIRST_INO(sb) || ino > max_ino)
|
|
goto bad_orphan;
|
|
|
|
block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
|
|
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
|
|
bitmap_bh = ext4_read_inode_bitmap(sb, block_group);
|
|
if (IS_ERR(bitmap_bh))
|
|
return ERR_CAST(bitmap_bh);
|
|
|
|
/* Having the inode bit set should be a 100% indicator that this
|
|
* is a valid orphan (no e2fsck run on fs). Orphans also include
|
|
* inodes that were being truncated, so we can't check i_nlink==0.
|
|
*/
|
|
if (!ext4_test_bit(bit, bitmap_bh->b_data))
|
|
goto bad_orphan;
|
|
|
|
inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL);
|
|
if (IS_ERR(inode)) {
|
|
err = PTR_ERR(inode);
|
|
ext4_error_err(sb, -err,
|
|
"couldn't read orphan inode %lu (err %d)",
|
|
ino, err);
|
|
brelse(bitmap_bh);
|
|
return inode;
|
|
}
|
|
|
|
/*
|
|
* If the orphans has i_nlinks > 0 then it should be able to
|
|
* be truncated, otherwise it won't be removed from the orphan
|
|
* list during processing and an infinite loop will result.
|
|
* Similarly, it must not be a bad inode.
|
|
*/
|
|
if ((inode->i_nlink && !ext4_can_truncate(inode)) ||
|
|
is_bad_inode(inode))
|
|
goto bad_orphan;
|
|
|
|
if (NEXT_ORPHAN(inode) > max_ino)
|
|
goto bad_orphan;
|
|
brelse(bitmap_bh);
|
|
return inode;
|
|
|
|
bad_orphan:
|
|
ext4_error(sb, "bad orphan inode %lu", ino);
|
|
if (bitmap_bh)
|
|
printk(KERN_ERR "ext4_test_bit(bit=%d, block=%llu) = %d\n",
|
|
bit, (unsigned long long)bitmap_bh->b_blocknr,
|
|
ext4_test_bit(bit, bitmap_bh->b_data));
|
|
if (inode) {
|
|
printk(KERN_ERR "is_bad_inode(inode)=%d\n",
|
|
is_bad_inode(inode));
|
|
printk(KERN_ERR "NEXT_ORPHAN(inode)=%u\n",
|
|
NEXT_ORPHAN(inode));
|
|
printk(KERN_ERR "max_ino=%lu\n", max_ino);
|
|
printk(KERN_ERR "i_nlink=%u\n", inode->i_nlink);
|
|
/* Avoid freeing blocks if we got a bad deleted inode */
|
|
if (inode->i_nlink == 0)
|
|
inode->i_blocks = 0;
|
|
iput(inode);
|
|
}
|
|
brelse(bitmap_bh);
|
|
return ERR_PTR(err);
|
|
}
|
|
|
|
unsigned long ext4_count_free_inodes(struct super_block *sb)
|
|
{
|
|
unsigned long desc_count;
|
|
struct ext4_group_desc *gdp;
|
|
ext4_group_t i, ngroups = ext4_get_groups_count(sb);
|
|
#ifdef EXT4FS_DEBUG
|
|
struct ext4_super_block *es;
|
|
unsigned long bitmap_count, x;
|
|
struct buffer_head *bitmap_bh = NULL;
|
|
|
|
es = EXT4_SB(sb)->s_es;
|
|
desc_count = 0;
|
|
bitmap_count = 0;
|
|
gdp = NULL;
|
|
for (i = 0; i < ngroups; i++) {
|
|
gdp = ext4_get_group_desc(sb, i, NULL);
|
|
if (!gdp)
|
|
continue;
|
|
desc_count += ext4_free_inodes_count(sb, gdp);
|
|
brelse(bitmap_bh);
|
|
bitmap_bh = ext4_read_inode_bitmap(sb, i);
|
|
if (IS_ERR(bitmap_bh)) {
|
|
bitmap_bh = NULL;
|
|
continue;
|
|
}
|
|
|
|
x = ext4_count_free(bitmap_bh->b_data,
|
|
EXT4_INODES_PER_GROUP(sb) / 8);
|
|
printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n",
|
|
(unsigned long) i, ext4_free_inodes_count(sb, gdp), x);
|
|
bitmap_count += x;
|
|
}
|
|
brelse(bitmap_bh);
|
|
printk(KERN_DEBUG "ext4_count_free_inodes: "
|
|
"stored = %u, computed = %lu, %lu\n",
|
|
le32_to_cpu(es->s_free_inodes_count), desc_count, bitmap_count);
|
|
return desc_count;
|
|
#else
|
|
desc_count = 0;
|
|
for (i = 0; i < ngroups; i++) {
|
|
gdp = ext4_get_group_desc(sb, i, NULL);
|
|
if (!gdp)
|
|
continue;
|
|
desc_count += ext4_free_inodes_count(sb, gdp);
|
|
cond_resched();
|
|
}
|
|
return desc_count;
|
|
#endif
|
|
}
|
|
|
|
/* Called at mount-time, super-block is locked */
|
|
unsigned long ext4_count_dirs(struct super_block * sb)
|
|
{
|
|
unsigned long count = 0;
|
|
ext4_group_t i, ngroups = ext4_get_groups_count(sb);
|
|
|
|
for (i = 0; i < ngroups; i++) {
|
|
struct ext4_group_desc *gdp = ext4_get_group_desc(sb, i, NULL);
|
|
if (!gdp)
|
|
continue;
|
|
count += ext4_used_dirs_count(sb, gdp);
|
|
}
|
|
return count;
|
|
}
|
|
|
|
/*
|
|
* Zeroes not yet zeroed inode table - just write zeroes through the whole
|
|
* inode table. Must be called without any spinlock held. The only place
|
|
* where it is called from on active part of filesystem is ext4lazyinit
|
|
* thread, so we do not need any special locks, however we have to prevent
|
|
* inode allocation from the current group, so we take alloc_sem lock, to
|
|
* block ext4_new_inode() until we are finished.
|
|
*/
|
|
int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
|
|
int barrier)
|
|
{
|
|
struct ext4_group_info *grp = ext4_get_group_info(sb, group);
|
|
struct ext4_sb_info *sbi = EXT4_SB(sb);
|
|
struct ext4_group_desc *gdp = NULL;
|
|
struct buffer_head *group_desc_bh;
|
|
handle_t *handle;
|
|
ext4_fsblk_t blk;
|
|
int num, ret = 0, used_blks = 0;
|
|
unsigned long used_inos = 0;
|
|
|
|
/* This should not happen, but just to be sure check this */
|
|
if (sb_rdonly(sb)) {
|
|
ret = 1;
|
|
goto out;
|
|
}
|
|
|
|
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
|
|
if (!gdp)
|
|
goto out;
|
|
|
|
/*
|
|
* We do not need to lock this, because we are the only one
|
|
* handling this flag.
|
|
*/
|
|
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))
|
|
goto out;
|
|
|
|
handle = ext4_journal_start_sb(sb, EXT4_HT_MISC, 1);
|
|
if (IS_ERR(handle)) {
|
|
ret = PTR_ERR(handle);
|
|
goto out;
|
|
}
|
|
|
|
down_write(&grp->alloc_sem);
|
|
/*
|
|
* If inode bitmap was already initialized there may be some
|
|
* used inodes so we need to skip blocks with used inodes in
|
|
* inode table.
|
|
*/
|
|
if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT))) {
|
|
used_inos = EXT4_INODES_PER_GROUP(sb) -
|
|
ext4_itable_unused_count(sb, gdp);
|
|
used_blks = DIV_ROUND_UP(used_inos, sbi->s_inodes_per_block);
|
|
|
|
/* Bogus inode unused count? */
|
|
if (used_blks < 0 || used_blks > sbi->s_itb_per_group) {
|
|
ext4_error(sb, "Something is wrong with group %u: "
|
|
"used itable blocks: %d; "
|
|
"itable unused count: %u",
|
|
group, used_blks,
|
|
ext4_itable_unused_count(sb, gdp));
|
|
ret = 1;
|
|
goto err_out;
|
|
}
|
|
|
|
used_inos += group * EXT4_INODES_PER_GROUP(sb);
|
|
/*
|
|
* Are there some uninitialized inodes in the inode table
|
|
* before the first normal inode?
|
|
*/
|
|
if ((used_blks != sbi->s_itb_per_group) &&
|
|
(used_inos < EXT4_FIRST_INO(sb))) {
|
|
ext4_error(sb, "Something is wrong with group %u: "
|
|
"itable unused count: %u; "
|
|
"itables initialized count: %ld",
|
|
group, ext4_itable_unused_count(sb, gdp),
|
|
used_inos);
|
|
ret = 1;
|
|
goto err_out;
|
|
}
|
|
}
|
|
|
|
blk = ext4_inode_table(sb, gdp) + used_blks;
|
|
num = sbi->s_itb_per_group - used_blks;
|
|
|
|
BUFFER_TRACE(group_desc_bh, "get_write_access");
|
|
ret = ext4_journal_get_write_access(handle, sb, group_desc_bh,
|
|
EXT4_JTR_NONE);
|
|
if (ret)
|
|
goto err_out;
|
|
|
|
/*
|
|
* Skip zeroout if the inode table is full. But we set the ZEROED
|
|
* flag anyway, because obviously, when it is full it does not need
|
|
* further zeroing.
|
|
*/
|
|
if (unlikely(num == 0))
|
|
goto skip_zeroout;
|
|
|
|
ext4_debug("going to zero out inode table in group %d\n",
|
|
group);
|
|
ret = sb_issue_zeroout(sb, blk, num, GFP_NOFS);
|
|
if (ret < 0)
|
|
goto err_out;
|
|
if (barrier)
|
|
blkdev_issue_flush(sb->s_bdev);
|
|
|
|
skip_zeroout:
|
|
ext4_lock_group(sb, group);
|
|
gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
|
|
ext4_group_desc_csum_set(sb, group, gdp);
|
|
ext4_unlock_group(sb, group);
|
|
|
|
BUFFER_TRACE(group_desc_bh,
|
|
"call ext4_handle_dirty_metadata");
|
|
ret = ext4_handle_dirty_metadata(handle, NULL,
|
|
group_desc_bh);
|
|
|
|
err_out:
|
|
up_write(&grp->alloc_sem);
|
|
ext4_journal_stop(handle);
|
|
out:
|
|
return ret;
|
|
}
|