2018-09-12 01:16:07 +00:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-11-29 04:28:09 +00:00
|
|
|
/*
|
2012-11-02 08:13:32 +00:00
|
|
|
* fs/f2fs/recovery.c
|
|
|
|
*
|
|
|
|
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
|
|
|
|
* http://www.samsung.com/
|
|
|
|
*/
|
2020-11-19 06:09:04 +00:00
|
|
|
#include <asm/unaligned.h>
|
2012-11-02 08:13:32 +00:00
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/f2fs_fs.h>
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14 22:07:14 +00:00
|
|
|
#include <linux/sched/mm.h>
|
2012-11-02 08:13:32 +00:00
|
|
|
#include "f2fs.h"
|
|
|
|
#include "node.h"
|
|
|
|
#include "segment.h"
|
|
|
|
|
2014-09-15 23:46:08 +00:00
|
|
|
/*
|
|
|
|
* Roll forward recovery scenarios.
|
|
|
|
*
|
|
|
|
* [Term] F: fsync_mark, D: dentry_mark
|
|
|
|
*
|
|
|
|
* 1. inode(x) | CP | inode(x) | dnode(F)
|
|
|
|
* -> Update the latest inode(x).
|
|
|
|
*
|
|
|
|
* 2. inode(x) | CP | inode(F) | dnode(F)
|
|
|
|
* -> No problem.
|
|
|
|
*
|
|
|
|
* 3. inode(x) | CP | dnode(F) | inode(x)
|
|
|
|
* -> Recover to the latest dnode(F), and drop the last inode(x)
|
|
|
|
*
|
|
|
|
* 4. inode(x) | CP | dnode(F) | inode(F)
|
|
|
|
* -> No problem.
|
|
|
|
*
|
|
|
|
* 5. CP | inode(x) | dnode(F)
|
|
|
|
* -> The inode(DF) was missing. Should drop this dnode(F).
|
|
|
|
*
|
|
|
|
* 6. CP | inode(DF) | dnode(F)
|
|
|
|
* -> No problem.
|
|
|
|
*
|
|
|
|
* 7. CP | dnode(F) | inode(DF)
|
|
|
|
* -> If f2fs_iget fails, then goto next to find inode(DF).
|
|
|
|
*
|
|
|
|
* 8. CP | dnode(F) | inode(x)
|
|
|
|
* -> If f2fs_iget fails, then goto next to find inode(DF).
|
|
|
|
* But it will fail due to no inode(DF).
|
|
|
|
*/
|
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
static struct kmem_cache *fsync_entry_slab;
|
|
|
|
|
2022-01-18 06:56:14 +00:00
|
|
|
#if IS_ENABLED(CONFIG_UNICODE)
|
2021-06-10 23:46:30 +00:00
|
|
|
extern struct kmem_cache *f2fs_cf_name_slab;
|
|
|
|
#endif
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
2016-05-16 18:06:50 +00:00
|
|
|
s64 nalloc = percpu_counter_sum_positive(&sbi->alloc_valid_block_count);
|
|
|
|
|
|
|
|
if (sbi->last_valid_block_count + nalloc > sbi->user_block_count)
|
2012-11-02 08:13:32 +00:00
|
|
|
return false;
|
2022-01-27 21:31:43 +00:00
|
|
|
if (NM_I(sbi)->max_rf_node_blocks &&
|
|
|
|
percpu_counter_sum_positive(&sbi->rf_node_block_count) >=
|
|
|
|
NM_I(sbi)->max_rf_node_blocks)
|
|
|
|
return false;
|
2012-11-02 08:13:32 +00:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct fsync_inode_entry *get_fsync_inode(struct list_head *head,
|
|
|
|
nid_t ino)
|
|
|
|
{
|
|
|
|
struct fsync_inode_entry *entry;
|
|
|
|
|
2014-03-29 03:33:17 +00:00
|
|
|
list_for_each_entry(entry, head, list)
|
2012-11-02 08:13:32 +00:00
|
|
|
if (entry->inode->i_ino == ino)
|
|
|
|
return entry;
|
2014-03-29 03:33:17 +00:00
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2016-09-09 23:48:15 +00:00
|
|
|
static struct fsync_inode_entry *add_fsync_inode(struct f2fs_sb_info *sbi,
|
2017-08-08 02:54:31 +00:00
|
|
|
struct list_head *head, nid_t ino, bool quota_inode)
|
2016-04-29 12:13:37 +00:00
|
|
|
{
|
2016-09-09 23:59:39 +00:00
|
|
|
struct inode *inode;
|
2016-04-29 12:13:37 +00:00
|
|
|
struct fsync_inode_entry *entry;
|
2017-08-08 02:54:31 +00:00
|
|
|
int err;
|
2016-04-29 12:13:37 +00:00
|
|
|
|
2016-09-09 23:59:39 +00:00
|
|
|
inode = f2fs_iget_retry(sbi->sb, ino);
|
2016-09-09 23:48:15 +00:00
|
|
|
if (IS_ERR(inode))
|
|
|
|
return ERR_CAST(inode);
|
|
|
|
|
2021-10-28 13:03:05 +00:00
|
|
|
err = f2fs_dquot_initialize(inode);
|
2017-08-08 02:54:31 +00:00
|
|
|
if (err)
|
|
|
|
goto err_out;
|
|
|
|
|
|
|
|
if (quota_inode) {
|
|
|
|
err = dquot_alloc_inode(inode);
|
|
|
|
if (err)
|
|
|
|
goto err_out;
|
|
|
|
}
|
|
|
|
|
2021-08-09 00:24:48 +00:00
|
|
|
entry = f2fs_kmem_cache_alloc(fsync_entry_slab,
|
|
|
|
GFP_F2FS_ZERO, true, NULL);
|
2016-04-29 12:13:37 +00:00
|
|
|
entry->inode = inode;
|
|
|
|
list_add_tail(&entry->list, head);
|
|
|
|
|
|
|
|
return entry;
|
2017-08-08 02:54:31 +00:00
|
|
|
err_out:
|
|
|
|
iput(inode);
|
|
|
|
return ERR_PTR(err);
|
2016-04-29 12:13:37 +00:00
|
|
|
}
|
|
|
|
|
2018-10-12 10:49:26 +00:00
|
|
|
static void del_fsync_inode(struct fsync_inode_entry *entry, int drop)
|
2016-04-29 12:13:37 +00:00
|
|
|
{
|
2018-10-12 10:49:26 +00:00
|
|
|
if (drop) {
|
|
|
|
/* inode should not be recovered, drop it */
|
|
|
|
f2fs_inode_synced(entry->inode);
|
|
|
|
}
|
2016-04-29 12:13:37 +00:00
|
|
|
iput(entry->inode);
|
|
|
|
list_del(&entry->list);
|
|
|
|
kmem_cache_free(fsync_entry_slab, entry);
|
|
|
|
}
|
|
|
|
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 07:59:04 +00:00
|
|
|
static int init_recovered_filename(const struct inode *dir,
|
|
|
|
struct f2fs_inode *raw_inode,
|
|
|
|
struct f2fs_filename *fname,
|
|
|
|
struct qstr *usr_fname)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
memset(fname, 0, sizeof(*fname));
|
|
|
|
fname->disk_name.len = le32_to_cpu(raw_inode->i_namelen);
|
|
|
|
fname->disk_name.name = raw_inode->i_name;
|
|
|
|
|
|
|
|
if (WARN_ON(fname->disk_name.len > F2FS_NAME_LEN))
|
|
|
|
return -ENAMETOOLONG;
|
|
|
|
|
|
|
|
if (!IS_ENCRYPTED(dir)) {
|
|
|
|
usr_fname->name = fname->disk_name.name;
|
|
|
|
usr_fname->len = fname->disk_name.len;
|
|
|
|
fname->usr_fname = usr_fname;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Compute the hash of the filename */
|
2020-11-19 06:09:04 +00:00
|
|
|
if (IS_ENCRYPTED(dir) && IS_CASEFOLDED(dir)) {
|
|
|
|
/*
|
|
|
|
* In this case the hash isn't computable without the key, so it
|
|
|
|
* was saved on-disk.
|
|
|
|
*/
|
|
|
|
if (fname->disk_name.len + sizeof(f2fs_hash_t) > F2FS_NAME_LEN)
|
|
|
|
return -EINVAL;
|
|
|
|
fname->hash = get_unaligned((f2fs_hash_t *)
|
|
|
|
&raw_inode->i_name[fname->disk_name.len]);
|
|
|
|
} else if (IS_CASEFOLDED(dir)) {
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 07:59:04 +00:00
|
|
|
err = f2fs_init_casefolded_name(dir, fname);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
f2fs_hash_filename(dir, fname);
|
2022-01-18 06:56:14 +00:00
|
|
|
#if IS_ENABLED(CONFIG_UNICODE)
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 07:59:04 +00:00
|
|
|
/* Case-sensitive match is fine for recovery */
|
2021-06-10 23:46:30 +00:00
|
|
|
kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 07:59:04 +00:00
|
|
|
fname->cf_name.name = NULL;
|
|
|
|
#endif
|
|
|
|
} else {
|
|
|
|
f2fs_hash_filename(dir, fname);
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
static int recover_dentry(struct inode *inode, struct page *ipage,
|
|
|
|
struct list_head *dir_list)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
2013-12-26 07:30:41 +00:00
|
|
|
struct f2fs_inode *raw_inode = F2FS_INODE(ipage);
|
2013-05-15 07:40:02 +00:00
|
|
|
nid_t pino = le32_to_cpu(raw_inode->i_pino);
|
2013-05-28 00:19:22 +00:00
|
|
|
struct f2fs_dir_entry *de;
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 07:59:04 +00:00
|
|
|
struct f2fs_filename fname;
|
|
|
|
struct qstr usr_fname;
|
2012-11-02 08:13:32 +00:00
|
|
|
struct page *page;
|
2013-05-28 00:19:22 +00:00
|
|
|
struct inode *dir, *einode;
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
struct fsync_inode_entry *entry;
|
2012-11-02 08:13:32 +00:00
|
|
|
int err = 0;
|
2016-08-29 03:27:56 +00:00
|
|
|
char *name;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
entry = get_fsync_inode(dir_list, pino);
|
|
|
|
if (!entry) {
|
2017-08-08 02:54:31 +00:00
|
|
|
entry = add_fsync_inode(F2FS_I_SB(inode), dir_list,
|
|
|
|
pino, false);
|
2016-09-09 23:48:15 +00:00
|
|
|
if (IS_ERR(entry)) {
|
|
|
|
dir = ERR_CAST(entry);
|
|
|
|
err = PTR_ERR(entry);
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
goto out;
|
|
|
|
}
|
2014-04-15 02:19:28 +00:00
|
|
|
}
|
|
|
|
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
dir = entry->inode;
|
f2fs: rework filename handling
Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.
For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.
This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.
This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)
The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-07 07:59:04 +00:00
|
|
|
err = init_recovered_filename(dir, raw_inode, &fname, &usr_fname);
|
|
|
|
if (err)
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
goto out;
|
2013-05-28 00:19:22 +00:00
|
|
|
retry:
|
2016-08-29 03:27:56 +00:00
|
|
|
de = __f2fs_find_entry(dir, &fname, &page);
|
2015-04-01 01:03:29 +00:00
|
|
|
if (de && inode->i_ino == le32_to_cpu(de->ino))
|
2018-02-28 12:31:52 +00:00
|
|
|
goto out_put;
|
2015-04-01 01:03:29 +00:00
|
|
|
|
2013-05-28 00:19:22 +00:00
|
|
|
if (de) {
|
2016-09-09 23:59:39 +00:00
|
|
|
einode = f2fs_iget_retry(inode->i_sb, le32_to_cpu(de->ino));
|
2013-05-28 00:19:22 +00:00
|
|
|
if (IS_ERR(einode)) {
|
|
|
|
WARN_ON(1);
|
2014-04-28 09:58:34 +00:00
|
|
|
err = PTR_ERR(einode);
|
|
|
|
if (err == -ENOENT)
|
2013-05-28 00:19:22 +00:00
|
|
|
err = -EEXIST;
|
2018-02-28 12:31:52 +00:00
|
|
|
goto out_put;
|
2013-09-24 14:40:57 +00:00
|
|
|
}
|
2017-08-08 02:54:31 +00:00
|
|
|
|
2021-10-28 13:03:05 +00:00
|
|
|
err = f2fs_dquot_initialize(einode);
|
2017-08-08 02:54:31 +00:00
|
|
|
if (err) {
|
|
|
|
iput(einode);
|
2018-02-28 12:31:52 +00:00
|
|
|
goto out_put;
|
2017-08-08 02:54:31 +00:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
err = f2fs_acquire_orphan_inode(F2FS_I_SB(inode));
|
2013-09-24 14:40:57 +00:00
|
|
|
if (err) {
|
|
|
|
iput(einode);
|
2018-02-28 12:31:52 +00:00
|
|
|
goto out_put;
|
2013-05-28 00:19:22 +00:00
|
|
|
}
|
2014-09-24 10:17:04 +00:00
|
|
|
f2fs_delete_entry(de, page, dir, einode);
|
2013-05-28 00:19:22 +00:00
|
|
|
iput(einode);
|
|
|
|
goto retry;
|
2016-07-19 00:27:47 +00:00
|
|
|
} else if (IS_ERR(page)) {
|
|
|
|
err = PTR_ERR(page);
|
|
|
|
} else {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
err = f2fs_add_dentry(dir, &fname, inode,
|
2016-07-19 00:27:47 +00:00
|
|
|
inode->i_ino, inode->i_mode);
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
2016-09-09 23:59:39 +00:00
|
|
|
if (err == -ENOMEM)
|
|
|
|
goto retry;
|
2013-09-24 14:40:57 +00:00
|
|
|
goto out;
|
|
|
|
|
2018-02-28 12:31:52 +00:00
|
|
|
out_put:
|
2013-09-24 14:40:57 +00:00
|
|
|
f2fs_put_page(page, 0);
|
2012-11-02 08:13:32 +00:00
|
|
|
out:
|
2016-08-29 03:27:56 +00:00
|
|
|
if (file_enc_name(inode))
|
|
|
|
name = "<encrypted>";
|
|
|
|
else
|
|
|
|
name = raw_inode->i_name;
|
2019-06-18 09:48:42 +00:00
|
|
|
f2fs_notice(F2FS_I_SB(inode), "%s: ino = %x, name = %s, dir = %lx, err = %d",
|
|
|
|
__func__, ino_of_node(ipage), name,
|
|
|
|
IS_ERR(dir) ? 0 : dir->i_ino, err);
|
2012-11-02 08:13:32 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
static int recover_quota_data(struct inode *inode, struct page *page)
|
|
|
|
{
|
|
|
|
struct f2fs_inode *raw = F2FS_INODE(page);
|
|
|
|
struct iattr attr;
|
|
|
|
uid_t i_uid = le32_to_cpu(raw->i_uid);
|
|
|
|
gid_t i_gid = le32_to_cpu(raw->i_gid);
|
|
|
|
int err;
|
|
|
|
|
|
|
|
memset(&attr, 0, sizeof(attr));
|
|
|
|
|
2022-06-21 14:14:54 +00:00
|
|
|
attr.ia_vfsuid = VFSUIDT_INIT(make_kuid(inode->i_sb->s_user_ns, i_uid));
|
|
|
|
attr.ia_vfsgid = VFSGIDT_INIT(make_kgid(inode->i_sb->s_user_ns, i_gid));
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
|
2023-01-13 11:49:30 +00:00
|
|
|
if (!vfsuid_eq(attr.ia_vfsuid, i_uid_into_vfsuid(&nop_mnt_idmap, inode)))
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
attr.ia_valid |= ATTR_UID;
|
2023-01-13 11:49:30 +00:00
|
|
|
if (!vfsgid_eq(attr.ia_vfsgid, i_gid_into_vfsgid(&nop_mnt_idmap, inode)))
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
attr.ia_valid |= ATTR_GID;
|
|
|
|
|
|
|
|
if (!attr.ia_valid)
|
|
|
|
return 0;
|
|
|
|
|
2023-01-13 11:49:28 +00:00
|
|
|
err = dquot_transfer(&nop_mnt_idmap, inode, &attr);
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
if (err)
|
|
|
|
set_sbi_flag(F2FS_I_SB(inode), SBI_QUOTA_NEED_REPAIR);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-01-20 04:01:40 +00:00
|
|
|
static void recover_inline_flags(struct inode *inode, struct f2fs_inode *ri)
|
|
|
|
{
|
|
|
|
if (ri->i_inline & F2FS_PIN_FILE)
|
|
|
|
set_inode_flag(inode, FI_PIN_FILE);
|
|
|
|
else
|
|
|
|
clear_inode_flag(inode, FI_PIN_FILE);
|
|
|
|
if (ri->i_inline & F2FS_DATA_EXIST)
|
|
|
|
set_inode_flag(inode, FI_DATA_EXIST);
|
|
|
|
else
|
|
|
|
clear_inode_flag(inode, FI_DATA_EXIST);
|
|
|
|
}
|
|
|
|
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
static int recover_inode(struct inode *inode, struct page *page)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
2014-09-15 23:46:08 +00:00
|
|
|
struct f2fs_inode *raw = F2FS_INODE(page);
|
2015-04-30 00:02:18 +00:00
|
|
|
char *name;
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
int err;
|
2014-09-15 23:46:08 +00:00
|
|
|
|
|
|
|
inode->i_mode = le16_to_cpu(raw->i_mode);
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
|
|
|
|
err = recover_quota_data(inode, page);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2018-09-20 09:41:30 +00:00
|
|
|
i_uid_write(inode, le32_to_cpu(raw->i_uid));
|
|
|
|
i_gid_write(inode, le32_to_cpu(raw->i_gid));
|
2018-09-25 07:35:58 +00:00
|
|
|
|
|
|
|
if (raw->i_inline & F2FS_EXTRA_ATTR) {
|
2018-10-24 10:34:26 +00:00
|
|
|
if (f2fs_sb_has_project_quota(F2FS_I_SB(inode)) &&
|
2018-09-25 07:35:58 +00:00
|
|
|
F2FS_FITS_IN_INODE(raw, le16_to_cpu(raw->i_extra_isize),
|
|
|
|
i_projid)) {
|
|
|
|
projid_t i_projid;
|
2018-09-25 07:36:02 +00:00
|
|
|
kprojid_t kprojid;
|
2018-09-25 07:35:58 +00:00
|
|
|
|
|
|
|
i_projid = (projid_t)le32_to_cpu(raw->i_projid);
|
2018-09-25 07:36:02 +00:00
|
|
|
kprojid = make_kprojid(&init_user_ns, i_projid);
|
|
|
|
|
|
|
|
if (!projid_eq(kprojid, F2FS_I(inode)->i_projid)) {
|
|
|
|
err = f2fs_transfer_project_quota(inode,
|
|
|
|
kprojid);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
F2FS_I(inode)->i_projid = kprojid;
|
|
|
|
}
|
2018-09-25 07:35:58 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-20 16:22:03 +00:00
|
|
|
f2fs_i_size_write(inode, le64_to_cpu(raw->i_size));
|
2023-10-04 18:52:21 +00:00
|
|
|
inode_set_atime(inode, le64_to_cpu(raw->i_atime),
|
|
|
|
le32_to_cpu(raw->i_atime_nsec));
|
2023-07-05 19:01:08 +00:00
|
|
|
inode_set_ctime(inode, le64_to_cpu(raw->i_ctime),
|
|
|
|
le32_to_cpu(raw->i_ctime_nsec));
|
2023-10-04 18:52:21 +00:00
|
|
|
inode_set_mtime(inode, le64_to_cpu(raw->i_mtime),
|
|
|
|
le32_to_cpu(raw->i_mtime_nsec));
|
2013-05-16 06:04:49 +00:00
|
|
|
|
2016-11-28 23:33:38 +00:00
|
|
|
F2FS_I(inode)->i_advise = raw->i_advise;
|
2018-09-25 07:35:59 +00:00
|
|
|
F2FS_I(inode)->i_flags = le32_to_cpu(raw->i_flags);
|
2018-10-06 19:03:38 +00:00
|
|
|
f2fs_set_inode_flags(inode);
|
2018-09-25 07:36:00 +00:00
|
|
|
F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] =
|
|
|
|
le16_to_cpu(raw->i_gc_failures);
|
2016-11-28 23:33:38 +00:00
|
|
|
|
2018-01-20 04:01:40 +00:00
|
|
|
recover_inline_flags(inode, raw);
|
|
|
|
|
2018-09-25 07:36:03 +00:00
|
|
|
f2fs_mark_inode_dirty_sync(inode, true);
|
|
|
|
|
2015-04-30 00:02:18 +00:00
|
|
|
if (file_enc_name(inode))
|
|
|
|
name = "<encrypted>";
|
|
|
|
else
|
|
|
|
name = F2FS_INODE(page)->i_name;
|
|
|
|
|
2019-06-18 09:48:42 +00:00
|
|
|
f2fs_notice(F2FS_I_SB(inode), "recover_inode: ino = %x, name = %s, inline = %x",
|
|
|
|
ino_of_node(page), name, raw->i_inline);
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
return 0;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
|
2022-02-04 00:34:10 +00:00
|
|
|
static unsigned int adjust_por_ra_blocks(struct f2fs_sb_info *sbi,
|
|
|
|
unsigned int ra_blocks, unsigned int blkaddr,
|
|
|
|
unsigned int next_blkaddr)
|
|
|
|
{
|
|
|
|
if (blkaddr + 1 == next_blkaddr)
|
|
|
|
ra_blocks = min_t(unsigned int, RECOVERY_MAX_RA_BLOCKS,
|
|
|
|
ra_blocks * 2);
|
2024-02-06 21:56:27 +00:00
|
|
|
else if (next_blkaddr % BLKS_PER_SEG(sbi))
|
2022-02-04 00:34:10 +00:00
|
|
|
ra_blocks = max_t(unsigned int, RECOVERY_MIN_RA_BLOCKS,
|
|
|
|
ra_blocks / 2);
|
|
|
|
return ra_blocks;
|
|
|
|
}
|
|
|
|
|
2023-05-27 17:06:40 +00:00
|
|
|
/* Detect looped node chain with Floyd's cycle detection algorithm. */
|
|
|
|
static int sanity_check_node_chain(struct f2fs_sb_info *sbi, block_t blkaddr,
|
|
|
|
block_t *blkaddr_fast, bool *is_detecting)
|
|
|
|
{
|
|
|
|
unsigned int ra_blocks = RECOVERY_MAX_RA_BLOCKS;
|
|
|
|
struct page *page = NULL;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!*is_detecting)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (i = 0; i < 2; i++) {
|
|
|
|
if (!f2fs_is_valid_blkaddr(sbi, *blkaddr_fast, META_POR)) {
|
|
|
|
*is_detecting = false;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
page = f2fs_get_tmp_page(sbi, *blkaddr_fast);
|
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
|
|
|
|
|
|
|
if (!is_recoverable_dnode(page)) {
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
*is_detecting = false;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
ra_blocks = adjust_por_ra_blocks(sbi, ra_blocks, *blkaddr_fast,
|
|
|
|
next_blkaddr_of_node(page));
|
|
|
|
|
|
|
|
*blkaddr_fast = next_blkaddr_of_node(page);
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
|
|
|
|
f2fs_ra_meta_pages_cond(sbi, *blkaddr_fast, ra_blocks);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (*blkaddr_fast == blkaddr) {
|
|
|
|
f2fs_notice(sbi, "%s: Detect looped node chain on blkaddr:%u."
|
|
|
|
" Run fsck to fix it.", __func__, blkaddr);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-04-14 22:46:23 +00:00
|
|
|
static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head,
|
|
|
|
bool check_only)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg;
|
2014-09-11 20:49:55 +00:00
|
|
|
struct page *page = NULL;
|
2023-05-27 17:06:40 +00:00
|
|
|
block_t blkaddr, blkaddr_fast;
|
|
|
|
bool is_detecting = true;
|
2012-11-02 08:13:32 +00:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
/* get node pages in the current segment */
|
|
|
|
curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
|
2014-02-27 11:52:21 +00:00
|
|
|
blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
|
2023-05-27 17:06:40 +00:00
|
|
|
blkaddr_fast = blkaddr;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
|
|
|
while (1) {
|
|
|
|
struct fsync_inode_entry *entry;
|
|
|
|
|
2018-06-05 09:44:11 +00:00
|
|
|
if (!f2fs_is_valid_blkaddr(sbi, blkaddr, META_POR))
|
2014-09-11 20:49:55 +00:00
|
|
|
return 0;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
page = f2fs_get_tmp_page(sbi, blkaddr);
|
2018-07-16 16:02:17 +00:00
|
|
|
if (IS_ERR(page)) {
|
|
|
|
err = PTR_ERR(page);
|
|
|
|
break;
|
|
|
|
}
|
2013-03-08 12:29:23 +00:00
|
|
|
|
2019-04-10 10:45:26 +00:00
|
|
|
if (!is_recoverable_dnode(page)) {
|
|
|
|
f2fs_put_page(page, 1);
|
2013-05-16 06:04:49 +00:00
|
|
|
break;
|
2019-04-10 10:45:26 +00:00
|
|
|
}
|
2012-11-02 08:13:32 +00:00
|
|
|
|
|
|
|
if (!is_fsync_dnode(page))
|
|
|
|
goto next;
|
|
|
|
|
|
|
|
entry = get_fsync_inode(head, ino_of_node(page));
|
2016-11-05 03:12:40 +00:00
|
|
|
if (!entry) {
|
2017-08-08 02:54:31 +00:00
|
|
|
bool quota_inode = false;
|
|
|
|
|
2017-04-14 22:46:23 +00:00
|
|
|
if (!check_only &&
|
|
|
|
IS_INODE(page) && is_dent_dnode(page)) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
err = f2fs_recover_inode_page(sbi, page);
|
2019-04-10 10:45:26 +00:00
|
|
|
if (err) {
|
|
|
|
f2fs_put_page(page, 1);
|
2013-05-16 06:04:49 +00:00
|
|
|
break;
|
2019-04-10 10:45:26 +00:00
|
|
|
}
|
2017-08-08 02:54:31 +00:00
|
|
|
quota_inode = true;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
|
2014-09-15 23:46:08 +00:00
|
|
|
/*
|
|
|
|
* CP | dnode(F) | inode(DF)
|
|
|
|
* For this case, we should not give up now.
|
|
|
|
*/
|
2017-08-08 02:54:31 +00:00
|
|
|
entry = add_fsync_inode(sbi, head, ino_of_node(page),
|
|
|
|
quota_inode);
|
2016-09-09 23:48:15 +00:00
|
|
|
if (IS_ERR(entry)) {
|
|
|
|
err = PTR_ERR(entry);
|
2023-06-16 14:20:09 +00:00
|
|
|
if (err == -ENOENT)
|
2014-09-15 23:46:08 +00:00
|
|
|
goto next;
|
2019-04-10 10:45:26 +00:00
|
|
|
f2fs_put_page(page, 1);
|
2013-05-16 06:04:49 +00:00
|
|
|
break;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
}
|
2013-05-15 01:49:13 +00:00
|
|
|
entry->blkaddr = blkaddr;
|
|
|
|
|
2016-04-15 16:43:17 +00:00
|
|
|
if (IS_INODE(page) && is_dent_dnode(page))
|
|
|
|
entry->last_dentry = blkaddr;
|
2012-11-02 08:13:32 +00:00
|
|
|
next:
|
|
|
|
/* check next segment */
|
|
|
|
blkaddr = next_blkaddr_of_node(page);
|
2014-09-11 20:49:55 +00:00
|
|
|
f2fs_put_page(page, 1);
|
2014-12-08 07:02:52 +00:00
|
|
|
|
2023-05-27 17:06:40 +00:00
|
|
|
err = sanity_check_node_chain(sbi, blkaddr, &blkaddr_fast,
|
|
|
|
&is_detecting);
|
|
|
|
if (err)
|
|
|
|
break;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-10-12 10:49:26 +00:00
|
|
|
static void destroy_fsync_dnodes(struct list_head *head, int drop)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
2013-01-20 15:02:58 +00:00
|
|
|
struct fsync_inode_entry *entry, *tmp;
|
|
|
|
|
2016-04-29 12:13:37 +00:00
|
|
|
list_for_each_entry_safe(entry, tmp, head, list)
|
2018-10-12 10:49:26 +00:00
|
|
|
del_fsync_inode(entry, drop);
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
|
2013-05-21 23:20:01 +00:00
|
|
|
static int check_index_in_prev_nodes(struct f2fs_sb_info *sbi,
|
2013-05-21 23:02:02 +00:00
|
|
|
block_t blkaddr, struct dnode_of_data *dn)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
|
|
|
struct seg_entry *sentry;
|
|
|
|
unsigned int segno = GET_SEGNO(sbi, blkaddr);
|
2014-02-04 04:01:10 +00:00
|
|
|
unsigned short blkoff = GET_BLKOFF_FROM_SEG0(sbi, blkaddr);
|
2014-01-28 05:54:07 +00:00
|
|
|
struct f2fs_summary_block *sum_node;
|
2012-11-02 08:13:32 +00:00
|
|
|
struct f2fs_summary sum;
|
2014-01-28 05:54:07 +00:00
|
|
|
struct page *sum_page, *node_page;
|
2015-03-27 01:46:38 +00:00
|
|
|
struct dnode_of_data tdn = *dn;
|
2013-05-21 23:02:02 +00:00
|
|
|
nid_t ino, nid;
|
2012-11-02 08:13:32 +00:00
|
|
|
struct inode *inode;
|
2022-09-14 11:51:51 +00:00
|
|
|
unsigned int offset, ofs_in_node, max_addrs;
|
2012-11-02 08:13:32 +00:00
|
|
|
block_t bidx;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
sentry = get_seg_entry(sbi, segno);
|
|
|
|
if (!f2fs_test_bit(blkoff, sentry->cur_valid_map))
|
2013-05-21 23:20:01 +00:00
|
|
|
return 0;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
|
|
|
/* Get the previous summary */
|
2017-08-13 04:33:23 +00:00
|
|
|
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
|
2012-11-02 08:13:32 +00:00
|
|
|
struct curseg_info *curseg = CURSEG_I(sbi, i);
|
2021-04-06 01:47:35 +00:00
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
if (curseg->segno == segno) {
|
|
|
|
sum = curseg->sum_blk->entries[blkoff];
|
2014-01-28 05:54:07 +00:00
|
|
|
goto got_it;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
sum_page = f2fs_get_sum_page(sbi, segno);
|
2018-09-18 00:36:06 +00:00
|
|
|
if (IS_ERR(sum_page))
|
|
|
|
return PTR_ERR(sum_page);
|
2014-01-28 05:54:07 +00:00
|
|
|
sum_node = (struct f2fs_summary_block *)page_address(sum_page);
|
|
|
|
sum = sum_node->entries[blkoff];
|
|
|
|
f2fs_put_page(sum_page, 1);
|
|
|
|
got_it:
|
2013-05-21 23:02:02 +00:00
|
|
|
/* Use the locked dnode page and inode */
|
|
|
|
nid = le32_to_cpu(sum.nid);
|
2022-09-14 11:51:51 +00:00
|
|
|
ofs_in_node = le16_to_cpu(sum.ofs_in_node);
|
|
|
|
|
|
|
|
max_addrs = ADDRS_PER_PAGE(dn->node_page, dn->inode);
|
|
|
|
if (ofs_in_node >= max_addrs) {
|
|
|
|
f2fs_err(sbi, "Inconsistent ofs_in_node:%u in summary, ino:%lu, nid:%u, max:%u",
|
|
|
|
ofs_in_node, dn->inode->i_ino, nid, max_addrs);
|
2022-09-28 15:38:54 +00:00
|
|
|
f2fs_handle_error(sbi, ERROR_INCONSISTENT_SUMMARY);
|
2022-09-14 11:51:51 +00:00
|
|
|
return -EFSCORRUPTED;
|
|
|
|
}
|
|
|
|
|
2013-05-21 23:02:02 +00:00
|
|
|
if (dn->inode->i_ino == nid) {
|
|
|
|
tdn.nid = nid;
|
2015-03-27 01:46:38 +00:00
|
|
|
if (!dn->inode_page_locked)
|
|
|
|
lock_page(dn->inode_page);
|
2013-05-21 23:02:02 +00:00
|
|
|
tdn.node_page = dn->inode_page;
|
2022-09-14 11:51:51 +00:00
|
|
|
tdn.ofs_in_node = ofs_in_node;
|
2015-03-27 01:46:38 +00:00
|
|
|
goto truncate_out;
|
2013-05-21 23:02:02 +00:00
|
|
|
} else if (dn->nid == nid) {
|
2022-09-14 11:51:51 +00:00
|
|
|
tdn.ofs_in_node = ofs_in_node;
|
2015-03-27 01:46:38 +00:00
|
|
|
goto truncate_out;
|
2013-05-21 23:02:02 +00:00
|
|
|
}
|
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
/* Get the node page */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
node_page = f2fs_get_node_page(sbi, nid);
|
2013-05-21 23:20:01 +00:00
|
|
|
if (IS_ERR(node_page))
|
|
|
|
return PTR_ERR(node_page);
|
2013-08-12 12:08:03 +00:00
|
|
|
|
|
|
|
offset = ofs_of_node(node_page);
|
2012-11-02 08:13:32 +00:00
|
|
|
ino = ino_of_node(node_page);
|
|
|
|
f2fs_put_page(node_page, 1);
|
|
|
|
|
f2fs: fix double lock for inode page during roll-foward recovery
If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.
Error case is like this.
1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut
-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*
This patch should resolve that bug.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-12 15:35:58 +00:00
|
|
|
if (ino != dn->inode->i_ino) {
|
2017-08-08 02:54:31 +00:00
|
|
|
int ret;
|
|
|
|
|
f2fs: fix double lock for inode page during roll-foward recovery
If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.
Error case is like this.
1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut
-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*
This patch should resolve that bug.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-12 15:35:58 +00:00
|
|
|
/* Deallocate previous index in the node page */
|
2016-09-09 23:59:39 +00:00
|
|
|
inode = f2fs_iget_retry(sbi->sb, ino);
|
f2fs: fix double lock for inode page during roll-foward recovery
If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.
Error case is like this.
1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut
-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*
This patch should resolve that bug.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-12 15:35:58 +00:00
|
|
|
if (IS_ERR(inode))
|
|
|
|
return PTR_ERR(inode);
|
2017-08-08 02:54:31 +00:00
|
|
|
|
2021-10-28 13:03:05 +00:00
|
|
|
ret = f2fs_dquot_initialize(inode);
|
2017-08-08 02:54:31 +00:00
|
|
|
if (ret) {
|
|
|
|
iput(inode);
|
|
|
|
return ret;
|
|
|
|
}
|
f2fs: fix double lock for inode page during roll-foward recovery
If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.
Error case is like this.
1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut
-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*
This patch should resolve that bug.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-12 15:35:58 +00:00
|
|
|
} else {
|
|
|
|
inode = dn->inode;
|
|
|
|
}
|
2012-12-22 03:09:43 +00:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
bidx = f2fs_start_bidx_of_node(offset, inode) +
|
|
|
|
le16_to_cpu(sum.ofs_in_node);
|
2013-08-12 12:08:03 +00:00
|
|
|
|
2015-03-27 01:46:38 +00:00
|
|
|
/*
|
|
|
|
* if inode page is locked, unlock temporarily, but its reference
|
|
|
|
* count keeps alive.
|
|
|
|
*/
|
|
|
|
if (ino == dn->inode->i_ino && dn->inode_page_locked)
|
|
|
|
unlock_page(dn->inode_page);
|
|
|
|
|
|
|
|
set_new_dnode(&tdn, inode, NULL, NULL, 0);
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
if (f2fs_get_dnode_of_data(&tdn, bidx, LOOKUP_NODE))
|
2015-03-27 01:46:38 +00:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (tdn.data_blkaddr == blkaddr)
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
f2fs_truncate_data_blocks_range(&tdn, 1);
|
2015-03-27 01:46:38 +00:00
|
|
|
|
|
|
|
f2fs_put_dnode(&tdn);
|
|
|
|
out:
|
|
|
|
if (ino != dn->inode->i_ino)
|
f2fs: fix double lock for inode page during roll-foward recovery
If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.
Error case is like this.
1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut
-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*
This patch should resolve that bug.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-12 15:35:58 +00:00
|
|
|
iput(inode);
|
2015-03-27 01:46:38 +00:00
|
|
|
else if (dn->inode_page_locked)
|
|
|
|
lock_page(dn->inode_page);
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
truncate_out:
|
2020-02-14 09:44:10 +00:00
|
|
|
if (f2fs_data_blkaddr(&tdn) == blkaddr)
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
f2fs_truncate_data_blocks_range(&tdn, 1);
|
2015-03-27 01:46:38 +00:00
|
|
|
if (dn->inode->i_ino == nid && !dn->inode_page_locked)
|
|
|
|
unlock_page(dn->inode_page);
|
2013-05-21 23:20:01 +00:00
|
|
|
return 0;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
|
2024-01-24 14:49:15 +00:00
|
|
|
static int f2fs_reserve_new_block_retry(struct dnode_of_data *dn)
|
|
|
|
{
|
|
|
|
int i, err = 0;
|
|
|
|
|
|
|
|
for (i = DEFAULT_FAILURE_RETRY_COUNT; i > 0; i--) {
|
|
|
|
err = f2fs_reserve_new_block(dn);
|
|
|
|
if (!err)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2013-03-20 10:01:06 +00:00
|
|
|
static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
|
2017-11-22 10:23:40 +00:00
|
|
|
struct page *page)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
|
|
|
struct dnode_of_data dn;
|
|
|
|
struct node_info ni;
|
2016-01-26 07:39:35 +00:00
|
|
|
unsigned int start, end;
|
2013-05-16 06:04:49 +00:00
|
|
|
int err = 0, recovered = 0;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2014-08-08 06:49:17 +00:00
|
|
|
/* step 1: recover xattr */
|
|
|
|
if (IS_INODE(page)) {
|
2020-07-06 10:23:36 +00:00
|
|
|
err = f2fs_recover_inline_xattr(inode, page);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
2014-08-08 06:49:17 +00:00
|
|
|
} else if (f2fs_has_xattr_block(ofs_of_node(page))) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
err = f2fs_recover_xattr_data(inode, page);
|
f2fs: change recovery policy of xattr node block
Currently, if we call fsync after updating the xattr date belongs to the
file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
this policy cause low performance as checkpoint will block most foreground
operations and cause unneeded and unrelated IOs around checkpoint.
This patch will reuse regular file recovery policy for xattr node block,
so, we change to write xattr node block tagged with fsync flag to warm
area instead of cold area, and during recovery, we search warm node chain
for fsynced xattr block, and do the recovery.
So, for below application IO pattern, performance can be improved
obviously:
- touch file
- create/update/delete xattr entry in file
- fsync file
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-08 09:39:45 +00:00
|
|
|
if (!err)
|
|
|
|
recovered++;
|
2013-12-26 03:49:48 +00:00
|
|
|
goto out;
|
2014-08-08 06:49:17 +00:00
|
|
|
}
|
2013-12-26 03:49:48 +00:00
|
|
|
|
2014-08-08 06:49:17 +00:00
|
|
|
/* step 2: recover inline data */
|
2020-07-06 10:23:36 +00:00
|
|
|
err = f2fs_recover_inline_data(inode, page);
|
|
|
|
if (err) {
|
|
|
|
if (err == 1)
|
|
|
|
err = 0;
|
2014-01-28 03:25:06 +00:00
|
|
|
goto out;
|
2020-07-06 10:23:36 +00:00
|
|
|
}
|
2014-01-28 03:25:06 +00:00
|
|
|
|
2014-08-08 06:49:17 +00:00
|
|
|
/* step 3: recover data indices */
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
start = f2fs_start_bidx_of_node(ofs_of_node(page), inode);
|
2016-01-26 07:39:35 +00:00
|
|
|
end = start + ADDRS_PER_PAGE(page, inode);
|
2012-11-02 08:13:32 +00:00
|
|
|
|
|
|
|
set_new_dnode(&dn, inode, NULL, NULL, 0);
|
2016-09-09 23:59:39 +00:00
|
|
|
retry_dn:
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
err = f2fs_get_dnode_of_data(&dn, start, ALLOC_NODE);
|
2016-09-09 23:59:39 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == -ENOMEM) {
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14 22:07:14 +00:00
|
|
|
memalloc_retry_wait(GFP_NOFS);
|
2016-09-09 23:59:39 +00:00
|
|
|
goto retry_dn;
|
|
|
|
}
|
2013-12-26 03:49:48 +00:00
|
|
|
goto out;
|
2016-09-09 23:59:39 +00:00
|
|
|
}
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2018-12-25 09:43:42 +00:00
|
|
|
f2fs_wait_on_page_writeback(dn.node_page, NODE, true, true);
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2021-12-13 22:16:32 +00:00
|
|
|
err = f2fs_get_node_info(sbi, dn.nid, &ni, false);
|
2018-07-16 16:02:17 +00:00
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
2014-09-02 22:52:58 +00:00
|
|
|
f2fs_bug_on(sbi, ni.ino != ino_of_node(page));
|
2019-04-15 07:28:37 +00:00
|
|
|
|
|
|
|
if (ofs_of_node(dn.node_page) != ofs_of_node(page)) {
|
2019-06-18 09:48:42 +00:00
|
|
|
f2fs_warn(sbi, "Inconsistent ofs_of_node, ino:%lu, ofs:%u, %u",
|
|
|
|
inode->i_ino, ofs_of_node(dn.node_page),
|
|
|
|
ofs_of_node(page));
|
2019-06-20 03:36:14 +00:00
|
|
|
err = -EFSCORRUPTED;
|
2022-09-28 15:38:54 +00:00
|
|
|
f2fs_handle_error(sbi, ERROR_INCONSISTENT_FOOTER);
|
2019-04-15 07:28:37 +00:00
|
|
|
goto err;
|
|
|
|
}
|
2012-11-02 08:13:32 +00:00
|
|
|
|
f2fs: recover invalid/reserved block address for fsynced file
When testing with generic/101 in xfstests, error message outputed as below:
--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)
The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount
After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.
In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].
In this patch, we fix to recover invalid/reserved block during recover flow.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 09:23:54 +00:00
|
|
|
for (; start < end; start++, dn.ofs_in_node++) {
|
2012-11-02 08:13:32 +00:00
|
|
|
block_t src, dest;
|
|
|
|
|
2020-02-14 09:44:10 +00:00
|
|
|
src = f2fs_data_blkaddr(&dn);
|
|
|
|
dest = data_blkaddr(dn.inode, page, dn.ofs_in_node);
|
2012-11-02 08:13:32 +00:00
|
|
|
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 07:26:32 +00:00
|
|
|
if (__is_valid_data_blkaddr(src) &&
|
|
|
|
!f2fs_is_valid_blkaddr(sbi, src, META_POR)) {
|
2019-06-20 03:36:14 +00:00
|
|
|
err = -EFSCORRUPTED;
|
2022-09-28 15:38:54 +00:00
|
|
|
f2fs_handle_error(sbi, ERROR_INVALID_BLKADDR);
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 07:26:32 +00:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (__is_valid_data_blkaddr(dest) &&
|
|
|
|
!f2fs_is_valid_blkaddr(sbi, dest, META_POR)) {
|
2019-06-20 03:36:14 +00:00
|
|
|
err = -EFSCORRUPTED;
|
2022-09-28 15:38:54 +00:00
|
|
|
f2fs_handle_error(sbi, ERROR_INVALID_BLKADDR);
|
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-15 07:26:32 +00:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
f2fs: recover invalid/reserved block address for fsynced file
When testing with generic/101 in xfstests, error message outputed as below:
--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)
The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount
After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.
In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].
In this patch, we fix to recover invalid/reserved block during recover flow.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 09:23:54 +00:00
|
|
|
/* skip recovering if dest is the same as src */
|
|
|
|
if (src == dest)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* dest is invalid, just invalidate src block */
|
|
|
|
if (dest == NULL_ADDR) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
f2fs_truncate_data_blocks_range(&dn, 1);
|
f2fs: recover invalid/reserved block address for fsynced file
When testing with generic/101 in xfstests, error message outputed as below:
--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)
The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount
After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.
In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].
In this patch, we fix to recover invalid/reserved block during recover flow.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 09:23:54 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2016-11-28 23:33:38 +00:00
|
|
|
if (!file_keep_isize(inode) &&
|
2017-01-25 02:52:39 +00:00
|
|
|
(i_size_read(inode) <= ((loff_t)start << PAGE_SHIFT)))
|
|
|
|
f2fs_i_size_write(inode,
|
|
|
|
(loff_t)(start + 1) << PAGE_SHIFT);
|
2016-05-21 03:42:37 +00:00
|
|
|
|
f2fs: recover invalid/reserved block address for fsynced file
When testing with generic/101 in xfstests, error message outputed as below:
--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)
The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount
After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.
In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].
In this patch, we fix to recover invalid/reserved block during recover flow.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 09:23:54 +00:00
|
|
|
/*
|
|
|
|
* dest is reserved block, invalidate src block
|
|
|
|
* and then reserve one new block in dnode page.
|
|
|
|
*/
|
|
|
|
if (dest == NEW_ADDR) {
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
f2fs_truncate_data_blocks_range(&dn, 1);
|
2024-01-24 14:49:15 +00:00
|
|
|
|
|
|
|
err = f2fs_reserve_new_block_retry(&dn);
|
2023-11-16 06:25:56 +00:00
|
|
|
if (err)
|
|
|
|
goto err;
|
f2fs: recover invalid/reserved block address for fsynced file
When testing with generic/101 in xfstests, error message outputed as below:
--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)
The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount
After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.
In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].
In this patch, we fix to recover invalid/reserved block during recover flow.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 09:23:54 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* dest is valid block, try to recover from src to dest */
|
2018-06-05 09:44:11 +00:00
|
|
|
if (f2fs_is_valid_blkaddr(sbi, dest, META_POR)) {
|
2012-11-02 08:13:32 +00:00
|
|
|
if (src == NULL_ADDR) {
|
2024-01-24 14:49:15 +00:00
|
|
|
err = f2fs_reserve_new_block_retry(&dn);
|
2016-07-20 02:30:06 +00:00
|
|
|
if (err)
|
|
|
|
goto err;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
2016-09-09 23:59:39 +00:00
|
|
|
retry_prev:
|
2012-11-02 08:13:32 +00:00
|
|
|
/* Check the previous node page having this index */
|
2013-05-21 23:20:01 +00:00
|
|
|
err = check_index_in_prev_nodes(sbi, dest, &dn);
|
2016-09-09 23:59:39 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == -ENOMEM) {
|
mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14 22:07:14 +00:00
|
|
|
memalloc_retry_wait(GFP_NOFS);
|
2016-09-09 23:59:39 +00:00
|
|
|
goto retry_prev;
|
|
|
|
}
|
2013-05-21 23:20:01 +00:00
|
|
|
goto err;
|
2016-09-09 23:59:39 +00:00
|
|
|
}
|
2012-11-02 08:13:32 +00:00
|
|
|
|
f2fs: fix to do sanity check on destination blkaddr during recovery
As Wenqing Liu reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=216456
loop5: detected capacity change from 0 to 131072
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): Bitmap was wrongly set, blk:5634
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1013 at fs/f2fs/segment.c:2198
RIP: 0010:update_sit_entry+0xa55/0x10b0 [f2fs]
Call Trace:
<TASK>
f2fs_do_replace_block+0xa98/0x1890 [f2fs]
f2fs_replace_block+0xeb/0x180 [f2fs]
recover_data+0x1a69/0x6ae0 [f2fs]
f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
f2fs_fill_super+0x4665/0x61e0 [f2fs]
mount_bdev+0x2cf/0x3b0
legacy_get_tree+0xed/0x1d0
vfs_get_tree+0x81/0x2b0
path_mount+0x47e/0x19d0
do_mount+0xce/0xf0
__x64_sys_mount+0x12c/0x1a0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
If we enable CONFIG_F2FS_CHECK_FS config, it will trigger a kernel panic
instead of warning.
The root cause is: in fuzzed image, SIT table is inconsistent with inode
mapping table, result in triggering such warning during SIT table update.
This patch introduces a new flag DATA_GENERIC_ENHANCE_UPDATE, w/ this
flag, data block recovery flow can check destination blkaddr's validation
in SIT table, and skip f2fs_replace_block() to avoid inconsistent status.
Cc: stable@vger.kernel.org
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-09-13 02:08:41 +00:00
|
|
|
if (f2fs_is_valid_blkaddr(sbi, dest,
|
|
|
|
DATA_GENERIC_ENHANCE_UPDATE)) {
|
|
|
|
f2fs_err(sbi, "Inconsistent dest blkaddr:%u, ino:%lu, ofs:%u",
|
|
|
|
dest, inode->i_ino, dn.ofs_in_node);
|
|
|
|
err = -EFSCORRUPTED;
|
2022-09-28 15:38:54 +00:00
|
|
|
f2fs_handle_error(sbi,
|
|
|
|
ERROR_INVALID_BLKADDR);
|
f2fs: fix to do sanity check on destination blkaddr during recovery
As Wenqing Liu reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=216456
loop5: detected capacity change from 0 to 131072
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): Bitmap was wrongly set, blk:5634
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1013 at fs/f2fs/segment.c:2198
RIP: 0010:update_sit_entry+0xa55/0x10b0 [f2fs]
Call Trace:
<TASK>
f2fs_do_replace_block+0xa98/0x1890 [f2fs]
f2fs_replace_block+0xeb/0x180 [f2fs]
recover_data+0x1a69/0x6ae0 [f2fs]
f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
f2fs_fill_super+0x4665/0x61e0 [f2fs]
mount_bdev+0x2cf/0x3b0
legacy_get_tree+0xed/0x1d0
vfs_get_tree+0x81/0x2b0
path_mount+0x47e/0x19d0
do_mount+0xce/0xf0
__x64_sys_mount+0x12c/0x1a0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
If we enable CONFIG_F2FS_CHECK_FS config, it will trigger a kernel panic
instead of warning.
The root cause is: in fuzzed image, SIT table is inconsistent with inode
mapping table, result in triggering such warning during SIT table update.
This patch introduces a new flag DATA_GENERIC_ENHANCE_UPDATE, w/ this
flag, data block recovery flow can check destination blkaddr's validation
in SIT table, and skip f2fs_replace_block() to avoid inconsistent status.
Cc: stable@vger.kernel.org
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-09-13 02:08:41 +00:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
/* write dummy data page */
|
2015-05-28 11:15:35 +00:00
|
|
|
f2fs_replace_block(sbi, &dn, src, dest,
|
f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-06 06:40:34 +00:00
|
|
|
ni.version, false, false);
|
2013-05-16 06:04:49 +00:00
|
|
|
recovered++;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
copy_node_footer(dn.node_page, page);
|
|
|
|
fill_node_footer(dn.node_page, dn.nid, ni.ino,
|
|
|
|
ofs_of_node(page), false);
|
|
|
|
set_page_dirty(dn.node_page);
|
2013-05-21 23:20:01 +00:00
|
|
|
err:
|
2012-11-02 08:13:32 +00:00
|
|
|
f2fs_put_dnode(&dn);
|
2013-12-26 03:49:48 +00:00
|
|
|
out:
|
2019-06-18 09:48:42 +00:00
|
|
|
f2fs_notice(sbi, "recover_data: ino = %lx (i_size: %s) recovered = %d, err = %d",
|
|
|
|
inode->i_ino, file_keep_isize(inode) ? "keep" : "recover",
|
|
|
|
recovered, err);
|
2013-05-21 23:20:01 +00:00
|
|
|
return err;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list,
|
2018-10-12 10:49:26 +00:00
|
|
|
struct list_head *tmp_inode_list, struct list_head *dir_list)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
|
|
|
struct curseg_info *curseg;
|
2014-09-11 20:49:55 +00:00
|
|
|
struct page *page = NULL;
|
2013-03-20 10:01:06 +00:00
|
|
|
int err = 0;
|
2012-11-02 08:13:32 +00:00
|
|
|
block_t blkaddr;
|
2022-02-04 00:34:10 +00:00
|
|
|
unsigned int ra_blocks = RECOVERY_MAX_RA_BLOCKS;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
|
|
|
/* get node pages in the current segment */
|
2015-12-01 03:43:59 +00:00
|
|
|
curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
|
2012-11-02 08:13:32 +00:00
|
|
|
blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
|
|
|
|
|
|
|
|
while (1) {
|
|
|
|
struct fsync_inode_entry *entry;
|
|
|
|
|
2018-06-05 09:44:11 +00:00
|
|
|
if (!f2fs_is_valid_blkaddr(sbi, blkaddr, META_POR))
|
2014-09-11 20:49:55 +00:00
|
|
|
break;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
page = f2fs_get_tmp_page(sbi, blkaddr);
|
2018-07-16 16:02:17 +00:00
|
|
|
if (IS_ERR(page)) {
|
|
|
|
err = PTR_ERR(page);
|
|
|
|
break;
|
|
|
|
}
|
2013-03-08 12:29:23 +00:00
|
|
|
|
2016-09-20 00:55:10 +00:00
|
|
|
if (!is_recoverable_dnode(page)) {
|
2014-09-11 20:49:55 +00:00
|
|
|
f2fs_put_page(page, 1);
|
2013-05-20 01:26:09 +00:00
|
|
|
break;
|
2014-09-11 20:49:55 +00:00
|
|
|
}
|
2012-11-02 08:13:32 +00:00
|
|
|
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
entry = get_fsync_inode(inode_list, ino_of_node(page));
|
2012-11-02 08:13:32 +00:00
|
|
|
if (!entry)
|
|
|
|
goto next;
|
2014-09-15 23:46:08 +00:00
|
|
|
/*
|
|
|
|
* inode(x) | CP | inode(x) | dnode(F)
|
|
|
|
* In this case, we can lose the latest inode(x).
|
2014-09-11 21:29:06 +00:00
|
|
|
* So, call recover_inode for the inode update.
|
2014-09-15 23:46:08 +00:00
|
|
|
*/
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
if (IS_INODE(page)) {
|
|
|
|
err = recover_inode(entry->inode, page);
|
2019-04-10 10:45:26 +00:00
|
|
|
if (err) {
|
|
|
|
f2fs_put_page(page, 1);
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
break;
|
2019-04-10 10:45:26 +00:00
|
|
|
}
|
f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 12:05:00 +00:00
|
|
|
}
|
2014-09-11 21:29:06 +00:00
|
|
|
if (entry->last_dentry == blkaddr) {
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
err = recover_dentry(entry->inode, page, dir_list);
|
2014-09-11 21:29:06 +00:00
|
|
|
if (err) {
|
|
|
|
f2fs_put_page(page, 1);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2017-11-22 10:23:40 +00:00
|
|
|
err = do_recover_data(sbi, entry->inode, page);
|
2014-09-11 20:49:55 +00:00
|
|
|
if (err) {
|
|
|
|
f2fs_put_page(page, 1);
|
2013-05-20 01:26:09 +00:00
|
|
|
break;
|
2014-09-11 20:49:55 +00:00
|
|
|
}
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2016-04-29 12:13:37 +00:00
|
|
|
if (entry->blkaddr == blkaddr)
|
2018-10-12 10:49:26 +00:00
|
|
|
list_move_tail(&entry->list, tmp_inode_list);
|
2012-11-02 08:13:32 +00:00
|
|
|
next:
|
2022-02-04 00:34:10 +00:00
|
|
|
ra_blocks = adjust_por_ra_blocks(sbi, ra_blocks, blkaddr,
|
|
|
|
next_blkaddr_of_node(page));
|
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
/* check next segment */
|
|
|
|
blkaddr = next_blkaddr_of_node(page);
|
2014-09-11 20:49:55 +00:00
|
|
|
f2fs_put_page(page, 1);
|
2022-02-04 00:34:10 +00:00
|
|
|
|
|
|
|
f2fs_ra_meta_pages_cond(sbi, blkaddr, ra_blocks);
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
2013-03-20 10:01:06 +00:00
|
|
|
if (!err)
|
2020-06-22 09:38:48 +00:00
|
|
|
f2fs_allocate_new_segments(sbi);
|
2013-03-20 10:01:06 +00:00
|
|
|
return err;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
|
|
|
|
f2fs: clean up symbol namespace
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 16:20:41 +00:00
|
|
|
int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
|
2012-11-02 08:13:32 +00:00
|
|
|
{
|
2018-10-12 10:49:26 +00:00
|
|
|
struct list_head inode_list, tmp_inode_list;
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
struct list_head dir_list;
|
2013-03-20 10:01:06 +00:00
|
|
|
int err;
|
2016-03-23 23:12:58 +00:00
|
|
|
int ret = 0;
|
2017-08-08 02:54:31 +00:00
|
|
|
unsigned long s_flags = sbi->sb->s_flags;
|
2013-10-23 04:39:32 +00:00
|
|
|
bool need_writecp = false;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2023-04-02 11:27:06 +00:00
|
|
|
if (is_sbi_flag_set(sbi, SBI_IS_WRITABLE))
|
2019-06-18 09:48:42 +00:00
|
|
|
f2fs_info(sbi, "recover fsync data on readonly fs");
|
2017-08-08 02:54:31 +00:00
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
INIT_LIST_HEAD(&inode_list);
|
2018-10-12 10:49:26 +00:00
|
|
|
INIT_LIST_HEAD(&tmp_inode_list);
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
INIT_LIST_HEAD(&dir_list);
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2014-08-13 23:30:46 +00:00
|
|
|
/* prevent checkpoint */
|
2022-01-07 20:48:44 +00:00
|
|
|
f2fs_down_write(&sbi->cp_global_sem);
|
2014-08-13 23:30:46 +00:00
|
|
|
|
2015-08-11 19:45:39 +00:00
|
|
|
/* step #1: find fsynced inode numbers */
|
2017-04-14 22:46:23 +00:00
|
|
|
err = find_fsync_dnodes(sbi, &inode_list, check_only);
|
2016-03-23 23:12:58 +00:00
|
|
|
if (err || list_empty(&inode_list))
|
2017-08-08 02:54:31 +00:00
|
|
|
goto skip;
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2016-03-23 23:12:58 +00:00
|
|
|
if (check_only) {
|
|
|
|
ret = 1;
|
2017-08-08 02:54:31 +00:00
|
|
|
goto skip;
|
2016-03-23 23:12:58 +00:00
|
|
|
}
|
2012-11-02 08:13:32 +00:00
|
|
|
|
2013-10-23 04:39:32 +00:00
|
|
|
need_writecp = true;
|
2013-09-24 01:26:24 +00:00
|
|
|
|
2012-11-02 08:13:32 +00:00
|
|
|
/* step #2: recover data */
|
2018-10-12 10:49:26 +00:00
|
|
|
err = recover_data(sbi, &inode_list, &tmp_inode_list, &dir_list);
|
2014-08-08 17:18:43 +00:00
|
|
|
if (!err)
|
2014-09-02 22:52:58 +00:00
|
|
|
f2fs_bug_on(sbi, !list_empty(&inode_list));
|
2021-09-01 08:06:21 +00:00
|
|
|
else
|
|
|
|
f2fs_bug_on(sbi, sbi->sb->s_flags & SB_ACTIVE);
|
2017-08-08 02:54:31 +00:00
|
|
|
skip:
|
2018-10-12 10:49:26 +00:00
|
|
|
destroy_fsync_dnodes(&inode_list, err);
|
|
|
|
destroy_fsync_dnodes(&tmp_inode_list, err);
|
2014-07-25 22:47:25 +00:00
|
|
|
|
2014-09-11 20:49:55 +00:00
|
|
|
/* truncate meta pages to be used by the recovery */
|
|
|
|
truncate_inode_pages_range(META_MAPPING(sbi),
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 12:29:47 +00:00
|
|
|
(loff_t)MAIN_BLKADDR(sbi) << PAGE_SHIFT, -1);
|
2014-09-11 20:49:55 +00:00
|
|
|
|
2014-07-25 22:47:25 +00:00
|
|
|
if (err) {
|
|
|
|
truncate_inode_pages_final(NODE_MAPPING(sbi));
|
|
|
|
truncate_inode_pages_final(META_MAPPING(sbi));
|
|
|
|
}
|
2019-12-09 10:44:44 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If fsync data succeeds or there is no fsync data to recover,
|
|
|
|
* and the f2fs is not read only, check and fix zoned block devices'
|
|
|
|
* write pointer consistency.
|
|
|
|
*/
|
2024-02-16 21:23:30 +00:00
|
|
|
if (f2fs_sb_has_blkzoned(sbi) && !f2fs_readonly(sbi->sb)) {
|
|
|
|
int err2 = f2fs_fix_curseg_write_pointer(sbi);
|
|
|
|
|
|
|
|
if (!err2)
|
|
|
|
err2 = f2fs_check_write_pointer(sbi);
|
|
|
|
if (err2)
|
|
|
|
err = err2;
|
2019-12-09 10:44:44 +00:00
|
|
|
ret = err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!err)
|
|
|
|
clear_sbi_flag(sbi, SBI_POR_DOING);
|
|
|
|
|
2022-01-07 20:48:44 +00:00
|
|
|
f2fs_up_write(&sbi->cp_global_sem);
|
2016-09-20 00:55:10 +00:00
|
|
|
|
2016-09-20 01:13:54 +00:00
|
|
|
/* let's drop all the directory inodes for clean checkpoint */
|
2018-10-12 10:49:26 +00:00
|
|
|
destroy_fsync_dnodes(&dir_list, err);
|
2016-09-20 01:13:54 +00:00
|
|
|
|
f2fs: fix to flush all dirty inodes recovered in readonly fs
generic/417 reported as blow:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
? _raw_spin_unlock+0x2c/0x50
evict+0xa8/0x170
dispose_list+0x34/0x40
evict_inodes+0x118/0x120
generic_shutdown_super+0x41/0x100
? rcu_read_lock_sched_held+0x97/0xa0
kill_block_super+0x22/0x50
kill_f2fs_super+0x6f/0x80 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
It can simply reproduced with scripts:
Enable quota feature during mkfs.
Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs
Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
a) open /mnt/f2fs/file;
b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs
The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.
Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.
To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-22 09:11:05 +00:00
|
|
|
if (need_writecp) {
|
|
|
|
set_sbi_flag(sbi, SBI_IS_RECOVERED);
|
|
|
|
|
|
|
|
if (!err) {
|
|
|
|
struct cp_control cpc = {
|
|
|
|
.reason = CP_RECOVERY,
|
|
|
|
};
|
2023-08-08 00:59:49 +00:00
|
|
|
stat_inc_cp_call_count(sbi, TOTAL_CALL);
|
f2fs: fix to flush all dirty inodes recovered in readonly fs
generic/417 reported as blow:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
? _raw_spin_unlock+0x2c/0x50
evict+0xa8/0x170
dispose_list+0x34/0x40
evict_inodes+0x118/0x120
generic_shutdown_super+0x41/0x100
? rcu_read_lock_sched_held+0x97/0xa0
kill_block_super+0x22/0x50
kill_f2fs_super+0x6f/0x80 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
It can simply reproduced with scripts:
Enable quota feature during mkfs.
Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs
Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
a) open /mnt/f2fs/file;
b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs
The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.
Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.
To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-22 09:11:05 +00:00
|
|
|
err = f2fs_write_checkpoint(sbi, &cpc);
|
|
|
|
}
|
2014-07-25 22:47:25 +00:00
|
|
|
}
|
f2fs: fix inode cache leak
When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...
After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.
But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.
We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.
This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 08:15:05 +00:00
|
|
|
|
2017-11-27 21:05:09 +00:00
|
|
|
sbi->sb->s_flags = s_flags; /* Restore SB_RDONLY status */
|
2017-08-08 02:54:31 +00:00
|
|
|
|
2021-04-06 01:47:35 +00:00
|
|
|
return ret ? ret : err;
|
2012-11-02 08:13:32 +00:00
|
|
|
}
|
2021-05-07 10:10:38 +00:00
|
|
|
|
|
|
|
int __init f2fs_create_recovery_cache(void)
|
|
|
|
{
|
|
|
|
fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
|
|
|
|
sizeof(struct fsync_inode_entry));
|
2022-11-25 11:47:36 +00:00
|
|
|
return fsync_entry_slab ? 0 : -ENOMEM;
|
2021-05-07 10:10:38 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void f2fs_destroy_recovery_cache(void)
|
|
|
|
{
|
|
|
|
kmem_cache_destroy(fsync_entry_slab);
|
|
|
|
}
|