Commit Graph

459 Commits

Author SHA1 Message Date
Jaegeuk Kim
70db5b04cb f2fs: give some messages for inline_xattr_size
This patch adds some kernel messages when user sets wrong inline_xattr_size.

Fixes: 500e0b28ec ("f2fs: fix to check inline_xattr_size boundary correctly")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-12 19:02:21 -07:00
Jaegeuk Kim
428e3bcf07 f2fs: give random value to i_generation
This follows to give random number to i_generation along with commit
2325306802 ("ext4: improve smp scalability for inode generation")

This can be used for DUN for UFS HW encryption.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-12 18:59:18 -07:00
Chao Yu
aa2c8c43e4 f2fs: fix to retry fill_super only if recovery failed
With current retry mechanism in f2fs_fill_super, first fill_super
fails due to no memory, then second fill_super runs w/o recovery,
if we succeed, we may lose fsynced data, it doesn't make sense.

Let's retry fill_super only if it occurs non-ENOMEM error during
recovery.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-12 18:59:17 -07:00
Chao Yu
6d52e135c8 f2fs: don't allow negative ->write_io_size_bits
As Dan reported:

"We put an upper bound on ->write_io_size_bits but we don't have a lower
bound."

So let's add lower bound check for ->write_io_size_bits in parse_options().

[We don't allow configuring ->write_io_size_bits to zero, since at least
we need to fill one dummy page for aligned IO.]

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-05 19:58:06 -08:00
Chao Yu
500e0b28ec f2fs: fix to check inline_xattr_size boundary correctly
We use below condition to check inline_xattr_size boundary:

	if (!F2FS_OPTION(sbi).inline_xattr_size ||
		F2FS_OPTION(sbi).inline_xattr_size >=
				DEF_ADDRS_PER_INODE -
				F2FS_TOTAL_EXTRA_ATTR_SIZE -
				DEF_INLINE_RESERVED_SIZE -
				DEF_MIN_INLINE_SIZE)

There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.

.bitmap		1 * 1 = 1
.reserved	1 * 1 = 1
.dentry		11 * 2 = 22
.filename	8 * 2 = 16
total		40

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-05 19:58:06 -08:00
Chengguang Xu
025cdb166c f2fs: jump to label 'free_node_inode' when failing from d_make_root()
When sb->s_root is NULL dput() will do nothing,
so jump to label 'free_node_inode' instead of lable
'free_root_inode' when failing from d_make_root().

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-05 19:58:06 -08:00
zhengliang
a0770e13c8 f2fs: fix to data block override node segment by mistake
v4: Rearrange the previous three versions.

The following scenario could lead to data block override by mistake.

TASK A            |  TASK kworker                                            |     TASK B                                            |       TASK C
                  |                                                          |                                                       |
open              |                                                          |                                                       |
write             |                                                          |                                                       |
close             |                                                          |                                                       |
                  |  f2fs_write_data_pages                                   |                                                       |
                  |    f2fs_write_cache_pages                                |                                                       |
                  |      f2fs_outplace_write_data                            |                                                       |
                  |        f2fs_allocate_data_block (get block in seg S,     |                                                       |
                  |                                  S is full, and only     |                                                       |
                  |                                  have this valid data    |                                                       |
                  |                                  block)                  |                                                       |
                  |          allocate_segment                                |                                                       |
                  |          locate_dirty_segment (mark S as PRE)            |                                                       |
                  |        f2fs_submit_page_write (submit but is not         |                                                       |
                  |                                written on dev)           |                                                       |
unlink            |                                                          |                                                       |
 iput_final       |                                                          |                                                       |
  f2fs_drop_inode |                                                          |                                                       |
    f2fs_truncate |                                                          |                                                       |
 (not evict)      |                                                          |                                                       |
                  |                                                          | write_checkpoint                                      |
                  |                                                          |  flush merged bio but not wait file data writeback    |
                  |                                                          |  set_prefree_as_free (mark S as FREE)                 |
                  |                                                          |                                                       | update NODE/DATA
                  |                                                          |                                                       | allocate_segment (select S)
                  |     writeback done                                       |                                                       |

So we need to guarantee io complete before truncate inode in f2fs_drop_inode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-05 19:57:56 -08:00
Jaegeuk Kim
812a95977f f2fs: sync filesystem after roll-forward recovery
Some works after roll-forward recovery can get an error which will release
all the data structures. Let's flush them in order to make it clean.

One possible corruption came from:

[   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
[   90.675349] Call trace:
[   90.677869]  __list_del_entry_valid+0x94/0xb4
[   90.682351]  remove_dirty_inode+0xac/0x114
[   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
[   90.691302]  f2fs_write_data_pages+0x40/0x4c
[   90.695695]  do_writepages+0x80/0xf0
[   90.699372]  __writeback_single_inode+0xdc/0x4ac
[   90.704113]  writeback_sb_inodes+0x280/0x440
[   90.708501]  wb_writeback+0x1b8/0x3d0
[   90.712267]  wb_workfn+0x1a8/0x4d4
[   90.715765]  process_one_work+0x1c0/0x3d4
[   90.719883]  worker_thread+0x224/0x344
[   90.723739]  kthread+0x120/0x130
[   90.727055]  ret_from_fork+0x10/0x18

Reported-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:45 -08:00
Jaegeuk Kim
0e0667b625 f2fs: flush quota blocks after turnning it off
After quota_off, we'll get some dirty blocks. If put_super don't have a chance
to flush them by checkpoint, it causes NULL pointer exception in end_io after
iput(node_inode). (e.g., by checkpoint=disable)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:45 -08:00
Jaegeuk Kim
db610a640e f2fs: add quick mode of checkpoint=disable for QA
This mode returns mount() quickly with EAGAIN. We can trigger this by
shutdown(F2FS_GOING_DOWN_NEED_FSCK).

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:37 -08:00
Jaegeuk Kim
03f2c02d8b f2fs: run discard jobs when put_super
When we umount f2fs, we need to avoid long delay due to discard commands, which
is actually taking tens of seconds, if storage is very slow on UNMAP. So, this
patch introduces timeout-based work on it.

By default, let me give 5 seconds for discard.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 08:55:34 -08:00
Greg Kroah-Hartman
c20e57b32d f2fs: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 20:41:09 -08:00
Chengguang Xu
f365c6cc85 f2fs: change error code to -ENOMEM from -EINVAL
The error case of failing allocating memory should
return -ENOMEM.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 09:34:27 -08:00
Jaegeuk Kim
7c77bf7de1 f2fs: don't access node/meta inode mapping after iput
This fixes wrong access of address spaces of node and meta inodes after iput.

Fixes: 60aa4d5536 ("f2fs: fix use-after-free issue when accessing sbi->stat_info")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 09:34:27 -08:00
Sahitya Tummala
60aa4d5536 f2fs: fix use-after-free issue when accessing sbi->stat_info
iput() on sbi->node_inode can update sbi->stat_info
in the below context, if the f2fs_write_checkpoint()
has failed with error.

f2fs_balance_fs_bg+0x1ac/0x1ec
f2fs_write_node_pages+0x4c/0x260
do_writepages+0x80/0xbc
__writeback_single_inode+0xdc/0x4ac
writeback_single_inode+0x9c/0x144
write_inode_now+0xc4/0xec
iput+0x194/0x22c
f2fs_put_super+0x11c/0x1e8
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78

Fix this by moving f2fs_destroy_stats() further below iput() in
both f2fs_put_super() and f2fs_fill_super() paths.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:18:10 -08:00
Martin Blumenstingl
88960068f2 f2fs: fix validation of the block count in sanity_check_raw_super
Treat "block_count" from struct f2fs_super_block as 64-bit little endian
value in sanity_check_raw_super() because struct f2fs_super_block
declares "block_count" as "__le64".

This fixes a bug where the superblock validation fails on big endian
devices with the following error:
  F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
  F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
  F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
  F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
As result of this the partition cannot be mounted.

With this patch applied the superblock validation works fine and the
partition can be mounted again:
  F2FS-fs (sda1): Mounted with checkpoint version = 7c84

My little endian x86-64 hardware was able to mount the partition without
this fix.
To confirm that mounting f2fs filesystems works on big endian machines
again I tested this on a 32-bit MIPS big endian (lantiq) device.

Fixes: 0cfe75c5b0 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:55 -08:00
Jaegeuk Kim
8f31b4665c f2fs: fix missing unlock(sbi->gc_mutex)
This fixes missing unlock call.

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:55 -08:00
Jaegeuk Kim
5222595d09 f2fs: use kvmalloc, if kmalloc is failed
One report says memalloc failure during mount.

 (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
 (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
 (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
 (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
 (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
 (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
 (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
 (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
 (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
 (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
 (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
 (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
 (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
 (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:53 -08:00
Alexey Dobriyan
19880e6e5f f2fs: make "f2fs_fault_name[]" const char *
Those strings are immutable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:54:37 -08:00
Chao Yu
e3080b0120 f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.

Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.

So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Chao Yu
7beb01f744 f2fs: clean up f2fs_sb_has_##feature_name
In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
access .raw_super field, to avoid unneeded pointer conversion, this
patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.

Just do cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Yunlong Song
089842de57 f2fs: remove codes of unused wio_mutex
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Chao Yu
af033b2aa8 f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
    hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Sheng Yong
26b5a07919 f2fs: cleanup dirty pages if recover failed
During recover, we will try to create new dentries for inodes with
dentry_mark. But if the parent is missing (e.g. killed by fsck),
recover will break. But those recovered dirty pages are not cleanup.
This will hit f2fs_bug_on:

[   53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
[   53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
[   53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
[   53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
[   53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
[   53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546174] Modules linked in:
[   53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
[   53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
[   53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
[   53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
[   53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
[   53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
[   53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
[   53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
[   53.546226] FS:  00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
[   53.546229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
[   53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   53.546242] Call Trace:
[   53.546248]  f2fs_submit_page_bio+0x95/0x740
[   53.546253]  read_node_page+0x161/0x1e0
[   53.546271]  ? truncate_node+0x650/0x650
[   53.546283]  ? add_to_page_cache_lru+0x12c/0x170
[   53.546288]  ? pagecache_get_page+0x262/0x2d0
[   53.546292]  __get_node_page+0x200/0x660
[   53.546302]  f2fs_update_inode_page+0x4a/0x160
[   53.546306]  f2fs_write_inode+0x86/0xb0
[   53.546317]  __writeback_single_inode+0x49c/0x620
[   53.546322]  writeback_single_inode+0xe4/0x1e0
[   53.546326]  sync_inode_metadata+0x93/0xd0
[   53.546330]  ? sync_inode+0x10/0x10
[   53.546342]  ? do_raw_spin_unlock+0xed/0x100
[   53.546347]  f2fs_sync_inode_meta+0xe0/0x130
[   53.546351]  f2fs_fill_super+0x287d/0x2d10
[   53.546367]  ? vsnprintf+0x742/0x7a0
[   53.546372]  ? f2fs_commit_super+0x180/0x180
[   53.546379]  ? up_write+0x20/0x40
[   53.546385]  ? set_blocksize+0x5f/0x140
[   53.546391]  ? f2fs_commit_super+0x180/0x180
[   53.546402]  mount_bdev+0x181/0x200
[   53.546406]  mount_fs+0x94/0x180
[   53.546411]  vfs_kern_mount+0x6c/0x1e0
[   53.546415]  do_mount+0xe5e/0x1510
[   53.546420]  ? fs_reclaim_release+0x9/0x30
[   53.546424]  ? copy_mount_string+0x20/0x20
[   53.546428]  ? fs_reclaim_acquire+0xd/0x30
[   53.546435]  ? __might_sleep+0x2c/0xc0
[   53.546440]  ? ___might_sleep+0x53/0x170
[   53.546453]  ? __might_fault+0x4c/0x60
[   53.546468]  ? _copy_from_user+0x95/0xa0
[   53.546474]  ? memdup_user+0x39/0x60
[   53.546478]  ksys_mount+0x88/0xb0
[   53.546482]  __x64_sys_mount+0x5d/0x70
[   53.546495]  do_syscall_64+0x65/0x130
[   53.546503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   53.547639] ---[ end trace b804d1ea2fec893e ]---

So if recover fails, we need to drop all recovered data.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Chao Yu
9149a5eb60 f2fs: spread f2fs_set_inode_flags()
This patch changes codes as below:
- use f2fs_set_inode_flags() to update i_flags atomically to avoid
potential race.
- synchronize F2FS_I(inode)->i_flags to inode->i_flags in
f2fs_new_inode().
- use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Jaegeuk Kim
730746ce88 f2fs: allow to mount, if quota is failed
Since we can use the filesystem without quotas till next boot.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:37:00 -07:00
Daniel Rosenberg
4354994f09 f2fs: checkpoint disabling
Note that, it requires "f2fs: return correct errno in f2fs_gc".

This adds a lightweight non-persistent snapshotting scheme to f2fs.

To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:39 -07:00
Junling Zheng
d440c52d31 f2fs: support superblock checksum
Now we support crc32 checksum for superblock.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:34:18 -07:00
Jaegeuk Kim
095680f24f f2fs: keep lazytime on remount
This patch fixes losing lazytime when remounting f2fs.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 16:50:46 -07:00
Chengguang Xu
c6b1867b1d f2fs: fix remount problem of option io_bits
Currently we show mount option "io_bits=%u" as "io_size=%uKB",
it will cause option parsing problem(unrecognized mount option)
in remount.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-25 18:44:56 -07:00
Sahitya Tummala
a7d10cf3e4 f2fs: add new idle interval timing for discard and gc paths
This helps to control the frequency of submission of discard and
GC requests independently, based on the need.

Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-19 15:55:14 -07:00
Chao Yu
6f5c2ed0a2 f2fs: split IO error injection according to RW
This patch adds to support injecting error for write IO, this can simulate
IO error like fail_make_request or dm_flakey does.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12 13:07:32 -07:00
Chao Yu
7c1a000d46 f2fs: add SPDX license identifiers
Remove the verbose license text from f2fs files and replace them with
SPDX tags.  This does not change the license of any of the code.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12 13:07:10 -07:00
Chengguang Xu
4cb037ec3f f2fs: surround fault_injection related option parsing using CONFIG_F2FS_FAULT_INJECTION
It's a little bit strange when fault_injection related
options fail with -EINVAL which were already disabled
from config, so surround all fault_injection related option
parsing code using CONFIG_F2FS_FAULT_INJECTION. Meanwhile,
slightly change warning message to keep consistency with
option POSIX_ACL and FS_XATTR.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12 13:05:33 -07:00
Chao Yu
042be0f849 f2fs: fix to do sanity check with current segment number
https://bugzilla.kernel.org/show_bug.cgi?id=200219

Reproduction way:
- mount image
- run poc code
- umount image

F2FS-fs (loop1): Bitmap was wrongly set, blk:15364
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 17686 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: update_sit_entry+0x459/0x4e0 [f2fs]
Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44
EAX: 00000032 EBX: 000000f8 ECX: 00000002 EDX: 00000001
ESI: d7177000 EDI: f520fe68 EBP: d6477c6c ESP: d6477c34
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
CR0: 80050033 CR2: b7fbe000 CR3: 2a99b3c0 CR4: 000406f0
Call Trace:
 f2fs_allocate_data_block+0x124/0x580 [f2fs]
 do_write_page+0x78/0x150 [f2fs]
 f2fs_do_write_node_page+0x25/0xa0 [f2fs]
 __write_node_page+0x2bf/0x550 [f2fs]
 f2fs_sync_node_pages+0x60e/0x6d0 [f2fs]
 ? sync_inode_metadata+0x2f/0x40
 ? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs]
 ? up_write+0x1e/0x80
 f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs]
 ? mark_held_locks+0x5d/0x80
 ? _raw_spin_unlock_irq+0x27/0x50
 kill_f2fs_super+0x68/0x90 [f2fs]
 deactivate_locked_super+0x3d/0x70
 deactivate_super+0x40/0x60
 cleanup_mnt+0x39/0x70
 __cleanup_mnt+0x10/0x20
 task_work_run+0x81/0xa0
 exit_to_usermode_loop+0x59/0xa7
 do_fast_syscall_32+0x1f5/0x22c
 entry_SYSENTER_32+0x53/0x86
EIP: 0xb7f95c51
Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
EAX: 00000000 EBX: 0871ab90 ECX: bfb2cd00 EDX: 00000000
ESI: 00000000 EDI: 0871ab90 EBP: 0871ab90 ESP: bfb2cd7c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
---[ end trace d423f83982cfcdc5 ]---

The reason is, different log headers using the same segment, once
one log's next block address is used by another log, it will cause
panic as above.

Main area: 24 segs, 24 secs 24 zones
  - COLD  data: 0, 0, 0
  - WARM  data: 1, 1, 1
  - HOT   data: 20, 20, 20
  - Dir   dnode: 22, 22, 22
  - File   dnode: 22, 22, 22
  - Indir nodes: 21, 21, 21

So this patch adds sanity check to detect such condition to avoid
this issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chao Yu
4a70e25544 f2fs: fix memory leak of percpu counter in fill_super()
In fill_super -> init_percpu_info, we should destroy percpu counter
in error path, otherwise memory allcoated for percpu counter will
leak.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chao Yu
0b2103e886 f2fs: fix memory leak of write_io in fill_super()
It needs to release memory allocated for sbi->write_io in error path,
otherwise, it will cause memory leak.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chao Yu
1378752b99 f2fs: fix to flush all dirty inodes recovered in readonly fs
generic/417 reported as blow:

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
 ? _raw_spin_unlock+0x2c/0x50
 evict+0xa8/0x170
 dispose_list+0x34/0x40
 evict_inodes+0x118/0x120
 generic_shutdown_super+0x41/0x100
 ? rcu_read_lock_sched_held+0x97/0xa0
 kill_block_super+0x22/0x50
 kill_f2fs_super+0x6f/0x80 [f2fs]
 deactivate_locked_super+0x3d/0x70
 deactivate_super+0x40/0x60
 cleanup_mnt+0x39/0x70
 __cleanup_mnt+0x10/0x20
 task_work_run+0x81/0xa0
 exit_to_usermode_loop+0x59/0xa7
 do_fast_syscall_32+0x1f5/0x22c
 entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]

It can simply reproduced with scripts:

Enable quota feature during mkfs.

Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs

Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
	a) open /mnt/f2fs/file;
	b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs

The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.

Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.

To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.

Signed-off-by: Chao Yu <yuchao0@huawei.com>

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 13:16:03 -07:00
Yunlei He
cda9cc595f f2fs: report error if quota off error during umount
Now, we depend on fsck to ensure quota file data is ok,
so we scan whole partition if checkpoint without umount
flag. It's same for quota off error case, which may make
quota file data inconsistent.

generic/019 reports below error:

 __quota_error: 1160 callbacks suppressed
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds.  Have a nice day...

If we failed in below path due to fail to write dquot block, we will miss
to release quota inode, fix it.

- f2fs_put_super
 - f2fs_quota_off_umount
  - f2fs_quota_off
   - f2fs_quota_sync   <-- failed
   - dquot_quota_off   <-- missed to call

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 13:16:03 -07:00
Chao Yu
22d7ea1364 Revert "f2fs: use printk_ratelimited for f2fs_msg"
Don't limit printing log, so that we will not miss any key messages.

This reverts commit a36c106dff.

In addition, we use printk_ratelimited to avoid too many log prints.
- error injection
- discard submission failure

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05 13:41:06 -07:00
Chao Yu
7d20c8abb2 f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951

These is a NULL pointer dereference issue reported in bugzilla:

Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.

The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
 sda1: FAT  /boot
 sda2: F2FS /
 sda3: F2FS /home
 sda4: F2FS

The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
 mounting with "discard" option, but the device does not support discard

The USB host is USB3.0 and UASP capable. It is the one on RK3399.

Given this everything works fine, except there is no TRIM support.

In order to enable TRIM a new UDEV rule is added [1]:
 /etc/udev/rules.d/10-sata-bridge-trim.rules:
 ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
 Unable to handle kernel NULL pointer dereference

Also tested on a x86_64 system: works fine even with TRIM enabled.
 same disc
 same bridge
 different usb host controller
 different cpu architecture
 not root filesystem

Regards,
  Vicenç.

[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280

 Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
 Mem abort info:
   ESR = 0x96000004
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
 [000000000000003e] pgd=0000000000000000
 Internal error: Oops: 96000004 [#1] SMP
 Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
 CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
 Hardware name: Sapphire-RK3399 Board (DT)
 pstate: 00000005 (nzcv daif -PAN -UAO)
 pc : update_sit_entry+0x304/0x4b0
 lr : update_sit_entry+0x108/0x4b0
 sp : ffff00000ca13bd0
 x29: ffff00000ca13bd0 x28: 000000000000003e
 x27: 0000000000000020 x26: 0000000000080000
 x25: 0000000000000048 x24: ffff8000ebb85cf8
 x23: 0000000000000253 x22: 00000000ffffffff
 x21: 00000000000535f2 x20: 00000000ffffffdf
 x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
 x17: 0000000007ce6926 x16: 000000001c83ffa8
 x15: 0000000000000000 x14: ffff8000f602df90
 x13: 0000000000000006 x12: 0000000000000040
 x11: 0000000000000228 x10: 0000000000000000
 x9 : 0000000000000000 x8 : 0000000000000000
 x7 : 00000000000535f2 x6 : ffff8000ebff3440
 x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
 x3 : 00000000ffffffff x2 : 0000000000000020
 x1 : 0000000000000000 x0 : ffff8000eb9e5800
 Process nvim (pid: 957, stack limit = 0x0000000063a78320)
 Call trace:
  update_sit_entry+0x304/0x4b0
  f2fs_invalidate_blocks+0x98/0x140
  truncate_node+0x90/0x400
  f2fs_remove_inode_page+0xe8/0x340
  f2fs_evict_inode+0x2b0/0x408
  evict+0xe0/0x1e0
  iput+0x160/0x260
  do_unlinkat+0x214/0x298
  __arm64_sys_unlinkat+0x3c/0x68
  el0_svc_handler+0x94/0x118
  el0_svc+0x8/0xc
 Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
 ---[ end trace a0f21a307118c477 ]---

The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.

So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05 13:40:31 -07:00
Linus Torvalds
fe6f0ed0da f2fs-for-4.19-rc1
In this round, we've tuned f2fs to improve general performance by serializing
 block allocation and enhancing discard flows like fstrim which avoids user IO
 contention. And we've added fsync_mode=nobarrier which gives an option to user
 where it skips issuing cache_flush commands to underlying flash storage. And
 there are many bug fixes related to fuzzed images, revoked atomic writes, quota
 ops, and minor direct IO.
 
 Enhancement:
  - add fsync_mode=nobarrier which bypasses cache_flush command
  - enhance the discarding flow which avoids user IOs and issues in LBA order
  - readahead some encrypted blocks during GC
  - enable in-memory inode checksum to verify the blocks if F2FS_CHECK_FS is set
  - enhance nat_bits behavior
  - set -o discard by default
  - set REQ_RAHEAD to bio in ->readpages
 
 Bug fixes:
  - fix a corner case to corrupt atomic_writes revoking flow
  - revisit i_gc_rwsem to fix race conditions
  - fix some dio behaviors captured by xfstests
  - correct handling errors given by quota-related failures
  - add many sanity check flows to avoid fuzz test failures
  - add more error number propagation to their callers
  - fix several corner cases to continue fault injection w/ shutdown loop
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlt82U4ACgkQQBSofoJI
 UNJTLQ/+PhewnNa5tDfUgWdFnUFz3h9/NcC677l0OplOOUNxA8iSa1xamlKf/nf9
 sB5ey0I7oBF8zQGxfndhHQfi6fpfUcNMr14hm+TS/3+d54xLJmiVShD5fjNSV2vB
 Ur0xoozuQDwYF1e3QKdBQjFqaCf78VheTr3aWxyv22/Sg+PYylZJ2K8rHTB7mGPU
 UG0aRnKrP3FPRjL7Q3m0Vm6b6eZ5uNdNrFfjgn/8yuQQ9V197K8vwSbPAsR5/pOh
 miCQXyM708NgEYJRWkWmC/rDSQdU0/h/mGnJWrBrbceW62QefGOgd2jcVfmthHJa
 ZXpj+BEG5bYpCCxGxF6N+u0e28OKonCwO/uvL8YAd5icN7yXtsKzoF1CCuXxOYf1
 9K5SMylCTSyrs/+LV8CJoT2ya8w0l0w+R/txUYn8UT+4AgqU+chS2kJeXqw9tcHB
 WLFs/rnAyofWCI/8frVBmJY+zA1ZZvTqs/lmVYrtJUkiOcMTq34WICBUAEFKV452
 BM5dcu21bSIkapYispEt4Rr7o4P4HHMQ+N1i2yUZMFCz5T0RyzdybeS5THk2yVzd
 L0kxfYU+zHigNX51ez8+Z7DyDLDBp6jkD0e66x73bUK9TGPH+ZbnAL6gwLikmD3M
 +VxYl5nyW/3bxx1HdfK1Xwd4/wYMNBmdtn5NC50oZ+jpB0h8YCE=
 =zgbL
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've tuned f2fs to improve general performance by
  serializing block allocation and enhancing discard flows like fstrim
  which avoids user IO contention. And we've added fsync_mode=nobarrier
  which gives an option to user where it skips issuing cache_flush
  commands to underlying flash storage. And there are many bug fixes
  related to fuzzed images, revoked atomic writes, quota ops, and minor
  direct IO.

  Enhancements:
   - add fsync_mode=nobarrier which bypasses cache_flush command
   - enhance the discarding flow which avoids user IOs and issues in
     LBA order
   - readahead some encrypted blocks during GC
   - enable in-memory inode checksum to verify the blocks if
     F2FS_CHECK_FS is set
   - enhance nat_bits behavior
   - set -o discard by default
   - set REQ_RAHEAD to bio in ->readpages

  Bug fixes:
   - fix a corner case to corrupt atomic_writes revoking flow
   - revisit i_gc_rwsem to fix race conditions
   - fix some dio behaviors captured by xfstests
   - correct handling errors given by quota-related failures
   - add many sanity check flows to avoid fuzz test failures
   - add more error number propagation to their callers
   - fix several corner cases to continue fault injection w/ shutdown
     loop"

* tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (89 commits)
  f2fs: readahead encrypted block during GC
  f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
  f2fs: fix performance issue observed with multi-thread sequential read
  f2fs: fix to skip verifying block address for non-regular inode
  f2fs: rework fault injection handling to avoid a warning
  f2fs: support fault_type mount option
  f2fs: fix to return success when trimming meta area
  f2fs: fix use-after-free of dicard command entry
  f2fs: support discard submission error injection
  f2fs: split discard command in prior to block layer
  f2fs: wake up gc thread immediately when gc_urgent is set
  f2fs: fix incorrect range->len in f2fs_trim_fs()
  f2fs: refresh recent accessed nat entry in lru list
  f2fs: fix avoid race between truncate and background GC
  f2fs: avoid race between zero_range and background GC
  f2fs: fix to do sanity check with block address in main area v2
  f2fs: fix to do sanity check with inline flags
  f2fs: fix to reset i_gc_failures correctly
  f2fs: fix invalid memory access
  f2fs: fix to avoid broken of dnode block list
  ...
2018-08-22 13:29:39 -07:00
Jaegeuk Kim
853137cef4 f2fs: fix performance issue observed with multi-thread sequential read
This reverts the commit - "b93f771 - f2fs: remove writepages lock"
to fix the drop in sequential read throughput.

Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
device: UFS

Before -
read throughput: 185 MB/s
total read requests: 85177 (of these ~80000 are 4KB size requests).
total write requests: 2546 (of these ~2208 requests are written in 512KB).

After -
read throughput: 758 MB/s
total read requests: 2417 (of these ~2042 are 512KB reads).
total write requests: 2701 (of these ~2034 requests are written in 512KB).

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20 23:13:42 -07:00
Chao Yu
d494500a70 f2fs: support fault_type mount option
Previously, once fault injection is on, by default, all kind of faults
will be injected to f2fs, if we want to trigger single or specified
combined type during the test, we need to configure sysfs entry, it will
be a little inconvenient to integrate sysfs configuring into testsuit,
such as xfstest.

So this patch introduces a new mount option 'fault_type' to assist old
option 'fault_injection', with these two mount options, we can specify
any fault rate/type at mount-time.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu
b83dcfe671 f2fs: support discard submission error injection
This patch adds to support discard submission error injection for testing
error handling of __submit_discard_cmd().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu
bcbfbd604d f2fs: fix to do sanity check with inline flags
https://bugzilla.kernel.org/show_bug.cgi?id=200221

- Overview
BUG() in clear_inode() when mounting and un-mounting a corrupted f2fs image

- Reproduce

- Kernel message
[  538.601448] F2FS-fs (loop0): Invalid segment/section count (31, 24 x 1376257)
[  538.601458] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[  538.724091] F2FS-fs (loop0): Try to recover 2th superblock, ret: 0
[  538.724102] F2FS-fs (loop0): Mounted with checkpoint version = 2
[  540.970834] ------------[ cut here ]------------
[  540.970838] kernel BUG at fs/inode.c:512!
[  540.971750] invalid opcode: 0000 [#1] SMP KASAN PTI
[  540.972755] CPU: 1 PID: 1305 Comm: umount Not tainted 4.18.0-rc1+ #4
[  540.974034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  540.982913] RIP: 0010:clear_inode+0xc0/0xd0
[  540.983774] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[  540.987570] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[  540.988636] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[  540.990063] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[  540.991499] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[  540.992923] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[  540.994360] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[  540.995786] FS:  00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[  540.997403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  540.998571] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[  541.000015] Call Trace:
[  541.000554]  f2fs_evict_inode+0x253/0x630
[  541.001381]  evict+0x16f/0x290
[  541.002015]  iput+0x280/0x300
[  541.002654]  dentry_unlink_inode+0x165/0x1e0
[  541.003528]  __dentry_kill+0x16a/0x260
[  541.004300]  dentry_kill+0x70/0x250
[  541.005018]  dput+0x154/0x1d0
[  541.005635]  do_one_tree+0x34/0x40
[  541.006354]  shrink_dcache_for_umount+0x3f/0xa0
[  541.007285]  generic_shutdown_super+0x43/0x1c0
[  541.008192]  kill_block_super+0x52/0x80
[  541.008978]  kill_f2fs_super+0x62/0x70
[  541.009750]  deactivate_locked_super+0x6f/0xa0
[  541.010664]  deactivate_super+0x5e/0x80
[  541.011450]  cleanup_mnt+0x61/0xa0
[  541.012151]  __cleanup_mnt+0x12/0x20
[  541.012893]  task_work_run+0xc8/0xf0
[  541.013635]  exit_to_usermode_loop+0x125/0x130
[  541.014555]  do_syscall_64+0x138/0x170
[  541.015340]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  541.016375] RIP: 0033:0x7f46624bf487
[  541.017104] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[  541.020923] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[  541.022452] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[  541.023885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[  541.025318] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[  541.026755] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[  541.028186] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[  541.029626] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[  541.039445] ---[ end trace 4ce02f25ff7d3df5 ]---
[  541.040392] RIP: 0010:clear_inode+0xc0/0xd0
[  541.041240] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[  541.045042] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[  541.046099] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[  541.047537] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[  541.048965] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[  541.050402] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[  541.051832] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[  541.053263] FS:  00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[  541.054891] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  541.056039] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[  541.058506] ==================================================================
[  541.059991] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[  541.061513] Read of size 8 at addr ffff8801e34a7970 by task umount/1305

[  541.063302] CPU: 1 PID: 1305 Comm: umount Tainted: G      D           4.18.0-rc1+ #4
[  541.064838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  541.066778] Call Trace:
[  541.067294]  dump_stack+0x7b/0xb5
[  541.067986]  print_address_description+0x70/0x290
[  541.068941]  kasan_report+0x291/0x390
[  541.069692]  ? update_stack_state+0x38c/0x3e0
[  541.070598]  __asan_load8+0x54/0x90
[  541.071315]  update_stack_state+0x38c/0x3e0
[  541.072172]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
[  541.073340]  ? vprintk_func+0x27/0x60
[  541.074096]  ? printk+0xa3/0xd3
[  541.074762]  ? __save_stack_trace+0x5e/0x100
[  541.075634]  unwind_next_frame.part.5+0x18e/0x490
[  541.076594]  ? unwind_dump+0x290/0x290
[  541.077368]  ? __show_regs+0x2c4/0x330
[  541.078142]  __unwind_start+0x106/0x190
[  541.085422]  __save_stack_trace+0x5e/0x100
[  541.086268]  ? __save_stack_trace+0x5e/0x100
[  541.087161]  ? unlink_anon_vmas+0xba/0x2c0
[  541.087997]  save_stack_trace+0x1f/0x30
[  541.088782]  save_stack+0x46/0xd0
[  541.089475]  ? __alloc_pages_slowpath+0x1420/0x1420
[  541.090477]  ? flush_tlb_mm_range+0x15e/0x220
[  541.091364]  ? __dec_node_state+0x24/0xb0
[  541.092180]  ? lock_page_memcg+0x85/0xf0
[  541.092979]  ? unlock_page_memcg+0x16/0x80
[  541.093812]  ? page_remove_rmap+0x198/0x520
[  541.094674]  ? mark_page_accessed+0x133/0x200
[  541.095559]  ? _cond_resched+0x1a/0x50
[  541.096326]  ? unmap_page_range+0xcd4/0xe50
[  541.097179]  ? rb_next+0x58/0x80
[  541.097845]  ? rb_next+0x58/0x80
[  541.098518]  __kasan_slab_free+0x13c/0x1a0
[  541.099352]  ? unlink_anon_vmas+0xba/0x2c0
[  541.100184]  kasan_slab_free+0xe/0x10
[  541.100934]  kmem_cache_free+0x89/0x1e0
[  541.101724]  unlink_anon_vmas+0xba/0x2c0
[  541.102534]  free_pgtables+0x101/0x1b0
[  541.103299]  exit_mmap+0x146/0x2a0
[  541.103996]  ? __ia32_sys_munmap+0x50/0x50
[  541.104829]  ? kasan_check_read+0x11/0x20
[  541.105649]  ? mm_update_next_owner+0x322/0x380
[  541.106578]  mmput+0x8b/0x1d0
[  541.107191]  do_exit+0x43a/0x1390
[  541.107876]  ? mm_update_next_owner+0x380/0x380
[  541.108791]  ? deactivate_super+0x5e/0x80
[  541.109610]  ? cleanup_mnt+0x61/0xa0
[  541.110351]  ? __cleanup_mnt+0x12/0x20
[  541.111115]  ? task_work_run+0xc8/0xf0
[  541.111879]  ? exit_to_usermode_loop+0x125/0x130
[  541.112817]  rewind_stack_do_exit+0x17/0x20
[  541.113666] RIP: 0033:0x7f46624bf487
[  541.114404] Code: Bad RIP value.
[  541.115094] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[  541.116605] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[  541.118034] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[  541.119472] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[  541.120890] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[  541.122321] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30

[  541.124061] The buggy address belongs to the page:
[  541.125042] page:ffffea00078d29c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[  541.126651] flags: 0x2ffff0000000000()
[  541.127418] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[  541.128963] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[  541.130516] page dumped because: kasan: bad access detected

[  541.131954] Memory state around the buggy address:
[  541.132924]  ffff8801e34a7800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[  541.134378]  ffff8801e34a7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  541.135814] >ffff8801e34a7900: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[  541.137253]                                                              ^
[  541.138637]  ffff8801e34a7980: f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  541.140075]  ffff8801e34a7a00: 00 00 00 00 00 00 00 00 f3 00 00 00 00 00 00 00
[  541.141509] ==================================================================

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/inode.c#L512
	BUG_ON(inode->i_data.nrpages);

The root cause is root directory inode is corrupted, it has both
inline_data and inline_dentry flag, and its nlink is zero, so in
->evict(), after dropping all page cache, it grabs page #0 for inline
data truncation, result in panic in later clear_inode() where we will
check inode->i_data.nrpages value.

This patch adds inline flags check in sanity_check_inode, in addition,
do sanity check with root inode's nlink.

Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu
50fa53eccf f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.

By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.

Sheng Yong helps to do the test with this patch:

Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8

1 / PERSIST / Index

Base:
	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
1	867.82		204.15		41440.03	41370.54	680.8		1025.94		1031.08
2	871.87		205.87		41370.3		40275.2		791.14		1065.84		1101.7
3	866.52		205.69		41795.67	40596.16	694.69		1037.16		1031.48
Avg	868.7366667	205.2366667	41535.33333	40747.3		722.21		1042.98		1054.753333

After:
	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
1	798.81		202.5		41143		40613.87	602.71		838.08		913.83
2	805.79		206.47		40297.2		41291.46	604.44		840.75		924.27
3	814.83		206.17		41209.57	40453.62	602.85		834.66		927.91
Avg	806.4766667	205.0466667	40883.25667	40786.31667	603.3333333	837.83		922.0033333

Patched/Original:
	0.928332713	0.999074239	0.984300676	1.000957528	0.835398753	0.803303994	0.874141189

It looks like atomic write will suffer performance regression.

I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.

BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:

- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
 - writeback data: P1, P2, P3;
 - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
 - preflush + fua;
- power-cut

If we don't wait dnode writeback for atomic_write:

	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
1	779.91		206.03		41621.5		40333.16	716.9		1038.21		1034.85
2	848.51		204.35		40082.44	39486.17	791.83		1119.96		1083.77
3	772.12		206.27		41335.25	41599.65	723.29		1055.07		971.92
Avg	800.18		205.55		41013.06333	40472.99333	744.0066667	1071.08		1030.18

Patched/Original:
	0.92108464	1.001526693	0.987425886	0.993268102	1.030180511	1.026942031	0.976702294

SQLite's performance recovers.

Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."

Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-10 16:19:05 -07:00
Chao Yu
e494c2f995 f2fs: fix to do sanity check with cp_pack_start_sum
After fuzzing, cp_pack_start_sum could be corrupted, so current log's
summary info should be wrong due to loading incorrect summary block.
Then, if segment's type in current log is exceeded NR_CURSEG_TYPE, it
can lead accessing invalid dirty_i->dirty_segmap bitmap finally.

Add sanity check for cp_pack_start_sum to fix this issue.

https://bugzilla.kernel.org/show_bug.cgi?id=200419

- Reproduce

- Kernel message (f2fs-dev w/ KASAN)
[ 3117.578432] F2FS-fs (loop0): Invalid log blocks per segment (8)

[ 3117.578445] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[ 3117.581364] F2FS-fs (loop0): invalid crc_offset: 30716
[ 3117.583564] WARNING: CPU: 1 PID: 1225 at fs/f2fs/checkpoint.c:90 __get_meta_page+0x448/0x4b0
[ 3117.583570] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 3117.584014] CPU: 1 PID: 1225 Comm: mount Not tainted 4.17.0+ #1
[ 3117.584017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.584022] RIP: 0010:__get_meta_page+0x448/0x4b0
[ 3117.584023] Code: 00 49 8d bc 24 84 00 00 00 e8 74 54 da ff 41 83 8c 24 84 00 00 00 08 4c 89 f6 4c 89 ef e8 c0 d9 95 00 48 89 ef e8 18 e3 00 00 <0f> 0b f0 80 4d 48 04 e9 0f fe ff ff 0f 0b 48 89 c7 48 89 04 24 e8
[ 3117.584072] RSP: 0018:ffff88018eb678c0 EFLAGS: 00010286
[ 3117.584082] RAX: ffff88018f0a6a78 RBX: ffffea0007a46600 RCX: ffffffff9314d1b2
[ 3117.584085] RDX: ffffffff00000001 RSI: 0000000000000000 RDI: ffff88018f0a6a98
[ 3117.584087] RBP: ffff88018ebe9980 R08: 0000000000000002 R09: 0000000000000001
[ 3117.584090] R10: 0000000000000001 R11: ffffed00326e4450 R12: ffff880193722200
[ 3117.584092] R13: ffff88018ebe9afc R14: 0000000000000206 R15: ffff88018eb67900
[ 3117.584096] FS:  00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3117.584098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3117.584101] CR2: 00000000016f21b8 CR3: 0000000191c22000 CR4: 00000000000006e0
[ 3117.584112] Call Trace:
[ 3117.584121]  ? f2fs_set_meta_page_dirty+0x150/0x150
[ 3117.584127]  ? f2fs_build_segment_manager+0xbf9/0x3190
[ 3117.584133]  ? f2fs_npages_for_summary_flush+0x75/0x120
[ 3117.584145]  f2fs_build_segment_manager+0xda8/0x3190
[ 3117.584151]  ? f2fs_get_valid_checkpoint+0x298/0xa00
[ 3117.584156]  ? f2fs_flush_sit_entries+0x10e0/0x10e0
[ 3117.584184]  ? map_id_range_down+0x17c/0x1b0
[ 3117.584188]  ? __put_user_ns+0x30/0x30
[ 3117.584206]  ? find_next_bit+0x53/0x90
[ 3117.584237]  ? cpumask_next+0x16/0x20
[ 3117.584249]  f2fs_fill_super+0x1948/0x2b40
[ 3117.584258]  ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.584279]  ? sget_userns+0x65e/0x690
[ 3117.584296]  ? set_blocksize+0x88/0x130
[ 3117.584302]  ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.584305]  mount_bdev+0x1c0/0x200
[ 3117.584310]  mount_fs+0x5c/0x190
[ 3117.584320]  vfs_kern_mount+0x64/0x190
[ 3117.584330]  do_mount+0x2e4/0x1450
[ 3117.584343]  ? lockref_put_return+0x130/0x130
[ 3117.584347]  ? copy_mount_string+0x20/0x20
[ 3117.584357]  ? kasan_unpoison_shadow+0x31/0x40
[ 3117.584362]  ? kasan_kmalloc+0xa6/0xd0
[ 3117.584373]  ? memcg_kmem_put_cache+0x16/0x90
[ 3117.584377]  ? __kmalloc_track_caller+0x196/0x210
[ 3117.584383]  ? _copy_from_user+0x61/0x90
[ 3117.584396]  ? memdup_user+0x3e/0x60
[ 3117.584401]  ksys_mount+0x7e/0xd0
[ 3117.584405]  __x64_sys_mount+0x62/0x70
[ 3117.584427]  do_syscall_64+0x73/0x160
[ 3117.584440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.584455] RIP: 0033:0x7f5693f14b9a
[ 3117.584456] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.584505] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.584510] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.584512] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.584514] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.584516] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.584519] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.584523] ---[ end trace a8e0d899985faf31 ]---
[ 3117.685663] F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix.
[ 3117.685673] F2FS-fs (loop0): recover_data: ino = 2 (i_size: recover) recovered = 1, err = 0
[ 3117.685707] ==================================================================
[ 3117.685955] BUG: KASAN: slab-out-of-bounds in __remove_dirty_segment+0xdd/0x1e0
[ 3117.686175] Read of size 8 at addr ffff88018f0a63d0 by task mount/1225

[ 3117.686477] CPU: 0 PID: 1225 Comm: mount Tainted: G        W         4.17.0+ #1
[ 3117.686481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.686483] Call Trace:
[ 3117.686494]  dump_stack+0x71/0xab
[ 3117.686512]  print_address_description+0x6b/0x290
[ 3117.686517]  kasan_report+0x28e/0x390
[ 3117.686522]  ? __remove_dirty_segment+0xdd/0x1e0
[ 3117.686527]  __remove_dirty_segment+0xdd/0x1e0
[ 3117.686532]  locate_dirty_segment+0x189/0x190
[ 3117.686538]  f2fs_allocate_new_segments+0xa9/0xe0
[ 3117.686543]  recover_data+0x703/0x2c20
[ 3117.686547]  ? f2fs_recover_fsync_data+0x48f/0xd50
[ 3117.686553]  ? ksys_mount+0x7e/0xd0
[ 3117.686564]  ? policy_nodemask+0x1a/0x90
[ 3117.686567]  ? policy_node+0x56/0x70
[ 3117.686571]  ? add_fsync_inode+0xf0/0xf0
[ 3117.686592]  ? blk_finish_plug+0x44/0x60
[ 3117.686597]  ? f2fs_ra_meta_pages+0x38b/0x5e0
[ 3117.686602]  ? find_inode_fast+0xac/0xc0
[ 3117.686606]  ? f2fs_is_valid_blkaddr+0x320/0x320
[ 3117.686618]  ? __radix_tree_lookup+0x150/0x150
[ 3117.686633]  ? dqget+0x670/0x670
[ 3117.686648]  ? pagecache_get_page+0x29/0x410
[ 3117.686656]  ? kmem_cache_alloc+0x176/0x1e0
[ 3117.686660]  ? f2fs_is_valid_blkaddr+0x11d/0x320
[ 3117.686664]  f2fs_recover_fsync_data+0xc23/0xd50
[ 3117.686670]  ? f2fs_space_for_roll_forward+0x60/0x60
[ 3117.686674]  ? rb_insert_color+0x323/0x3d0
[ 3117.686678]  ? f2fs_recover_orphan_inodes+0xa5/0x700
[ 3117.686683]  ? proc_register+0x153/0x1d0
[ 3117.686686]  ? f2fs_remove_orphan_inode+0x10/0x10
[ 3117.686695]  ? f2fs_attr_store+0x50/0x50
[ 3117.686700]  ? proc_create_single_data+0x52/0x60
[ 3117.686707]  f2fs_fill_super+0x1d06/0x2b40
[ 3117.686728]  ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.686735]  ? sget_userns+0x65e/0x690
[ 3117.686740]  ? set_blocksize+0x88/0x130
[ 3117.686745]  ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.686748]  mount_bdev+0x1c0/0x200
[ 3117.686753]  mount_fs+0x5c/0x190
[ 3117.686758]  vfs_kern_mount+0x64/0x190
[ 3117.686762]  do_mount+0x2e4/0x1450
[ 3117.686769]  ? lockref_put_return+0x130/0x130
[ 3117.686773]  ? copy_mount_string+0x20/0x20
[ 3117.686777]  ? kasan_unpoison_shadow+0x31/0x40
[ 3117.686780]  ? kasan_kmalloc+0xa6/0xd0
[ 3117.686786]  ? memcg_kmem_put_cache+0x16/0x90
[ 3117.686790]  ? __kmalloc_track_caller+0x196/0x210
[ 3117.686795]  ? _copy_from_user+0x61/0x90
[ 3117.686801]  ? memdup_user+0x3e/0x60
[ 3117.686804]  ksys_mount+0x7e/0xd0
[ 3117.686809]  __x64_sys_mount+0x62/0x70
[ 3117.686816]  do_syscall_64+0x73/0x160
[ 3117.686824]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.686829] RIP: 0033:0x7f5693f14b9a
[ 3117.686830] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.686887] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.686892] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.686894] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.686896] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.686899] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.686901] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003

[ 3117.687005] Allocated by task 1225:
[ 3117.687152]  kasan_kmalloc+0xa6/0xd0
[ 3117.687157]  kmem_cache_alloc_trace+0xfd/0x200
[ 3117.687161]  f2fs_build_segment_manager+0x2d09/0x3190
[ 3117.687165]  f2fs_fill_super+0x1948/0x2b40
[ 3117.687168]  mount_bdev+0x1c0/0x200
[ 3117.687171]  mount_fs+0x5c/0x190
[ 3117.687174]  vfs_kern_mount+0x64/0x190
[ 3117.687177]  do_mount+0x2e4/0x1450
[ 3117.687180]  ksys_mount+0x7e/0xd0
[ 3117.687182]  __x64_sys_mount+0x62/0x70
[ 3117.687186]  do_syscall_64+0x73/0x160
[ 3117.687190]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 3117.687285] Freed by task 19:
[ 3117.687412]  __kasan_slab_free+0x137/0x190
[ 3117.687416]  kfree+0x8b/0x1b0
[ 3117.687460]  ttm_bo_man_put_node+0x61/0x80 [ttm]
[ 3117.687476]  ttm_bo_cleanup_refs+0x15f/0x250 [ttm]
[ 3117.687492]  ttm_bo_delayed_delete+0x2f0/0x300 [ttm]
[ 3117.687507]  ttm_bo_delayed_workqueue+0x17/0x50 [ttm]
[ 3117.687528]  process_one_work+0x2f9/0x740
[ 3117.687531]  worker_thread+0x78/0x6b0
[ 3117.687541]  kthread+0x177/0x1c0
[ 3117.687545]  ret_from_fork+0x35/0x40

[ 3117.687638] The buggy address belongs to the object at ffff88018f0a6300
                which belongs to the cache kmalloc-192 of size 192
[ 3117.688014] The buggy address is located 16 bytes to the right of
                192-byte region [ffff88018f0a6300, ffff88018f0a63c0)
[ 3117.688382] The buggy address belongs to the page:
[ 3117.688554] page:ffffea00063c2980 count:1 mapcount:0 mapping:ffff8801f3403180 index:0x0
[ 3117.688788] flags: 0x17fff8000000100(slab)
[ 3117.688944] raw: 017fff8000000100 ffffea00063c2840 0000000e0000000e ffff8801f3403180
[ 3117.689166] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
[ 3117.689386] page dumped because: kasan: bad access detected

[ 3117.689653] Memory state around the buggy address:
[ 3117.689816]  ffff88018f0a6280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3117.690027]  ffff88018f0a6300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3117.690239] >ffff88018f0a6380: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3117.690448]                                                  ^
[ 3117.690644]  ffff88018f0a6400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3117.690868]  ffff88018f0a6480: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3117.691077] ==================================================================
[ 3117.691290] Disabling lock debugging due to kernel taint
[ 3117.693893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 3117.694120] PGD 80000001f01bc067 P4D 80000001f01bc067 PUD 1d9638067 PMD 0
[ 3117.694338] Oops: 0002 [#1] SMP KASAN PTI
[ 3117.694490] CPU: 1 PID: 1225 Comm: mount Tainted: G    B   W         4.17.0+ #1
[ 3117.694703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.695073] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
[ 3117.695246] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
[ 3117.695793] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
[ 3117.695969] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
[ 3117.696182] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
[ 3117.696391] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
[ 3117.696604] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
[ 3117.696813] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
[ 3117.697032] FS:  00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3117.697280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3117.702357] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0
[ 3117.707235] Call Trace:
[ 3117.712077]  locate_dirty_segment+0x189/0x190
[ 3117.716891]  f2fs_allocate_new_segments+0xa9/0xe0
[ 3117.721617]  recover_data+0x703/0x2c20
[ 3117.726316]  ? f2fs_recover_fsync_data+0x48f/0xd50
[ 3117.730957]  ? ksys_mount+0x7e/0xd0
[ 3117.735573]  ? policy_nodemask+0x1a/0x90
[ 3117.740198]  ? policy_node+0x56/0x70
[ 3117.744829]  ? add_fsync_inode+0xf0/0xf0
[ 3117.749487]  ? blk_finish_plug+0x44/0x60
[ 3117.754152]  ? f2fs_ra_meta_pages+0x38b/0x5e0
[ 3117.758831]  ? find_inode_fast+0xac/0xc0
[ 3117.763448]  ? f2fs_is_valid_blkaddr+0x320/0x320
[ 3117.768046]  ? __radix_tree_lookup+0x150/0x150
[ 3117.772603]  ? dqget+0x670/0x670
[ 3117.777159]  ? pagecache_get_page+0x29/0x410
[ 3117.781648]  ? kmem_cache_alloc+0x176/0x1e0
[ 3117.786067]  ? f2fs_is_valid_blkaddr+0x11d/0x320
[ 3117.790476]  f2fs_recover_fsync_data+0xc23/0xd50
[ 3117.794790]  ? f2fs_space_for_roll_forward+0x60/0x60
[ 3117.799086]  ? rb_insert_color+0x323/0x3d0
[ 3117.803304]  ? f2fs_recover_orphan_inodes+0xa5/0x700
[ 3117.807563]  ? proc_register+0x153/0x1d0
[ 3117.811766]  ? f2fs_remove_orphan_inode+0x10/0x10
[ 3117.815947]  ? f2fs_attr_store+0x50/0x50
[ 3117.820087]  ? proc_create_single_data+0x52/0x60
[ 3117.824262]  f2fs_fill_super+0x1d06/0x2b40
[ 3117.828367]  ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.832432]  ? sget_userns+0x65e/0x690
[ 3117.836500]  ? set_blocksize+0x88/0x130
[ 3117.840501]  ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.844420]  mount_bdev+0x1c0/0x200
[ 3117.848275]  mount_fs+0x5c/0x190
[ 3117.852053]  vfs_kern_mount+0x64/0x190
[ 3117.855810]  do_mount+0x2e4/0x1450
[ 3117.859441]  ? lockref_put_return+0x130/0x130
[ 3117.862996]  ? copy_mount_string+0x20/0x20
[ 3117.866417]  ? kasan_unpoison_shadow+0x31/0x40
[ 3117.869719]  ? kasan_kmalloc+0xa6/0xd0
[ 3117.872948]  ? memcg_kmem_put_cache+0x16/0x90
[ 3117.876121]  ? __kmalloc_track_caller+0x196/0x210
[ 3117.879333]  ? _copy_from_user+0x61/0x90
[ 3117.882467]  ? memdup_user+0x3e/0x60
[ 3117.885604]  ksys_mount+0x7e/0xd0
[ 3117.888700]  __x64_sys_mount+0x62/0x70
[ 3117.891742]  do_syscall_64+0x73/0x160
[ 3117.894692]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.897669] RIP: 0033:0x7f5693f14b9a
[ 3117.900563] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.906922] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.910159] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.913469] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.916764] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.920071] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.923393] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.926680] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 3117.949979] CR2: 0000000000000000
[ 3117.954283] ---[ end trace a8e0d899985faf32 ]---
[ 3117.958575] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
[ 3117.962810] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
[ 3117.971789] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
[ 3117.976333] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
[ 3117.980926] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
[ 3117.985497] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
[ 3117.990098] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
[ 3117.994761] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
[ 3117.999392] FS:  00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3118.004096] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3118.008816] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0

- Location
https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/segment.c#L775
		if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t]))
			dirty_i->nr_dirty[t]--;
Here dirty_i->dirty_segmap[t] can be NULL which leads to crash in test_and_clear_bit()

Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-10 16:19:05 -07:00
Chao Yu
4ddc1b28aa f2fs: fix to restrict mount condition when without CONFIG_QUOTA
Like quota_ino feature, we need to reject mounting RDWR with image
which enables project_quota feature when there is no CONFIG_QUOTA
be set in kernel.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00
Sheng Yong
00960c2cd8 f2fs: quota: do not mount as RDWR without QUOTA if quota feature enabled
If quota feature is enabled, quota is on by default. However, if
CONFIG_QUOTA is not built in kernel, dquot entries will not get updated,
which leads to quota inconsistency.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-01 11:52:36 -07:00