Commit Graph

2322 Commits

Author SHA1 Message Date
Jaegeuk Kim
8a29c1260e f2fs: sanity check for total valid node blocks
This patch enhances sanity check for SIT entries.

syzbot hit the following crash on upstream commit
83beed7b2b (Fri Apr 20 17:56:32 2018 +0000)
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad

syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): Try to recover 1th superblock, ret: 0
F2FS-fs (loop0): Mounted with checkpoint version = d
F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:1884!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882
RSP: 0018:ffff8801af526708 EFLAGS: 00010282
RAX: ffffed0035ea4cc0 RBX: ffff8801ad454f90 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff82eeb87e RDI: ffffed0035ea4cb6
RBP: ffff8801af526760 R08: ffff8801ad4a2480 R09: ffffed003b5e4f90
R10: ffffed003b5e4f90 R11: ffff8801daf27c87 R12: ffff8801adb8d380
R13: 0000000000000001 R14: 0000000000000008 R15: 00000000ffffffff
FS:  00000000014af940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f06bc223000 CR3: 00000001adb02000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663
 do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727
 write_node_page+0x129/0x350 fs/f2fs/segment.c:2770
 __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398
 sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652
 block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088
 write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405
 f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077
 __sync_filesystem fs/sync.c:39 [inline]
 sync_filesystem+0x265/0x310 fs/sync.c:67
 generic_shutdown_super+0xd7/0x520 fs/super.c:429
 kill_block_super+0xa4/0x100 fs/super.c:1191
 kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030
 deactivate_locked_super+0x97/0x100 fs/super.c:316
 deactivate_super+0x188/0x1b0 fs/super.c:347
 cleanup_mnt+0xbf/0x160 fs/namespace.c:1174
 __cleanup_mnt+0x16/0x20 fs/namespace.c:1181
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:191 [inline]
 exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457d97
RSP: 002b:00007ffd46f9c8e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000457d97
RDX: 00000000014b09a3 RSI: 0000000000000002 RDI: 00007ffd46f9da50
RBP: 00007ffd46f9da50 R08: 0000000000000000 R09: 0000000000000009
R10: 0000000000000005 R11: 0000000000000246 R12: 00000000014b0940
R13: 0000000000000000 R14: 0000000000000002 R15: 000000000000658e
RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: ffff8801af526708
---[ end trace f498328bb02610a2 ]---

Reported-and-tested-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+7d6d31d3bc702f566ce3@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+0a725420475916460f12@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:48 -07:00
Jaegeuk Kim
b2ca374f33 f2fs: sanity check on sit entry
syzbot hit the following crash on upstream commit
87ef12027b (Wed Apr 18 19:48:17 2018 +0000)
Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client
syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744
syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
BUG: unable to handle kernel paging request at ffffed006b2a50c0
PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0
Oops: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline]
RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852
RSP: 0018:ffff8801b102e5b0 EFLAGS: 00010a06
RAX: 1ffff1006b2a50c0 RBX: 0000000000000004 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801ac74243e
RBP: ffff8801b102f410 R08: ffff8801acbd46c0 R09: fffffbfff14d9af8
R10: fffffbfff14d9af8 R11: ffff8801acbd46c0 R12: ffff8801ac742a80
R13: ffff8801d9519100 R14: dffffc0000000000 R15: ffff880359528600
FS:  0000000001e04880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffed006b2a50c0 CR3: 00000001ac6ac000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803
 mount_bdev+0x30c/0x3e0 fs/super.c:1165
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1268
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2847
 ksys_mount+0x12d/0x140 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x443d6a
RSP: 002b:00007ffd312813c8 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443d6a
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd312813d0
RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a
R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004
R13: 0000000000402c60 R14: 0000000000000000 R15: 0000000000000000
RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: ffff8801b102e5b0
RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: ffff8801b102e5b0
CR2: ffffed006b2a50c0
---[ end trace a2034989e196ff17 ]---

Reported-and-tested-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:48 -07:00
Jaegeuk Kim
5d64600d4f f2fs: avoid bug_on on corrupted inode
syzbot has tested the proposed patch but the reproducer still triggered crash:
kernel BUG at fs/f2fs/inode.c:LINE!

F2FS-fs (loop1): invalid crc value
F2FS-fs (loop5): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop5): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop5): invalid crc value
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:238!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4886 Comm: syz-executor1 Not tainted 4.17.0-rc1+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:do_read_inode fs/f2fs/inode.c:238 [inline]
RIP: 0010:f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313
RSP: 0018:ffff8801c44a70e8 EFLAGS: 00010293
RAX: ffff8801ce208040 RBX: ffff8801b3621080 RCX: ffffffff82eace18
F2FS-fs (loop2): Magic Mismatch, valid(0xf2f52010) - read(0x0)
RDX: 0000000000000000 RSI: ffffffff82eaf047 RDI: 0000000000000007
RBP: ffff8801c44a7410 R08: ffff8801ce208040 R09: ffffed0039ee4176
R10: ffffed0039ee4176 R11: ffff8801cf720bb7 R12: ffff8801c0efa000
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f753aa9d700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
------------[ cut here ]------------
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel BUG at fs/f2fs/inode.c:238!
CR2: 0000000001b03018 CR3: 00000001c8b74000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 f2fs_fill_super+0x4377/0x7bf0 fs/f2fs/super.c:2842
 mount_bdev+0x30c/0x3e0 fs/super.c:1165
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1268
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2847
 ksys_mount+0x12d/0x140 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457daa
RSP: 002b:00007f753aa9cba8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000457daa
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f753aa9cbf0
RBP: 0000000000000064 R08: 0000000020016a00 R09: 0000000020000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000064 R14: 00000000006fcb80 R15: 0000000000000000
RIP: do_read_inode fs/f2fs/inode.c:238 [inline] RSP: ffff8801c44a70e8
RIP: f2fs_iget+0x3307/0x3ca0 fs/f2fs/inode.c:313 RSP: ffff8801c44a70e8
invalid opcode: 0000 [#2] SMP KASAN
---[ end trace 1cbcbec2156680bc ]---

Reported-and-tested-by: syzbot+41a1b341571f0952badb@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:48 -07:00
Jaegeuk Kim
a4f843bd00 f2fs: give message and set need_fsck given broken node id
syzbot hit the following crash on upstream commit
83beed7b2b (Fri Apr 20 17:56:32 2018 +0000)
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264
syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
------------[ cut here ]------------
kernel BUG at fs/f2fs/node.c:1185!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185
RSP: 0018:ffff8801d960e820 EFLAGS: 00010293
RAX: ffff8801d88205c0 RBX: 0000000000000003 RCX: ffffffff82f6cc06
RDX: 0000000000000000 RSI: ffffffff82f6d5e8 RDI: 0000000000000004
RBP: ffff8801d960ec30 R08: ffff8801d88205c0 R09: ffffed003b5e46c2
R10: 0000000000000003 R11: 0000000000000003 R12: ffff8801a86e00c0
R13: 0000000000000001 R14: ffff8801a86e0530 R15: ffff8801d9745240
FS:  000000000072c880(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3d403209b8 CR3: 00000001d8f3f000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 get_node_page fs/f2fs/node.c:1237 [inline]
 truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014
 remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039
 f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547
 evict+0x4a6/0x960 fs/inode.c:557
 iput_final fs/inode.c:1519 [inline]
 iput+0x62d/0xa80 fs/inode.c:1545
 f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849
 mount_bdev+0x30c/0x3e0 fs/super.c:1164
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1267
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2518 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2848
 ksys_mount+0x12d/0x140 fs/namespace.c:3064
 __do_sys_mount fs/namespace.c:3078 [inline]
 __se_sys_mount fs/namespace.c:3075 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x443dea
RSP: 002b:00007ffcc7882368 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443dea
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffcc7882370
RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a
R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004
R13: 0000000000402ce0 R14: 0000000000000000 R15: 0000000000000000
RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: ffff8801d960e820
---[ end trace 4edbeb71f002bb76 ]---

Reported-and-tested-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:48 -07:00
Chao Yu
cf52b27a39 f2fs: clean up commit_inmem_pages()
This patch moves error handling from commit_inmem_pages() into
__commit_inmem_page() for cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:47 -07:00
Sheng Yong
eff15c2aec f2fs: do not check F2FS_INLINE_DOTS in recover
Only dir may have F2FS_INLINE_DOTS flag, so there is no need to check
the flag in recover flow.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:47 -07:00
Sheng Yong
a515d12f1b f2fs: remove duplicated dquot_initialize and fix error handling
This patch removes duplicated dquot_initialize in recover_orphan_inode(),
and fix the error handling if dquot_initialize fails.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:47 -07:00
Chao Yu
c22aecd759 f2fs: fix to detect failure of dquot_initialize
dquot_initialize() can fail due to any exception inside quota subsystem,
f2fs needs to be aware of it, and return correct return value to caller.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:47 -07:00
Yunlei He
d618477473 f2fs: stop issue discard if something wrong with f2fs
v4->v5: move data corruption check to __submit_discard_cmd, in order to
control discard io submitted more accurately, besides, increase async
thread wait time if data corruption detected.

This patch stop async thread and umount process to issue discard
if something wrong with f2fs, which is similar to fstrim.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:47 -07:00
Chao Yu
b169c3c560 f2fs: fix return value in f2fs_ioc_commit_atomic_write
In f2fs_ioc_commit_atomic_write, if file is volatile, return -EINVAL to
indicate that commit failure.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:46 -07:00
Yunlei He
054afda999 f2fs: allocate hot_data for atomic write more strictly
If a file not set type as hot, has dirty pages more than
threshold 64 before starting atomic write, may be lose hot
flag.

v1->v2: move set FI_ATOMIC_FILE flag behind flush dirty pages too,
in case of dirty pages before starting atomic use atomic mode to
write back.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:46 -07:00
Sheng Yong
d0891e84e1 f2fs: check if inmem_pages list is empty correctly
`cur' will never be NULL, we should check inmem_pages list instead.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:46 -07:00
Chao Yu
27319ba404 f2fs: fix race in between GC and atomic open
Thread					GC thread
- f2fs_ioc_start_atomic_write
 - get_dirty_pages
 - filemap_write_and_wait_range
					- f2fs_gc
					 - do_garbage_collect
					  - gc_data_segment
					   - move_data_page
					    - f2fs_is_atomic_file
					    - set_page_dirty
 - set_inode_flag(, FI_ATOMIC_FILE)

Dirty data page can still be generated by GC in race condition as
above call stack.

This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write
to avoid such race.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:46 -07:00
Souptick Joarder
ea4d479bb3 fs: f2fs: Adding new return type vm_fault_t
Use new return type vm_fault_t for page_mkwrite
and fault handler.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:46 -07:00
Zhikang Zhang
d6964949e4 f2fs: change le32 to le16 of f2fs_inode->i_extra_size
In the structure of f2fs_inode, i_extra_size's type is __le16,
so we should keep type consistent when using it.

Fixes: 704956ecf5 ("f2fs: support inode checksum")
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:46 -07:00
Zhikang Zhang
56b07e7e65 f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries
We should check valid_map_mir and block count to ensure
the flushed raw_sit is correct.

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:46 -07:00
Chao Yu
3d165dc3ae f2fs: correct return value of f2fs_trim_fs
Correct return value in two cases:
- return EINVAL if end boundary is out-of-range.
- return EIO if fs needs off-line check.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:45 -07:00
Chao Yu
764811581d f2fs: fix to show missing bits in FS_IOC_GETFLAGS
This patch fixes to show missing encrypt/inline_data flag in
FS_IOC_GETFLAGS like ext4 does.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:45 -07:00
Chao Yu
c807a7cb54 f2fs: remove unneeded F2FS_PROJINHERIT_FL
Now F2FS_FL_USER_VISIBLE and F2FS_FL_USER_MODIFIABLE has included
F2FS_PROJINHERIT_FL, so remove unneeded F2FS_PROJINHERIT_FL when
using visible/modifiable flag macro.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:45 -07:00
Chao Yu
81114baa83 f2fs: don't use GFP_ZERO for page caches
Related to https://lkml.org/lkml/2018/4/8/661

Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.

There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:45 -07:00
Yunlei He
241b493d8f f2fs: issue all big range discards in umount process
This patch modify max_requests to UINT_MAX, to issue
all big range discards in umount.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:45 -07:00
Chao Yu
47cca3d62a f2fs: remove redundant block plug
For buffered IO, we don't need to use block plug to cache bio,
for direct IO, generic f2fs_direct_IO has already added block
plug, so let's remove redundant one in .write_iter.

As Yunlei described in his patch:

-f2fs_file_write_iter
  -blk_start_plug
    -__generic_file_write_iter
	...
	  -do_blockdev_direct_IO
	    -blk_start_plug
		...
	    -blk_finish_plug
	...
  -blk_finish_plug

which may conduct performance decrease in our platform

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:45 -07:00
Yunlong Song
8045249829 f2fs: remove unmatched zero_user_segment when convert inline dentry
Since the layout of regular dentry block is different from inline dentry
block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not
correct for regular dentry block, besides, bitmap is already copied and
used, so there is no necessary to zero page at all, so just remove the
zero_user_segment is OK.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:44 -07:00
Chao Yu
59c844088b f2fs: introduce private inode status mapping
Previously, we use generic FS_*_FL defined by vfs to indicate inode status
for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
one, it will be hard for f2fs to reuse bits f2fs never used to indicate
new status..

In order to solve this issue, we introduce private inode status mapping,
Note, for these bits have already been persisted into disk, we should
never change their definition, for other ones, we can remap them for
later new coming status.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 11:31:44 -07:00
Jaegeuk Kim
e555da9f31 f2fs: run fstrim asynchronously if runtime discard is on
We don't need to wait for whole bunch of discard candidates in fstrim, since
runtime discard will issue them in idle time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-31 10:20:48 -07:00
Chao Yu
cba608493d f2fs: turn down IO priority of discard from background
In order to avoid interfering normal r/w IO, let's turn down IO
priority of discard issued from background.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 08:58:59 -07:00
Chao Yu
377224c471 f2fs: don't split checkpoint in fstrim
Now, we issue discard asynchronously in separated thread instead of in
checkpoint, after that, we won't encounter long latency in checkpoint
due to huge number of synchronous discard command handling, so, we don't
need to split checkpoint to do trim in batch, merge it and obsolete
related sysfs entry.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 08:58:59 -07:00
Jaegeuk Kim
8bb4f2535c f2fs: issue discard commands proactively in high fs utilization
In the high utilization like over 80%, we don't expect huge # of large discard
commands, but do many small pending discards which affects FTL GCs a lot.
Let's issue them in that case.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 08:58:59 -07:00
Jaegeuk Kim
d6290814b0 f2fs: add fsync_mode=nobarrier for non-atomic files
For non-atomic files, this patch adds an option to give nobarrier which
doesn't issue flush commands to the device.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 12:04:08 -07:00
Jaegeuk Kim
9a997188ff f2fs: let fstrim issue discard commands in lower priority
The fstrim gathers huge number of large discard commands, and tries to issue
without IO awareness, which results in long user-perceive IO latencies on
READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
seconds due to long discard latency.

This patch limits the maximum size to 2MB per candidate, and check IO congestion
when issuing them to disk.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29 12:04:08 -07:00
Eric Biggers
e12ee6836a fscrypt: make fscrypt_operations.max_namelen an integer
Now ->max_namelen() is only called to limit the filename length when
adding NUL padding, and only for real filenames -- not symlink targets.
It also didn't give the correct length for symlink targets anyway since
it forgot to subtract 'sizeof(struct fscrypt_symlink_data)'.

Thus, change ->max_namelen from a function to a simple 'unsigned int'
that gives the filesystem's maximum filename length.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-05-20 16:21:03 -04:00
Christoph Hellwig
3f3942aca6 proc: introduce proc_create_single{,_data}
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16 07:23:35 +02:00
Al Viro
1e2e547a93 do d_instantiate/unlock_new_inode combinations safely
For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Cc: stable@kernel.org	# 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11 15:36:37 -04:00
Jaegeuk Kim
5b19d284f5 f2fs: avoid fsync() failure caused by EAGAIN in writepage()
pageout() in MM traslates EAGAIN, so calls handle_write_error()
 -> mapping_set_error() -> set_bit(AS_EIO, ...).
 file_write_and_wait_range() will see EIO error, which is critical
 to return value of fsync() followed by atomic_write failure to user.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-04 10:51:22 -07:00
Jaegeuk Kim
17c500350b f2fs: clear PageError on writepage
This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-02 14:30:58 -07:00
Jaegeuk Kim
a90a0884ac f2fs: check cap_resource only for data blocks
This patch changes the rule to check cap_resource for data blocks, not inode
or node blocks in order to avoid selinux denial.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-02 14:30:58 -07:00
Jaegeuk Kim
b87078ad3a Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"
This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
for stability.

This reverts commit fe76b796fc.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-02 14:30:57 -07:00
Eric Biggers
ab3835aae6 f2fs: call unlock_new_inode() before d_instantiate()
xfstest generic/429 sometimes hangs on f2fs, caused by a thread being
unable to take a directory's i_rwsem for write in vfs_rmdir().  In the
test, one thread repeatedly creates and removes a directory, and other
threads repeatedly look up a file in the directory.  The bug is that
f2fs_mkdir() calls d_instantiate() before unlock_new_inode(), resulting
in the directory inode being exposed to lookups before it has been fully
initialized.  And with CONFIG_DEBUG_LOCK_ALLOC, unlock_new_inode()
reinitializes ->i_rwsem, corrupting its state when it is already held.

Fix it by calling unlock_new_inode() before d_instantiate().  This
matches what other filesystems do.

Fixes: 57397d86c6 ("f2fs: add inode operations for special inodes")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-02 14:30:57 -07:00
Eric Biggers
6dbb17961f f2fs: refactor read path to allow multiple postprocessing steps
Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only.  But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.

To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step.  The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step.  When that completes, it continues to the next step.  Pages that
fail any postprocessing step have PageError set.  Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.

Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.

This may also be useful for other future f2fs features such as
compression.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-02 14:30:57 -07:00
Eric Biggers
0cb8dae4a0 fscrypt: allow synchronous bio decryption
Currently, fscrypt provides fscrypt_decrypt_bio_pages() which decrypts a
bio's pages asynchronously, then unlocks them afterwards.  But, this
assumes that decryption is the last "postprocessing step" for the bio,
so it's incompatible with additional postprocessing steps such as
authenticity verification after decryption.

Therefore, rename the existing fscrypt_decrypt_bio_pages() to
fscrypt_enqueue_decrypt_bio().  Then, add fscrypt_decrypt_bio() which
decrypts the pages in the bio synchronously without unlocking the pages,
nor setting them Uptodate; and add fscrypt_enqueue_decrypt_work(), which
enqueues work on the fscrypt_read_workqueue.  The new functions will be
used by filesystems that support both fscrypt and fs-verity.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-02 14:30:57 -07:00
Matthew Wilcox
b93b016313 page cache: use xa_lock
Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox
f6bb2a2c0b xarray: add the xa_lock to the radix_tree_root
This results in no change in structure size on 64-bit machines as it
fits in the padding between the gfp_t and the void *.  32-bit machines
will grow the structure from 8 to 12 bytes.  Almost all radix trees are
protected with (at least) a spinlock, so as they are converted from
radix trees to xarrays, the data structures will shrink again.

Initialising the spinlock requires a name for the benefit of lockdep, so
RADIX_TREE_INIT() now needs to know the name of the radix tree it's
initialising, and so do IDR_INIT() and IDA_INIT().

Also add the xa_lock() and xa_unlock() family of wrappers to make it
easier to use the lock.  If we could rely on -fplan9-extensions in the
compiler, we could avoid all of this syntactic sugar, but that wasn't
added until gcc 4.6.

Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Jaegeuk Kim
214c2461a8 f2fs: remain written times to update inode during fsync
This fixes xfstests/generic/392.

The failure was caused by different times between 1) one marked in the last
fsync(2) call and 2) the other given by roll-forward recovery after power-cut.
The reason was that we skipped updating inode block at 1), since its i_size
was recoverable along with 4KB-aligned data writes, which was fixed by:
  "f2fs: fix a wrong condition in f2fs_skip_inode_update"

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-03 18:52:47 -07:00
Yunlong Song
c79d152094 f2fs: make assignment of t->dentry_bitmap more readable
In make_dentry_ptr_block, it is confused with "&" for t->dentry_bitmap
but without "&" for t->dentry, so delete "&" to make code more readable.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-02 20:09:35 -07:00
Jaegeuk Kim
dc7a10ddee f2fs: truncate preallocated blocks in error case
If write is failed, we must deallocate the blocks that we couldn't write.

Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-02 20:09:34 -07:00
Junling Zheng
235831d7dd f2fs: fix a wrong condition in f2fs_skip_inode_update
Fix commit 97dd26ad83 (f2fs: fix wrong AUTO_RECOVER condition).
We should use ~PAGE_MASK to determine whether i_size is aligned to
the f2fs's block size or not.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-02 13:21:51 -07:00
Eric Biggers
53fedcc09c f2fs: reserve bits for fs-verity
Reserve an F2FS feature flag and inode flag for fs-verity.  This is an
in-development feature that is planned be discussed at LSF/MM 2018 [1].
It will provide file-based integrity and authenticity for read-only
files.  Most code will be in a filesystem-independent module, with
smaller changes needed to individual filesystems that opt-in to
supporting the feature.  An early prototype supporting F2FS is available
[2].  Reserving the F2FS on-disk bits for fs-verity will prevent users
of the prototype from conflicting with other new F2FS features.

Note that we're reserving the inode flag in f2fs_inode.i_advise, which
isn't really appropriate since it's not a hint or advice.  But
->i_advise is already being used to hold the 'encrypt' flag; and F2FS's
->i_flags uses the generic FS_* values, so it seems ->i_flags can't be
used for an F2FS-specific flag without additional work to remove the
assumption that ->i_flags uses the generic flags namespace.

[1] https://marc.info/?l=linux-fsdevel&m=151690752225644
[2] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/log/?h=fs-verity-dev

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-28 12:56:49 -07:00
Yunlei He
d21b0f238a f2fs: Add a segment type check in inplace write
This patch add a segment type check in IPU, in
case of something wrong with blkadd in dnode.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:13:54 -07:00
Yunlong Song
2f52052929 f2fs: no need to initialize zero value for GFP_F2FS_ZERO
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not
need to initialize zero value for its member any more.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:12:53 -07:00
Chao Yu
780de47cf6 f2fs: don't track new nat entry in nat set
Nat entry set is used only in checkpoint(), and during checkpoint() we
won't flush new nat entry with unallocated address, so we don't need to
add new nat entry into nat set, then nat_entry_set::entry_cnt can
indicate actual entry count we need to flush in checkpoint().

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:10:29 -07:00
Chao Yu
df033caf51 f2fs: clean up with F2FS_BLK_ALIGN
Clean up F2FS_BYTES_TO_BLK(x + F2FS_BLKSIZE - 1) with F2FS_BLK_ALIGN(x).

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:09:32 -07:00
Yunlei He
0833721ec3 f2fs: check blkaddr more accuratly before issue a bio
This patch check blkaddr more accuratly before issue a
write or read bio.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-18 23:33:50 -07:00
Ritesh Harjani
02117b8ae9 f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
Quota code itself is serializing the operations by taking mutex_lock.
It seems a below deadlock can happen if GF_NOFS is not used in
f2fs_quota_read

__switch_to+0x88
__schedule+0x5b0
schedule+0x78
schedule_preempt_disabled+0x20
__mutex_lock_slowpath+0xdc   		//mutex owner is itself
mutex_lock+0x2c
dquot_commit+0x30			//mutex_lock(&dqopt->dqio_mutex);
dqput+0xe0
__dquot_drop+0x80
dquot_drop+0x48
f2fs_evict_inode+0x218
evict+0xa8
dispose_list+0x3c
prune_icache_sb+0x58
super_cache_scan+0xf4
do_shrink_slab+0x208
shrink_slab.part.40+0xac
shrink_zone+0x1b0
do_try_to_free_pages+0x25c
try_to_free_pages+0x164
__alloc_pages_nodemask+0x534
do_read_cache_page+0x6c
read_cache_page+0x14
f2fs_quota_read+0xa4
read_blk+0x54
find_tree_dqentry+0xe4
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
qtree_read_dquot+0x68
v2_read_dquot+0x24
dquot_acquire+0x5c			// mutex_lock(&dqopt->dqio_mutex);
dqget+0x238
__dquot_initialize+0xd4
dquot_initialize+0x10
dquot_file_open+0x34
f2fs_file_open+0x6c
do_dentry_open+0x1e4
vfs_open+0x6c
path_openat+0xa20
do_filp_open+0x4c
do_sys_open+0x178

Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-18 23:31:28 -07:00
Sheng Yong
ff62af200b f2fs: introduce a new mount option test_dummy_encryption
This patch introduces a new mount option `test_dummy_encryption'
to allow fscrypt to create a fake fscrypt context. This is used
by xfstests.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 14:02:57 +09:00
Sheng Yong
b7c409deda f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
This patch introduces a new feature, F2FS_FEATURE_LOST_FOUND, which
is set by mkfs. mkfs creates a directory named lost+found, which saves
unreachable files. If fsck finds a file which has no parent, or its
parent is removed by fsck, the file will be placed under lost+found
directory by fsck.

lost+found directory could not be encrypted. As a result, the root
directory cannot be encrypted too. So if LOST_FOUND feature is enabled,
let's avoid to encrypt root directory.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 14:02:47 +09:00
Qiuyang Sun
da5ce874f8 f2fs: release locks before return in f2fs_ioc_gc_range()
Currently, we will leave the kernel with locks still held when the gc_range
is invalid. This patch fixes the bug.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:58:16 +09:00
Jaegeuk Kim
bb1105e479 f2fs: align memory boundary for bitops
For example, in arm64, free_nid_bitmap should be aligned to word size in order
to use bit operations.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:39 +09:00
Chao Yu
c56675750d f2fs: remove unneeded set_cold_node()
When setting COLD_BIT_SHIFT flag in node block, we only need to call
set_cold_node() in new_node_page() and recover_inode_page() during
node page initialization. So remove unneeded set_cold_node() in other
places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:33 +09:00
Hyunchul Lee
b91050a80c f2fs: add nowait aio support
This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:32 +09:00
Chao Yu
63189b7859 f2fs: wrap all options with f2fs_sb_info.mount_opt
This patch merges miscellaneous mount options into struct f2fs_mount_info,
After this patch, once we add new mount option, we don't need to worry
about recovery of it in remount_fs(), since we will recover the
f2fs_sb_info.mount_opt including all options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:30 +09:00
Yunlei He
5d7881cadf f2fs: Don't overwrite all types of node to keep node chain
Currently, we enable node SSR by default, and mixed
different types of node segment to do SSR more intensively.
Although reuse warm node is not allowed, warm node chain
will be destroyed by errors introduced by other types
node chain. So we'd better forbid reusing all types
of node to keep warm node chain.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:29 +09:00
Junling Zheng
93cf93f17c f2fs: introduce mount option for fsync mode
Commit "0a007b97aad6"(f2fs: recover directory operations by fsync)
fixed xfstest generic/342 case, but it also increased the written
data and caused the performance degradation. In most cases, there's
no need to do so heavy fsync actually.

So we introduce new mount option "fsync_mode={posix,strict}" to
control the policy of fsync. "fsync_mode=posix" is set by default,
and means that f2fs uses a light fsync, which follows POSIX semantics.
And "fsync_mode=strict" means that it's a heavy fsync, which behaves
in line with xfs, ext4 and btrfs, where generic/342 will pass, but
the performance will regress.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:28 +09:00
Chao Yu
eea52882d8 f2fs: fix to restore old mount option in ->remount_fs
This patch fixes to restore old mount option once we encounter failure
in ->remount_fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:27 +09:00
Chao Yu
deeedd71e3 f2fs: wrap sb_rdonly with f2fs_readonly
Use f2fs_readonly to wrap sb_rdonly for cleanup, and spread it in
all places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:26 +09:00
Jaegeuk Kim
162b27aec9 f2fs: avoid selinux denial on CAP_SYS_RESOURCE
This fixes CAP_SYS_RESOURCE denial of selinux when using resgid, since it
seems selinux reports it at the first place, but mostly we don't need to
check this condition first.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:55:43 +09:00
Chao Yu
b6a06cbbb5 f2fs: support hot file extension
This patch supports to recognize hot file extension in f2fs, so that we
can allocate proper hot segment location for its data, which can lead to
better hot/cold seperation in filesystem.

In addition, we changes a bit on query/add/del operation method for
extension_list sysfs entry as below:

- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:57 +09:00
Chao Yu
1dc0f8991d f2fs: fix to avoid race in between atomic write and background GC
Sqlite user			Background GC
				- move_data_block
				  : move page #1
				 - f2fs_is_atomic_file
- f2fs_ioc_start_atomic_write
- f2fs_ioc_commit_atomic_write
 - commit_inmem_pages
   : commit page #1 & set node #2 dirty
				 - f2fs_submit_page_write
				  - f2fs_update_data_blkaddr
				   - set_page_dirty
				     : set node #2 dirty
 - f2fs_do_sync_file
  - fsync_node_pages
   : commit node #1 & node #2, then sudden power-cut

In a race case, we may check FI_ATOMIC_FILE flag before starting atomic
write flow, then we will commit meta data before data with reversed
order, after a sudden pow-cut, database transaction will be inconsistent.

So we'd better to exclude gc/atomic_write to each other by using lock
instead of flag checking.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:55 +09:00
Jaegeuk Kim
b27bc8091c f2fs: do gc in greedy mode for whole range if gc_urgent mode is set
Otherwise, f2fs conducts GC on 8GB range only based on slow cost-benefit.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:54 +09:00
Jaegeuk Kim
dee02f0d62 f2fs: issue discard aggressively in the gc_urgent mode
This patch avoids to skip discard commands when user sets gc_urgent mode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:53 +09:00
Jaegeuk Kim
782911f491 f2fs: set readdir_ra by default
It gives general readdir improvement.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:52 +09:00
Jaegeuk Kim
84b89e5d94 f2fs: add auto tuning for small devices
If f2fs is running on top of very small devices, it's worth to avoid abusing
free LBAs. In order to achieve that, this patch introduces some parameter
tuning.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:51 +09:00
Jaegeuk Kim
079396270b f2fs: add mount option for segment allocation policy
This patch adds an mount option, "alloc_mode=%s" having two options, "default"
and "reuse".

In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment
all the time to reassign segments. It'd be useful for small-sized eMMC parts.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:50 +09:00
Jaegeuk Kim
69babac019 f2fs: don't stop GC if GC is contended
Let's do GC as much as possible, while gc_urgent is set.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:49 +09:00
Chao Yu
846ae671ad f2fs: expose extension_list sysfs entry
This patch adds a sysfs entry 'extension_list' to support
query/add/del item in extension list.

Query:
cat /sys/fs/f2fs/<device>/extension_list

Add:
echo 'extension' > /sys/fs/f2fs/<device>/extension_list

Del:
echo '!extension' > /sys/fs/f2fs/<device>/extension_list

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:48 +09:00
Chao Yu
17cd07ae95 f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
As Jayashree Mohan reported:

A simple workload to reproduce this would be :
1. create foo
2. Write (8K - 16K)  // foo size = 16K now
3. fsync()
4. falloc zero_range , keep_size (4202496 - 4210688) // foo size must be 16K
5. fdatasync()
Crash now

On recovery, we see that the file size is 4210688 and not 16K, which
violates the semantics of keep_size flag. We have a test case to
reproduce this using CrashMonkey on 4.15 kernel. Try this out by
simply running :
 ./c_harness -f /dev/sda -d /dev/cow_ram0 -t f2fs -e 102400  -P -v
 tests/generic_468_zero.so

The root cause is that we miss to set KEEP_SIZE bit correctly in zero_range
when zeroing block cross EOF with FALLOC_FL_KEEP_SIZE, let's fix this
missing case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:47 +09:00
Chao Yu
d0d3f1b329 f2fs: introduce sb_lock to make encrypt pwsalt update exclusive
f2fs_super_block.encrypt_pw_salt can be udpated and persisted
concurrently, result in getting different pwsalt in separated
threads, so let's introduce sb_lock to exclude concurrent
accessers.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:46 +09:00
Colin Ian King
8fe326cb99 f2fs: remove redundant initialization of pointer 'p'
Pointer p is initialized with a value that is never read and is later
re-assigned a new value, hence the initialization is redundant and can
be removed.

Cleans up clang warning:
fs/f2fs/extent_cache.c:463:19: warning: Value stored to 'p' during
its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:45 +09:00
Gao Xiang
46706d5917 f2fs: flush cp pack except cp pack 2 page at first
Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly powering off at this time, we could get into
an extreme scenario that cp pack 1 page and cp pack 2 page are updated
and latest, but payload or current summaries are still partially
outdated. (see reliable write in the UFS specification)

This patch submits the whole cp pack except cp pack 2 page at first,
and then writes the cp pack 2 page with an extra independent
bio with pre-io barrier.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:43 +09:00
Sheng Yong
ccd31cb28f f2fs: clean up f2fs_sb_has_xxx functions
This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
different f2fs_sb_has_xxx functions.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:42 +09:00
Tiezhu Yang
3bb09a0e7e f2fs: remove redundant check of page type when submit bio
This patch removes redundant check of page type when submit bio to
make the logic more clear.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:41 +09:00
Chao Yu
fb0e72c8b9 f2fs: fix to handle looped node chain during recovery
There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:40 +09:00
Jaegeuk Kim
0f9ec2a8f6 f2fs: handle quota for orphan inodes
This is to detect dquot_initialize errors early from evict_inode
for orphan inodes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:38 +09:00
Hyunchul Lee
f2e703f9a3 f2fs: support passing down write hints to block layer with F2FS policy
Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_MEDIUM;
                      HOT_NODE                 WRITE_LIFE_NOT_SET
                      WARM_NODE                "
                      COLD_NODE                WRITE_LIFE_NONE
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:36 +09:00
Hyunchul Lee
0cdd319539 f2fs: support passing down write hints given by users to block layer
Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_NOT_SET
                      HOT_NODE                 "
                      WARM_NODE                "
                      COLD_NODE                "
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid build warning]
[Chao Yu: fix to restore whint_mode in ->remount_fs]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:35 +09:00
Chao Yu
cd36d7a17f f2fs: fix to clear CP_TRIMMED_FLAG
Once CP_TRIMMED_FLAG is set, after a reboot, we will never issue discard
before LBA becomes invalid again, fix it by clearing the flag in
checkpoint without CP_TRIMMED reason.

Fixes: 1f43e2ad7b ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:34 +09:00
Chao Yu
199bc3fef2 f2fs: support large nat bitmap
Previously, we will store all nat version bitmap in checkpoint pack block,
so our total node entry number has a limitation which caused total node
number can not exceed (3900 * 8) block * 455 node/block = 14196000. So
that once user wants to create more nodes in large size image, it becomes
a bottleneck, that's unreasonable.

This patch detects the new layout of nat/sit version bitmap in image in
order to enable supporting large nat bitmap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:33 +09:00
Chao Yu
bf617f7a92 f2fs: fix to check extent cache in f2fs_drop_extent_tree
If noextent_cache mount option is on, we will never initialize extent tree
in inode, but still we're going to access it in f2fs_drop_extent_tree,
result in kernel panic as below:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
 IP: _raw_write_lock+0xc/0x30
 Call Trace:
  ? f2fs_drop_extent_tree+0x41/0x70 [f2fs]
  f2fs_fallocate+0x5a0/0xdd0 [f2fs]
  ? common_file_perm+0x47/0xc0
  ? apparmor_file_permission+0x1a/0x20
  vfs_fallocate+0x15b/0x290
  SyS_fallocate+0x44/0x70
  do_syscall_64+0x6e/0x160
  entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes to check extent cache status before using in
f2fs_drop_extent_tree.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:32 +09:00
Chao Yu
4d817ae099 f2fs: restrict inline_xattr_size configuration
This patch limits to enable inline_xattr_size mount option only if
both extra_attr and flexible_inline_xattr feature is on in current
image.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:31 +09:00
Yunlong Song
b94929d975 f2fs: fix heap mode to reset it back
Commit 7a20b8a61e ("f2fs: allocate node
and hot data in the beginning of partition") introduces another mount
option, heap, to reset it back. But it does not do anything for heap
mode, so fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:30 +09:00
Sheng Yong
0964fc1a82 f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET
sb_getblk does not guarantee the buffer head is uptodate. If bh is not
uptodate, the data (may be used as boot code) in area before
F2FS_SUPER_OFFSET may get corrupted when super block is committed.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:29 +09:00
Yunlong Song
bdbc90fa55 f2fs: don't put dentry page in pagecache into highmem
Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:03 +09:00
Linus Torvalds
3462ac5703 Refactor support for encrypted symlinks to move common code to fscrypt.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlp2R3AACgkQ8vlZVpUN
 gaOIdAgApEdlFR2Gf93z2hMj5HxVL5rjkuPJVtVkKu0eH2HMQJyxNmjymrRfuFmM
 8W1CrEvVKi5Aj6r8q4KHIdVV247Ya0SVEhLwKM0LX4CvlZUXmwgCmZ/MPDTXA1eq
 C4vPVuJAuSNGNVYDlDs3+NiMHINGNVnBVQQFSPBP9P+iNWPD7o486712qaF8maVn
 RbfbQ2rWtOIRdlAOD1U5WqgQku59lOsmHk2pc0+X4LHCZFpMoaO80JVjENPAw+BF
 daRt6TX+WljMyx6DRIaszqau876CJhe/tqlZcCLOkpXZP0jJS13yodp26dVQmjCh
 w8YdiY7uHK2D+S/8eyj7h7DIwzu3vg==
 =ZjQP
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Refactor support for encrypted symlinks to move common code to fscrypt"

Ted also points out about the merge:
 "This makes the f2fs symlink code use the fscrypt_encrypt_symlink()
  from the fscrypt tree. This will end up dropping the kzalloc() ->
  f2fs_kzalloc() change, which means the fscrypt-specific allocation
  won't get tested by f2fs's kmalloc error injection system; which is
  fine"

* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits)
  fscrypt: fix build with pre-4.6 gcc versions
  fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
  fscrypt: document symlink length restriction
  fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
  fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
  fscrypt: calculate NUL-padding length in one place only
  fscrypt: move fscrypt_symlink_data to fscrypt_private.h
  fscrypt: remove fscrypt_fname_usr_to_disk()
  ubifs: switch to fscrypt_get_symlink()
  ubifs: switch to fscrypt ->symlink() helper functions
  ubifs: free the encrypted symlink target
  f2fs: switch to fscrypt_get_symlink()
  f2fs: switch to fscrypt ->symlink() helper functions
  ext4: switch to fscrypt_get_symlink()
  ext4: switch to fscrypt ->symlink() helper functions
  fscrypt: new helper function - fscrypt_get_symlink()
  fscrypt: new helper functions for ->symlink()
  fscrypt: trim down fscrypt.h includes
  fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
  fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
  ...
2018-02-04 10:43:12 -08:00
Linus Torvalds
255442c938 Documentation updates for 4.16. New stuff includes refcount_t
documentation, errseq documentation, kernel-doc support for nested
 structure definitions, the removal of lots of crufty kernel-doc support for
 unused formats, SPDX tag documentation, the beginnings of a manual for
 subsystem maintainers, and lots of fixes and updates.
 
 As usual, some of the changesets reach outside of Documentation/ to effect
 kerneldoc comment fixes.  It also adds the new LICENSES directory, of which
 Thomas promises I do not need to be the maintainer.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJab11TAAoJEI3ONVYwIuV6i1UP/1LgGPHW9Ygq5qaLFbReZd/u
 Mx/orrhHX0PdkbCCE+CbL8Vm1m4UKFDTBdlpk3s542zxeeG0ZBXuTnvq4Kyk+cTN
 p4/vsIEzk/Ih13/glGE5MlV+EjiEK+8hK69TIUj7bAyuHmpzofjRz9/1M6RLDGDC
 HY6UI58AXG0yOQWMWCGRMYpQAFUGij2equ7Doe1ugXRq14dx7V4RsOhI140iRk7t
 bquAq1rS2fXniiuPFmLBUe4dWW28isVa/Vl/aXcaWQDKMyT0OLhjOMW36wWKqtPi
 WdVCpHv1NLZNyZZr9S3kvfOwW+BUqpEzfVwssyBLW4h0tsnIx0U0HVhSTY8/TvFZ
 QD9yCSana4LB/e5CHXIX5lBHbjHxf+rETXqVV4MgwDaMvM3mCo4X6WUTJDmZADo6
 vQISEKeb4su5uWAbc9T9xwRSLhZnFVdJ/QuYdNQ5+EpFJYLhzQ9eBvEz6JstSIXL
 p9ASBiPNY3ulpVZ8q0JOHJRBhq5mHJH6Dy8achzbILy2l/ZI4b8lJ53mw9II04cp
 puF96E6HpvuZ8Tgjjrg9U3ZdxXNrUgc/tjk2ZDkyTglk1XF2jKSq2tiNSZ3oLrJm
 XqJPnpCeyJM5UDvwkIBzgC41WEHwe8uvoNbUnc4X7UJSZegFzcSLQXf5qaprHS5k
 XeQ7sbd+S+jzVVjFi0W5
 =Z15Z
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.16' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Documentation updates for 4.16.

  New stuff includes refcount_t documentation, errseq documentation,
  kernel-doc support for nested structure definitions, the removal of
  lots of crufty kernel-doc support for unused formats, SPDX tag
  documentation, the beginnings of a manual for subsystem maintainers,
  and lots of fixes and updates.

  As usual, some of the changesets reach outside of Documentation/ to
  effect kerneldoc comment fixes. It also adds the new LICENSES
  directory, of which Thomas promises I do not need to be the
  maintainer"

* tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits)
  linux-next: docs-rst: Fix typos in kfigure.py
  linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt
  Documentation: Fix misconversion of #if
  docs: add index entry for networking/msg_zerocopy
  Documentation: security/credentials.rst: explain need to sort group_list
  LICENSES: Add MPL-1.1 license
  LICENSES: Add the GPL 1.0 license
  LICENSES: Add Linux syscall note exception
  LICENSES: Add the MIT license
  LICENSES: Add the BSD-3-clause "Clear" license
  LICENSES: Add the BSD 3-clause "New" or "Revised" License
  LICENSES: Add the BSD 2-clause "Simplified" license
  LICENSES: Add the LGPL-2.1 license
  LICENSES: Add the LGPL 2.0 license
  LICENSES: Add the GPL 2.0 license
  Documentation: Add license-rules.rst to describe how to properly identify file licenses
  scripts: kernel_doc: better handle show warnings logic
  fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at
  doc: md: Fix a file name to md-fault.c in fault-injection.txt
  errseq: Add to documentation tree
  ...
2018-01-31 19:25:25 -08:00
Linus Torvalds
3da90b159b f2fs-for-4.16-rc1
In this round, we've followed up to support some generic features such as
 cgroup, block reservation, linking fscrypt_ops, delivering write_hints,
 and some ioctls. And, we could fix some corner cases in terms of power-cut
 recovery and subtle deadlocks.
 
 Enhancement:
  - bitmap operations to handle NAT blocks
  - readahead to improve readdir speed
  - switch to use fscrypt_*
  - apply write hints for direct IO
  - add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid
  - modify b_avail and b_free to consider root reserved blocks
  - support cgroup writeback
  - support FIEMAP_FLAG_XATTR for fibmap
  - add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents
  - add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks
  - support inode creation time
 
 Bug fix:
  - sysfile-based quota operations
  - memory footprint accounting
  - allow to write data on partial preallocation case
  - fix deadlock case on fallocate
  - fix to handle fill_super errors
  - fix missing inode updates of fsync'ed file
  - recover renamed file which was fsycn'ed before
  - drop inmemory pages in corner error case
  - keep last_disk_size correctly
  - recover missing i_inline flags during roll-forward
 
 Various clean-up patches were added as well.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlpw7uUACgkQQBSofoJI
 UNIDEA//d0ScVxEWN6s2Qba5wa2XnTcIi3upzB61gYAkeKdy/8zan8Dp7yaZWh+h
 hAMG7iuaNZS6fwkNvF4SRf3zChIDDXofJneQVE5x3eyHpYCJTIBRijV/dtZ1Fzzd
 q+hrl3JxMKx1jMezh2tDxWnznL8Fgu34cz3UmcF/YBniM99nNP7ri4HsDB+r/691
 Yo/Z1nN3VEJRrHiIfTK2eR8LmlvUFWBq21R9mszvPTYpUz3GGJ5bInY1r92nzMC9
 1EDk4e0RFvV2p/CJPmFiOGMDVUb9LJ/J8icXF5FlQ5eE6DNIP6Q4609MJD29sVtE
 mDC11hV15QhKt+huazn/nivcPtwWgjUdyzw6EYJLtUdEaugQarA1iORR2ZNNBxOX
 ZmocX++rby4oHMd+Tl618jcRYOS3hUhHncgw8IxDJH9Kh1vI/4z2wdCfkucH5L/u
 eG5+1qMehE4vnSB2nMvEJdCERR3yHc5qZDUfMZs/e7jHjIUdT399kkAprljdEDSc
 QVlXeM5rdmiILVs9fY6gVgr6BgjzYB+DhrOJ3jQ8xrIOdcoqeN14RduOvZpAdiNr
 IwQgPxNQm8WBJQxomso7ySWotYmGIxWOPjqWtSyfL7TS4Jdiwf7eoo3pUDc8sg7A
 xi6zvDjdT4hmkqLKx71As3V82g6RmY4Ydcyk2XqnBjF26g2Kb68=
 =jZqE
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've followed up to support some generic features such
  as cgroup, block reservation, linking fscrypt_ops, delivering
  write_hints, and some ioctls. And, we could fix some corner cases in
  terms of power-cut recovery and subtle deadlocks.

  Enhancements:
   - bitmap operations to handle NAT blocks
   - readahead to improve readdir speed
   - switch to use fscrypt_*
   - apply write hints for direct IO
   - add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid
   - modify b_avail and b_free to consider root reserved blocks
   - support cgroup writeback
   - support FIEMAP_FLAG_XATTR for fibmap
   - add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents
   - add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks
   - support inode creation time

  Bug fixs:
   - sysfile-based quota operations
   - memory footprint accounting
   - allow to write data on partial preallocation case
   - fix deadlock case on fallocate
   - fix to handle fill_super errors
   - fix missing inode updates of fsync'ed file
   - recover renamed file which was fsycn'ed before
   - drop inmemory pages in corner error case
   - keep last_disk_size correctly
   - recover missing i_inline flags during roll-forward

  Various clean-up patches were added as well"

* tag 'f2fs-for-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (72 commits)
  f2fs: support inode creation time
  f2fs: rebuild sit page from sit info in mem
  f2fs: stop issuing discard if fs is readonly
  f2fs: clean up duplicated assignment in init_discard_policy
  f2fs: use GFP_F2FS_ZERO for cleanup
  f2fs: allow to recover node blocks given updated checkpoint
  f2fs: recover some i_inline flags
  f2fs: correct removexattr behavior for null valued extended attribute
  f2fs: drop page cache after fs shutdown
  f2fs: stop gc/discard thread after fs shutdown
  f2fs: hanlde error case in f2fs_ioc_shutdown
  f2fs: split need_inplace_update
  f2fs: fix to update last_disk_size correctly
  f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
  f2fs: clean up error path of fill_super
  f2fs: avoid hungtask when GC encrypted block if io_bits is set
  f2fs: allow quota to use reserved blocks
  f2fs: fix to drop all inmem pages correctly
  f2fs: speed up defragment on sparse file
  f2fs: support F2FS_IOC_PRECACHE_EXTENTS
  ...
2018-01-30 19:07:32 -08:00
Chao Yu
1c1d35df71 f2fs: support inode creation time
This patch adds creation time field in inode layout to support showing
kstat.btime in ->statx.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 14:10:39 -08:00
Yunlei He
068c3cd858 f2fs: rebuild sit page from sit info in mem
This patch rebuild sit page from sit info in mem instead
of issue a read io.

I test this method and the result is as below:

Pre:
 mmc_perf_test-12061 [001] ...1   976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [001] ...1   976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1   998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1   999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [002] ...1  1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [001] ...1  1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [001] ...1  1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [002] ...1  1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [002] ...1  1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [002] ...1  1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

Some sit flush consume more than 50ms.

Now:
mmc_perf_test-2187  [007] ...1   196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [007] ...1   196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [007] ...1   219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [007] ...1   219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [002] ...1   243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [002] ...1   265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [002] ...1   265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [000] ...1   290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [003] ...1   317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [003] ...1   317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [005] ...1   343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [005] ...1   343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [000] ...1   370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [001] ...1   397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [001] ...1   397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [003] ...1   425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [003] ...1   425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

Most sit flush consume no more than 1ms.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 10:44:34 -08:00
Chao Yu
3b60d802d9 f2fs: stop issuing discard if fs is readonly
If filesystem is readonly, stop to issue discard in daemon.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 10:40:10 -08:00
Chao Yu
6819b884e0 f2fs: clean up duplicated assignment in init_discard_policy
Remove duplicated codes of assignment for .max_requests and .io_aware_gran.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 10:40:01 -08:00
Chao Yu
2882d34310 f2fs: use GFP_F2FS_ZERO for cleanup
Clean up codes with GFP_F2FS_ZERO, no logic changes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 10:39:49 -08:00
Jaegeuk Kim
f236792311 f2fs: allow to recover node blocks given updated checkpoint
If fsck.f2fs changes crc, we have no way to recover some inode blocks by roll-
forward recovery. Let's relax the condition to recover them.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-22 14:56:59 -08:00