Commit Graph

1944 Commits

Author SHA1 Message Date
Chao Yu
b6895e8f99 f2fs: fix memory leak of write_io_dummy mempool during umount
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:40:09 -08:00
Chao Yu
540faedb00 f2fs: fix to update F2FS_{CP_}WB_DATA count correctly
We should only account F2FS_{CP_}WB_DATA IOs for write path, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:40:08 -08:00
Kinglong Mee
f0cdbfe6ef f2fs: use MAX_FREE_NIDS for the free nids target
F2FS has define MAX_FREE_NIDS for maximum of cached free nids target.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:48 -08:00
Chao Yu
4ac912427c f2fs: introduce free nid bitmap
In scenario of intensively node allocation, free nids will be ran out
soon, then it needs to stop to load free nids by traversing NAT blocks,
in worse case, if NAT blocks does not be cached in memory, it generates
IOs which slows down our foreground operations.

In order to speed up node allocation, in this patch we introduce a new
free_nid_bitmap array, so there is an bitmap table for each NAT block,
Once the NAT block is loaded, related bitmap cache will be switched on,
and bitmap will be set during traversing nat entries in NAT block, later
we can query and update nid usage status in memory completely.

With such implementation, I expect performance of node allocation can be
improved in the long-term after filesystem image is mounted.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:47 -08:00
Kinglong Mee
ced2c7ea8e f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint
There are four places that getting the crc value in f2fs_checkpoint,
just add a new helper cur_cp_crc for them.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:47 -08:00
Kinglong Mee
727ebb091e f2fs: update the comment of default nr_pages to skipping
Fixes: 2c237ebaa4 ("f2fs: avoid writing node/metapages during writes")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:46 -08:00
Kinglong Mee
0ec4a5b647 f2fs: drop the duplicate pval in f2fs_getxattr
Fixes: ba38c27eb9 ("f2fs: enhance lookup xattr")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:45 -08:00
Kinglong Mee
5f35a2cd5b f2fs: Don't update the xattr data that same as the exist
f2fs removes the old xattr data and appends the new data although
the new data is same as the exist.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:44 -08:00
Chao Yu
317e130096 f2fs: kill __is_extent_same
Since commit ee6d182f2a ("f2fs: remove syncing inode page in all the
cases") delayed inode element updating from inode cache to node page
cache, so once largest cached extent is updated, we can make inode dirty
immediately instead of checking and updating it in the end of extent
cache update.

The above commit didn't clean up unneeded codes in extent_cache.c, let's
finish the job in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:43 -08:00
Hou Pengyang
19f4e688f8 f2fs: avoid bggc->fggc when enough free segments are avaliable after cp
We use has_not_enough_free_secs to check if there are enough free segments,

    	(free_sections(sbi) + freed) <=
	        (node_secs + 2 * dent_secs + imeta_secs +
			         reserved_sections(sbi) + needed);

Under scenario with large number of dirty nodes, these nodes would be flushed
during cp, as a result, right side of the inequality would be decreased, while
left side stays unchanged if these nodes are flushed in SSR way, which means
there are enough free segments after this cp.

For this case, we just do a bggc instead of fggc.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:37 -08:00
Chao Yu
d27c3d89db f2fs: select target segment with closer temperature in SSR mode
In SSR mode, we can allocate target segment which has different
temperature type from the type of current block, in order to avoid
mixing coldest and hottest data/node as much as possible, change
SSR allocation policy to select closer temperature for current
block prior.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:56 -08:00
Chao Yu
55523519bc f2fs: show simple call stack in fault injection message
Previously kernel message can show that in which function we do the
injection, but unfortunately, most of the caller are the same, for
tracking more information of injection path, it needs to show upper
caller's name. This patch supports that ability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:55 -08:00
Yunlei He
dd7b2333e6 f2fs: no need lock_op in f2fs_write_inline_data
Similar as f2fs_write_inode, f2fs_write_inline_data just
mark inode page dirty, so it's no need to write inline data
under read lock of cp_rwsem.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:55 -08:00
Jaegeuk Kim
22ad0b6ab4 f2fs: add bitmaps for empty or full NAT blocks
This patches adds bitmaps to represent empty or full NAT blocks containing
free nid entries.

If we can find valid crc|cp_ver in the last block of checkpoint pack, we'll
use these bitmaps when building free nids. In order to avoid checkpointing
burden, up-to-date bitmaps will be flushed only during umount time. So,
normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs
which recovers this bitmap again.

After this patch, we build free nids from nid #0 at mount time to make more
full NAT blocks, but in runtime, we check empty NAT blocks to load free nids
without loading any NAT pages from disk.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:54 -08:00
Yunlei He
5e8256ac2e f2fs: replace rw semaphore extent_tree_lock with mutex lock
This patch replace rw semaphore extent_tree_lock with mutex lock
for no read cases with this lock.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:53 -08:00
Kinglong Mee
3f2be04304 f2fs: avoid m_flags overlay when allocating more data blocks
When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED
flags will be overlapped by F2FS_MAP_NEW at the later times.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:52 -08:00
Hou Pengyang
6bfaf7b150 f2fs: remove unsafe bitmap checking
proc A:                      proc B:
- writeback_sb_inodes
- __writeback_single_inode
- do_writepages
- f2fs_write_node_pages
- f2fs_balance_fs_bg         - write_checkpoint
- build_free_nids            - flush_nat_entries
- __build_free_nids          - __flush_nat_entry_set
- ra_meta_pages              - get_next_nat_page
- current_nat_addr           - set_to_next_nat
[do nat_bitmap checking]     - f2fs_change_bit

For proc A, nat_bitmap and nat_bitmap_mir would be compared without lock_op and
nm_i->nat_tree_lock, while proc B is changing nat_bitmap/nat_bitmap_ver in cp.

So it is normal for nat_bitmap/nat_bitmap diffrence under such scenario.

This patch fix this by removing the monitoring point.

[Fix: 599a09b f2fs: check in-memory nat version bitmap]
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:51 -08:00
Hou Pengyang
e15882b6c6 f2fs: init local extent_info to avoid stale stack info in tp
To avoid such stale(fops, blk, len) info in f2fs_lookup_extent_tree_end tp

dio-23095 [005] ...1 17878.856859: f2fs_lookup_extent_tree_end:
			dev = (259,30), ino = 856, pgofs = 0,
			ext_info(fofs: 3441207040, blk: 4294967232, len: 3481143808)

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:50 -08:00
Yunlong Song
77190e1f31 f2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc
Since has_not_enough_free_secs(sbi, 0, 0) must be true if has_not_enough_
free_secs(sbi, sec_freed, 0) is true, write_checkpoint is sure to execute in
both conditions.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:50 -08:00
Jaegeuk Kim
9259228571 f2fs: check discard alignment only for SEQWRITE zones
For converntional zones, we don't need to align discard commands to exact zone
size.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:46 -08:00
Jaegeuk Kim
40465257ac f2fs: wait for discard completion after submission
We don't need to wait for each discard commands when unmounting the image.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:39 -08:00
Jaegeuk Kim
47b8980816 f2fs: much larger batched trim_fs job
We have a kernel thread to issue discard commands, so we can increase the
number of batched discard sections. By default, now it becomes 4GB range.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:30 -08:00
Jaegeuk Kim
ad4d307fce f2fs: avoid very large discard command
This patch adds MAX_DISCARD_BLOCKS() to avoid issuing too much large single
discard command.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 09:59:20 -08:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Jaegeuk Kim
70d625cbdb f2fs: do SSR for node segments more aggresively
This patch gives more SSR chances for node blocks.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-24 10:01:41 -08:00
Jaegeuk Kim
c192f7a477 f2fs: find data segments across all the types
Previously, if type is CURSEG_HOT_DATA, we only check CURSEG_HOT_DATA only.
This patch fixes to search all the different types for SSR.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-24 10:01:08 -08:00
Jaegeuk Kim
d0db7703ac f2fs: do SSR in higher priority
Let's check SSR in prior to LFS allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-24 09:39:53 -08:00
Yunlong Song
035e97adab f2fs: do SSR for data when there is enough free space
In allocate_segment_by_default(), need_SSR() already detected it's time to do
SSR. So, let's try to find victims for data segments more aggressively in time.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-24 09:39:52 -08:00
Hou Pengyang
b9cd20619e f2fs: node segment is prior to data segment selected victim
As data segment gc may lead dnode dirty, so the greedy cost for data segment
should be valid blocks * 2, that is data segment is prior to node segment.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-24 09:39:40 -08:00
Yunlong Song
3436c4bdb3 f2fs: put allocate_segment after refresh_sit_entry
SIT information should be updated before segment allocation, since SSR needs
latest valid block information. Current code does not update the old_blkaddr
info in sit_entry, so adjust the allocate_segment to its proper location. Commit
5e443818fa ("f2fs: handle dirty segments inside
refresh_sit_entry") puts it into wrong location.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-24 09:37:30 -08:00
Hou Pengyang
e93b986525 f2fs: add ovp valid_blocks check for bg gc victim to fg_gc
For foreground gc, greedy algorithm should be adapted, which makes
this formula work well:

	(2 * (100 / config.overprovision + 1) + 6)

But currently, we fg_gc have a prior to select bg_gc victim segments to gc
first, these victims are selected by cost-benefit algorithm, we can't guarantee
such segments have the small valid blocks, which may destroy the f2fs rule, on
the worstest case, would consume all the free segments.

This patch fix this by add a filter in check_bg_victims, if segment's has # of
valid blocks over overprovision ratio, skip such segments.

Cc: <stable@vger.kernel.org>
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:28:20 -08:00
Jaegeuk Kim
86d54795c9 f2fs: do not wait for writeback in write_begin
Otherwise we can get livelock like below.

[79880.428136] dbench          D    0 18405  18404 0x00000000
[79880.428139] Call Trace:
[79880.428142]  __schedule+0x219/0x6b0
[79880.428144]  schedule+0x36/0x80
[79880.428147]  schedule_timeout+0x243/0x2e0
[79880.428152]  ? update_sd_lb_stats+0x16b/0x5f0
[79880.428155]  ? ktime_get+0x3c/0xb0
[79880.428157]  io_schedule_timeout+0xa6/0x110
[79880.428161]  __lock_page+0xf7/0x130
[79880.428164]  ? unlock_page+0x30/0x30
[79880.428167]  pagecache_get_page+0x16b/0x250
[79880.428171]  grab_cache_page_write_begin+0x20/0x40
[79880.428182]  f2fs_write_begin+0xa2/0xdb0 [f2fs]
[79880.428192]  ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
[79880.428197]  ? kmem_cache_free+0x79/0x200
[79880.428203]  ? __mark_inode_dirty+0x17f/0x360
[79880.428206]  generic_perform_write+0xbb/0x190
[79880.428213]  ? file_update_time+0xa4/0xf0
[79880.428217]  __generic_file_write_iter+0x19b/0x1e0
[79880.428226]  f2fs_file_write_iter+0x9c/0x180 [f2fs]
[79880.428231]  __vfs_write+0xc5/0x140
[79880.428235]  vfs_write+0xb2/0x1b0
[79880.428238]  SyS_write+0x46/0xa0
[79880.428242]  entry_SYSCALL_64_fastpath+0x1e/0xad

Fixes: cae96a5c8a ("f2fs: check io submission more precisely")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:23:27 -08:00
Yunlei He
05eeb118a0 f2fs: replace __get_victim by dirty_segments in FG_GC
In FG_GC process, it will search victim section twice. This will
cause some dirty section with less valid blocks skip garbage
collection.

section # 26425 : valid blocks # 3
142.037567: get_victim_by_default: victim 26425 : valid blocks # 3
142.037585: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.039494: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19022 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.070247: new_curseg: Debug: alloc new segment 26746
142.244341: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26054 ofs_unit = 1, pre_victim_secno = 26054, prefree = 0, free = 243
142.254475: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.293131: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23466 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.319001: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23467 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.368879: get_victim_by_default: victim 26425 : valid blocks # 3
142.368894: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.378127: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19612 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.416917: new_curseg: Debug: alloc new segment 26054
142.656794: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 25404 ofs_unit = 1, pre_victim_secno = 25404, prefree = 0, free = 243
142.662139: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.684159: new_curseg: Debug: alloc new segment 25197
142.685059: get_victim_by_default: victim 26425 : valid blocks # 3
142.685079: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 243
142.701427: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26238 ofs_unit = 1, pre_victim_secno = 26238, prefree = 0, free = 243
142.707105: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.802444: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23473 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.804422: get_victim_by_default: victim 26425 : valid blocks # 3
142.804443: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.851567: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19092 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.865014: new_curseg: Debug: alloc new segment 26238
143.082245: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26307 ofs_unit = 1, pre_victim_secno = 26307, prefree = 0, free = 244
143.088252: do_garbage_collect: Debug: FG_GC, seg_freed = 1
143.128307: new_curseg: Debug: alloc new segment 25404
143.181846: get_victim_by_default: victim 26425 : valid blocks # 3
143.181872: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:23:26 -08:00
Jaegeuk Kim
88c5c13a50 f2fs: fix multiple f2fs_add_link() calls having same name
It turns out a stakable filesystem like sdcardfs in AOSP can trigger multiple
vfs_create() to lower filesystem. In that case, f2fs will add multiple dentries
having same name which breaks filesystem consistency.

Until upper layer fixes, let's work around by f2fs, which shows actually not
much performance regression.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:23:25 -08:00
Jaegeuk Kim
d50aaeec90 f2fs: show actual device info in tracepoints
This patch shows actual device information in the tracepoints.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:23:24 -08:00
Jaegeuk Kim
5b6c6be2d8 f2fs: use SSR for warm node as well
We have had node chains, but haven't used it so far due to stale node blocks.
Now, we have crc|cp_ver in node footer and give random cp_ver at format time,
we can start to use it again.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 11:23:22 -08:00
Chao Yu
39133a5015 f2fs: enable inline_xattr by default
In android, since SElinux is enable, security policy will be appliedd for
each file, it stores in inode as an xattr entry, so it will take one 4k
size node block additionally for each file.

Let's enable inline_xattr by default in order to save storage space.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:21:49 -08:00
Chao Yu
23cf7212a1 f2fs: introduce noinline_xattr mount option
This patch introduces new mount option 'noinline_xattr', so we can disable
inline xattr functionality which is already set as a default mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:21:48 -08:00
Jaegeuk Kim
25cc5d3b9d f2fs: avoid reading NAT page by get_node_info
We've not seen this buggy case for a long time, so it's time to avoid this
unnecessary get_node_info() call which reading NAT page to cache nat entry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:21:47 -08:00
Jaegeuk Kim
9b064f7d0c f2fs: remove build_free_nids() during checkpoint
Let's avoid build_free_nids() in checkpoint path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:53 -08:00
Chao Yu
d260081ccf f2fs: change recovery policy of xattr node block
Currently, if we call fsync after updating the xattr date belongs to the
file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
this policy cause low performance as checkpoint will block most foreground
operations and cause unneeded and unrelated IOs around checkpoint.

This patch will reuse regular file recovery policy for xattr node block,
so, we change to write xattr node block tagged with fsync flag to warm
area instead of cold area, and during recovery, we search warm node chain
for fsynced xattr block, and do the recovery.

So, for below application IO pattern, performance can be improved
obviously:
- touch file
- create/update/delete xattr entry in file
- fsync file

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:52 -08:00
Bhumika Goyal
2ad0ef846b f2fs: super: constify fscrypt_operations structure
Declare fscrypt_operations structure as const as it is only stored in
the s_cop field of a super_block structure. This field is of type const,
so fscrypt_operations structure having this property can be made const
too.

File size before: fs/f2fs/super.o
   text	   data	    bss	    dec	    hex	filename
  54131	  31355	    184	  85670	  14ea6	fs/f2fs/super.o

File size after: fs/f2fs/super.o
   text	   data	    bss	    dec	    hex	filename
  54227	  31259	    184	  85670	  14ea6	fs/f2fs/super.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:51 -08:00
Jaegeuk Kim
1200abb26f f2fs: show checkpoint version at mount time
If we mounted f2fs successfully, let's show current checkpoint version.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:50 -08:00
Jaegeuk Kim
7f54f51f46 f2fs: remove preflush for nobarrier case
This patch removes REQ_PREFLUSH in the nobarrier case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:48 -08:00
Jaegeuk Kim
942fd3192f f2fs: check last page index in cached bio to decide submission
If the cached bio has the last page's index, then we need to submit it.
Otherwise, we don't need to submit it and can wait for further IO merges.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:48 -08:00
Jaegeuk Kim
d68f735b3b f2fs: check io submission more precisely
This patch check IO submission more precisely than previous rough check.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:47 -08:00
Jaegeuk Kim
f566bae846 f2fs: call internal __write_data_page directly
This patch introduces __write_data_page to call it by f2fs_write_cache_pages
directly..

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:46 -08:00
Jaegeuk Kim
e7c75ab099 f2fs: avoid out-of-order execution of atomic writes
We need to flush data writes before flushing last node block writes by using
FUA with PREFLUSH. We don't need to guarantee precedent node writes since if
those are not written, we can't reach to the last node block when scanning
node block chain during roll-forward recovery.
Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to
disk, which builds a valid node block chain.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:10:35 -08:00
Jaegeuk Kim
faa24895ac f2fs: move write_node_page above fsync_node_pages
This patch just moves write_node_page and introduces an inner function.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:09:43 -08:00
Jaegeuk Kim
c1b221078b f2fs: move flush tracepoint
This patch moves the tracepoint location for flush command.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-23 10:08:43 -08:00
Jaegeuk Kim
a00861dbca f2fs: show # of APPEND and UPDATE inodes
This patch shows cached # of APPEND and UPDATE inode entries.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:54:53 -08:00
DongOh Shin
cac5a3d8f5 f2fs: fix 446 coding style warnings in f2fs.h
1) Nine coding style warnings below have been resolved:
"Missing a blank line after declarations"

2) 435 coding style warnings below have been resolved:
"function definition argument 'x' should also have an identifier name"

3) Two coding style warnings below have been resolved:
"macros should not use a trailing semicolon"

Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:55 -08:00
DongOh Shin
c64ab12e36 f2fs: fix 3 coding style errors in f2fs.h
Two coding style errors below have been resolved:
"Macros with complex values should be enclosed in parentheses"

And a coding style error below has been resolved:
"space prohibited before that ',' (ctx:WxW)"

Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:55 -08:00
Jaegeuk Kim
8ed5974552 f2fs: declare missing static function
We missed two functions declared as static functions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:54 -08:00
Kaixu Xia
0cc0dec2b6 f2fs: show the fault injection mount option
This patch shows the fault injection mount option in
f2fs_show_options().

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:53 -08:00
Chao Yu
73545817c9 f2fs: fix null pointer dereference when issuing flush in ->fsync
We only allocate flush merge control structure sbi::sm_info::fcc_info when
flush_merge option is on, but in f2fs_issue_flush we still try to access
member of the control structure without that option, it incurs panic as
show below, fix it.

Call Trace:
 __remove_ino_entry+0xa9/0xc0 [f2fs]
 f2fs_do_sync_file.isra.27+0x214/0x6d0 [f2fs]
 f2fs_sync_file+0x18/0x20 [f2fs]
 vfs_fsync_range+0x3d/0xb0
 __do_page_fault+0x261/0x4d0
 do_fsync+0x3d/0x70
 SyS_fsync+0x10/0x20
 do_syscall_64+0x6e/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f18ce260de0
RSP: 002b:00007ffdd4589258 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f18ce260de0
RDX: 0000000000000006 RSI: 00000000016c0360 RDI: 0000000000000003
RBP: 00000000016c0360 R08: 000000000000ffff R09: 000000000000001f
R10: 00007ffdd4589020 R11: 0000000000000246 R12: 00000000016c0100
R13: 0000000000000000 R14: 00000000016c1f00 R15: 00000000016c0100
Code: fb 81 e3 00 08 00 00 48 89 45 a0 0f 1f 44 00 00 31 c0 85 db 75 27 41 81 e7 00 04 00 00 74 0c 41 8b 45 20 85 c0 0f 85 81 00 00 00 <f0> 41 ff 45 20 4c 89 e7 e8 f8 e9 ff ff f0 41 ff 4d 20 48 83 c4
RIP: f2fs_issue_flush+0x5b/0x170 [f2fs] RSP: ffffc90003b5fd78
CR2: 0000000000000020
---[ end trace a09314c24f037648 ]---

Reported-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:52 -08:00
Chao Yu
dba79f38bc f2fs: fix to avoid overflow when left shifting page offset
We use following method to calculate size with current page index:
size = index << PAGE_SHIFT
If type of index has only 32-bits size, left shifting will incur overflow,
which makes result incorrect.

So let's cast index with 64-bits type to avoid such issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:51 -08:00
Chao Yu
ba38c27eb9 f2fs: enhance lookup xattr
Previously, in getxattr we will load all entries both in inline xattr and
xattr node block, and then do the lookup in all entries, but our lookup
flow shows low efficiency, since if we can lookup and hit in inline xattr
of inode page cache first, we don't need to load and lookup xattr node
block, which can obviously save cpu time and IO latency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: initialize NULL to avoid warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:51 -08:00
Wei Fang
b86e33075e f2fs: fix a dead loop in f2fs_fiemap()
A dead loop can be triggered in f2fs_fiemap() using the test case
as below:

	...
	fd = open();
	fallocate(fd, 0, 0, 4294967296);
	ioctl(fd, FS_IOC_FIEMAP, fiemap_buf);
	...

It's caused by an overflow in __get_data_block():
	...
	bh->b_size = map.m_len << inode->i_blkbits;
	...
map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits
on 64 bits archtecture, type conversion from an unsigned int to a size_t
will result in an overflow.

In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap()
will call get_data_block() at block 0 again an again.

Fix this by adding a force conversion before left shift.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:49 -08:00
Jaegeuk Kim
dc91de78e5 f2fs: do not preallocate blocks which has wrong buffer
Sheng Yong reports needless preallocation if write(small_buffer, large_size)
is called.

In that case, f2fs preallocates large_size, but vfs returns early due to
small_buffer size. Let's detect it before preallocation phase in f2fs.

Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:48 -08:00
Jaegeuk Kim
dcc9165dbf f2fs: show # of on-going flush and discard bios
This patch adds stat information for flush and discard commands.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:47 -08:00
Jaegeuk Kim
1546996348 f2fs: add a kernel thread to issue discard commands asynchronously
This patch adds a kernel thread to issue discard commands.
It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
bio status.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 20:24:45 -08:00
Jaegeuk Kim
0b54fb8458 f2fs: factor out discard command info into discard_cmd_control
This patch adds discard_cmd_control with the existing discarding controls.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:53 -08:00
Jaegeuk Kim
d4adb30f25 f2fs: reorganize stat information
This patch modifies stat information more clearly.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:52 -08:00
Jaegeuk Kim
b01a92019c f2fs: clean up flush/discard command namings
This patch simply cleans up the names for flush/discard commands.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:51 -08:00
Chao Yu
ae27d62e6b f2fs: check in-memory sit version bitmap
This patch adds a mirror for sit version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:50 -08:00
Chao Yu
599a09b2c1 f2fs: check in-memory nat version bitmap
This patch adds a mirror for nat version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:49 -08:00
Chao Yu
355e78913c f2fs: check in-memory block bitmap
This patch adds a mirror for valid block bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:49 -08:00
Chao Yu
5fe457430e f2fs: introduce FI_ATOMIC_COMMIT
This patch introduces a new flag to indicate inode status of doing atomic
write committing, so that, we can keep atomic write status for inode
during atomic committing, then we can skip GCing pages of atomic write inode,
that avoids random GCed datas being mixed with current transaction, so
isolation of transaction can be kept.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:48 -08:00
Chao Yu
939afa943c f2fs: clean up with list_{first, last}_entry
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:47 -08:00
Jaegeuk Kim
25290fa559 f2fs: return fs_trim if there is no candidate
If there is no candidate to submit discard command during f2fs_trim_fs, let's
return without checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 18:48:40 -08:00
Jaegeuk Kim
0333ad4e4f f2fs: avoid needless checkpoint in f2fs_trim_fs
The f2fs_trim_fs() doesn't need to do checkpoint if there are newly allocated
data blocks only which didn't change the critical checkpoint data such as nat
and sit entries.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-22 13:16:36 -08:00
Linus Torvalds
6c24337f22 Various cleanups for the file system encryption feature.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlirP6wACgkQ8vlZVpUN
 gaMwpQgApR67CxzlstxYjZpWPAqC8McJ2FBDX+mCOle5Vkc1WQDklwr0oCfQThTj
 eDSFRhNfIvyPh0DJ589PxBCsWOqN5h6Si7hD5ZinomVNI+IL89OytaU5EV2OpWaW
 iKdJgO9Tm8U7LuY6FOIoVdX57kUXVdkWoj61rC056B1SNhnNiVeofi7lYDM8Ix4q
 IGSQ9W24iQKmCk4hCwgObhJBRK9RnlOH0GLUmpMaS+jnfnj/uePwdxWEFsPuCOob
 8acAJ49lr55kjIw79E0BAyWxhEZ2aiArHk8PaWynT/DyNq3ftcapPlpftoeba8vo
 glBJRX70QxPvt0iHEp0ykfExkhWhFA==
 =Joki
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Various cleanups for the file system encryption feature"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: constify struct fscrypt_operations
  fscrypt: properly declare on-stack completion
  fscrypt: split supp and notsupp declarations into their own headers
  fscrypt: remove redundant assignment of res
  fscrypt: make fscrypt_operations.key_prefix a string
  fscrypt: remove unused 'mode' member of fscrypt_ctx
  ext4: don't allow encrypted operations without keys
  fscrypt: make test_dummy_encryption require a keyring key
  fscrypt: factor out bio specific functions
  fscrypt: pass up error codes from ->get_context()
  fscrypt: remove user-triggerable warning messages
  fscrypt: use EEXIST when file already uses different policy
  fscrypt: use ENOTDIR when setting encryption policy on nondirectory
  fscrypt: use ENOKEY when file cannot be created w/o key
2017-02-20 18:22:31 -08:00
Eric Biggers
6f69f0ed61 fscrypt: constify struct fscrypt_operations
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Richard Weinberger <richard@nod.at>
2017-02-08 10:59:57 -05:00
Eric Biggers
46f47e4800 fscrypt: split supp and notsupp declarations into their own headers
Previously, each filesystem configured without encryption support would
define all the public fscrypt functions to their notsupp_* stubs.  This
list of #defines had to be updated in every filesystem whenever a change
was made to the public fscrypt functions.  To make things more
maintainable now that we have three filesystems using fscrypt, split the
old header fscrypto.h into several new headers.  fscrypt_supp.h contains
the real declarations and is included by filesystems when configured
with encryption support, whereas fscrypt_notsupp.h contains the inline
stubs and is included by filesystems when configured without encryption
support.  fscrypt_common.h contains common declarations needed by both.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-06 23:26:43 -05:00
Jaegeuk Kim
4e6a8d9b22 f2fs: relax async discard commands more
This patch relaxes async discard commands to avoid waiting its end_io during
checkpoint.
Instead of waiting them during checkpoint, it will be done when actually reusing
them.

Test on initial partition of nvme drive.

 # time fstrim /mnt/test

Before : 6.158s
After : 4.822s

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
bb95d9ab2a f2fs: drop exist_data for inline_data when truncated to 0
A test program gets the SEEK_DATA with two values between
a new created file and the exist file on f2fs filesystem.

F2FS filesystem,  (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 8192)
SEEK_DATA size != 0 (offset = 4096)

PNFS filesystem, (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 4096)
SEEK_DATA size != 0 (offset = 4096)

int main(int argc, char **argv)
{
        char *filename = argv[1];
        int offset = 1, i = 0, fd = -1;

        if (argc < 2) {
                printf("Usage: %s f2fsfilename\n", argv[0]);
                return -1;
        }

        /*
        if (!access(filename, F_OK) || errno != ENOENT) {
                printf("Needs a new file for test, %m\n");
                return -1;
        }*/

        fd = open(filename, O_RDWR | O_CREAT, 0777);
        if (fd < 0) {
                printf("Create test file %s failed, %m\n", filename);
                return -1;
        }

        for (i = 0; i < 20; i++) {
                offset = 1 << i;
                ftruncate(fd, 0);
                lseek(fd, offset, SEEK_SET);
                write(fd, "test", 5);
                /* Get the alloc size by seek data equal zero*/
                if (lseek(fd, 0, SEEK_DATA)) {
                        printf("SEEK_DATA size != 0 (offset = %d)\n", offset);
                        break;
                }
        }

        close(fd);
        return 0;
}

Reported-and-Tested-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
363fa4e078 f2fs: don't allow encrypted operations without keys
This patch fixes the renaming bug on encrypted filenames, which was pointed by

 (ext4: don't allow encrypted operations without keys)

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
26a28a0c1e f2fs: show the max number of atomic operations
This patch adds to show the max number of atomic operations which are
conducting concurrently.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
ec91538dcc f2fs: get io size bit from mount option
This patch adds to set io_size_bits from mount option.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
0a595ebaaa f2fs: support IO alignment for DATA and NODE writes
This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.

Note that,
 - it requires "-o mode=lfs".
 - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
 - read IO is still 4KB.
 - do checkpoint at fsync, if dummy NODE page was written.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
554b5125f5 f2fs: add submit_bio tracepoint
This patch adds final submit_bio() tracepoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Jaegeuk Kim
9d52a504db f2fs: reassign new segment for mode=lfs
Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Yunlei He
650d3c4e56 f2fs: fix a missing discard prefree segments
If userspace issue a fstrim with a range not involve prefree segments,
it will reuse these segments without discard. This patch fix it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Geliang Tang
ed0b56209f f2fs: use rb_entry_safe
Use rb_entry_safe() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Yunlei He
746e240392 f2fs: add a case of no need to read a page in write begin
If the range we write cover the whole valid data in the last page,
we do not need to read it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Yunlei He
7855eba4d6 f2fs: fix a problem of using memory after free
This patch fix a problem of using memory after free
in function __try_merge_extent_node.

Fixes: 0f825ee6e8 ("f2fs: add new interfaces for extent tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:01 +09:00
Dan Carpenter
07fe8d4440 f2fs: remove unneeded condition
We checked that "inode" is not an error pointer earlier so there is
no need to check again here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:00 +09:00
Chao Yu
5c9e418436 f2fs: don't cache nat entry if out of memory
If we run out of memory, in cache_nat_entry, it's better to avoid loop
for allocating memory to cache nat entry, so in low memory scenario, for
read path of node block, I expect this can avoid unneeded latency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:00 +09:00
Yunlei He
fed2466848 f2fs: remove unused values in recover_fsync_data
This patch remove unused values in function recover_fsync_data

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-01-29 12:46:00 +09:00
Damien Le Moal
f99e86485c block: Rename blk_queue_zone_size and bdev_zone_size
All block device data fields and functions returning a number of 512B
sectors are by convention named xxx_sectors while names in the form
xxx_size are generally used for a number of bytes. The blk_queue_zone_size
and bdev_zone_size functions were not following this convention so rename
them.

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Collapsed the two patches, they were nonsensically split and broke
bisection.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-12 07:58:32 -07:00
Eric Biggers
a5d431eff2 fscrypt: make fscrypt_operations.key_prefix a string
There was an unnecessary amount of complexity around requesting the
filesystem-specific key prefix.  It was unclear why; perhaps it was
envisioned that different instances of the same filesystem type could
use different key prefixes, or that key prefixes could be binary.
However, neither of those things were implemented or really make sense
at all.  So simplify the code by making key_prefix a const char *.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-01-08 01:03:41 -05:00
Eric Biggers
54475f531b fscrypt: use ENOKEY when file cannot be created w/o key
As part of an effort to clean up fscrypt-related error codes, make
attempting to create a file in an encrypted directory that hasn't been
"unlocked" fail with ENOKEY.  Previously, several error codes were used
for this case, including ENOENT, EACCES, and EPERM, and they were not
consistent between and within filesystems.  ENOKEY is a better choice
because it expresses that the failure is due to lacking the encryption
key.  It also matches the error code returned when trying to open an
encrypted regular file without the key.

I am not aware of any users who might be relying on the previous
inconsistent error codes, which were never documented anywhere.

This failure case will be exercised by an xfstest.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-31 16:26:20 -05:00
Linus Torvalds
231753ef78 Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull partial readlink cleanups from Miklos Szeredi.

This is the uncontroversial part of the readlink cleanup patch-set that
simplifies the default readlink handling.

Miklos and Al are still discussing the rest of the series.

* git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  vfs: make generic_readlink() static
  vfs: remove ".readlink = generic_readlink" assignments
  vfs: default to generic_readlink()
  vfs: replace calling i_op->readlink with vfs_readlink()
  proc/self: use generic_readlink
  ecryptfs: use vfs_get_link()
  bad_inode: add missing i_op initializers
2016-12-17 19:16:12 -08:00
Linus Torvalds
5084fdf081 This merge request includes the dax-4.0-iomap-pmd branch which is
needed for both ext4 and xfs dax changes to use iomap for DAX.  It
 also includes the fscrypt branch which is needed for ubifs encryption
 work as well as ext4 encryption and fscrypt cleanups.
 
 Lots of cleanups and bug fixes, especially making sure ext4 is robust
 against maliciously corrupted file systems --- especially maliciously
 corrupted xattr blocks and a maliciously corrupted superblock.  Also
 fix ext4 support for 64k block sizes so it works well on ppcle.  Fixed
 mbcache so we don't miss some common xattr blocks that can be merged.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlhQQVEACgkQ8vlZVpUN
 gaN9TQgAoCD+V4kJjMCFhiV8u6QR3hqD6bOZbggo5wJf4CHglWkmrbAmc3jANOgH
 CKsXDRRjxuDjPXf1ukB1i4M7ArLYjkbbzKdsu7lismoJLS+w8uwUKSNdep+LYMjD
 alxUcf5DCzLlUmdOdW4yE22L+CwRfqfs8IpBvKmJb7DrAKiwJVA340ys6daBGuu1
 63xYx0QIyPzq0xjqLb6TVf88HUI4NiGVXmlm2wcrnYd5966hEZd/SztOZTVCVWOf
 Z0Z0fGQ1WJzmaBB9+YV3aBi+BObOx4m2PUprIa531+iEW02E+ot5Xd4vVQFoV/r4
 NX3XtoBrT1XlKagy2sJLMBoCavqrKw==
 =j4KP
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "This merge request includes the dax-4.0-iomap-pmd branch which is
  needed for both ext4 and xfs dax changes to use iomap for DAX. It also
  includes the fscrypt branch which is needed for ubifs encryption work
  as well as ext4 encryption and fscrypt cleanups.

  Lots of cleanups and bug fixes, especially making sure ext4 is robust
  against maliciously corrupted file systems --- especially maliciously
  corrupted xattr blocks and a maliciously corrupted superblock. Also
  fix ext4 support for 64k block sizes so it works well on ppcle. Fixed
  mbcache so we don't miss some common xattr blocks that can be merged"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
  dax: Fix sleep in atomic contex in grab_mapping_entry()
  fscrypt: Rename FS_WRITE_PATH_FL to FS_CTX_HAS_BOUNCE_BUFFER_FL
  fscrypt: Delay bounce page pool allocation until needed
  fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page()
  fscrypt: Cleanup fscrypt_{decrypt,encrypt}_page()
  fscrypt: Never allocate fscrypt_ctx on in-place encryption
  fscrypt: Use correct index in decrypt path.
  fscrypt: move the policy flags and encryption mode definitions to uapi header
  fscrypt: move non-public structures and constants to fscrypt_private.h
  fscrypt: unexport fscrypt_initialize()
  fscrypt: rename get_crypt_info() to fscrypt_get_crypt_info()
  fscrypto: move ioctl processing more fully into common code
  fscrypto: remove unneeded Kconfig dependencies
  MAINTAINERS: fscrypto: recommend linux-fsdevel for fscrypto patches
  ext4: do not perform data journaling when data is encrypted
  ext4: return -ENOMEM instead of success
  ext4: reject inodes with negative size
  ext4: remove another test in ext4_alloc_file_blocks()
  Documentation: fix description of ext4's block_validity mount option
  ext4: fix checks for data=ordered and journal_async_commit options
  ...
2016-12-14 09:17:42 -08:00
Linus Torvalds
09cb6464fe for-f2fs-4.10
This patch series contains several performance tuning patches regarding to the
 IO submission flow, in addition to supporting new features such as a ZBC-base
 drive and multiple devices.
 
 It also includes some major bug fixes such as:
  - checkpoint version control
  - fdatasync-related roll-forward recovery routine
  - memory boundary or null-pointer access in corner cases
  - missing error cases
 
 It has various minor clean-up patches as well.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYTx44AAoJEEAUqH6CSFDSnAQP/jeYJq5Zd0bweEF5g00Ec1Qg
 qNKQ57e9EHDRaDLBUmHHEaCEPRL0bw6SOUUWWqzGA07KcsIK+Yb/dGAyIcuV7WMl
 PjntVbYm4yARDYBHGupdOCzFSkzr8gDalb+98jJnoGUonsftljhES9jedQ1NjAms
 GFPHDNtirZM/r0bjKkYKjpqJ6FCxFxcGPfb/GtohDajIpohWfKZiemaXGTgtYR4d
 iBVek16h+Hprz90ycZBY69uz0TdAwu/gb+htMVBrAdExHWvlFzgp35OIywiAB/YX
 3QD/x4t2HqOBaNYiiOAY4ukVW/Yyqa/ZAzbm+m5B5CAcFYiWXMy+cMXUY9HJJ/K0
 wdvi//Avtvgpp2PVZFn2pASx14vgMFylBzuNgKpP6MPdtWTEL33jT7VYs9Nuz45E
 dgZ9IpiDt4DeTRuZ4mPO5iH7bVHPvAVV80bpXzirCCzDeNZ1EFFIQzXh/2UAmCxI
 twPXGBIYul0aIl9JkWAyhCZSd3XDSqedpfPudknjhzM9Xb1H5X0QJco7f/UwsWXH
 WxV6lHr1Q7UH96wJ7x/GAqj8ArOAASRV18+K51dqU+DWHnFPpBArJe39FVf8NGWs
 Fz1ZmlWBQ0ZgzvLkGa80llhjalXIEy/JabMrpy6VrzQGxHdmW4cVxe4dJ3710WxX
 VysJUcNMRKxMUTWOKsxp
 =Boum
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch series contains several performance tuning patches
  regarding to the IO submission flow, in addition to supporting new
  features such as a ZBC-base drive and multiple devices.

  It also includes some major bug fixes such as:
   - checkpoint version control
   - fdatasync-related roll-forward recovery routine
   - memory boundary or null-pointer access in corner cases
   - missing error cases

  It has various minor clean-up patches as well"

* tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
  f2fs: fix a missing size change in f2fs_setattr
  f2fs: fix to access nullified flush_cmd_control pointer
  f2fs: free meta pages if sanity check for ckpt is failed
  f2fs: detect wrong layout
  f2fs: call sync_fs when f2fs is idle
  Revert "f2fs: use percpu_counter for # of dirty pages in inode"
  f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
  f2fs: do not activate auto_recovery for fallocated i_size
  f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
  f2fs: fix 32-bit build
  f2fs: set ->owner for debugfs status file's file_operations
  f2fs: fix incorrect free inode count in ->statfs
  f2fs: drop duplicate header timer.h
  f2fs: fix wrong AUTO_RECOVER condition
  f2fs: do not recover i_size if it's valid
  f2fs: fix fdatasync
  f2fs: fix to account total free nid correctly
  f2fs: fix an infinite loop when flush nodes in cp
  f2fs: don't wait writeback for datas during checkpoint
  f2fs: fix wrong written_valid_blocks counting
  ...
2016-12-14 09:07:36 -08:00
Linus Torvalds
36869cb93d Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main block pull request this series. Contrary to previous
  release, I've kept the core and driver changes in the same branch. We
  always ended up having dependencies between the two for obvious
  reasons, so makes more sense to keep them together. That said, I'll
  probably try and keep more topical branches going forward, especially
  for cycles that end up being as busy as this one.

  The major parts of this pull request is:

   - Improved support for O_DIRECT on block devices, with a small
     private implementation instead of using the pig that is
     fs/direct-io.c. From Christoph.

   - Request completion tracking in a scalable fashion. This is utilized
     by two components in this pull, the new hybrid polling and the
     writeback queue throttling code.

   - Improved support for polling with O_DIRECT, adding a hybrid mode
     that combines pure polling with an initial sleep. From me.

   - Support for automatic throttling of writeback queues on the block
     side. This uses feedback from the device completion latencies to
     scale the queue on the block side up or down. From me.

   - Support from SMR drives in the block layer and for SD. From Hannes
     and Shaun.

   - Multi-connection support for nbd. From Josef.

   - Cleanup of request and bio flags, so we have a clear split between
     which are bio (or rq) private, and which ones are shared. From
     Christoph.

   - A set of patches from Bart, that improve how we handle queue
     stopping and starting in blk-mq.

   - Support for WRITE_ZEROES from Chaitanya.

   - Lightnvm updates from Javier/Matias.

   - Supoort for FC for the nvme-over-fabrics code. From James Smart.

   - A bunch of fixes from a whole slew of people, too many to name
     here"

* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
  blk-stat: fix a few cases of missing batch flushing
  blk-flush: run the queue when inserting blk-mq flush
  elevator: make the rqhash helpers exported
  blk-mq: abstract out blk_mq_dispatch_rq_list() helper
  blk-mq: add blk_mq_start_stopped_hw_queue()
  block: improve handling of the magic discard payload
  blk-wbt: don't throttle discard or write zeroes
  nbd: use dev_err_ratelimited in io path
  nbd: reset the setup task for NBD_CLEAR_SOCK
  nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
  nvme-fabrics: Add target support for FC transport
  nvme-fabrics: Add host support for FC transport
  nvme-fabrics: Add FC transport LLDD api definitions
  nvme-fabrics: Add FC transport FC-NVME definitions
  nvme-fabrics: Add FC transport error codes to nvme.h
  Add type 0x28 NVME type code to scsi fc headers
  nvme-fabrics: patch target code in prep for FC transport support
  nvme-fabrics: set sqe.command_id in core not transports
  parser: add u64 number parser
  nvme-rdma: align to generic ib_event logging helper
  ...
2016-12-13 10:19:16 -08:00
Yunlei He
c0ed4405a9 f2fs: fix a missing size change in f2fs_setattr
This patch fix a missing size change in f2fs_setattr

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-12 11:09:05 -08:00
David Gstir
bd7b829038 fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page()
Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which,
when set, indicates that the fs uses pages under its own control as
opposed to writeback pages which require locking and a bounce buffer for
encryption.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-11 16:26:12 -05:00
Eric Biggers
db717d8e26 fscrypto: move ioctl processing more fully into common code
Multiple bugs were recently fixed in the "set encryption policy" ioctl.
To make it clear that fscrypt_process_policy() and fscrypt_get_policy()
implement ioctls and therefore their implementations must take standard
security and correctness precautions, rename them to
fscrypt_ioctl_set_policy() and fscrypt_ioctl_get_policy().  Make the
latter take in a struct file * to make it consistent with the former.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-11 16:26:07 -05:00
Miklos Szeredi
dfeef68862 vfs: remove ".readlink = generic_readlink" assignments
If .readlink == NULL implies generic_readlink().

Generated by:

to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-09 16:45:04 +01:00
Jaegeuk Kim
5eba8c5d1f f2fs: fix to access nullified flush_cmd_control pointer
f2fs_sync_file()             remount_ro
 - f2fs_readonly
                               - destroy_flush_cmd_control
 - f2fs_issue_flush
   - no fcc pointer!

So, this patch doesn't free fcc in this case, but just stop its kernel thread
which sends flush commands.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07 18:56:50 -08:00
Jaegeuk Kim
a2125ff7dd f2fs: free meta pages if sanity check for ckpt is failed
This fixes missing freeing meta pages in the error case.

Tested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07 14:38:16 -08:00
Jaegeuk Kim
2040fce83f f2fs: detect wrong layout
Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect
that as well.

Refer this in f2fs-tools.

mkfs.f2fs: detect small partition by overprovision ratio and # of segments

Reported-and-Tested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07 14:37:33 -08:00
Jaegeuk Kim
f455c8a5f0 f2fs: call sync_fs when f2fs is idle
The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05 11:44:07 -08:00
Jaegeuk Kim
204706c7ac Revert "f2fs: use percpu_counter for # of dirty pages in inode"
This reverts commit 1beba1b3a9.

The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05 11:43:59 -08:00
Chao Yu
0002b61bda f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29 15:43:00 -08:00
Jaegeuk Kim
26787236b3 f2fs: do not activate auto_recovery for fallocated i_size
If a file needs to keep its i_size by fallocate, we need to turn off auto
recovery during roll-forward recovery.

This will resolve the below scenario.

1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
3. md5sum /mnt/f2fs/file;
4. godown /mnt/f2fs/
5. umount /mnt/f2fs/
6. mount -t f2fs /dev/sdx /mnt/f2fs
7. md5sum /mnt/f2fs/file

Reported-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29 15:42:58 -08:00
Jaegeuk Kim
8508e44ae9 f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
We don't guarantee cp_addr is fixed by cp_version.
This is to sync with f2fs-tools.

Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-28 13:39:58 -08:00
Arnd Bergmann
19c526515f f2fs: fix 32-bit build
The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED
on 32-bit machines because of a 64-bit division:

fs/f2fs/f2fs.o: In function `__issue_discard_async':
extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod'

Fortunately, bdev_zone_size() is guaranteed to return a power-of-two
number, so we can replace the % operator with a cheaper bit mask.

Fixes: 792b84b74b54 ("f2fs: support multiple devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:09 -08:00
Nicolai Stange
05e6ea2685 f2fs: set ->owner for debugfs status file's file_operations
The struct file_operations instance serving the f2fs/status debugfs file
lacks an initialization of its ->owner.

This means that although that file might have been opened, the f2fs module
can still get removed. Any further operation on that opened file, releasing
included,  will cause accesses to unmapped memory.

Indeed, Mike Marshall reported the following:

  BUG: unable to handle kernel paging request at ffffffffa0307430
  IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
  <...>
  Call Trace:
   [] __fput+0xdf/0x1d0
   [] ____fput+0xe/0x10
   [] task_work_run+0x8e/0xc0
   [] do_exit+0x2ae/0xae0
   [] ? __audit_syscall_entry+0xae/0x100
   [] ? syscall_trace_enter+0x1ca/0x310
   [] do_group_exit+0x44/0xc0
   [] SyS_exit_group+0x14/0x20
   [] do_syscall_64+0x61/0x150
   [] entry_SYSCALL64_slow_path+0x25/0x25
  <...>
  ---[ end trace f22ae883fa3ea6b8 ]---
  Fixing recursive fault but reboot is needed!

Fix this by initializing the f2fs/status file_operations' ->owner with
THIS_MODULE.

This will allow debugfs to grab a reference to the f2fs module upon any
open on that file, thus preventing it from getting removed.

Fixes: 902829aa0b ("f2fs: move proc files to debugfs")
Reported-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:08 -08:00
Chao Yu
b08b12d2dd f2fs: fix incorrect free inode count in ->statfs
While calculating inode count that we can create at most in the left space,
we should consider space which data/node blocks occupied, since we create
data/node mixly in main area. So fix the wrong calculation in ->statfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:07 -08:00
Geliang Tang
b4ceec2921 f2fs: drop duplicate header timer.h
Drop duplicate header timer.h from segment.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:06 -08:00
Jaegeuk Kim
97dd26ad83 f2fs: fix wrong AUTO_RECOVER condition
If i_size is not aligned to the f2fs's block size, we should not skip inode
update during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:05 -08:00
Jaegeuk Kim
3a3a5ead7b f2fs: do not recover i_size if it's valid
If i_size is already valid during roll_forward recovery, we should not update
it according to the block alignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:04 -08:00
Chao Yu
281518c694 f2fs: fix fdatasync
For below two cases, we can't guarantee data consistence:

a)
1. xfs_io "pwrite 0 4195328" "fsync"
2. xfs_io "pwrite 4195328 1024" "fdatasync"
3. godown
4. umount & mount
--> isize we updated before fdatasync won't be recovered

b)
1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
2. xfs_io "fpunch 4194304 4096" "fdatasync"
3. godown
4. umount & mount
--> dnode we punched before fdatasync won't be recovered

The reason is that normally fdatasync won't be aware of modification
of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
we will skip flushing node pages for above cases, result in making
fdatasynced file being lost during recovery.

Currently we have introduced DIRTY_META global list in sbi for tracking
dirty inode selectively, so in fdatasync we can choose to flush nodes
depend on dirty state of current inode in the list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:03 -08:00
Chao Yu
04d47e6738 f2fs: fix to account total free nid correctly
Thread A		Thread B		Thread C
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
    alloc last nid
  - f2fs_unlock_op
			- f2fs_create
			 - f2fs_new_inode
			  - f2fs_lock_op
			   - alloc_nid
			    as node count still not
			    be increased, we will
			    loop in alloc_nid
						- f2fs_write_node_pages
						 - f2fs_balance_fs_bg
						  - f2fs_sync_fs
						   - write_checkpoint
						    - block_operations
						     - f2fs_lock_all
 - f2fs_lock_op

While creating new inode, we do not allocate and account nid atomically,
so that when there is almost no free nids left, we may encounter deadloop
like above stack.

In order to avoid that, reuse nm_i::available_nids for accounting free nids
and make nid allocation and counting being atomical during node creation.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:01 -08:00
Yunlei He
d40a43af0a f2fs: fix an infinite loop when flush nodes in cp
Thread A			Thread B

- write_checkpoint
 - block_operations
   -blk_start_plug
    -sync_node_pages		- f2fs_do_sync_file
				 - fsync_node_pages
				  - f2fs_wait_on_page_writeback

Thread A wait for global F2FS_DIRTY_NODES decreased to zero,
it start a plug list, some requests have been added to this list.
Thread B lock one dirty node page, and wait this page write back.
But this page has been in plug list of thread A with PG_writeback flag.
Thread A keep on running and its plug list has no chance to finish,
so it seems a deadlock between cp and fsync path.

This patch add a wait on page write back before set node page dirty
to avoid this problem.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:16:00 -08:00
Chao Yu
36951b38d1 f2fs: don't wait writeback for datas during checkpoint
Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.

Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:59 -08:00
Jaegeuk Kim
c79b7ff1d3 f2fs: fix wrong written_valid_blocks counting
Previously, written_valid_blocks was got by ckpt->valid_block_count. But if
the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value.
Fix it to get the number from actual written block count from sit entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:58 -08:00
Jaegeuk Kim
7702bdbe50 f2fs: avoid BG_GC in f2fs_balance_fs
If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
time, all the threads would do FG_GC or BG_GC.
In this critical path, we totally don't need to do BG_GC at all.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:57 -08:00
Jaegeuk Kim
c040ff9d69 f2fs: fix redundant block allocation
In direct_IO path of f2fs_file_write_iter(),
1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO)
   -> allocate LBA X
2. f2fs_direct_IO()
   -> return 0;

Then,
f2fs_write_data_page() will allocate another LBA X+1.

This makes EIO triggered by HM-SMR.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:50 -08:00
Jaegeuk Kim
a7de608691 f2fs: use err for f2fs_preallocate_blocks
This patch has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:14 -08:00
Jaegeuk Kim
3c62be17d4 f2fs: support multiple devices
This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:13 -08:00
Jaegeuk Kim
e57e9ae5b1 f2fs: allow dio read for LFS mode
We can allow dio reads for LFS mode, while doing buffered writes for dio writes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:12 -08:00
Jaegeuk Kim
6ae1be13e8 f2fs: revert segment allocation for direct IO
Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.

Revert: 38aa0889b2 (f2fs: align direct_io'ed data to section)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25 10:15:02 -08:00
Yunlei He
2061471128 f2fs: return directly if block has been removed from the victim
If one block has been to written to a new place, just return
in move data process. This patch check it again with holding
page lock.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:31 -08:00
Chao Yu
d47b871595 Revert "f2fs: do not recover from previous remained wrong dnodes"
i_times of inode will be set with current system time which can be
configured through 'date', so it's not safe to judge dnode block as
garbage data or unchanged inode depend on i_times.

Now, we have used enhanced 'cp_ver + cp' crc method to verify valid
dnode block, so I expect recoverying invalid dnode is almost not
possible.

This reverts commit 807b1e1c8e.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:30 -08:00
Jaegeuk Kim
b4b9d34c85 f2fs: remove checkpoint in f2fs_freeze
The generic freeze_super() calls sync_filesystems() before f2fs_freeze().
So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068,
it triggers circular locking problem below due to gc_mutex for checkpoint.

======================================================
[ INFO: possible circular locking dependency detected ]
4.9.0-rc1+ #132 Tainted: G           OE
-------------------------------------------------------

1. wait for __sb_start_write() by

 [<ffffffff9845f353>] dump_stack+0x85/0xc2
 [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230
 [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0
 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff98289991>] iput+0x171/0x2c0
 [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
 [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs]
 [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs]
 [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs]
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30
 [<ffffffff9826ca3e>] iterate_supers+0xae/0x100
 [<ffffffff982a50b5>] sys_sync+0x55/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

2. wait for sbi->gc_mutex by

 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0
 [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs]
 [<ffffffff9826b6ef>] freeze_super+0xcf/0x190
 [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0
 [<ffffffff9827f099>] SyS_ioctl+0x79/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:29 -08:00
Jaegeuk Kim
bdb7d964c4 f2fs: assign segments correctly for direct_io
Previously, we assigned CURSEG_WARM_DATA for direct_io, but if we have two or
four logs, we do not use that type at all.
Let's fix it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:28 -08:00
Chao Yu
9f0552e078 f2fs: fix wrong i_atime recovery
Shouldn't update in-memory i_atime with on-disk i_mtime of inode when
recovering inode.

Shuoran found this bug which is hidden for a long time, honour is belong
to him.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:26 -08:00
Chao Yu
60dcedc997 f2fs: record inode updating status correctly
We should record updating status of inode only for living inode, for those
unlinked inode it needs to clear its ino cache, otherwise after the ino
was been reused, it will cause unneeded node page writing during ->fsync.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:25 -08:00
Damien Le Moal
126606c7a9 f2fs: Trace reset zone events
Similarly to the regular discard, trace zone reset events.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:24 -08:00
Damien Le Moal
f46e8809e8 f2fs: Reset sequential zones on zoned block devices
When a zoned block device is mounted, discarding sections
contained in sequential zones must reset the zone write pointer.
For sections contained in conventional zones, the regular discard
is used if the drive supports it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:23 -08:00
Damien Le Moal
178053e2f1 f2fs: Cache zoned block devices zone type
With the zoned block device feature enabled, section discard
need to do a zone reset for sections contained in sequential
zones, and a regular discard (if supported) for sections
stored in conventional zones. Avoid the need for a costly
report zones to obtain a section zone type when discarding it
by caching the types of the device zones in the super block
information. This cache is initialized at mount time for mounts
with the zoned block device feature enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:22 -08:00
Damien Le Moal
3adc57e977 f2fs: Do not allow adaptive mode for host-managed zoned block devices
The LFS mode is mandatory for host-managed zoned block devices as
update in place optimizations are not possible for segments in
sequential zones.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:20 -08:00
Damien Le Moal
96ba2decb4 f2fs: Always enable discard for zoned blocks devices
Zone write pointer reset acts as discard for zoned block
devices. So if the zoned block device feature is enabled,
always declare that discard is enabled, even if the device
does not actually support the command.
For the same reason, prevent the use the "nodicard" mount
option.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:19 -08:00
Damien Le Moal
0ab0299835 f2fs: Suppress discard warning message for zoned block devices
For zoned block devices, discard is replaced by zone reset. So
do not warn if the device does not supports discard.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:18 -08:00
Damien Le Moal
d1b959c877 f2fs: Check zoned block feature for host-managed zoned block devices
The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted
 with zone alignment optimization. This is optional for host-aware
devices, but mandatory for host-managed zoned block devices.
So check that the feature is set in this latter case.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:17 -08:00
Damien Le Moal
0bfd7a091c f2fs: Use generic zoned block device terminology
SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:16 -08:00
Damien Le Moal
487df616de f2fs: Add missing break in switch-case
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:14 -08:00
Jaegeuk Kim
099228000e f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes
This patch should fix an infinite loop case below.

F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs]
F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix.
...
[<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs]
[<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs]
[<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs]
[<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs]
[<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
[<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs]
[<ffffffffb41dff21>] do_writepages+0x21/0x30
[<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760
[<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190
[<ffffffffb42a0959>] write_inode_now+0x99/0xc0
[<ffffffffb4289a16>] iput+0x1f6/0x2c0
[<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs]
[<ffffffffb426c394>] ? sget_userns+0x4f4/0x530
[<ffffffffb426c692>] mount_bdev+0x182/0x1b0
[<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffffb426d038>] mount_fs+0x38/0x170
[<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160
[<ffffffffb4291d9e>] do_mount+0x1be/0xd60
[<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220
[<ffffffffb4292c54>] SyS_mount+0x94/0xd0
[<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:13 -08:00
Chao Yu
ed6bd4b146 f2fs: report error of f2fs_fill_dentries
Report error of f2fs_fill_dentries to ->iterate_shared, otherwise when
error ocurrs, user may just list part of dirents in target directory
without any hints.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:12 -08:00
Arnd Bergmann
230436b3ef f2fs: hide a maybe-uninitialized warning
gcc is unsure about the use of last_ofs_in_node, which might happen
without a prior initialization:

fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’:
fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {

As pointed out by Chao Yu, the code is actually correct as 'prealloc'
is only set if the last_ofs_in_node has been set, the two always
get updated together.

This initializes last_ofs_in_node to dn.ofs_in_node for each
new dnode at the start of the 'next_block' loop, which at that
point is a correct initialization as well. I assume that compilers
that correctly track the contents of the variables and do not
warn about the condition also figure out that they can eliminate
the extra assignment here.

Fixes: 46008c6d42 ("f2fs: support in batch multi blocks preallocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:11 -08:00
Jaegeuk Kim
35782b233f f2fs: remove percpu_count due to performance regression
This patch removes percpu_count usage due to performance regression in iozone.

Fixes: 523be8a6b3 ("f2fs: use percpu_counter for page counters")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:10 -08:00
Jaegeuk Kim
18340edc8d f2fs: make clean inodes when flushing inode page
This patch tries to make more clean inodes when flushing dirty inodes in
checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:09 -08:00
Jaegeuk Kim
7c45729a4d f2fs: keep dirty inodes selectively for checkpoint
This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.
1. mount with lazytime option
2. fill 4k file until disk is full
3. sync filesystem
4. read all files in the image
5. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:08 -08:00
Jaegeuk Kim
664ba972df f2fs: use BIO_MAX_PAGES for bio allocation
We don't need to allocate bio partially in order to maximize sequential writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:07 -08:00
Jaegeuk Kim
3e7b5bbbef f2fs: declare static function for __build_free_nids
This patch avoids build warning.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:06 -08:00
Jaegeuk Kim
15d0435455 f2fs: call f2fs_balance_fs for setattr
If inode becomes dirty, we need to check the # of dirty inodes whether or not
further checkpoint would be required.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:05 -08:00
Jaegeuk Kim
b9610bdfcb f2fs: count dirty inodes to flush node pages during checkpoint
If there are a lot of dirty inodes, we need to flush all of them when doing
checkpoint. So, we need to count this for enough free space.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:04 -08:00
Chao Yu
02110a4fd5 f2fs: avoid casted negative value as shrink count
This patch makes sure it returns a positive value instead of a probable
casted negative value as shrink count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:03 -08:00
Chao Yu
3a2ad5672b f2fs: don't interrupt free nids building during nid allocation
Let build_free_nids support sync/async methods, in allocation flow of nids,
we use synchronuous method, so that we can avoid looping in alloc_nid when
free memory is low; in unblock_operations and f2fs_balance_fs_bg we use
asynchronuous method in where low memory condition can interrupt us.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:02 -08:00
Jaegeuk Kim
eb0aa4b807 f2fs: clean up free nid list operations
This patch cleans up to use consistent free nid list ops.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:01 -08:00
Chao Yu
b8559dc242 f2fs: split free nid list
During free nid allocation, in order to do preallocation, we will tag free
nid entry as allocated one and still leave it in free nid list, for other
allocators who want to grab free nids, it needs to traverse the free nid
list for lookup. It becomes overhead in scenario of allocating free nid
intensively by multithreads.

This patch splits free nid list to two list: {free,alloc}_nid_list, to
keep free nids and preallocated free nids separately, after that, traverse
latency will be gone, besides split nid_cnt for separate statistic.

Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:11:00 -08:00
Chao Yu
a11b9f65ea f2fs: clear nlink if fail to add_link
We don't need to keep incomplete created inode in cache, so if we fail to
add link into directory during new inode creation, it's better to set
nlink of inode to zero, then we can evict inode immediately. Otherwise
release of nid belong to inode will be delayed until inode cache is being
shrunk, it may cause a seemingly endless loop while allocating free nids
in time of testing generic/269 case of fstest suit.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add update_inode_page to fix kernel panic]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:59 -08:00
Eric Biggers
0c0b471e43 f2fs: fix sparse warnings
f2fs contained a number of endianness conversion bugs.

Also, one function should have been 'static'.

Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/'

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:57 -08:00
Chao Yu
9de6927975 f2fs: fix error handling in fsync_node_pages
In fsync_node_pages, if f2fs was taged with CP_ERROR_FLAG, make sure bio
cache was flushed before return.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:56 -08:00
Chao Yu
b691d98fdd f2fs: fix to update largest extent under lock
In order to avoid racing problem, make largest extent cache being updated
under lock.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:55 -08:00
Chao Yu
58736fa60f f2fs: be aware of extent beyond EOF in fiemap
f2fs can support fallocating blocks beyond file size without changing the
size, but ->fiemap of f2fs was restricted and can't detect these extents
fallocated past EOF, now relieve the restriction.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:54 -08:00
Chao Yu
6f2d8ed654 f2fs: don't miss any f2fs_balance_fs cases
In f2fs_map_blocks, let f2fs_balance_fs detects node page modification
with dn.node_changed to avoid miss some corner cases.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:53 -08:00
Chao Yu
9434fcde1f f2fs: add missing f2fs_balance_fs in f2fs_zero_range
f2fs_balance_fs should be called in between node page updating, otherwise
node page count will exceeded far beyond watermark of triggering
foreground garbage collection, result in facing high risk of hitting LFS
allocation failure.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:52 -08:00
Chao Yu
933439c8f3 f2fs: give a chance to detach from dirty list
If there is no dirty pages in inode, we should give a chance to detach
the inode from global dirty list, otherwise it needs to call another
unnecessary .writepages for detaching.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:51 -08:00
Chao Yu
2dd15654ac f2fs: fix to release discard entries during checkpoint
In f2fs_fill_super, if there is any IO error occurs during recovery,
cached discard entries will be leaked, in order to avoid this, make
write_checkpoint() handle memory release by itself, besides, move
clear_prefree_segments to write_checkpoint for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:50 -08:00
Chao Yu
2411cf5bef f2fs: exclude free nids building and allocation
During nid allocation, it needs to exclude building and allocating flow
of free nids, this is because while building free nid cache, there are two
steps: a) load free nids from unused nat entries in NAT pages, b) update
free nid cache by checking nat journal. The two steps should be atomical,
otherwise an used nid can be allocated as free one after a) and before b).

This patch adds missing lock which covers build_free_nids in
unlock_operation and f2fs_balance_fs_bg to avoid that.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:49 -08:00
Jaegeuk Kim
e87f7329bb f2fs: fix overflow due to condition check order
In the last ilen case, i was already increased, resulting in accessing out-
of-boundary entry of do_replace and blkaddr.
Fix to check ilen first to exit the loop.

Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-23 12:10:48 -08:00
David Gstir
9c4bb8a3a9 fscrypt: Let fs select encryption index/tweak
Avoid re-use of page index as tweak for AES-XTS when multiple parts of
same page are encrypted. This will happen on multiple (partial) calls of
fscrypt_encrypt_page on same page.
page->index is only valid for writeback pages.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 20:18:16 -05:00
David Gstir
7821d4dd45 fscrypt: Enable partial page encryption
Not all filesystems work on full pages, thus we should allow them to
hand partial pages to fscrypt for en/decryption.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 18:55:21 -05:00
Jens Axboe
7637241e65 writeback: add wbc_to_write_flags()
Add wbc_to_write_flags(), which returns the write modifier flags to use,
based on a struct writeback_control. No functional changes in this
patch, but it prepares us for factoring other wbc fields for write type.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-11-02 10:24:03 -06:00
Christoph Hellwig
70fd76140a block,fs: use REQ_* flags directly
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Christoph Hellwig
ef295ecf09 block: better op and flags encoding
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields.  This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.

In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.

Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits.  Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28 08:48:16 -06:00
Linus Torvalds
1a1891d762 This includes fixing a bug which references a wrong pointer, sum_page, in
f2fs_gc. It was newly introduced in 4.9-rc1.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYBoL+AAoJEEAUqH6CSFDSQEoP/irufW4HDUszwKCxISLGksBN
 /i85zfbL9pY11Ci78qW4N5pegd2ouKk9WtUdrtXwT6y8Y+CWs9Gx5FaYbdA8aj7T
 nQBfRlfN/zKIJkRqqqXh/YRTMyQfUticur0dWZ00l7wpGAxA6Xhe+QDV1T7rxQxZ
 W8Ne9jcAD+SLu8G5Ci8zOTDcK+q6SWeXFNFtM1MPqr/S86PXiTRmlWFNcubACBi9
 iqLl5moD++oYBJSU6sqxaKXvf27GhJMZUGp+upYT9lEIGF98vyhw+yXT64HYDvrW
 lMJspz80m5CRT3Pz7nxG+1d6ctONbXwDjyBPeEINUlHi4DN07UJZCJUnUWyMYM8D
 JffKq4DGhmsh0CJoX5pKjzxDhuHtrWwN+d/gMuUZrgC7oecXs0U5O2WR49oMGCs8
 MMh6T/MUh1fWBI7F1Y3+OvyshU5XA9LCcNguu0T/RSKsjvqxkYTlpS/Zo1U/mpqv
 ar34mlvLXlLQNj8gTYDNybq/ufBfS2Fzq+juWbXCaVZ4OV0rsu66D9aUInQ66Iy6
 1EqjlLaqAwTF+DJsyGeAxNEn6tMPiqMVhK9LTT9dz5Uk5vK4FdvuDxqeFuwwTnCi
 xYHdsZn/yhs7Nn6akwOclJayegDKiV3OVEG4ZyUHrXVR+njFTJyW20VcuUvN9RCn
 AxHW5MI9jaU/U+O4Opbb
 =1MpQ
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs bugfix from Jaegeuk Kim:
 "This fixes a bug which referenced the wrong pointer, sum_page, in
  f2fs_gc.  It was newly introduced in 4.9-rc1.

* tag 'for-f2fs-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: fix wrong sum_page pointer in f2fs_gc
2016-10-18 14:15:23 -07:00
Jaegeuk Kim
de0dcc40f6 f2fs: fix wrong sum_page pointer in f2fs_gc
This patch fixes using a wrong pointer for sum_page in f2fs_gc.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-12 16:23:36 -07:00
Michal Hocko
5114a97a8b fs: use mapping_set_error instead of opencoded set_bit
The mapping_set_error() helper sets the correct AS_ flag for the mapping
so there is no reason to open code it.  Use the helper directly.

[akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Linus Torvalds
101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Al Viro
3873691e5a Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
Linus Torvalds
97d2116708 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
 "xattr stuff from Andreas

  This completes the switch to xattr_handler ->get()/->set() from
  ->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Remove {get,set,remove}xattr inode operations
  xattr: Stop calling {get,set,remove}xattr inode operations
  vfs: Check for the IOP_XATTR flag in listxattr
  xattr: Add __vfs_{get,set,remove}xattr helpers
  libfs: Use IOP_XATTR flag for empty directory handling
  vfs: Use IOP_XATTR flag for bad-inode handling
  vfs: Add IOP_XATTR inode operations flag
  vfs: Move xattr_resolve_name to the front of fs/xattr.c
  ecryptfs: Switch to generic xattr handlers
  sockfs: Get rid of getxattr iop
  sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
  kernfs: Switch to generic xattr handlers
  hfs: Switch to generic xattr handlers
  jffs2: Remove jffs2_{get,set,remove}xattr macros
  xattr: Remove unnecessary NULL attribute name check
2016-10-10 17:11:50 -07:00
Linus Torvalds
abb5a14fa2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted misc bits and pieces.

  There are several single-topic branches left after this (rename2
  series from Miklos, current_time series from Deepa Dinamani, xattr
  series from Andreas, uaccess stuff from from me) and I'd prefer to
  send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
  proc: switch auxv to use of __mem_open()
  hpfs: support FIEMAP
  cifs: get rid of unused arguments of CIFSSMBWrite()
  posix_acl: uapi header split
  posix_acl: xattr representation cleanups
  fs/aio.c: eliminate redundant loads in put_aio_ring_file
  fs/internal.h: add const to ns_dentry_operations declaration
  compat: remove compat_printk()
  fs/buffer.c: make __getblk_slow() static
  proc: unsigned file descriptors
  fs/file: more unsigned file descriptors
  fs: compat: remove redundant check of nr_segs
  cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
  cifs: don't use memcpy() to copy struct iov_iter
  get rid of separate multipage fault-in primitives
  fs: Avoid premature clearing of capabilities
  fs: Give dentry to inode_change_ok() instead of inode
  fuse: Propagate dentry down to inode_change_ok()
  ceph: Propagate dentry down to inode_change_ok()
  xfs: Propagate dentry down to inode_change_ok()
  ...
2016-10-10 13:04:49 -07:00
Al Viro
e55f1d1d13 Merge remote-tracking branch 'jk/vfs' into work.misc 2016-10-08 11:06:08 -04:00
Andreas Gruenbacher
fd50ecaddf vfs: Remove {get,set,remove}xattr inode operations
These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-07 21:48:36 -04:00
Linus Torvalds
2eee010d09 Lots of bug fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJX9pA6AAoJEPL5WVaVDYGj7fwH/0YcdQWBg0O5d7iXFnTcimh9
 fiYkqKniBWQhgBAOFPMoNPRIW4tyeQmTtu8Rywx2Hr+v4lzJvuOaT18NDANdq/pp
 u5eDrnJ4R+uqPJlgxVOzopLVJ6I2glgSSRdvAKYxwTYcv8F88ObzVfsJ4M415gPq
 cbEKF+JT3l5hTGENR5sqmYvHYaNfOFkOqt4gulPtgk1eshy+BH/05M+qBSeA5a6k
 srdon0pFRoUV68m+T4G8FqOZxdybeT5Yx6X0GJf0eQJoX7IaiQTPcDrXzlrbDBbN
 rrzbpwsDeDKtgSOckbarCBroZKdToHFekfnOJ7IPWYq8IwYTSnZKFCWIRKO6z38=
 =IvhS
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Lots of bug fixes and cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
  ext4: remove unused variable
  ext4: use journal inode to determine journal overhead
  ext4: create function to read journal inode
  ext4: unmap metadata when zeroing blocks
  ext4: remove plugging from ext4_file_write_iter()
  ext4: allow unlocked direct IO when pages are cached
  ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY
  fscrypto: use standard macros to compute length of fname ciphertext
  ext4: do not unnecessarily null-terminate encrypted symlink data
  ext4: release bh in make_indexed_dir
  ext4: Allow parallel DIO reads
  ext4: allow DAX writeback for hole punch
  jbd2: fix lockdep annotation in add_transaction_credits()
  blockgroup_lock.h: simplify definition of NR_BG_LOCKS
  blockgroup_lock.h: remove debris from bgl_lock_ptr() conversion
  fscrypto: make filename crypto functions return 0 on success
  fscrypto: rename completion callbacks to reflect usage
  fscrypto: remove unnecessary includes
  fscrypto: improved validation when loading inode encryption metadata
  ext4: fix memory leak when symlink decryption fails
  ...
2016-10-07 15:15:33 -07:00
Linus Torvalds
4c1fad64ef In this round, we've investigated how f2fs deals with errors given by our fault
injection facility. With this, we could fix several corner cases. And, in order
 to improve the performance, we set inline_dentry by default and enhance the
 exisiting discard issue flow. In addition, we added f2fs_migrate_page for better
 memory management.
 
 = Enhancement =
  - set inline_dentry by default
  - improve discard issue flow
  - add more fault injection cases in f2fs
  - allow block preallocation for encrypted files
  - introduce migrate_page callback function
  - avoid truncating the next direct node block at every checkpoint
 
 = Bug fixes =
  - set page flag correctly between write_begin and write_end
  - missing error handling cases detected by fault injection
  - preallocate blocks regarding to 4KB alignement correctly
  - dentry and filename handling of encryption
  - lost xattrs of directories
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX9sMhAAoJEEAUqH6CSFDSFhQQAIQ99GkcaPmSACHg7JNa9zG1
 wb6eeKIDee+Jr4vu7yQ++T3Ih4lesl2ZLABVaP+IcXlsYWI2VUvlChczuwVSDQMg
 ZiBIR2IwXVVY6Zpb0xuw8C/vmQAJjLZTBV33s+wgsYHaTDobYexVUjkCM+pekrzj
 HBXrk7zx8NHUh41yr/kVQl6FY8KPC6bTtBH23UUp6Vuy1zMZDR/VjL440IyT5Ded
 JRSBX0XSAC9He6n+kZ4S2kMc11kmqZYW7mE4SmiPDzAhGwUv4SmQ1871lK00EOUp
 5EN1Lcy8M7kkl8en2zpZ002R/LDbzRTYjb1fjGJVR+s5Q3piGokxtwAMd0/a7k9v
 wwZm64Bm4NMHBEK6uc/DPWFUmnUySrboTvOCDRunNogPGTjMJwnzAQmTcB/Hdpr5
 oAJQwyAq7ZzkMk3xt0ifeNqy+78uiwfpPEnZDoWqU6zxa+vIyqpFDD+8wEPBO9qo
 JLRocH0Yl7+ExJvi+2W9wMQq9DsxZWR+CwUc8pg68E+1oOEycJ3weAwg5XSVHoNr
 59I2blZQU6P922sH2HVhp0n58xZfYrR7Z3NSsiSfKXeL4gN222dHHT1UfRUmY+A3
 7EeuYm8EUecKV0fZimMcqCCrUXQpubT+qGZfI6NZhu3Qhno1Y8ApxqH8Ieypx7ol
 YD5prZs2qqVKO5LjLV5o
 =crpN
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've investigated how f2fs deals with errors given by
  our fault injection facility. With this, we could fix several corner
  cases. And, in order to improve the performance, we set inline_dentry
  by default and enhance the exisiting discard issue flow. In addition,
  we added f2fs_migrate_page for better memory management.

  Enhancements:
   - set inline_dentry by default
   - improve discard issue flow
   - add more fault injection cases in f2fs
   - allow block preallocation for encrypted files
   - introduce migrate_page callback function
   - avoid truncating the next direct node block at every checkpoint

  Bug fixes:
   - set page flag correctly between write_begin and write_end
   - missing error handling cases detected by fault injection
   - preallocate blocks regarding to 4KB alignement correctly
   - dentry and filename handling of encryption
   - lost xattrs of directories"

* tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (69 commits)
  f2fs: introduce update_ckpt_flags to clean up
  f2fs: don't submit irrelevant page
  f2fs: fix to commit bio cache after flushing node pages
  f2fs: introduce get_checkpoint_version for cleanup
  f2fs: remove dead variable
  f2fs: remove redundant io plug
  f2fs: support checkpoint error injection
  f2fs: fix to recover old fault injection config in ->remount_fs
  f2fs: do fault injection initialization in default_options
  f2fs: remove redundant value definition
  f2fs: support configuring fault injection per superblock
  f2fs: adjust display format of segment bit
  f2fs: remove dirty inode pages in error path
  f2fs: do not unnecessarily null-terminate encrypted symlink data
  f2fs: handle errors during recover_orphan_inodes
  f2fs: avoid gc in cp_error case
  f2fs: should put_page for summary page
  f2fs: assign return value in f2fs_gc
  f2fs: add customized migrate_page callback
  f2fs: introduce cp_lock to protect updating of ckpt_flags
  ...
2016-10-06 15:30:40 -07:00
Jaegeuk Kim
e4c5d8489a f2fs: introduce update_ckpt_flags to clean up
This patch add update_ckpt_flags() to clean up the flow.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:55:24 -07:00
Chao Yu
6ca56ca429 f2fs: don't submit irrelevant page
While we call ->writepages, there are two cases:
a. we didn't writeout any dirty pages, since they are writebacked by other
thread concurrently.
b. we writeout dirty pages, and have already submitted bio to block layer.

In these cases, we don't need to do additional bio flushing unnecessarily,
it may split bio in cache into smaller one.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:39 -07:00
Chao Yu
3f5f4959b1 f2fs: fix to commit bio cache after flushing node pages
In sync_node_pages, we won't check and commit last merged pages in private
bio cache of f2fs, as these pages were taged as writeback, someone who is
waiting for writebacking of the page will be blocked until the cache was
committed by someone else.

We need to commit node type bio cache to avoid potential deadlock or long
delay of waiting writeback.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:38 -07:00
Tiezhu Yang
fc0065adb2 f2fs: introduce get_checkpoint_version for cleanup
There exists almost same codes when get the value of pre_version
and cur_version in function validate_checkpoint, this patch adds
get_checkpoint_version to clean up redundant codes.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:37 -07:00
Sheng Yong
3fa565039e f2fs: remove dead variable
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:37 -07:00
Chao Yu
7fd748df45 f2fs: remove redundant io plug
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:36 -07:00
Chao Yu
0f34802858 f2fs: support checkpoint error injection
This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance, it will be useful that it can simulate abnormal
power off by f2fs itself instead of calling godown ioctl by running apps.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:35 -07:00
Chao Yu
2443b8b363 f2fs: fix to recover old fault injection config in ->remount_fs
In ->remount_fs, we didn't recover original fault injection config if
we encounter error, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:34 -07:00
Chao Yu
36dbd3287f f2fs: do fault injection initialization in default_options
Do fault injection initialization in default_options to keep consistent
with other default option configurating.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:33 -07:00
Yunlei He
9c094040c5 f2fs: remove redundant value definition
This patch remove redundant value definition in build_sit_entries

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:32 -07:00
Chao Yu
1ecc0c5c50 f2fs: support configuring fault injection per superblock
Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.

It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.

>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:31 -07:00
Chao Yu
d32853de50 f2fs: adjust display format of segment bit
Just adjust segment bit info printed in procfs.

Before:
1008      5|0  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1009      3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13
1010      3|1  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After:
1008      5|0  | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1009      4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef
1010      4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:30 -07:00
Jaegeuk Kim
bb5dada7d2 f2fs: remove dirty inode pages in error path
When getting EIO while handling orphan inodes, we can get some dirty node
pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
to flush node pages. But in this case, we should prevent to do that, since
we will try again from the start.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:29 -07:00
Eric Biggers
ef68bf1197 f2fs: do not unnecessarily null-terminate encrypted symlink data
Null-terminating the fscrypt_symlink_data on read is unnecessary because
it is not string data --- it contains binary ciphertext.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:28 -07:00
Jaegeuk Kim
d41065e204 f2fs: handle errors during recover_orphan_inodes
This patch fixes to handle EIO during recover_orphan_inode() given the below
panic.

F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs]
------------[ cut here ]------------
RIP: 0010:[<ffffffffc0b244e3>]  [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs]
RSP: 0018:ffff92f8b7fb7c30  EFLAGS: 00010246
RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c
RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8
RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58
R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb
R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620
FS:  00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0
Stack:
 ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28
 ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0
 ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d
Call Trace:
 [<ffffffffbc27ff55>] evict+0xc5/0x1a0
 [<ffffffffbc28090d>] iput+0x1ad/0x2c0
 [<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs]
 [<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs]
 [<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0
 [<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs]
 [<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs]
 [<ffffffffbc264e49>] mount_fs+0x39/0x170
 [<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160
 [<ffffffffbc2881df>] do_mount+0x1cf/0xd00
 [<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170
 [<ffffffffbc289003>] SyS_mount+0x83/0xd0
 [<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:27 -07:00
Jaegeuk Kim
646e759a4d f2fs: avoid gc in cp_error case
Otherwise, we can hit
	f2fs_bug_on(sbi, !PageUptodate(sum_page));

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:26 -07:00
Jaegeuk Kim
f6fe2be3c6 f2fs: should put_page for summary page
We should call put_page for preloaded summary pages in do_garbage_collect.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:25 -07:00
Jaegeuk Kim
2956e450fa f2fs: assign return value in f2fs_gc
This patch adds a return value of write_checkpoint for f2fs_gc.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:24 -07:00
Weichao Guo
5b7a487cf3 f2fs: add customized migrate_page callback
This patch improves the migration of dirty pages and allows migrating atomic
written pages that F2FS uses in Page Cache. Instead of the fallback releasing
page path, it provides better performance for memory compaction, CMA and other
users of memory page migrating. For dirty pages, there is no need to write back
first when migrating. For an atomic written page before committing, we can
migrate the page and update the related 'inmem_pages' list at the same time.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix some coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:23 -07:00
Chao Yu
aaec2b1d18 f2fs: introduce cp_lock to protect updating of ckpt_flags
This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 17:34:20 -07:00
Chao Yu
fadb2fb8af f2fs: fix to avoid race condition when updating sbi flag
Making updating of sbi flag atomic by using {test,set,clear}_bit,
otherwise in concurrency scenario, the flag could be updated incorrectly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 10:05:50 -07:00
Jaegeuk Kim
9e1e6df412 f2fs: put directory inodes before checkpoint in roll-forward recovery
Before checkpoint, we'd be better drop any inodes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 10:05:49 -07:00
Jaegeuk Kim
a468f0ef51 f2fs: use crc and cp version to determine roll-forward recovery
Previously, we used cp_version only to detect recoverable dnodes.
In order to avoid same garbage cp_version, we needed to truncate the next
dnode during checkpoint, resulting in additional discard or data write.
If we can distinguish this by using crc in addition to cp_version, we can
remove this overhead.

There is backward compatibility concern where it changes node_footer layout.
So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
detect new layout. New layout will be activated only when this flag is set.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-30 10:05:46 -07:00
Deepa Dinamani
c2050a454c fs: Replace current_fs_time() with current_time()
current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:22 -04:00
Deepa Dinamani
078cd8279e fs: Replace CURRENT_TIME with current_time() for inode timestamps
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:21 -04:00
Miklos Szeredi
2773bf00ae fs: rename "rename2" i_op to "rename"
Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27 11:03:58 +02:00
Yunlei He
5d4c0af41f f2fs: preallocate blocks for encrypted file
This patch allow preallocates data blocks for buffered aio writes
in encrypted file.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix to avoid BUG_ON]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-22 11:43:08 -07:00
Chao Yu
5bc994a043 f2fs: show dirty inode number
This patch enables showing dirty inode number in procfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-22 11:43:07 -07:00
Chao Yu
8b038c70df f2fs: support IO error injection
This patch adds to support IO error injection for testing IO error
tolerance of f2fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-22 11:43:06 -07:00
Chao Yu
866969668a f2fs: fix to return error number of read_all_xattrs correctly
We treat all error in read_all_xattrs as a no memory error, which covers
the real reason of failure in it. Fix it by return correct errno in order
to reflect the real cause.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-22 11:43:05 -07:00
Chao Yu
ebfa732217 f2fs: make f2fs_filetype_table static
There is no more user of f2fs_filetype_table outside of dir.c, make it
static.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-22 11:43:04 -07:00
Jan Kara
31051c85b5 fs: Give dentry to inode_change_ok() instead of inode
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22 10:56:19 +02:00
Jan Kara
073931017b posix_acl: Clear SGID bit when setting file permissions
When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2).  Fix that.

References: CVE-2016-7097
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2016-09-22 10:55:32 +02:00
Miklos Szeredi
280db3c88c f2fs: use filemap_check_errors()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-16 12:44:21 +02:00
Eric Biggers
ef1eb3aa50 fscrypto: make filename crypto functions return 0 on success
Several filename crypto functions: fname_decrypt(),
fscrypt_fname_disk_to_usr(), and fscrypt_fname_usr_to_disk(), returned
the output length on success or -errno on failure.  However, the output
length was redundant with the value written to 'oname->len'.  It is also
potentially error-prone to make callers have to check for '< 0' instead
of '!= 0'.

Therefore, make these functions return 0 instead of a length, and make
the callers who cared about the return value being a length use
'oname->len' instead.  For consistency also make other callers check for
a nonzero result rather than a negative result.

This change also fixes the inconsistency of fname_encrypt() actually
already returning 0 on success, not a length like the other filename
crypto functions and as documented in its function comment.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-15 17:25:55 -04:00
Jaegeuk Kim
5905f9afa2 f2fs: handle error in recover_orphan_inode
This patch enhances the error path in recover_orphan_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-15 13:50:24 -07:00
Tiezhu Yang
49ed09dd85 f2fs: remove dead code f2fs_check_acl
The macro f2fs_check_acl is defined but never used since
the initial commit, this patch removes the code that has
been dead for several years.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-14 16:52:36 -07:00
Fan Li
d95fd91c1a f2fs: exclude special cases for f2fs_move_file_range
When src and dst is the same file, and the latter part of source region
overlaps with the former part of destination region, current implement
will overwrite data which hasn't been moved yet and truncate data in
overlapped region.
This patch return -EINVAL when such cases occur and return 0 when
source region and destination region is actually the same part of
the same file.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-14 16:52:06 -07:00
Jaegeuk Kim
649d7df29c f2fs: fix to set PageUptodate in f2fs_write_end correctly
Previously, f2fs_write_begin sets PageUptodate all the time. But, when user
tries to update the entire page (i.e., len == PAGE_SIZE), we need to consider
that the page is able to be copied partially afterwards. In such the case,
we will lose the remaing region in the page.

This patch fixes this by setting PageUptodate in f2fs_write_end as given copied
result. In the short copy case, it returns zero to let generic_perform_write
retry copying user data again.

As a result, f2fs_write_end() works:
   PageUptodate      len      copied    return   retry
1. no                4096     4096      4096     false  -> return 4096
2. no                4096     1024      0        true   -> goto #1 case
3. yes               2048     2048      2048     false  -> return 2048
4. yes               2048     1024      1024     false  -> return 1024

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-13 13:02:34 -07:00
Fan Li
61e4da1172 f2fs: fix parameters of __exchange_data_block
__exchange_data_block should take block indexes as parameters
instead of offsets in bytes.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-13 13:02:33 -07:00
Jaegeuk Kim
e8ea9b3d7e f2fs: avoid ENOMEM during roll-forward recovery
This patch gives another chances during roll-forward recovery regarding to
-ENOMEM.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-13 13:02:29 -07:00
Jaegeuk Kim
f4702d61eb f2fs: add common iget in add_fsync_inode
There is no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-12 13:55:11 -07:00
Jaegeuk Kim
7f3037a5ec f2fs: check free_sections for defragmentation
Fix wrong condition check for defragmentation of a file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-12 10:30:41 -07:00
Yunlei He
ed214a1183 f2fs: forbid to do fstrim if fs has some error
This patch skip fstrim if sbi set SBI_NEED_FSCK flag

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-12 10:30:40 -07:00
Jaegeuk Kim
34b5d5c22d f2fs: avoid page allocation for truncating partial inline_data
When truncating cached inline_data, we don't need to allocate a new page
all the time. Instead, it must check its page cache only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-12 10:30:39 -07:00
Eric Biggers
ba63f23d69 fscrypto: require write access to mount to set encryption policy
Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem.  This was handled correctly by f2fs but not by ext4.  Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-10 01:18:57 -04:00
Jaegeuk Kim
68f313935f f2fs: no need to make zeros beyond i_size
We don't need to make zeros beyond i_size, since we already wrote that through
NEW_ADDR case.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 18:53:50 -07:00
Chao Yu
7732c26ac3 f2fs: fix to detect temporary name of multimedia file
Some applications may create multimeida file with temporary name like
'*.jpg.tmp' or '*.mp4.tmp', then rename to '*.jpg' or '*.mp4'.

Now, f2fs can only detect multimedia filename with specified format:
"filename + '.' + extension", so it will make f2fs missing to detect
multimedia file with special temporary name, result in failing to set
cold flag on file.

This patch enhances detection flow for enabling lookup extension in the
middle of temporary filename.

Reported-by: Xue Liu <liuxueliu.liu@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 18:53:49 -07:00
Chao Yu
6ab2a3085e f2fs: fix minor typo
Correct typo from 'destory' to 'destroy'.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 18:53:48 -07:00
Jaegeuk Kim
6bf6b267d2 f2fs: set dentry bits on random location in memory
This fixes pointer panic when using inline_dentry, which was triggered when
backporting to 3.10.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 18:53:47 -07:00
Chao Yu
c2a080aefa f2fs: fix to set superblock dirty correctly
tests/generic/251 of fstest suit complains us with below message:

------------[ cut here ]------------
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 7698 Comm: fstrim Tainted: G           O    4.7.0+ #21
task: e9f4e000 task.stack: e7262000
EIP: 0060:[<f89fcefe>] EFLAGS: 00010202 CPU: 2
EIP is at write_checkpoint+0xfde/0x1020 [f2fs]
EAX: f33eb300 EBX: eecac310 ECX: 00000001 EDX: ffff0001
ESI: eecac000 EDI: eecac5f0 EBP: e7263dec ESP: e7263d18
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b76ab01c CR3: 2eb89de0 CR4: 000406f0
Stack:
 00000001 a220fb7b e9f4e000 00000002 419ff2d3 b3a05151 00000002 e9f4e5d8
 e9f4e000 419ff2d3 b3a05151 eecac310 c10b8154 b3a05151 419ff2d3 c10b78bd
 e9f4e000 e9f4e000 e9f4e5d8 00000001 e9f4e000 ec409000 eecac2cc eecac288
Call Trace:
 [<c10b8154>] ? __lock_acquire+0x3c4/0x760
 [<c10b78bd>] ? mark_held_locks+0x5d/0x80
 [<f8a10632>] f2fs_trim_fs+0x1c2/0x2e0 [f2fs]
 [<f89e9f56>] f2fs_ioctl+0x6b6/0x10b0 [f2fs]
 [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
 [<c10b4281>] ? trace_hardirqs_off_caller+0x91/0x120
 [<f89e98a0>] ? __exchange_data_block+0xd30/0xd30 [f2fs]
 [<c120b2e1>] do_vfs_ioctl+0x81/0x7f0
 [<c11d57c5>] ? kmem_cache_free+0x245/0x2e0
 [<c1217840>] ? get_unused_fd_flags+0x40/0x40
 [<c1206eec>] ? putname+0x4c/0x50
 [<c11f631e>] ? do_sys_open+0x16e/0x1d0
 [<c1001990>] ? do_fast_syscall_32+0x30/0x1c0
 [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
 [<c120baa8>] SyS_ioctl+0x58/0x80
 [<c1001a01>] do_fast_syscall_32+0xa1/0x1c0
 [<c178cc54>] sysenter_past_esp+0x45/0x74
EIP: [<f89fcefe>] write_checkpoint+0xfde/0x1020 [f2fs] SS:ESP 0068:e7263d18
---[ end trace 4de95d7e6b3aa7c6 ]---

The reason is: with below call stack, we will encounter BUG_ON during
doing fstrim.

Thread A				Thread B
- write_checkpoint
 - do_checkpoint
					- f2fs_write_inode
					 - update_inode_page
					  - update_inode
					   - set_page_dirty
					    - f2fs_set_node_page_dirty
					     - inc_page_count
					      - percpu_counter_inc
					      - set_sbi_flag(SBI_IS_DIRTY)
  - clear_sbi_flag(SBI_IS_DIRTY)

Thread C				Thread D
- f2fs_write_node_page
 - set_node_addr
  - __set_nat_cache_dirty
   - nm_i->dirty_nat_cnt++
					- do_vfs_ioctl
					 - f2fs_ioctl
					  - f2fs_trim_fs
					   - write_checkpoint
					    - f2fs_bug_on(nm_i->dirty_nat_cnt)

Fix it by setting superblock dirty correctly in do_checkpoint and
f2fs_write_node_page.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 18:53:47 -07:00
Shuoran Liu
e7ba108a06 f2fs: add roll-forward recovery process for encrypted dentry
Add roll-forward recovery process for encrypted dentry, so the first fsync
issued to an encrypted file does not need writing checkpoint.

This improves the performance of the following test at thousands of small
files: open -> write -> fsync -> close

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify kernel message to show encrypted names]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:40 -07:00
Jaegeuk Kim
bbf156f7af f2fs: fix lost xattrs of directories
This patch enhances the xattr consistency of dirs from suddern power-cuts.

Possible scenario would be:
1. dir->setxattr used by per-file encryption
2. file->setxattr goes into inline_xattr
3. file->fsync

In that case, we should do checkpoint for #1.
Otherwise we'd lose dir's key information for the file given #2.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:39 -07:00
Chao Yu
275b66b09e f2fs: support async discard
Like most filesystems, f2fs will issue discard command synchronously, so
when user trigger fstrim through ioctl, multiple discard commands will be
issued serially with sync mode, which makes poor performance.

In this patch we try to support async discard, so that all discard
commands can be issued and be waited for endio in batch to improve
performance.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:38 -07:00
Shuoran Liu
167451efb5 f2fs: set encryption name flag in add inline entry path
This patch sets encryption name flag in the add inline entry path
if filename is encrypted.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:37 -07:00
Chao Yu
e06f86e61d f2fs crypto: avoid unneeded memory allocation in ->readdir
When decrypting dirents in ->readdir, fscrypt_fname_disk_to_usr won't
change content of original encrypted dirent, we don't need to allocate
additional buffer for storing mirror of it, so get rid of it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:36 -07:00
Chao Yu
9421d57051 f2fs: fix to do security initialization of encrypted inode with original filename
When creating new inode, security_inode_init_security will be called for
initializing security info related to the inode, and filename is passed to
security module, it helps security module such as SElinux to know which
rule or label could be applied for the inode with specified name.

Previously, if new inode is created as an encrypted one, f2fs will transfer
encrypted filename to security module which may fail the check of security
policy belong to the inode. So in order to this issue, alter to transfer
original unencrypted filename instead.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:35 -07:00
Chao Yu
7ea984b060 f2fs: do in batch synchronously readahead during GC
In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.

f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE

This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.

f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:34 -07:00
Chao Yu
74fa5f3d43 f2fs: schedule in between two continous batch discards
In batch discard approach of fstrim will grab/release gc_mutex lock
repeatly, it makes contention of the lock becoming more intensive.

So after one batch discards were issued in checkpoint and the lock
was released, it's better to do schedule() to increase opportunity
of grabbing gc_mutex lock for other competitors.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-07 17:27:33 -07:00
Chao Yu
97c1794a5d f2fs: enable inline_dentry by default and add noinline_dentry option
Make inline_dentry as default mount option to improve space usage and
IO performance in scenario of numerous small directory.
It adds noinline_dentry mount option, instead.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:17 -07:00
Shuoran Liu
5d2b42ede7 f2fs: fix a bug when using namehash to locate dentry bucket
In the following scenario,

1) we don't have the key and doing a lookup for encrypted file,
2) and the encrypted filename is big name

we should use fname->hash as name hash value instead of what is
calculated by fname->disk_name. Because in such case,
fname->disk_name is empty.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:16 -07:00
Chao Yu
dfd02e4de1 f2fs: fix to preallocate block only aligned to 4K
In write_begin(), we skip checking dnode block for preallocating block
when whole block needs to be updated since we preallocated its block in
f2fs_preallocate_blocks, for partial updated block, we will still try
to lock its node and do preallocation in write_begin(), so in
f2fs_preallocate_blocks we should not preallocate its block.

But previously, the calculation of preallocating block number is
incorrect, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:15 -07:00
Wei Yongjun
6a7a3aedd5 f2fs: fix non static symbol warning
Fixes the following sparse warning:

fs/f2fs/data.c:969:12: warning:
 symbol 'f2fs_grab_bio' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:14 -07:00
Sheng Yong
69494229ba f2fs: remove unnecessary initialization
`flags' is used to save value from userspace, there is no need to
initialize it, and FS_FL_USER_VISIBLE is the mask for getflags.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:13 -07:00
Chao Yu
5f8eaf1f9b f2fs: remove redundant judgement condition in available_free_memory
In available_free_memory, there are two same judgement conditions which
is used for checking NAT excess, remove one of them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:12 -07:00
Chao Yu
e932835377 f2fs: check return value of write_checkpoint during fstrim
During fstrim, if one of multiple write_checkpoint failed, break off and
return error number to caller.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:11 -07:00
Chao Yu
58383befc3 f2fs: fix to do f2fs_balance_fs in f2fs_map_blocks correctly
If we preallocate blocks with f2fs_reserve_blocks in f2fs_map_blocks, we
should call f2fs_balance_fs for checking and reclaiming space, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:10 -07:00
Chao Yu
d600af236d f2fs: avoid unneeded loop in build_sit_entries
When building each sit entry in cache, firstly, we will load it from
sit page, and then check all entries in sit journal, if there is one
updated entry in journal, cover cached entry with the journaled one.

Actually, most of check operation is unneeded since we only need
to update cached entries with journaled entries in batch, so
changing the flow as below for more efficient:
1. load all sit entries into cache from sit pages;
2. update sit entries with journal.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-29 18:31:09 -07:00