Commit Graph

320 Commits

Author SHA1 Message Date
Chao Yu
3024c9a1fe Revert "f2fs: move i_size_write in f2fs_write_end"
This reverts commit a2ee0a3003.

When testing with generic/032 of xfstest suit, failure message will be
reported as below:

generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
    --- tests/generic/032.out	2015-01-11 16:52:27.643681072 +0800
    +++ results/generic/032.out.bad	2016-08-06 13:44:43.861330500 +0800
    @@ -1,5 +1,5 @@
     QA output created by 032
    -100 iterations
    -0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
    -*
    -0100000
    +1: [768..775]: unwritten
    +Unwritten extents found!
    ...
    (Run 'diff -u tests/generic/032.out results/generic/032.out.bad'  to see the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests

In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.

Thread A			Thread B
- write_end
 - unlock page
				- writepages
				 - lock_page
				  - writepage
				  if page is out-of-range of file size,
				  we will skip writting the page.
 - update i_size

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-08-19 11:15:08 +09:00
Jens Axboe
1aee6b9a7d f2fs: drop bio->bi_rw manual assignment
Merge 4fc29c1aa3 included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04 14:19:16 -06:00
Linus Torvalds
4fc29c1aa3 The major change in this version is mitigating cpu overheads on write paths by
replacing redundant inode page updates with mark_inode_dirty calls. And we tried
 to reduce lock contentions as well to improve filesystem scalability.
 Other feature is setting F2FS automatically when detecting host-managed SMR.
 
 = Enhancement =
  - ioctl to move a range of data between files
  - inject orphan inode errors
  - avoid flush commands congestion
  - support lazytime
 
 = Bug fixes =
  - return proper results for some dentry operations
  - fix deadlock in add_link failure
  - disable extent_cache for fcollapse/finsert
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXmDJFAAoJEEAUqH6CSFDSJeYP/0ru8+5/ui5VTCdNPQB9KxYD
 DIUaDGpeoLvmn3ZdrMEdyNr6kWbgjCE9JjOGPQ7l1/apErOGVPyaBwflKcCDwloU
 pAlEqVM1Q9j4qH4i9SWTlvPtsHBHB7G7YSe3vDB9fJGSTqumubIlnaBm+Wfjx31U
 p53WcPn9LpOyzfmvZf2tOHmvZ7bWLkE/a07x9kPC6XHUFb9C17jLRFFGeuhZQHv1
 Yo7HgokBnPExa8TnEILYyX/x+eecFS/1Cp/cN0STsebSu8pStTHTcAP7qEpKQB88
 Cc51Lf+d5gFeydxKDFxwdH3VWOGIr9Ppako+lHW83gJcHP0zw8zdxULab+HJMa4n
 MOByRRiafwu1sL0dl7TCfsYNIHdEnXhWbhcRhMVZbb5C2Q6+Htuac8ZrKSOWExNN
 DUqRkzeTib9u+cHxUTFFPgOGdUjDLmg3XHU7mvb+2hViluVjIImC4tqD5XPpv7vt
 WnaDJxLCGD/6DF2yhiVY9NysuxInLTNFFCF06LworZ4L24hlg5TvN0UeUNRO9954
 ux6f+lSORCzV3TmrsHP5vwjSAW26FviPXV1q1HHJeTpWKMlhsZtHmOAJOtZKKmxP
 WFnHT0aiWF+sQf4qfxVQL+lLqtgRKJAI9zqGRyfDJWJp5aXdRuVsZs9pWNQF7lCo
 5gVnCYk3ULjXG3b23j2S
 =tKTR
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "The major change in this version is mitigating cpu overheads on write
  paths by replacing redundant inode page updates with mark_inode_dirty
  calls.  And we tried to reduce lock contentions as well to improve
  filesystem scalability.  Other feature is setting F2FS automatically
  when detecting host-managed SMR.

  Enhancements:
   - ioctl to move a range of data between files
   - inject orphan inode errors
   - avoid flush commands congestion
   - support lazytime

  Bug fixes:
   - return proper results for some dentry operations
   - fix deadlock in add_link failure
   - disable extent_cache for fcollapse/finsert"

* tag 'for-f2fs-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: clean up coding style and redundancy
  f2fs: get victim segment again after new cp
  f2fs: handle error case with f2fs_bug_on
  f2fs: avoid data race when deciding checkpoin in f2fs_sync_file
  f2fs: support an ioctl to move a range of data blocks
  f2fs: fix to report error number of f2fs_find_entry
  f2fs: avoid memory allocation failure due to a long length
  f2fs: reset default idle interval value
  f2fs: use blk_plug in all the possible paths
  f2fs: fix to avoid data update racing between GC and DIO
  f2fs: add maximum prefree segments
  f2fs: disable extent_cache for fcollapse/finsert inodes
  f2fs: refactor __exchange_data_block for speed up
  f2fs: fix ERR_PTR returned by bio
  f2fs: avoid mark_inode_dirty
  f2fs: move i_size_write in f2fs_write_end
  f2fs: fix to avoid redundant discard during fstrim
  f2fs: avoid mismatching block range for discard
  f2fs: fix incorrect f_bfree calculation in ->statfs
  f2fs: use percpu_rw_semaphore
  ...
2016-07-27 10:36:31 -07:00
Linus Torvalds
0e06f5c0de Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2

 - most(?) of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
  thp: fix comments of __pmd_trans_huge_lock()
  cgroup: remove unnecessary 0 check from css_from_id()
  cgroup: fix idr leak for the first cgroup root
  mm: memcontrol: fix documentation for compound parameter
  mm: memcontrol: remove BUG_ON in uncharge_list
  mm: fix build warnings in <linux/compaction.h>
  mm, thp: convert from optimistic swapin collapsing to conservative
  mm, thp: fix comment inconsistency for swapin readahead functions
  thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
  shmem: split huge pages beyond i_size under memory pressure
  thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
  khugepaged: add support of collapse for tmpfs/shmem pages
  shmem: make shmem_inode_info::lock irq-safe
  khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
  thp: extract khugepaged from mm/huge_memory.c
  shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
  shmem: add huge pages support
  shmem: get_unmapped_area align huge page
  shmem: prepare huge= mount option and sysfs knob
  mm, rmap: account shmem thp pages
  ...
2016-07-26 19:55:54 -07:00
Michal Hocko
8a5c743e30 mm, memcg: use consistent gfp flags during readahead
Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs.  This gfp mask discrepancy is really unfortunate and easily
fixable.  Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.

This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again.  We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well.  The same applies to read_cache_pages.

ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask.  Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.

[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
  Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Chris Mason <clm@fb.com>
Cc: Steve French <sfrench@samba.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Jaegeuk Kim
5302fb000d f2fs: clean up coding style and redundancy
This patch includes minor clean-ups.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-25 12:58:12 -07:00
Jaegeuk Kim
9dfa1baff7 f2fs: use blk_plug in all the possible paths
This patch reverts 19a5f5e2ef (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.

The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15 15:21:23 -07:00
Chao Yu
82e0a5aa5d f2fs: fix to avoid data update racing between GC and DIO
Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:

For write case:
Thread A				Thread B
- generic_file_direct_write
 - invalidate_inode_pages2_range
 - f2fs_direct_IO
  - do_blockdev_direct_IO
   - do_direct_IO
    - get_more_blocks
					- f2fs_gc
					 - do_garbage_collect
					  - gc_data_segment
					   - move_data_page
					    - do_write_data_page
					    migrate data block to new block address
   - dio_bio_submit
   update user data to old block address

For read case:
Thread A                                Thread B
- generic_file_direct_write
 - invalidate_inode_pages2_range
 - f2fs_direct_IO
  - do_blockdev_direct_IO
   - do_direct_IO
    - get_more_blocks
					- f2fs_balance_fs
					 - f2fs_gc
					  - do_garbage_collect
					   - gc_data_segment
					    - move_data_page
					     - do_write_data_page
					     migrate data block to new block address
					  - write_checkpoint
					   - do_checkpoint
					    - clear_prefree_segments
					     - f2fs_issue_discard
                                             discard old block adress
   - dio_bio_submit
   update user buffer from obsolete block address

In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15 15:21:22 -07:00
Jaegeuk Kim
1d353eb7e4 f2fs: fix ERR_PTR returned by bio
This is to fix wrong error pointer handling flow reported by Dan.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-15 15:21:19 -07:00
Jaegeuk Kim
a2ee0a3003 f2fs: move i_size_write in f2fs_write_end
We don't need to do i_size_write under page lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:35 -07:00
Jaegeuk Kim
237c0790e5 f2fs: call SetPageUptodate if needed
SetPageUptodate() issues memory barrier, resulting in performance degrdation.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:29 -07:00
Jaegeuk Kim
fe76b796fc f2fs: introduce f2fs_set_page_dirty_nobuffer
This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:28 -07:00
Chao Yu
1563ac75e7 f2fs: fix to detect truncation prior rather than EIO during read
In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:25 -07:00
Chao Yu
78682f7944 f2fs: fix to avoid reading out encrypted data in page cache
For encrypted inode, if user overwrites data of the inode, f2fs will read
encrypted data into page cache, and then do the decryption.

However reader can race with overwriter, and it will see encrypted data
which has not been decrypted by overwriter yet. Fix it by moving decrypting
work to background and keep page non-uptodated until data is decrypted.

Thread A				Thread B
- f2fs_file_write_iter
 - __generic_file_write_iter
  - generic_perform_write
   - f2fs_write_begin
    - f2fs_submit_page_bio
					- generic_file_read_iter
					 - do_generic_file_read
					  - lock_page_killable
					  - unlock_page
					  - copy_page_to_iter
					  hit the encrypted data in updated page
    - lock_page
    - fscrypt_decrypt_page

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-08 10:33:24 -07:00
Jaegeuk Kim
ac6f199984 f2fs: avoid latency-critical readahead of node pages
The f2fs_map_blocks is very related to the performance, so let's avoid any
latency to read ahead node pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-06 10:44:10 -07:00
Jaegeuk Kim
52763a4b7a f2fs: detect host-managed SMR by feature flag
If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs
by default.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-07-06 10:44:07 -07:00
Jaegeuk Kim
36abef4e79 f2fs: introduce mode=lfs mount option
This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.

Especially, this supports host-managed SMR device.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-13 11:55:21 -07:00
Jaegeuk Kim
19a5f5e2ef f2fs: drop any block plugging
In f2fs, we don't need to keep block plugging for NODE and DATA writes, since
we already merged bios as much as possible.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-08 10:25:51 -07:00
Jaegeuk Kim
7f319975cc f2fs: set mapping error for EIO
If EIO occurred, we need to set all the mapping to avoid any further IOs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-08 10:25:50 -07:00
Mike Christie
04d328defd f2fs: use bio op accessors
Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie
4e49ea4a3d block/fs/drivers: remove rw argument from submit_bio
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.

Signed-off-by: Mike Christie <mchristi@redhat.com>

Fixed up fs/ext4/crypto.c

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Jaegeuk Kim
b230e6cabf f2fs: handle writepage correctly
Previously, f2fs_write_data_pages() calls __f2fs_writepage() which calls
f2fs_write_data_page().
If f2fs_write_data_page() returns AOP_WRITEPAGE_ACTIVATE, __f2fs_writepage()
calls mapping_set_error(). But, this should not happen at every time, since
sometimes f2fs_write_data_page() tries to skip writing pages without error.
For example, volatile_write() gives EIO all the time, as Shuoran Liu pointed
out.

Reported-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:24 -07:00
Jaegeuk Kim
46ae957f9b f2fs: remove two steps to flush dirty data pages
If there is no cold page, we don't need to do a loop to flush dirty
data pages.

On /dev/pmem0,

1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
 Before : 1.1 GB/s
 After  : 1.2 GB/s

2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
 Before : 2.2 GB/s
 After  : 2.3 GB/s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:21 -07:00
Jaegeuk Kim
28ea6162e2 f2fs: do not skip writing data pages
For data pages, let's try to flush as much as possible in background.

On /dev/pmem0,

1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
 Before : 800 MB/s
 After  : 1.1 GB/s

2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
 Before : 1.3 GB/s
 After  : 2.2 GB/s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:20 -07:00
Jaegeuk Kim
b93f771286 f2fs: remove writepages lock
This patch removes writepages lock.
We can improve multi-threading performance.

tiobench, 32 threads, 4KB write per fsync on SSD
Before: 25.88 MB/s
After: 28.03 MB/s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:17 -07:00
Jaegeuk Kim
26de9b1171 f2fs: avoid unnecessary updating inode during fsync
If roll-forward recovery can recover i_size, we don't need to update inode's
metadata during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:13 -07:00
Jaegeuk Kim
ee6d182f2a f2fs: remove syncing inode page in all the cases
This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()

Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:12 -07:00
Jaegeuk Kim
8edd03c870 f2fs: introduce f2fs_i_blocks_write with mark_inode_dirty_sync
This patch introduces f2fs_i_blocks_write() to call mark_inode_dirty_sync() when
changing inode->i_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:09 -07:00
Jaegeuk Kim
fc9581c809 f2fs: introduce f2fs_i_size_write with mark_inode_dirty_sync
This patch introduces f2fs_i_size_write() to call mark_inode_dirty_sync() with
i_size_write().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:08 -07:00
Jaegeuk Kim
91942321e4 f2fs: use inode pointer for {set, clear}_inode_flag
This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:07 -07:00
Linus Torvalds
f6c658df63 Enhancement
- fs-specific prefix for fscrypto
 - fault injection facility
 - expose validity bitmaps for user to be aware of fragmentation
 - fallocate/rm/preallocation speed up
 - use percpu counters
 
 Bug fixes
 - some inline_dentry/inline_data bugs
 - error handling for atomic/volatile/orphan inodes
 - recover broken superblock
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXQPu4AAoJEEAUqH6CSFDSILgP/1dj6fmtytr8c+55EBqXUGpo
 M7rS93JTxlmU5BduIo9psJsEquTQoVEmxB/Gjd+ZnI5R6Rp1c/REaP0ba374rEhZ
 ecMQh5QqzM1gRNFXrQhWFEL/KtfRqt3T80zebQP7pxFUm/m9NGMLWT43RzQ8AAhr
 Y3P0NLdvxA4HAnipKptkPJcGZQlWnL9W/MR+LgsXLXqLDwJHkVu61GcF0y2ibcJM
 lEtIRmyH5tg7hP5c5LTw9pKQFHkIZt5cHFLjrJ1x8FSm2TXOcJPbjOrThvcb+NKK
 e0O+6R0meH2eMpak+BTkZp2YbPPyXOb1N00j//lmbPjCoJPd4ZuiJ+oRoHUlTxtU
 FhO67t0brlDbMFQVRFrtv8VA8M6by+DTAAP3Ffx62I/TJkphKANCSoyQRhlWtxxO
 kRU69N7ipnRNxO4WCv40FjaQjSIElCKysP1POazRmAOQm7UFTGT9Nj37+eqUcEPJ
 HZ7O61DEHNemb0SMlJ8WSClstt0yUU+2cjRfTPAr2Wd3V8gYbRs0QUg5M2GLgywR
 EmiJfpkXse3f/nR8W6g1hganSOXA0AZX+EUibed6VkV3oYemdFbm8OymeEmLmWpM
 y2F3D7dPLW7MCoTXJqtwFWdoDwI+zkH4rJaPGTq5TVBRWVU/njX8OvoB47pOvKV1
 kccL7zv2PekE1hSDO5WF
 =6MSp
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, as Ted pointed out, fscrypto allows one more key prefix
  given by filesystem to resolve backward compatibility issues.  Other
  than that, we've fixed several error handling cases by introducing
  a fault injection facility.  We've also achieved performance
  improvement in some workloads as well as a bunch of bug fixes.

  Summary:

  Enhancements:
   - fs-specific prefix for fscrypto
   - fault injection facility
   - expose validity bitmaps for user to be aware of fragmentation
   - fallocate/rm/preallocation speed up
   - use percpu counters

  Bug fixes:
   - some inline_dentry/inline_data bugs
   - error handling for atomic/volatile/orphan inodes
   - recover broken superblock"

* tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (73 commits)
  f2fs: fix to update dirty page count correctly
  f2fs: flush pending bios right away when error occurs
  f2fs: avoid ENOSPC fault in the recovery process
  f2fs: make exit_f2fs_fs more clear
  f2fs: use percpu_counter for total_valid_inode_count
  f2fs: use percpu_counter for alloc_valid_block_count
  f2fs: use percpu_counter for # of dirty pages in inode
  f2fs: use percpu_counter for page counters
  f2fs: use bio count instead of F2FS_WRITEBACK page count
  f2fs: manipulate dirty file inodes when DATA_FLUSH is set
  f2fs: add fault injection to sysfs
  f2fs: no need inc dirty pages under inode lock
  f2fs: fix incorrect error path handling in f2fs_move_rehashed_dirents
  f2fs: fix i_current_depth during inline dentry conversion
  f2fs: correct return value type of f2fs_fill_super
  f2fs: fix deadlock when flush inline data
  f2fs: avoid f2fs_bug_on during recovery
  f2fs: show # of orphan inodes
  f2fs: support in batch fzero in dnode page
  f2fs: support in batch multi blocks preallocation
  ...
2016-05-21 18:25:28 -07:00
Jaegeuk Kim
38f91ca8c0 f2fs: flush pending bios right away when error occurs
Given errors, this patch flushes pending bios as soon as possible.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-20 11:46:15 -07:00
Jaegeuk Kim
f573018491 f2fs: use bio count instead of F2FS_WRITEBACK page count
This can reduce page counting overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-18 13:57:25 -07:00
Linus Torvalds
c2e7b20705 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs cleanups from Al Viro:
 "More cleanups from Christoph"

* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd: use RWF_SYNC
  fs: add RWF_DSYNC aand RWF_SYNC
  ceph: use generic_write_sync
  fs: simplify the generic_write_sync prototype
  fs: add IOCB_SYNC and IOCB_DSYNC
  direct-io: remove the offset argument to dio_complete
  direct-io: eliminate the offset argument to ->direct_IO
  xfs: eliminate the pos variable in xfs_file_dio_aio_write
  filemap: remove the pos argument to generic_file_direct_write
  filemap: remove pos variables in generic_file_read_iter
2016-05-17 15:05:23 -07:00
Chao Yu
ab47036d8f f2fs: fix deadlock when flush inline data
Below backtrace info was reported by Yunlei He:

Call Trace:
 [<ffffffff817a9395>] schedule+0x35/0x80
 [<ffffffff817abb7d>] rwsem_down_read_failed+0xed/0x130
 [<ffffffff813c12a8>] call_rwsem_down_read_failed+0x18/0x
 [<ffffffff817ab1d0>] down_read+0x20/0x30
 [<ffffffffa02a1a12>] f2fs_evict_inode+0x242/0x3a0 [f2fs]
 [<ffffffff81217057>] evict+0xc7/0x1a0
 [<ffffffff81217cd6>] iput+0x196/0x200
 [<ffffffff812134f9>] __dentry_kill+0x179/0x1e0
 [<ffffffff812136f9>] dput+0x199/0x1f0
 [<ffffffff811fe77b>] __fput+0x18b/0x220
 [<ffffffff811fe84e>] ____fput+0xe/0x10
 [<ffffffff81097427>] task_work_run+0x77/0x90
 [<ffffffff81074d62>] exit_to_usermode_loop+0x73/0xa2
 [<ffffffff81003b7a>] do_syscall_64+0xfa/0x110
 [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25

Call Trace:
 [<ffffffff817a9395>] schedule+0x35/0x80
 [<ffffffff81216dc3>] __wait_on_freeing_inode+0xa3/0xd0
 [<ffffffff810bc300>] ? autoremove_wake_function+0x40/0x4
 [<ffffffff8121771d>] find_inode_fast+0x7d/0xb0
 [<ffffffff8121794a>] ilookup+0x6a/0xd0
 [<ffffffffa02bc740>] sync_node_pages+0x210/0x650 [f2fs]
 [<ffffffff8122e690>] ? do_fsync+0x70/0x70
 [<ffffffffa02b085e>] block_operations+0x9e/0xf0 [f2fs]
 [<ffffffff8137b795>] ? bio_endio+0x55/0x60
 [<ffffffffa02b0942>] write_checkpoint+0x92/0xba0 [f2fs]
 [<ffffffff8117da57>] ? mempool_free_slab+0x17/0x20
 [<ffffffff8117de8b>] ? mempool_free+0x2b/0x80
 [<ffffffff8122e690>] ? do_fsync+0x70/0x70
 [<ffffffffa02a53e3>] f2fs_sync_fs+0x63/0xd0 [f2fs]
 [<ffffffff8129630f>] ? ext4_sync_fs+0xbf/0x190
 [<ffffffff8122e6b0>] sync_fs_one_sb+0x20/0x30
 [<ffffffff812002e9>] iterate_supers+0xb9/0x110
 [<ffffffff8122e7b5>] sys_sync+0x55/0x90
 [<ffffffff81003ae9>] do_syscall_64+0x69/0x110
 [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25

With following excuting serials, we will set inline_node in inode page
after inode was unlinked, result in a deadloop described as below:
1. open file
2. write file
3. unlink file
4. write file
5. close file

Thread A				Thread B
 - dput
  - iput_final
   - inode->i_state |= I_FREEING
   - evict
    - f2fs_evict_inode
					 - f2fs_sync_fs
					  - write_checkpoint
					   - block_operations
					    - f2fs_lock_all (down_write(cp_rwsem))
     - f2fs_lock_op (down_read(cp_rwsem))
					    - sync_node_pages
					     - ilookup
					      - find_inode_fast
					       - __wait_on_freeing_inode
					         (wait on I_FREEING clear)

Here, we change to set inline_node flag only for linked inode for fixing.

Reported-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: stable@vger.kernel.org # v4.6
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-11 09:56:38 -07:00
Chao Yu
46008c6d42 f2fs: support in batch multi blocks preallocation
This patch introduces reserve_new_blocks to make preallocation of multi
blocks as in batch operation, so it can avoid lots of redundant
operation, result in better performance.

In virtual machine, with rotational device:

time fallocate -l 32G /mnt/f2fs/file

Before:
real	0m4.584s
user	0m0.000s
sys	0m4.580s

After:
real	0m0.292s
user	0m0.000s
sys	0m0.272s

In x86, with SSD:

time fallocate -l 500G $MNT/testfile

Before : 24.758 s
After  :  1.604 s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix bugs and add performance numbers measured in x86.]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-11 09:56:35 -07:00
Jaegeuk Kim
0080c50764 f2fs: do not preallocate block unaligned to 4KB
Previously f2fs_preallocate_blocks() tries to allocate unaligned blocks.
In f2fs_write_begin(), however, prepare_write_begin() does not skip its
allocation due to (len != 4KB).
So, it needs locking node page twice unexpectedly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:44:57 -07:00
Chao Yu
43473f9645 f2fs: fix incorrect mapping in ->bmap
Currently, generic_block_bmap is used in f2fs_bmap, its semantics is when
the mapping is been found, return position of target physical block,
otherwise return zero.

But, previously, when there is no mapping info for specified logical block,
f2fs_bmap will map target physical block to a uninitialized variable, which
should be wrong. Fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:32:32 -07:00
Chao Yu
23dc974eed f2fs: fix to clear private data in page
Private data in page should be removed during ->releasepage or
->invalidatepage, otherwise garbage data would be remained in that page.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-03 11:20:05 -07:00
Christoph Hellwig
c8b8e32d70 direct-io: eliminate the offset argument to ->direct_IO
Including blkdev_direct_IO and dax_do_io.  It has to be ki_pos to actually
work, so eliminate the superflous argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01 19:58:39 -04:00
Jaegeuk Kim
6bfc49197e f2fs: issue cache flush on direct IO
Under direct IO path with O_(D)SYNC, it needs to set proper APPEND or UPDATE
flags, so taht f2fs_sync_file can make its data safe.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:25:05 -07:00
Jaegeuk Kim
e6e5f5610d f2fs: avoid writing 0'th page in volatile writes
The first page of volatile writes usually contains a sort of header information
which will be used for recovery.
(e.g., journal header of sqlite)

If this is written without other journal data, user needs to handle the stale
journal information.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:39 -07:00
Jaegeuk Kim
4da7bf5a43 f2fs: remove redundant condition check
This patch resolves the redundant condition check reported by David.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15 08:49:47 -07:00
Jaegeuk Kim
b32e4482aa fscrypto: don't let data integrity writebacks fail with ENOMEM
This patch fixes the issue introduced by the ext4 crypto fix in a same manner.
For F2FS, however, we flush the pending IOs and wait for a while to acquire free
memory.

Fixes: c9af28fdd4 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM")
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12 10:25:30 -07:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Jaegeuk Kim
0b81d07790 fs crypto: move per-file encryption from f2fs tree to fs/crypto
This patch adds the renamed functions moved from the f2fs crypto files.

1. definitions for per-file encryption used by ext4 and f2fs.

2. crypto.c for encrypt/decrypt functions
 a. IO preparation:
  - fscrypt_get_ctx / fscrypt_release_ctx
 b. before IOs:
  - fscrypt_encrypt_page
  - fscrypt_decrypt_page
  - fscrypt_zeroout_range
 c. after IOs:
  - fscrypt_decrypt_bio_pages
  - fscrypt_pullback_bio_page
  - fscrypt_restore_control_page

3. policy.c supporting context management.
 a. For ioctls:
  - fscrypt_process_policy
  - fscrypt_get_policy
 b. For context permission
  - fscrypt_has_permitted_context
  - fscrypt_inherit_context

4. keyinfo.c to handle permissions
  - fscrypt_get_encryption_info
  - fscrypt_free_encryption_info

5. fname.c to support filename encryption
 a. general wrapper functions
  - fscrypt_fname_disk_to_usr
  - fscrypt_fname_usr_to_disk
  - fscrypt_setup_filename
  - fscrypt_free_filename

 b. specific filename handling functions
  - fscrypt_fname_alloc_buffer
  - fscrypt_fname_free_buffer

6. Makefile and Kconfig

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:33 -07:00
Chao Yu
406657dd18 f2fs: introduce f2fs_flush_merged_bios for cleanup
Add a new helper f2fs_flush_merged_bios to clean up redundant codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:52:02 -08:00
Chao Yu
f28b3434af f2fs: introduce f2fs_update_data_blkaddr for cleanup
Add a new help f2fs_update_data_blkaddr to clean up redundant codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:52:01 -08:00
Chao Yu
7a9d75481b f2fs: trace old block address for CoWed page
This patch enables to trace old block address of CoWed page for better
debugging.

f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE

f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA

f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 21:40:02 -08:00
Chao Yu
28bc106b23 f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file

With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.

But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.

So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.

If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00