Commit Graph

1652 Commits

Author SHA1 Message Date
Jaegeuk Kim
7daaea256d f2fs: add noextent_cache mount option
This patch adds noextent_cache mount option.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:55 -07:00
Jaegeuk Kim
554df79e52 f2fs: shrink extent_cache entries
This patch registers shrinking extent_caches.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:55 -07:00
Jaegeuk Kim
1b38dc8e74 f2fs: shrink nat_cache entries
This patch registers shrinking nat_cache entries.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:55 -07:00
Jaegeuk Kim
2658e50de6 f2fs: introduce a shrinker for mounted fs
This patch introduces a shrinker targeting to reduce memory footprint consumed
by a number of in-memory f2fs data structures.

In addition, it newly adds:
 - sbi->umount_mutex to avoid data races on shrinker and put_super
 - sbi->shruinker_run_no to not revisit objects

Note that the basic implementation was copied from fs/ubifs/shrinker.c

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:54 -07:00
Jaegeuk Kim
244f4fc1c5 f2fs: set cached_en after checking finally
This patch relocates cached_en not only to be covered by spin_lock, but also
to set once after checking out completely.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:54 -07:00
Jaegeuk Kim
cbe91923a9 f2fs: update on-disk extents even under extent_cache
Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the
time, and then finally preserves its up-to-date extent into on-disk one during
f2fs_evict_inode.

But, in the following scenario:

1. mount
2. open & write an extent X
3. f2fs_evict_inode; on-disk extent is X
4. open & update the extent X with Y
5. sync; trigger checkpoint
6. power-cut

after power-on, f2fs should serve extent Y, but we have an on-disk extent X.

This causes a failure on xfstests/311.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:54 -07:00
Jaegeuk Kim
7a2cb67867 f2fs: fix wrong block address calculation for a split extent
This patch fixes wrong calculation on block address field when an extent is
split.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:54 -07:00
Jaegeuk Kim
97a7b2c274 f2fs: convert inline_data for various fallocate
For newly added fallocate types, it should convert inline_data before handling
block swapping.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:53 -07:00
Jaegeuk Kim
c9b63bd01d f2fs: avoid to use failed inode immediately
Before iput is called, the inode number used by a bad inode can be reassigned
to other new inode, resulting in any abnormal behaviors on the new inode.
This should not happen for the new inode.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:53 -07:00
Jaegeuk Kim
eca616f8c1 f2fs: avoid freed stat information
The write_checkpoint can update stat information, so we should destroy the stat
structure after it.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:53 -07:00
Chao Yu
5ac9f36fca f2fs: fix to record dirty page count for symlink
Dirty page can be exist in mapping of newly created symlink, but previously
we did not maintain the counting of dirty page for symlink like we maintained
for regular/directory, so the counting we lookuped should be wrong.

This patch adds missed dirty page counting for symlink to fix this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:52 -07:00
Markus Elfring
92859a5efd f2fs crypto: delete an unnecessary check before the function call "key_put"
The key_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:52 -07:00
Christoph Hellwig
4246a0b63b block: add a bi_error field to struct bio
Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29 08:55:15 -06:00
Jaegeuk Kim
6282adbf93 f2fs: call set_page_dirty to attach i_wb for cgroup
The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
is called, __inc_wb_stat() updates i_wb's stat.

So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
any writebacking pages.

This patch should resolve the following kernel panic reported by Andreas Reis.

https://bugzilla.kernel.org/show_bug.cgi?id=101801

--- Comment #2 from Andreas Reis <andreas.reis@gmail.com> ---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
PGD 2951ff067 PUD 2df43f067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
T01 02/03/2015
task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
__percpu_counter_add+0x1a/0x90
RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
Call Trace:
 [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
 RSP <ffff880295143ac8>
CR2: 00000000000000a8
---[ end trace 5132449a58ed93a3 ]---
note: gcc[10356] exited with preempt_count 2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-07-25 08:54:26 -07:00
Jaegeuk Kim
548aedac51 f2fs: handle error cases in move_encrypted_block
This patch fixes some missing error handlers.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-07-25 08:54:25 -07:00
Linus Torvalds
e4bc13adfd Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block
Pull cgroup writeback support from Jens Axboe:
 "This is the big pull request for adding cgroup writeback support.

  This code has been in development for a long time, and it has been
  simmering in for-next for a good chunk of this cycle too.  This is one
  of those problems that has been talked about for at least half a
  decade, finally there's a solution and code to go with it.

  Also see last weeks writeup on LWN:

        http://lwn.net/Articles/648292/"

* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
  writeback, blkio: add documentation for cgroup writeback support
  vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
  writeback: do foreign inode detection iff cgroup writeback is enabled
  v9fs: fix error handling in v9fs_session_init()
  bdi: fix wrong error return value in cgwb_create()
  buffer: remove unusued 'ret' variable
  writeback: disassociate inodes from dying bdi_writebacks
  writeback: implement foreign cgroup inode bdi_writeback switching
  writeback: add lockdep annotation to inode_to_wb()
  writeback: use unlocked_inode_to_wb transaction in inode_congested()
  writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
  writeback: implement [locked_]inode_to_wb_and_lock_list()
  writeback: implement foreign cgroup inode detection
  writeback: make writeback_control track the inode being written back
  writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
  mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
  writeback: implement memcg writeback domain based throttling
  writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
  writeback: implement memcg wb_domain
  writeback: update wb_over_bg_thresh() to use wb_domain aware operations
  ...
2015-06-25 16:00:17 -07:00
Linus Torvalds
cfcc0ad47f Merge tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - per-file encryption (e.g., ext4)
   - FALLOC_FL_ZERO_RANGE
   - FALLOC_FL_COLLAPSE_RANGE
   - RENAME_WHITEOUT

  Major enhancement/fixes:
   - recovery broken superblocks
   - enhance f2fs_trim_fs with a discard_map
   - fix a race condition on dentry block allocation
   - fix a deadlock during summary operation
   - fix a missing fiemap result

  .. and many minor bug fixes and clean-ups were done"

* tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (83 commits)
  f2fs: do not trim preallocated blocks when truncating after i_size
  f2fs crypto: add alloc_bounce_page
  f2fs crypto: fix to handle errors likewise ext4
  f2fs: drop the volatile_write flag only
  f2fs: skip committing valid superblock
  f2fs: setting discard option in parse_options()
  f2fs: fix to return exact trimmed size
  f2fs: support FALLOC_FL_INSERT_RANGE
  f2fs: hide common code in f2fs_replace_block
  f2fs: disable the discard option when device doesn't support
  f2fs crypto: remove alloc_page for bounce_page
  f2fs: fix a deadlock for summary page lock vs. sentry_lock
  f2fs crypto: clean up error handling in f2fs_fname_setup_filename
  f2fs crypto: avoid f2fs_inherit_context for symlink
  f2fs crypto: do not set encryption policy for non-directory by ioctl
  f2fs crypto: allow setting encryption policy once
  f2fs crypto: check context consistent for rename2
  f2fs: avoid duplicated code by reusing f2fs_read_end_io
  f2fs crypto: use per-inode tfm structure
  f2fs: recovering broken superblock during mount
  ...
2015-06-24 20:38:29 -07:00
Chao Yu
3c45414527 f2fs: do not trim preallocated blocks when truncating after i_size
When we perform generic/092 in xfstests, output is like below:

     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     0: [0..10239]: data
     0: [0..10239]: data
    -1: [10240..20479]: unwritten
    +1: [10240..14335]: unwritten

This is because with this testcase, we redefine the regulation for
truncate in perallocated space past i_size as below:

"There was some confused about what the fs was supposed to do when you
truncate at i_size with preallocated space past i_size. We decided on the
following things.

1) truncate(i_size) will trim all blocks past i_size.
2) truncate(x) where x > i_size will not trim all blocks past i_size.
"

This method is used in xfs, and then ext4/btrfs will follow the rule.

This patch fixes to follow the new rule for f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-11 18:30:49 -07:00
Jaegeuk Kim
43f54cd52f f2fs crypto: add alloc_bounce_page
This patch adds alloc_bounce_page likewise ext4.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-11 15:04:20 -07:00
Jaegeuk Kim
7e8e754a4b f2fs crypto: fix to handle errors likewise ext4
This patch makes some error handling policies same with ext4.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-11 15:00:36 -07:00
Jaegeuk Kim
de6a8ec982 f2fs: drop the volatile_write flag only
When aborting volatile_writes, let's drop its flag and give up any further
volatile_writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-09 13:56:47 -07:00
Chao Yu
c5bda1c8b1 f2fs: skip committing valid superblock
In recovery procedure for superblock, we try to write data of valid
superblock into invalid one for recovery, work should be finished here,
but then still we will write the valid one with its original data.
This operation is not needed. Let's skip doing this unnecessary work.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-08 11:20:51 -07:00
Chao Yu
09d54cdd20 f2fs: setting discard option in parse_options()
For the first mount of f2fs image with realtime discard option, we will
disable discard option if device is not supported, but for remount
operation, our discard option can still be set, this should be avoided.

This patch moves configuring of discard option to parse_options() to fix
this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-08 11:09:19 -07:00
Jaegeuk Kim
f56aa1c57e f2fs: fix to return exact trimmed size
Now, we add all the candidates for trim commands and then finally issue
discard commands.
So, we should count the trimmed size in back-end.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-02 15:48:20 -07:00
Chao Yu
f62185d0e2 f2fs: support FALLOC_FL_INSERT_RANGE
FALLOC_FL_INSERT_RANGE flag for ->fallocate was introduced in commit
dd46c78778 ("fs: Add support FALLOC_FL_INSERT_RANGE for fallocate").

The effect of FALLOC_FL_INSERT_RANGE command is the opposite of
FALLOC_FL_COLLAPSE_RANGE, if this command was performed, all data from
offset to EOF in our file will be shifted to right as given length, and
then range [offset, offset + length] becomes a hole.

This command is useful for our user who wants to add some data in the
middle of the file, for example: video/music editor will insert a keyframe
in specified position of media file, with this command we can easily create
a hole for inserting without removing original data.

This patch introduces f2fs_insert_range() to support FALLOC_FL_INSERT_RANGE.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-02 09:53:27 -07:00
Chao Yu
528e34593d f2fs: hide common code in f2fs_replace_block
This patch clean up codes through:
1.rename f2fs_replace_block to __f2fs_replace_block().
2.introduce new f2fs_replace_block() to include __f2fs_replace_block()
and some common related codes around __f2fs_replace_block().

Then, newly introduced function f2fs_replace_block can be used by
following patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-02 09:52:07 -07:00
Tejun Heo
66114cad64 writeback: separate out include/linux/backing-dev-defs.h
With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.

This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it.  c files
which need access to more backing-dev details now include
backing-dev.h directly.  This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.

v2: fs/fat build failure fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Tejun Heo
a88a341a73 writeback: move bandwidth related fields from backing_dev_info into bdi_writeback
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear.  For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi.  To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.

This patch moves bandwidth related fields from backing_dev_info into
bdi_writeback.

* The moved fields are: bw_time_stamp, dirtied_stamp, written_stamp,
  write_bandwidth, avg_write_bandwidth, dirty_ratelimit,
  balanced_dirty_ratelimit, completions and dirty_exceeded.

* writeback_chunk_size() and over_bground_thresh() now take @wb
  instead of @bdi.

* bdi_writeout_fraction(bdi, ...)	-> wb_writeout_fraction(wb, ...)
  bdi_dirty_limit(bdi, ...)		-> wb_dirty_limit(wb, ...)
  bdi_position_ration(bdi, ...)		-> wb_position_ratio(wb, ...)
  bdi_update_writebandwidth(bdi, ...)	-> wb_update_write_bandwidth(wb, ...)
  [__]bdi_update_bandwidth(bdi, ...)	-> [__]wb_update_bandwidth(wb, ...)
  bdi_{max|min}_pause(bdi, ...)		-> wb_{max|min}_pause(wb, ...)
  bdi_dirty_limits(bdi, ...)		-> wb_dirty_limits(wb, ...)

* Init/exits of the relocated fields are moved to bdi_wb_init/exit()
  respectively.  Note that explicit zeroing is dropped in the process
  as wb's are cleared in entirety anyway.

* As there's still only one bdi_writeback per backing_dev_info, all
  uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
  introducing no behavior changes.

v2: Typo in description fixed as suggested by Jan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:34 -06:00
Chenxi Mao
96c6dd59bf f2fs: disable the discard option when device doesn't support
Current f2fs check the whether the blk device can support discard.
However, the code will cause the discard option cannot be enabled.
Because the clear_opt(sbi, DISCARD) will be invoked forever.

This patch can fix this issue.

Jaegeuk Kim:
 The original patch was intended to disable the discard option when device
 does not support trim command.
 Rather than remaining the buggy patch, let's replace with this patch as
 an integrated one.

Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:32 -07:00
Jaegeuk Kim
4683ff837c f2fs crypto: remove alloc_page for bounce_page
We don't need to call alloc_page() prior to mempool_alloc(), since the
mempool_alloc() calls alloc_page() internally.
And, if __GFP_WAIT is set, it never fails on page allocation, so let's
give GFP_NOWAIT and handle ENOMEM by writepage().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:10 -07:00
Jaegeuk Kim
9236cac566 f2fs: fix a deadlock for summary page lock vs. sentry_lock
In f2fs_gc:                      In f2fs_replace_block:
 - lock_page(sum_page)
  - check_valid_map()            - mutex_lock(sentry_lock)
   - mutex_lock(sentry_lock)     - change_curseg()
                                  - lock_page(sum_page)

This patch fixes the deadlock condition.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:09 -07:00
Jaegeuk Kim
e5e0906b6b f2fs crypto: clean up error handling in f2fs_fname_setup_filename
Sync with:
  ext4 crypto: clean up error handling in ext4_fname_setup_filename

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:08 -07:00
Jaegeuk Kim
e992e238ff f2fs crypto: avoid f2fs_inherit_context for symlink
This patch fixes to call f2fs_inherit_context twice for newly created symlink.
The original one is called by f2fs_add_link(), which invokes f2fs_setxattr.
If the second one is called again, f2fs_setxattr is triggered again with same
encryption index.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:07 -07:00
Chao Yu
4637fd11ff f2fs crypto: do not set encryption policy for non-directory by ioctl
Encryption policy should only be set to an empty directory through ioctl,
This patch add a judgement condition to verify type of the target inode
to avoid incorrectly configuring for non-directory.

Additionally, remove unneeded inline data conversion since regular or symlink
file should not be processed here.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:07 -07:00
Chao Yu
81b0a8ffaa f2fs crypto: allow setting encryption policy once
This patch add XATTR_CREATE flag in setxattr when setting encryption
context for inode. Without this flag the context could be set more than
once, this should never happen. So, fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:06 -07:00
Chao Yu
d3baf7c472 f2fs crypto: check context consistent for rename2
For exchange rename, we should check context consistent of encryption
between new_dir and old_inode or old_dir and new_inode. Otherwise
inheritance of parent's encryption context will be broken.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: sync with ext4 approach]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:05 -07:00
Chao Yu
1237702471 f2fs: avoid duplicated code by reusing f2fs_read_end_io
This patch tries to clean up code because part code of f2fs_read_end_io
and mpage_end_io are the same, so it's better to merge and reuse them.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:04 -07:00
Jaegeuk Kim
26bf3dc7e2 f2fs crypto: use per-inode tfm structure
This patch applies the following ext4 patch:

  ext4 crypto: use per-inode tfm structure

As suggested by Herbert Xu, we shouldn't allocate a new tfm each time
we read or write a page.  Instead we can use a single tfm hanging off
the inode's crypt_info structure for all of our encryption needs for
that inode, since the tfm can be used by multiple crypto requests in
parallel.

Also use cmpxchg() to avoid races that could result in crypt_info
structure getting doubly allocated or doubly freed.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:04 -07:00
hujianyang
da554e48ca f2fs: recovering broken superblock during mount
This patch recovers a broken superblock with the other valid one.

Signed-off-by: hujianyang <hujianyang@huawei.com>
[Jaegeuk Kim: reinitialize local variables in f2fs_fill_super for retrial]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:03 -07:00
Jaegeuk Kim
304eecc346 f2fs crypto: check encryption for tmpfile
This patch adds to check encryption for tmpfile in early stage.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:02 -07:00
Chao Yu
7e01e7ad74 f2fs: support RENAME_WHITEOUT
As the description of rename in manual, RENAME_WHITEOUT is a special operation
that only makes sense for overlay/union type filesystem.

When performing rename with RENAME_WHITEOUT, dst will be replace with src, and
meanwhile, a 'whiteout' will be create with name of src.

A "whiteout" is designed to be a char device with 0,0 device number, it has
specially meaning for stackable filesystem. In these filesystems, there are
multiple layers exist, and only top of these can be modified. So a whiteout
in top layer is used to hide a corresponding file in lower layer, as well
removal of whiteout will make the file appear.

Now in overlayfs, when we rename a file which is exist in lower layer, it
will be copied up to upper if it is not on upper layer yet, and then rename
it on upper layer, source file will be whiteouted to hide corresponding file
in lower layer at the same time.

So in upper layer filesystem, implementation of RENAME_WHITEOUT provide a
atomic operation for stackable filesystem to support rename operation.

There are multiple ways to implement RENAME_WHITEOUT in log of this commit:
7dcf5c3e45 ("xfs: add RENAME_WHITEOUT support") which pointed out by
Dave Chinner.

For now, we just try to follow the way that xfs/ext4 use.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:01 -07:00
Chao Yu
381722d2ac f2fs: introduce update_meta_page
Add a help function update_meta_page() to update meta page with specified
buffer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:00 -07:00
Chao Yu
cb5c94cf3a f2fs crypto: zero next free dnode block
Now page cache of meta inode is used by garbage collection for encrypted page,
it may contain random data, so we should zero it before issuing discard.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:00 -07:00
Jaegeuk Kim
cfc4d971df f2fs crypto: split f2fs_crypto_init/exit with two parts
This patch splits f2fs_crypto_init/exit with two parts: base initialization and
memory allocation.

Firstly, f2fs module declares the base encryption memory pointers.
Then, allocating internal memories is done at the first encrypted inode access.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:59 -07:00
Chao Yu
b9da898b05 f2fs crypto: fix incorrect release for crypto ctx
When encryption feature is enable, if we rmmod f2fs module,
we will encounter a stack backtrace reported in syslog:

"BUG: Bad page state in process rmmod  pfn:aaf8a
page:f0f4f148 count:0 mapcount:129 mapping:ee2f4104 index:0x80
flags: 0xee2830a4(referenced|lru|slab|private_2|writeback|swapbacked|mlocked)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x2030a0(lru|slab|private_2|writeback|mlocked)
Modules linked in: f2fs(O-) fuse bnep rfcomm bluetooth dm_crypt binfmt_misc snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm
snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device joydev ppdev mac_hid lp hid_generic i2c_piix4
parport_pc psmouse snd serio_raw parport soundcore ext4 jbd2 mbcache usbhid hid e1000 [last unloaded: f2fs]
CPU: 1 PID: 3049 Comm: rmmod Tainted: G    B      O    4.1.0-rc3+ #10
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000000 00000000 c0021eb4 c15b7518 f0f4f148 c0021ed8 c112e0b7 c1779174
c9b75674 000aaf8a 01b13ce1 c17791a4 f0f4f148 ee2830a4 c0021ef8 c112e3c3
00000000 f0f4f148 c0021f34 f0f4f148 ee2830a4 ef9f0000 c0021f20 c112fdf8
Call Trace:
[<c15b7518>] dump_stack+0x41/0x52
[<c112e0b7>] bad_page.part.72+0xa7/0x100
[<c112e3c3>] free_pages_prepare+0x213/0x220
[<c112fdf8>] free_hot_cold_page+0x28/0x120
[<c1073380>] ? try_to_wake_up+0x2b0/0x2b0
[<c112ff15>] __free_pages+0x25/0x30
[<c112c4fd>] mempool_free_pages+0xd/0x10
[<c112c5f1>] mempool_free+0x31/0x90
[<f0f441cf>] f2fs_exit_crypto+0x6f/0xf0 [f2fs]
[<f0f456c4>] exit_f2fs_fs+0x23/0x95f [f2fs]
[<c10c30e0>] SyS_delete_module+0x130/0x180
[<c11556d6>] ? vm_munmap+0x46/0x60
[<c15bd888>] sysenter_do_call+0x12/0x12"

The reason is that:

since commit 0827e645fd35
("f2fs crypto: shrink size of the f2fs_crypto_ctx structure") is merged,
some fields in f2fs_crypto_ctx structure are merged into a union as they
will never be used simultaneously in write path, read path or on free list.

In f2fs_exit_crypto, we traverse each crypto ctx from free list, in this
moment, our free_list field in union is valid, but still we will try to
release memory space which is pointed by other invalid field in union
structure for each ctx.

Then the error occurs, let's fix it with this patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:58 -07:00
Chao Yu
7bf4b5576a f2fs crypto: fix to release buffer for fname crypto
This patch fixes memory leak issue in error path of f2fs_fname_setup_filename().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:57 -07:00
Jaegeuk Kim
ca40b03052 f2fs crypto: shrink size of the f2fs_crypto_ctx structure
This patch integrates the below patch into f2fs.

"ext4 crypto: shrink size of the ext4_crypto_ctx structure

Some fields are only used when the crypto_ctx is being used on the
read path, some are only used on the write path, and some are only
used when the structure is on free list.  Optimize memory use by using
a union."

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:57 -07:00
Jaegeuk Kim
640778fbc9 f2fs crypto: get rid of ci_mode from struct f2fs_crypt_info
This patch integrates the below patch into f2fs.

"ext4 crypto: get rid of ci_mode from struct ext4_crypt_info

The ci_mode field was superfluous, and getting rid of it gets rid of
an unused hole in the structure."

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:56 -07:00
Jaegeuk Kim
8bacf6deb0 f2fs crypto: use slab caches
This patch integrates the below patch into f2fs.

"ext4 crypto: use slab caches

Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for
slighly better memory efficiency and debuggability."

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:55 -07:00
Jaegeuk Kim
06e1bc05ca f2fs: truncate data blocks for orphan inode
As Hu reported, F2FS has a space leak problem, when conducting:

1) format a 4GB f2fs partition
2) dd a 3G file,
3) unlink it.

So, when doing f2fs_drop_inode(), we need to truncate data blocks
before skipping it.
We can also drop unused caches assigned to each inode.

Reported-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:54 -07:00
Dan Carpenter
912a83b509 f2fs: cleanup a confusing indent
The return was not indented far enough so it looked like it was supposed
to go with the other if statement.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:53 -07:00
Arnd Bergmann
7beb428eda f2fs: fix building on 32-bit architectures
A bug fix to the debug output extended the type of some local
variables to 64-bit, which now causes the kernel to fail building
because of missing 64-bit division functions:

ERROR: "__aeabi_uldivmod" [fs/f2fs/f2fs.ko] undefined!

In the kernel, we have to use div_u64 or do_div to do this,
in order to annotate that this is an expensive operation.

As the function is only called for debug out, we know this
is not performance critical, so it is safe to use div_u64.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: d1f85bd38db19 ("f2fs: avoid value overflow in showing current status")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:53 -07:00
Jaegeuk Kim
e19ef527aa f2fs: avoid buggy functions
This patch avoids to use a buggy function for now.
It needs to fix them later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:52 -07:00
hujianyang
08b95126c7 f2fs: add compat_ioctl to provide backward compatability
introduce compat_ioctl to regular files, but doesn't add this
functionality to f2fs_dir_operations.

While running a 32-bit busybox, I met an error like this:
(A is a directory)

chattr: reading flags on A: Inappropriate ioctl for device

This patch copies compat_ioctl from f2fs_file_operations and
fix this problem.

Signed-off-by: hujianyang <hujianyang@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:51 -07:00
Jaegeuk Kim
40a02be178 f2fs: do not issue next dnode discard redundantly
We have a discard map, so that we can avoid redundant discard issues.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:20:50 -07:00
Yunlei He
498c5e9fcd f2fs: add default mount options to remount
I use f2fs filesystem with /data partition on my Android phone
by the default mount options. When I remount /data in order to
adding discard option to run some benchmarks, I find the default
options such as background_gc, user_xattr and acl turned off.

So I introduce a function named default_options in super.c. It do
some default setting, and both mount and remount operations will
call this function to complete default setting.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:58 -07:00
Jaegeuk Kim
d690358b2b f2fs crypto: remove checking key context during lookup
No matter what the key is valid or not, readdir shows the dir entries correctly.
So, lookup should not failed.
But, we expect further accesses should be denied from open, rename, link, and so
on.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:57 -07:00
Jaegeuk Kim
edf3fb8e9e f2fs crypto: fix missing key when reading a page
1. mount $mnt
2. cp data $mnt/
3. umount $mnt
4. log out
5. log in
6. cat $mnt/data

-> panic, due to no i_crypt_info.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:56 -07:00
Jaegeuk Kim
cbaf042a3c f2fs crypto: add symlink encryption
This patch implements encryption support for symlink.

Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:55 -07:00
Jaegeuk Kim
e7d5545285 f2fs crypto: add filename encryption for roll-forward recovery
This patch adds a bit flag to indicate whether or not i_name in the inode
is encrypted.

If this name is encrypted, we can't do recover_dentry during roll-forward.
So, f2fs_sync_file() needs to do checkpoint, if this will be needed in future.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:55 -07:00
Jaegeuk Kim
6e22c691ba f2fs crypto: add filename encryption for f2fs_lookup
This patch implements filename encryption support for f2fs_lookup.

Note that, f2fs_find_entry should be outside of f2fs_(un)lock_op().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:54 -07:00
Jaegeuk Kim
d8c6822a05 f2fs crypto: add filename encryption for f2fs_readdir
This patch implements filename encryption support for f2fs_readdir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:53 -07:00
Jaegeuk Kim
9ea97163c6 f2fs crypto: add filename encryption for f2fs_add_link
This patch adds filename encryption support for f2fs_add_link.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:52 -07:00
Jaegeuk Kim
4375a33664 f2fs crypto: add encryption support in read/write paths
This patch adds encryption support in read and write paths.

Note that, in f2fs, we need to consider cleaning operation.
In cleaning procedure, we must avoid encrypting and decrypting written blocks.
So, this patch implements move_encrypted_block().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:52 -07:00
Jaegeuk Kim
fcc85a4d86 f2fs crypto: activate encryption support for fs APIs
This patch activates the following APIs for encryption support.

The rules quoted by ext4 are:
 - An unencrypted directory may contain encrypted or unencrypted files
   or directories.
 - All files or directories in a directory must be protected using the
   same key as their containing directory.
 - Encrypted inode for regular file should not have inline_data.
 - Encrypted symlink and directory may have inline_data and inline_dentry.

This patch activates the following APIs.
1. f2fs_link              : validate context
2. f2fs_lookup            :      ''
3. f2fs_rename            :      ''
4. f2fs_create/f2fs_mkdir : inherit its dir's context
5. f2fs_direct_IO         : do buffered io for regular files
6. f2fs_open              : check encryption info
7. f2fs_file_mmap         :      ''
8. f2fs_setattr           :      ''
9. f2fs_file_write_iter   :      ''           (Called by sys_io_submit)
10. f2fs_fallocate        : do not support fcollapse
11. f2fs_evict_inode      : free_encryption_info

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:51 -07:00
Jaegeuk Kim
6b3bd08f93 f2fs crypto: filename encryption facilities
This patch adds filename encryption infra.
Most of codes are copied from ext4 part, but changed to adjust f2fs
directory structure.

Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:50 -07:00
Jaegeuk Kim
0adda907f2 f2fs crypto: add encryption key management facilities
This patch copies from encrypt_key.c in ext4, and modifies for f2fs.

Use GFP_NOFS, since _f2fs_get_encryption_info is called under f2fs_lock_op.

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:49 -07:00
Jaegeuk Kim
57e5055b0a f2fs crypto: add f2fs encryption facilities
Most of parts were copied from ext4, except:

 - add f2fs_restore_and_release_control_page which returns control page and
   restore control page
 - remove ext4_encrypted_zeroout()
 - remove sbi->s_file_encryption_mode & sbi->s_dir_encryption_mode
 - add f2fs_end_io_crypto_work for mpage_end_io

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:49 -07:00
Jaegeuk Kim
f424f664f0 f2fs crypto: add encryption policy and password salt support
This patch adds encryption policy and password salt support through ioctl
implementation.

It adds three ioctls:
 F2FS_IOC_SET_ENCRYPTION_POLICY,
 F2FS_IOC_GET_ENCRYPTION_POLICY,
 F2FS_IOC_GET_ENCRYPTION_PWSALT, which use xattr operations.

Note that, these definition and codes are taken from ext4 crypto support.
For f2fs, xattr operations and on-disk flags for superblock and inode were
changed.

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:48 -07:00
Jaegeuk Kim
b93531dd7b f2fs crypto: add encryption xattr support
This patch add some definition for enrcyption xattr.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:47 -07:00
Jaegeuk Kim
d33793fb89 f2fs crypto: add f2fs encryption Kconfig
This patch adds f2fs encryption config.

This patch integrates:

"ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is enabled

On arm64 this is apparently needed for CTS mode to function correctly.
Otherwise attempts to use CTS return ENOENT."

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:46 -07:00
Jaegeuk Kim
cde4de1205 f2fs crypto: declare some definitions for f2fs encryption feature
This definitions will be used by inode and superblock for encyption.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:45 -07:00
Jaegeuk Kim
7f63eb77af f2fs: report unwritten area in f2fs_fiemap
This patch slightly changes f2fs_fiemap function to report unwritten area.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:45 -07:00
Jaegeuk Kim
3589a9190b f2fs: avoid value overflow in showing current status
This patch fixes overflow when do cat /sys/kernel/debug/f2fs/status.
If a section is relatively large, dist value can be overflowed.

Reported-by: Yossi Goldfill <ygoldfill@radianmemory.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:44 -07:00
Chao Yu
75cd4e098d f2fs: support FALLOC_FL_ZERO_RANGE
Now, FALLOC_FL_ZERO_RANGE flag in ->fallocate is supported in ext4/xfs.

In commit, the semantics of this flag is descripted as following:"
1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
   block number is not the starting block of the extent, split the extent
   such that the block number is the starting block of the extent.
4) Shift all the extents which are lying between
   [offset, last allocated extent] towards right by len bytes. This step
   will make a hole of len bytes at offset."

This patch implements fallocate's FALLOC_FL_ZERO_RANGE for f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:43 -07:00
Chao Yu
b4ace33703 f2fs: support FALLOC_FL_COLLAPSE_RANGE
Now, FALLOC_FL_COLLAPSE_RANGE flag in ->fallocate is supported in ext4/xfs.

In commit, the semantics of this flag is descripted as following:"
1) It collapses the range lying between offset and length by removing any
   data blocks which are present in this range and than updates all the
   logical offsets of extents beyond "offset + len" to nullify the hole
   created by removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
   in case of xfs and ext4.
4) Collaspe range does not work beyond i_size."

This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:42 -07:00
Chao Yu
19f106bc03 f2fs: introduce f2fs_replace_block() for reuse
Introduce a generic function replace_block base on recover_data_page,
and export it. So with it we can operate file's meta data which is in
CP/SSA area when we invoke fallocate with FALLOC_FL_COLLAPSE_RANGE
flag.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:42 -07:00
Chao Yu
d5b692b786 f2fs: do not re-lookup nat cache with same nid
In set_node_addr, we try to lookup cached nat entry of inode and then
set flag in it.

But previously in this function, we have already grabbed nat entry with
current node id, if the node id is the same as the one of inode, we
do not need to lookup it in cache again.

So this patch adds condition judgment for reducing unneeded lookup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:41 -07:00
Chao Yu
402f721d2f f2fs: remove unneeded f2fs_make_empty declaration
Remove f2fs_make_empty() declaration, since the main body of this function
is move into do_make_empty_dir() and the function is obsolete now.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:40 -07:00
Jaegeuk Kim
836b5a6356 f2fs: issue discard with finally produced len and minlen
This patch determines to issue discard commands by comparing given minlen and
the length of produced final candidates.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:39 -07:00
Jaegeuk Kim
a66cdd9855 f2fs: introduce discard_map for f2fs_trim_fs
This patch adds a bitmap for discard issues from f2fs_trim_fs.
There-in rule is to issue discard commands only for invalidated blocks
after mount.
Once mount is done, f2fs_trim_fs trims out whole invalid area.
After ehn, it will not issue and discrads redundantly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:39 -07:00
Jaegeuk Kim
d6c67a4fee f2fs: revmove spin_lock for write_orphan_inodes
This patch removes spin_lock, since this is covered by f2fs_lock_op already.
And, we should avoid to use page operations inside spin_lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:38 -07:00
Jaegeuk Kim
43f3eae1d3 f2fs: split find_data_page according to specific purposes
This patch splits find_data_page as follows.

1. f2fs_gc
 - use get_read_data_page() with read only

2. find_in_level
 - use find_data_page without locked page

3. truncate_partial_page
 - In the case cache_only mode, just drop cached page.
 - Ohterwise, use get_lock_data_page() and guarantee to truncate

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:37 -07:00
Jaegeuk Kim
2fb2c95496 f2fs: fix counting the number of inline_data inodes
This patch fixes to count the missing symlink case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:36 -07:00
Jaegeuk Kim
2dcf51ab2f f2fs: add need_dentry_mark
This patch introduces need_dentry_mark() to clean up and avoid redundant
node locks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:36 -07:00
Jaegeuk Kim
01f28610a1 f2fs: fix race on allocating and deallocating a dentry block
There are two threads:
 f2fs_delete_entry()              get_new_data_page()
                                  f2fs_reserve_block()
				  dn.blkaddr = XXX
 lock_page(dentry_block)
 truncate_hole()
 dn.blkaddr = NULL
 unlock_page(dentry_block)
                                  lock_page(dentry_block)
                                  fill the block from XXX address
                                  add new dentries
                                  unlock_page(dentry_block)

Later, f2fs_write_data_page() will truncate the dentry_block, since
its block address is NULL.

The reason for this was due to the wrong lock order.
In this case, we should do f2fs_reserve_block() after locking its dentry block.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:35 -07:00
Jaegeuk Kim
eaa693f4dc f2fs: introduce dot and dotdot name check
This patch adds an inline function to check dot and dotdot names.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:34 -07:00
Jaegeuk Kim
c879f90da9 f2fs: move get_page for gc victims
This patch moves getting victim page into move_data_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:33 -07:00
Jaegeuk Kim
05ca3632e5 f2fs: add sbi and page pointer in f2fs_io_info
This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure.
With this change, we can reduce a lot of parameters for IO functions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:32 -07:00
Jaegeuk Kim
01b960e94a f2fs: add f2fs_may_inline_{data, dentry}
This patch adds f2fs_may_inline_data and f2fs_may_inline_dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:32 -07:00
Jaegeuk Kim
06957e8fe6 f2fs: clean up f2fs_lookup
This patch cleans up to avoid deep indentation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:31 -07:00
Jaegeuk Kim
f1e8866016 f2fs: expose f2fs_mpage_readpages
This patch implements f2fs_mpage_readpages for further optimization on
encryption support.

The basic code was taken from fs/mpage.c, and changed to be simple by adjusting
that block_size is equal to page_size in f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:30 -07:00
Jaegeuk Kim
26d815ad75 f2fs: introduce f2fs_commit_super
This patch introduces f2fs_commit_super to write updated superblock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:29 -07:00
Jaegeuk Kim
003a3e1d60 f2fs: add f2fs_map_blocks
This patch introduces f2fs_map_blocks structure likewise ext4_map_blocks.
Now, f2fs uses f2fs_map_blocks when handling get_block.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:29 -07:00
Jaegeuk Kim
76f105a2db f2fs: add feature facility in superblock
This patch introduces a feature in superblock, which will indicate any new
features for f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:28 -07:00
Jaegeuk Kim
b5492af78c f2fs: move existing definitions into f2fs.h
This patch moves some inode-related definitions from node.h to f2fs.h to
add new features.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:27 -07:00
Al Viro
5f2c4179e1 switch ->put_link() from dentry to inode
only one instance looks at that argument at all; that sole
exception wants inode rather than dentry.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11 08:13:12 -04:00
Al Viro
6e77137b36 don't pass nameidata to ->follow_link()
its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidata

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-10 22:20:15 -04:00
Al Viro
680baacbca new ->follow_link() and ->put_link() calling conventions
a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body.  Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks.  Stored pointer
is ignored in all cases except the last one.

Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().

b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is.  In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-10 22:19:45 -04:00
Chao Yu
2aa7c51a45 f2fs: make has_fsynced_inode static
has_fsynced_inode() has no other caller out of node.c, make it static.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-07 11:38:32 -07:00
Taehee Yoo
587c0a4255 f2fs: add offset check routine before punch_hole() in f2fs_fallocate()
In the punch_hole(), if offset bigger than inode size, it returns SUCCESS.
Then f2fs_fallocate() will update time and dirty mark.
In that case, inode has not been modified actually.
So I have added offset check routine that prevent to call the punch_hole().

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-07 11:38:32 -07:00
Chao Yu
f0c9cadae6 f2fs: use is_valid_blkaddr to verify blkaddr for readability
Export is_valid_blkaddr() and use it to replace some codes for readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-07 11:38:32 -07:00
Chao Yu
272e083f7a f2fs: make posix_acl_create() safer and cleaner
Our f2fs_acl_create is copied from posix_acl_create in ./fs/posix_acl.c and
modified to avoid deadlock bug when inline_dentry feature is enabled.

Dan Carpenter rewrites posix_acl_create in commit 2799563b281f
("fs/posix_acl.c: make posix_acl_create() safer and cleaner") to make this
function more safer, so that we can avoid potential bug in its caller,
especially for ocfs2.

Let's back port the patch to f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-07 11:38:31 -07:00
Jaegeuk Kim
7263b1bd04 f2fs: fix wrong error hanlder in f2fs_follow_link
The page_follow_link_light returns NULL and its error pointer was remained
in nd->path.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-04 14:15:16 -07:00
Jaegeuk Kim
5463e7c18e Revert "f2fs: enhance multi-threads performance"
This reports performance regression by Yuanhan Liu.
The basic idea was to reduce one-point mutex, but it turns out this causes
another contention like context swithes.

https://lkml.org/lkml/2015/4/21/11

Until finishing the analysis on this issue, I'd like to revert this for a while.

This reverts commit 78373b7319.
2015-05-04 14:15:15 -07:00
Linus Torvalds
9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Linus Torvalds
06a60deca8 Merge tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - in-memory extent_cache
   - fs_shutdown to test power-off-recovery
   - use inline_data to store symlink path
   - show f2fs as a non-misc filesystem

  Major fixes:
   - avoid CPU stalls on sync_dirty_dir_inodes
   - fix some power-off-recovery procedure
   - fix handling of broken symlink correctly
   - fix missing dot and dotdot made by sudden power cuts
   - handle wrong data index during roll-forward recovery
   - preallocate data blocks for direct_io

  ... and a bunch of minor bug fixes and cleanups"

* tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits)
  f2fs: pass checkpoint reason on roll-forward recovery
  f2fs: avoid abnormal behavior on broken symlink
  f2fs: flush symlink path to avoid broken symlink after POR
  f2fs: change 0 to false for bool type
  f2fs: do not recover wrong data index
  f2fs: do not increase link count during recovery
  f2fs: assign parent's i_mode for empty dir
  f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
  f2fs: fix mismatching lock and unlock pages for roll-forward recovery
  f2fs: fix sparse warnings
  f2fs: limit b_size of mapped bh in f2fs_map_bh
  f2fs: persist system.advise into on-disk inode
  f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
  f2fs: preallocate fallocated blocks for direct IO
  f2fs: enable inline data by default
  f2fs: preserve extent info for extent cache
  f2fs: initialize extent tree with on-disk extent info of inode
  f2fs: introduce __{find,grab}_extent_tree
  f2fs: split set_data_blkaddr from f2fs_update_extent_cache
  f2fs: enable fast symlink by utilizing inline data
  ...
2015-04-18 11:17:20 -04:00
Jaegeuk Kim
10027551cc f2fs: pass checkpoint reason on roll-forward recovery
This patch adds CP_RECOVERY to remain recovery information for checkpoint.
And, it makes sure writing checkpoint in this case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-16 09:45:40 -07:00
Jaegeuk Kim
feb7cbb079 f2fs: avoid abnormal behavior on broken symlink
When f2fs_symlink was triggered and checkpoint was done before syncing its
link path, f2fs can get broken symlink like "xxx -> \0\0\0".
This incurs abnormal path_walk by VFS.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-16 09:45:40 -07:00
Jaegeuk Kim
d0cae97cb6 f2fs: flush symlink path to avoid broken symlink after POR
This patch tries to avoid broken symlink case after POR in best effort.
This results in performance regression.
But, if f2fs has inline_data and the target path is under 3KB-sized long,
the page would be stored in its inode_block, so that there would be no
performance regression.

Note that, if user wants to keep this file atomically, it needs to trigger
dir->fsync.
And, there is still a hole to produce broken symlink.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-16 09:45:35 -07:00
Taehee Yoo
9df47ba759 f2fs: change 0 to false for bool type
in the f2fs_fill_super function, variable "retry" is bool type
i think that it should be set as false.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-15 16:13:28 -07:00
David Howells
2b0143b5c9 VFS: normal filesystems (and lustre): d_inode() annotations
that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:57 -04:00
Omar Sandoval
22c6186ece direct_IO: remove rw from a_ops->direct_IO()
Now that no one is using rw, remove it completely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:45 -04:00
Omar Sandoval
6f67376318 direct_IO: use iov_iter_rw() instead of rw everywhere
The rw parameter to direct_IO is redundant with iov_iter->type, and
treated slightly differently just about everywhere it's used: some users
do rw & WRITE, and others do rw == WRITE where they should be doing a
bitwise check. Simplify this with the new iov_iter_rw() helper, which
always returns either READ or WRITE.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:45 -04:00
Omar Sandoval
17f8c842d2 Remove rw from {,__,do_}blockdev_direct_IO()
Most filesystems call through to these at some point, so we'll start
here.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:44 -04:00
Al Viro
5d5d568975 make new_sync_{read,write}() static
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:40 -04:00
Jaegeuk Kim
e03b07d908 f2fs: do not recover wrong data index
During the roll-forward recovery, if we found a new data index written fsync
lastly, we need to recover new block address.
But, if that address was corrupted, we should not recover that.
Otherwise, f2fs gets kernel panic from:

 In check_index_in_prev_nodes(),

    sentry = get_seg_entry(sbi, segno);
             --------------------------> out-of-range segno.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:59 -07:00
Jaegeuk Kim
418f6c2770 f2fs: do not increase link count during recovery
If there are multiple fsynced dnodes having a dent flag, roll-forward routine
sets FI_INC_LINK for their inode, and recovery_dentry increases its link count
accordingly.
That results in normal file having a link count as 2, so we can't unlink those
files.

This was added to handle several inode blocks having same inode number with
different directory paths.
But, current f2fs doesn't replay all of path changes and only recover its dentry
for the last fsynced inode block.
So, there is no reason to do this.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:58 -07:00
Jaegeuk Kim
cb58463bc9 f2fs: assign parent's i_mode for empty dir
When assigning i_mode for dotdot, it needs to assign parent's i_mode.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:58 -07:00
Jaegeuk Kim
510022a858 f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
If f2fs was corrupted with missing dot dentries, it needs to recover them after
fsck.f2fs detection.

The underlying precedure is:

1. The fsck.f2fs remains F2FS_INLINE_DOTS flag in directory inode, if it detects
missing dot dentries.

2. When f2fs looks up the corrupted directory, it triggers f2fs_add_link with
proper inode numbers and their dot and dotdot names.

3. Once f2fs recovers the directory without errors, it removes F2FS_INLINE_DOTS
finally.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:57 -07:00
Jaegeuk Kim
c9ef481097 f2fs: fix mismatching lock and unlock pages for roll-forward recovery
Previously, inode page is not correctly locked and unlocked in pair during
the roll-forward recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:56 -07:00
Jaegeuk Kim
adad81ed42 f2fs: fix sparse warnings
This patch fixes the below warning.

sparse warnings: (new ones prefixed by >>)

>> fs/f2fs/inode.c:56:23: sparse: restricted __le32 degrades to integer
>> fs/f2fs/inode.c:56:52: sparse: restricted __le32 degrades to integer

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:55 -07:00
Chao Yu
1b3e27a92a f2fs: limit b_size of mapped bh in f2fs_map_bh
Map bh over max size which caller defined is not needed, limit it in
f2fs_map_bh.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:55 -07:00
Chao Yu
30c62fdb25 f2fs: persist system.advise into on-disk inode
This patch fixes to dirty inode for persisting i_advise of f2fs inode info into
on-disk inode if user sets system.advise through setxattr. Otherwise the new
value will be lost.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:54 -07:00
Chao Yu
84e97c2767 f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
We will encounter oops by executing below command.
getfattr -n system.advise /mnt/f2fs/file
Killed

message log:
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<f8b54d69>] f2fs_xattr_advise_get+0x29/0x40 [f2fs]
*pdpt = 00000000319b7001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
Modules linked in: f2fs(O) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev
snd_seq_device snd_timer bnep snd rfcomm microcode bluetooth soundcore i2c_piix4 mac_hid serio_raw parport_pc ppdev lp parport
binfmt_misc hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
CPU: 3 PID: 3134 Comm: getfattr Tainted: G           O    4.0.0-rc1 #6
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f3a71b60 ti: f19a6000 task.ti: f19a6000
EIP: 0060:[<f8b54d69>] EFLAGS: 00010246 CPU: 3
EIP is at f2fs_xattr_advise_get+0x29/0x40 [f2fs]
EAX: 00000000 EBX: f19a7e71 ECX: 00000000 EDX: f8b5b467
ESI: 00000000 EDI: f2008570 EBP: f19a7e14 ESP: f19a7e08
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 00000000 CR3: 319b8000 CR4: 000007f0
Stack:
 f8b5a634 c0cbb580 00000000 f19a7e34 c1193850 00000000 00000007 f19a7e71
 f19a7e64 c0cbb580 c1193810 f19a7e50 c1193c00 00000000 00000000 00000000
 c0cbb580 00000000 f19a7f70 c1194097 00000000 00000000 00000000 74737973
Call Trace:
 [<c1193850>] generic_getxattr+0x40/0x50
 [<c1193810>] ? xattr_resolve_name+0x80/0x80
 [<c1193c00>] vfs_getxattr+0x70/0xa0
 [<c1194097>] getxattr+0x87/0x190
 [<c11801d7>] ? path_lookupat+0x57/0x5f0
 [<c11819d2>] ? putname+0x32/0x50
 [<c116653a>] ? kmem_cache_alloc+0x2a/0x130
 [<c11819d2>] ? putname+0x32/0x50
 [<c11819d2>] ? putname+0x32/0x50
 [<c11819d2>] ? putname+0x32/0x50
 [<c11827f9>] ? user_path_at_empty+0x49/0x70
 [<c118283f>] ? user_path_at+0x1f/0x30
 [<c11941e7>] path_getxattr+0x47/0x80
 [<c11948e7>] SyS_getxattr+0x27/0x30
 [<c163f748>] sysenter_do_call+0x12/0x12
Code: 66 90 55 89 e5 57 56 53 66 66 66 66 90 8b 78 20 89 d3 ba 67 b4 b5 f8 89 d8 89 ce e8 42 7c 7b c8 85 c0 75 16 0f b6 87 44 01 00
00 <88> 06 b8 01 00 00 00 5b 5e 5f 5d c3 8d 76 00 b8 ea ff ff ff eb
EIP: [<f8b54d69>] f2fs_xattr_advise_get+0x29/0x40 [f2fs] SS:ESP 0068:f19a7e08
CR2: 0000000000000000
---[ end trace 860260654f1f416a ]---

The reason is that in getfattr there are two steps which is indicated by strace info:
1) try to lookup and get size of specified xattr.
2) get value of the extented attribute.

strace info:
getxattr("/mnt/f2fs/file", "system.advise", 0x0, 0) = 1
getxattr("/mnt/f2fs/file", "system.advise", "\x00", 256) = 1

For the first step, getfattr may pass a NULL pointer in @value and zero in @size
as parameters for ->getxattr, but we access this @value pointer directly without
checking whether the pointer is valid or not in f2fs_xattr_advise_get, so the
oops occurs.

This patch fixes this issue by verifying @value pointer before using.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:53 -07:00
Chao Yu
df6136ef55 f2fs: preallocate fallocated blocks for direct IO
Normally, due to DIO_SKIP_HOLES flag is set by default, blockdev_direct_IO in
f2fs_direct_IO tries to skip DIO in holes when writing inside i_size, this
makes us falling back to buffered IO which shows lower performance.

So in commit 59b802e5a4 ("f2fs: allocate data blocks in advance for
f2fs_direct_IO"), we improve perfromance by allocating data blocks in advance
if we meet holes no matter in i_size or not, since with it we can avoid falling
back to buffered IO.

But we forget to consider for unwritten fallocated block in this commit.
This patch tries to fix it for fallocate case, this helps to improve
performance.

Test result:
Storage info: sandisk ultra 64G micro sd card.

touch /mnt/f2fs/file
truncate -s 67108864 /mnt/f2fs/file
fallocate -o 0 -l 67108864 /mnt/f2fs/file
time dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=64 conv=notrunc oflag=direct

Time before applying the patch:
67108864 bytes (67 MB) copied, 36.16 s, 1.9 MB/s
real    0m36.162s
user    0m0.000s
sys     0m0.180s

Time after applying the patch:
67108864 bytes (67 MB) copied, 27.7776 s, 2.4 MB/s
real    0m27.780s
user    0m0.000s
sys     0m0.036s

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:52 -07:00
Wanpeng Li
7534279798 f2fs: enable inline data by default
Enable inline_data feature by default since it brings us better
performance and space utilization and now has already stable.
Add another option noinline_data to disable it during mount.

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Suggested-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:52 -07:00
Chao Yu
0bdee48250 f2fs: preserve extent info for extent cache
This patch tries to preserve last extent info in extent tree cache into on-disk
inode, so this can help us to reuse the last extent info next time for
performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:51 -07:00
Chao Yu
028a41e893 f2fs: initialize extent tree with on-disk extent info of inode
With normal extent info cache, we records largest extent mapping between logical
block and physical block into extent info, and we persist extent info in on-disk
inode.

When we enable extent tree cache, if extent info of on-disk inode is exist, and
the extent is not a small fragmented mapping extent. We'd better to load the
extent info into extent tree cache when inode is loaded. By this way we can have
more chance to hit extent tree cache rather than taking more time to read dnode
page for block address.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:50 -07:00
Chao Yu
93dfc52656 f2fs: introduce __{find,grab}_extent_tree
This patch introduces __{find,grab}_extent_tree for reusing by following
patches.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:49 -07:00
Chao Yu
216a620a7c f2fs: split set_data_blkaddr from f2fs_update_extent_cache
Split __set_data_blkaddr from f2fs_update_extent_cache for readability.

Additionally rename __set_data_blkaddr to set_data_blkaddr for exporting.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:49 -07:00
Wanpeng Li
368a0e40b5 f2fs: enable fast symlink by utilizing inline data
Fast symlink can utilize inline data flow to avoid using any
i_addr region, since we need to handle many cases such as
truncation, roll-forward recovery, and fsck/dump tools.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:48 -07:00
Jaegeuk Kim
8ce67cb07d f2fs: add some tracepoints to debug volatile and atomic writes
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:47 -07:00
Jaegeuk Kim
3c6c2bebef f2fs: avoid punch_hole overhead when releasing volatile data
This patch is to avoid some punch_hole overhead when releasing volatile data.
If volatile data was not written yet, we just can make the first page as zero.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:46 -07:00
Jaegeuk Kim
83e21db693 f2fs: avoid wrong f2fs_bug_on when truncating inline_data
This patch removes wrong f2fs_bug_on in truncate_inline_inode.

When there is no space, it can happen a corner case where i_isze is over
MAX_INLINE_SIZE while its inode is still inline_data.

The scenario is
 1. write small data into file #A.
 2. fill the whole partition to 100%.
 3. truncate 4096 on file #A.
 4. write data at 8192 offset.
  --> f2fs_write_begin
    -> -ENOSPC = f2fs_convert_inline_page
    -> f2fs_write_failed
      -> truncate_blocks
        -> truncate_inline_inode
	  BUG_ON, since i_size is 4096.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:46 -07:00
Jaegeuk Kim
78373b7319 f2fs: enhance multi-threads performance
Previously, f2fs_write_data_pages has a mutex, sbi->writepages, to serialize
data writes to maximize write bandwidth, while sacrificing multi-threads
performance.
Practically, however, multi-threads environment is much more important for
users. So this patch tries to remove the mutex.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:45 -07:00
Jaegeuk Kim
3402e87cfb f2fs: set buffer_new when new blocks are allocated
This patch modifies to call set_buffer_new, if new blocks are allocated.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:44 -07:00
Chao Yu
2adc3505cf f2fs: set SBI_NEED_FSCK when encountering exception in recovery
This patch tries to set SBI_NEED_FSCK flag into sbi only when we fail to recover
in fill_super, so we could skip fscking image when we fail to fill super for
other reason.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:43 -07:00
Jaegeuk Kim
21cb1d99bc f2fs: fix to cover sentry_lock for block allocation
In the following call stack, f2fs changes the bitmap for dirty segments and # of
dirty sentries without grabbing sit_i->sentry_lock.
This can result in mismatch on bitmap and # of dirty sentries, since if there
are some direct_io operations.

In allocate_data_block,
 - __allocate_new_segments
  - mutex_lock(&curseg->curseg_mutex);
  - s_ops->allocate_segment
   - new_curseg/change_curseg
    - reset_curseg
     - __set_sit_entry_type
      - __mark_sit_entry_dirty
       - set_bit(dirty_sentries_bitmap)
       - dirty_sentries++;

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:43 -07:00
Chao Yu
d6d4f1cb91 f2fs: fix to check current blkaddr in __allocate_data_blocks
In __allocate_data_blocks, we should check current blkaddr which is located at
ofs_in_node of dnode page instead of checking first blkaddr all the time.
Otherwise we can only allocate one blkaddr in each dnode page. Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:42 -07:00
Chao Yu
0bfcfcca3d f2fs: fix to truncate inline data past EOF
Previously if inode is with inline data, we will try to invalid partial inline
data in page #0 when we truncate size of inode in truncate_partial_data_page().
And then we set page #0 to dirty, after this we can synchronize inode page with
page #0 at ->writepage().

But sometimes we will fail to operate page #0 in truncate_partial_data_page()
due to below reason:
a) if offset is zero, we will skip setting page #0 to dirty.
b) if page #0 is not uptodate, we will fail to update it as it has no mapping
data.

So with following operations, we will meet recent data which should be
truncated.

1.write inline data to file
2.sync first data page to inode page
3.truncate file size to 0
4.truncate file size to max_inline_size
5.echo 1 > /proc/sys/vm/drop_caches
6.read file --> meet original inline data which is remained in inode page.

This patch renames truncate_inline_data() to truncate_inline_inode() for code
readability, then use truncate_inline_inode() to truncate inline data in inode
page in truncate_blocks() and truncate page #0 in truncate_partial_data_page()
for fixing.

v2:
 o truncate partially #0 page in truncate_partial_data_page to avoid keeping
   old data in #0 page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:41 -07:00
Chao Yu
83dfe53c18 f2fs: fix reference leaks in f2fs_acl_create
Our f2fs_acl_create is copied and modified from posix_acl_create to avoid
deadlock bug when inline_dentry feature is enabled.

Now, we got reference leaks in posix_acl_create, and this has been fixed in
commit fed0b588be ("posix_acl: fix reference leaks in posix_acl_create")
by Omar Sandoval.
https://lkml.org/lkml/2015/2/9/5

Let's fix this issue in f2fs_acl_create too.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Changman Lee <cm224.lee@ssamsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:40 -07:00
Chao Yu
bda190760b f2fs: fix to calculate max length of contiguous free slots correctly
When lookuping for creating, we will try to record the level of current dentry
hash table if current dentry has enough contiguous slots for storing name of new
file which will be created later, this can save our lookup time when add a link
into parent dir.

But currently in find_target_dentry, our current length of contiguous free slots
is not calculated correctly. This make us leaving some holes in dentry block
occasionally, it wastes our space of dentry block.

Let's refactor the lookup flow for max slots as following to fix this issue:
a) increase max_len if current slot is free;
b) update max_slots with max_len if max_len is larger than max_slots;
c) reset max_len to zero if current slot is not free.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:40 -07:00
Wanpeng Li
57ed1e95ba f2fs: fix unlocked nat set cache operation
nm_i->nat_tree_lock is used to sync both the operations of nat entry
cache tree and nat set cache tree, however, it isn't held when flush
nat entries during checkpoint which lead to potential race, this patch
fix it by holding the lock when gang lookup nat set cache and delete
item from nat set cache.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:39 -07:00
Changman Lee
e0150392dd f2fs: cleanup statement about max orphan inodes calc
Through each macro, we can read the meaning easily.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:38 -07:00
Yuan Zhong
d9f46bb1a8 f2fs: remove unnecessary condition judgment
Remove the unnecessary condition judgment, because
'max_slots' has been initialized to '0' at the beginging
of the function, as following:
if (max_slots)
       *max_slots = 0;

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:38 -07:00
Yuan Zhong
b1f73b79d2 f2fs: set the correct place of initializing *res_page
The function 'find_in_inline_dir()' contain 'res_page'
as an argument. So, we should initiaize 'res_page' before
this function.

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:37 -07:00
Wanpeng Li
7fd97019b8 f2fs: reduce searching region of segmap when set free section
In __set_free we will check whether all segment are free in one section
when free one segment, in order to set section to free status. But the
searching region of segmap is from start segno to last segno of main
area, it's not necessary. So let's just only check all segment bitmap
of target section.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:36 -07:00
Wanpeng Li
fdf6c8be33 f2fs: fix extent cache memory leak
extent tree/node slab cache is created during f2fs insmod,
how, it isn't destroyed during f2fs rmmod, this patch fix
it by destroy extent tree/node slab cache once rmmod f2fs.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:35 -07:00
Jaegeuk Kim
d7196c5a32 f2fs: relocate Kconfig from misc filesystems
The f2fs has been shipped on many smartphone devices during a couple of years.
So, it is worth to relocate Kconfig into main page from misc filesystems for
developers to choose it more easily.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:35 -07:00
Jaegeuk Kim
7662916591 f2fs: report -ENOENT for unreached data indices
If inode has inline_data, it should report -ENOENT when accessing out-of-bound
region.
This is used by f2fs_fiemap which treats -ENOENT with no error.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:34 -07:00
Jaegeuk Kim
cff28521bb f2fs: clear append/update flags once fsync is done
When fsync is done through checkpoint, previous f2fs missed to clear append
and update flag. This patch fixes to clear them.

This was originally catched by Changman Lee before.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:33 -07:00
Jaegeuk Kim
d5669f7b9b f2fs: avoid to trigger writepage during POR
This patch doesn't make any effect on previous behavior, since
f2fs_write_data_page bypasses writing the page during POR.

But, the difference is that this patch avoids holding writepages mutex.
This is to avoid the following false warning, since this can happen only
when mount and shutdown are triggered at the same time.

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 4.0.0-rc1+ #3 Tainted: G           O
 -------------------------------------------------------
 kworker/u8:0/2270 is trying to acquire lock:
  (&sbi->gc_mutex){+.+.+.}, at: [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]

 but task is already holding lock:
  (&sbi->writepages){+.+...}, at: [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (&sbi->writepages){+.+...}:
        [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
        [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
        [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]
        [<ffffffff811c38c1>] do_writepages+0x21/0x50
        [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
        [<ffffffff8126e23a>] writeback_single_inode+0xea/0x1c0
        [<ffffffff8126e425>] write_inode_now+0x95/0xa0
        [<ffffffff81259dab>] iput+0x20b/0x3f0
        [<ffffffffa02c1c8b>] recover_data.constprop.14+0x26b/0xa80 [f2fs]
        [<ffffffffa02c2776>] recover_fsync_data+0x2b6/0x5e0 [f2fs]
        [<ffffffffa02a9744>] f2fs_fill_super+0xb24/0xb90 [f2fs]
        [<ffffffff8123d7f4>] mount_bdev+0x1a4/0x1e0
        [<ffffffffa02a3c85>] f2fs_mount+0x15/0x20 [f2fs]
        [<ffffffff8123e159>] mount_fs+0x39/0x180
        [<ffffffff8125e51b>] vfs_kern_mount+0x6b/0x160
        [<ffffffff81261554>] do_mount+0x204/0xbe0
        [<ffffffff8126223b>] SyS_mount+0x8b/0xe0
        [<ffffffff81863e6d>] system_call_fastpath+0x16/0x1b

 -> #1 (&sbi->cp_mutex){+.+...}:
        [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
        [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
        [<ffffffffa02acbf2>] write_checkpoint+0x42/0x1230 [f2fs]
        [<ffffffffa02a847d>] f2fs_sync_fs+0x9d/0x2a0 [f2fs]
        [<ffffffff81272f82>] sync_filesystem+0x82/0xb0
        [<ffffffff8123c214>] generic_shutdown_super+0x34/0x100
        [<ffffffff8123c5f7>] kill_block_super+0x27/0x70
        [<ffffffffa02a3c60>] kill_f2fs_super+0x20/0x30 [f2fs]
        [<ffffffff8123ca49>] deactivate_locked_super+0x49/0x80
        [<ffffffff8123d05e>] deactivate_super+0x4e/0x70
        [<ffffffff8125df63>] cleanup_mnt+0x43/0x90
        [<ffffffff8125e002>] __cleanup_mnt+0x12/0x20
        [<ffffffff810a82e4>] task_work_run+0xc4/0xf0
        [<ffffffff8101f0bd>] do_notify_resume+0x8d/0xa0
        [<ffffffff81864141>] int_signal+0x12/0x17

 -> #0 (&sbi->gc_mutex){+.+.+.}:
        [<ffffffff810e2866>] __lock_acquire+0x1ac6/0x1c90
        [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
        [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
        [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]
        [<ffffffffa02b5938>] f2fs_write_data_page+0x348/0x5b0 [f2fs]
        [<ffffffffa02af9da>] __f2fs_writepage+0x1a/0x50 [f2fs]
        [<ffffffff811c1b54>] write_cache_pages+0x274/0x6f0
        [<ffffffffa02b2630>] f2fs_write_data_pages+0xe0/0x3a0 [f2fs]
        [<ffffffff811c38c1>] do_writepages+0x21/0x50
        [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
        [<ffffffff8126d44a>] writeback_sb_inodes+0x32a/0x710
        [<ffffffff8126d8cf>] __writeback_inodes_wb+0x9f/0xd0
        [<ffffffff8126dcdb>] wb_writeback+0x3db/0x850
        [<ffffffff8126e848>] bdi_writeback_workfn+0x148/0x980
        [<ffffffff810a3782>] process_one_work+0x1e2/0x840
        [<ffffffff810a3f01>] worker_thread+0x121/0x460
        [<ffffffff810a9dc8>] kthread+0xf8/0x110
        [<ffffffff81863dbc>] ret_from_fork+0x7c/0xb0

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:32 -07:00
Changman Lee
e1235983e3 f2fs: add stat info for moved blocks by background gc
This patch is for looking into gc performance of f2fs in detail.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: fix build errors]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:32 -07:00
Chao Yu
b28c3f9493 f2fs: fix to issue small discard in real-time mode discard
Now in f2fs, we share functions and structures for batch mode and real-time mode
discard. For real-time mode discard, in shared function add_discard_addrs, we
will use uninitialized trim_minlen in struct cp_control to compare with length
of contiguous free blocks to decide whether skipping discard fragmented freespace
or not, this makes us ignore small discard sometimes. Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by : Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:31 -07:00
Sebastian Andrzej Siewior
7ecebe5e07 f2fs: add cond_resched() to sync_dirty_dir_inodes()
In a preempt-off enviroment a alot of FS activity (write/delete) I run
into a CPU stall:

| NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
| Modules linked in:
| CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G        W      3.19.0-00010-g10c11c51ffed #153
| Workqueue: writeback bdi_writeback_workfn (flush-179:0)
| task: df230000 ti: df23e000 task.ti: df23e000
| PC is at __submit_merged_bio+0x6c/0x110
| LR is at f2fs_submit_merged_bio+0x74/0x80
…
| [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
| Exception stack(0xdf23fb48 to 0xdf23fb90)
| fb40:                   deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
| fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
| fb80: c02191f0 c0218fa0 60000013 ffffffff
| [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
| [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
| [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
| [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
| [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
| [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
| [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
| [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
| [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
| [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
| [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
| [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
| [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
| [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
| [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
| [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
| [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)

As it turns out, the code loops in sync_dirty_dir_inodes() and waits for
others to make progress but since it never leaves the CPU there is no
progress made. At the time of this stall, there is also a rm process
blocked:
| rm              R running      0  1989   1774 0x00000000
| [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
| [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
| [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
| [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
| [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
| [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
| [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
| [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
| [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
| [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
| [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
| [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
| [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)

As explained by Jaegeuk Kim:
|This inode is the directory (c.f., do_rmdir) causing a infinite loop on
|sync_dirty_dir_inodes.
|The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
|inode is under eviction, it submits bios and do it again until eviction
|is finished.

This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO
is submitted so other thread can make progress.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[Jaegeuk Kim: change fs/f2fs to f2fs in subject as naming convention]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:30 -07:00
Wanpeng Li
14b4281776 f2fs: fix max orphan inodes calculation
cp_payload is introduced for sit bitmap to support large volume, and it is
just after the block of f2fs_checkpoint + nat bitmap, so the first segment
should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks.
However, current max orphan inodes calculation don't consider cp_payload,
this patch fix it by reducing the number of cp_payload from total blocks of
the first segment when calculate max orphan inodes.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:29 -07:00
Wanpeng Li
2b11a74b21 f2fs: don't need to collect dirty sit entries and flush journal when there's no dirty sit entries
Don't need to collect dirty sit entries and flush sit journal to sit
 entries when there's no dirty sit entries. This patch check dirty_sentries
 earlier just like flush_nat_entries.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:29 -07:00
Wanpeng Li
2bda542d59 f2fs: fix block_ops trace point
block operations is used to flush all dirty node and dentry blocks in
the page cache and suspend ordinary writing activities, however, there
are some facts such like cp error or mount read-only etc which lead to
block operations can't be invoked. Current trace point print block_ops
start premature even if block_ops doesn't have opportunity to execute.
This patch fix it by move block_ops trace point just before block_ops.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:28 -07:00
Jaegeuk Kim
b7f204cca4 f2fs: check its block allocation to avoid producing wrong dirty pages
If a page is cached but its block was deallocated, we don't need to make
the page dirty again by gc and truncate_partial_data_page.

In that case, it needs to check its block allocation all the time instead
of giving up-to-date page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:27 -07:00
Jaegeuk Kim
2bca1e2388 f2fs: clear page's up-to-date if block was deallocated
If page's on-disk block was deallocated, let's remove up-to-date flag to avoid
further access with wrong contents.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:26 -07:00
Wanpeng Li
3c64298579 f2fs: fix the number of orphan inode blocks
cp_pack_start_sum is calculated in do_checkpoint and is equal to
cpu_to_le32(1 + cp_payload_blks + orphan_blocks). The number of
orphan inode blocks is take advantage of by recover_orphan_inodes
to readahead meta pages and recovery inodes. However, current codes
forget to reduce the number of cp payload blocks when calculate
the number of orphan inode blocks. This patch fix it.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:26 -07:00
Wanpeng Li
551414861f f2fs: introduce macro __cp_payload
This patch introduce macro __cp_payload.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:08:25 -07:00
Jaegeuk Kim
1abff93d01 f2fs: support fs shutdown
This patch introduces a generic ioctl for fs shutdown, which was used by xfs.

If this shutdown is triggered, filesystem stops any further IOs according to the
following options.

1. FS_GOING_DOWN_FULLSYNC
 : this will flush all the data and dentry blocks, and do checkpoint before
   shutdown.

2. FS_GOING_DOWN_METASYNC
 : this will do checkpoint before shutdown.

3. FS_GOING_DOWN_NOSYNC
 : this will trigger shutdown as is.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-04-10 15:07:57 -07:00
Christoph Hellwig
e2e40f2c1e fs: move struct kiocb to fs.h
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-25 20:28:11 -04:00
Jaegeuk Kim
8fbc418f99 f2fs: avoid wrong error during recovery
During the roll-forward recovery, -ENOENT for f2fs_iget can be skipped.
So, this error value should not be propagated.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Jaegeuk Kim
1614091dc1 f2fs: remove obsolete code
This patch removes obsolete code in which summary variable is not needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Chao Yu
cb3bc9ee06 f2fs: use extent cache for dir
We update extent cache for all user inode of f2fs including dir inode, so this
patch gives another chance to try to get physical address of page from extent
cache for dir inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Chao Yu
91c5d9bce7 f2fs: switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache
This patch switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache
instead of f2fs_{lookup,update}_extent_tree or {lookup,update}_extent_info.

No functionality modification in this patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:48 -08:00
Chao Yu
62c8af651b f2fs: support fast lookup in extent cache
This patch adds a fast lookup path for rb-tree extent cache.

In this patch we add a recently accessed extent node pointer 'cached_en' in
extent tree. In lookup path of extent cache, we will firstly lookup the last
accessed extent node which cached_en points, if we do not hit in this node,
we will try to lookup extent node in rb-tree.

By this way we can avoid unnecessary slow lookup in rb-tree sometimes.

Note that, side-effect of this patch is that we will increase memory cost,
because we will store a pointer variable in each struct extent tree
additionally.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu
1ec4610c52 f2fs: add trace for rb-tree extent cache ops
This patch adds trace for lookup/update/shrink/destroy ops in rb-tree extent cache.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu
4bf6fd9fed f2fs: show extent tree, node stat info in debugfs
This patch add and show stat info of total memory footprint for extent tree,node
in debugfs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu
1dcc336b02 f2fs: enable rb-tree extent cache
This patch enables rb-tree based extent cache in f2fs.

When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.

By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.

Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)

Extent Hit Ratio:
		before		patched
Hit Ratio	121 / 1071	1071 / 1071

Performance:
		before		patched
real    	0m37.051s	0m35.556s
user    	0m0.040s	0m0.026s
sys     	0m2.990s	0m2.251s

Memory Cost:
		before		patched
Tree Count:	0		1 (size: 24 bytes)
Node Count:	0		45 (size: 1440 bytes)

v3:
 o retest and given more details of test result.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:47 -08:00
Chao Yu
8967215954 f2fs: add a mount option for rb-tree extent cache
This patch adds a mount option 'extent_cache' in f2fs.

It is try to use a rb-tree based extent cache to cache more mapping information
with less memory if this option is set, otherwise we will use the original one
extent info cache.

Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu
429511cdf8 f2fs: add core functions for rb-tree extent cache
This patch adds core functions including slab cache init function and
init/lookup/update/shrink/destroy function for rb-tree based extent cache.

Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail
design and implementation of extent cache.

Todo:
 * register rb-based extent cache shrink with mm shrink interface.

v2:
 o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h.
 o introduce __{attach,detach}_extent_node for code readability.
 o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert.
 o fix some coding style and typo issues.

v3:
 o fix oops due to using an unassigned pointer.
 o use list_del to remove extent node in shrink list.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: add static for some funcitons and declare in f2fs.h]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu
13054c548a f2fs: introduce infra macro and data structure of rb-tree extent cache
Introduce infra macro and data structure for rb-tree based extent cache:

Macros:
 * EXT_TREE_VEC_SIZE: indicate vector size for gang lookup in extent tree.
 * F2FS_MIN_EXTENT_LEN: indicate minimum length of extent managed in cache.
 * EXTENT_CACHE_SHRINK_NUMBER: indicate number of extent in cache will be shrunk.

Basic data structures for extent cache:
 * struct extent_tree: extent tree entry per inode.
 * struct extent_node: extent info node linked in extent tree.

Besides, adding new extent cache related fields in f2fs_sb_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu
7e4dde79df f2fs: introduce universal lookup/update interface for extent cache
In this patch, we do these jobs:
1. rename {check,update}_extent_cache to {lookup,update}_extent_info;
2. introduce universal lookup/update interface of extent cache:
f2fs_{lookup,update}_extent_cache including above two real functions, then
export them to function callers.

So after above cleanup, we can add new rb-tree based extent cache into exported
interfaces.

v2:
 o remove "f2fs_" for inner function {lookup,update}_extent_info suggested by
   Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:46 -08:00
Chao Yu
a2e7d1bfeb f2fs: introduce f2fs_map_bh to clean codes of check_extent_cache
This patch introduces f2fs_map_bh to clean codes of check_extent_cache.

v2:
 o cleanup f2fs_map_bh pointed out by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu
4d0b0bd438 f2fs: simplfy a field name in struct f2fs_extent,extent_info
Rename a filed name from 'blk_addr' to 'blk' in struct {f2fs_extent,extent_info}
as annotation of this field descripts its meaning well to us.

By this way, we can avoid long statement in code of following patches.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu
0c872e2ded f2fs: move ext_lock out of struct extent_info
Move ext_lock out of struct extent_info, then in the following patches we can
use variables with struct extent_info type as a parameter to pass pure data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu
3c0d84d6f1 f2fs: fix incorrectly stat number of inline data inode
We should stat inline data information for temp file in f2fs_tmpfile if we
enable inline_data feature.

Otherwise, inline data stat number will be wrong after this temp file is
evicted.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:45 -08:00
Chao Yu
97dc3fd2cb f2fs: use ->writepage in sync_meta_pages
This patch uses ->writepage of meta mapping in sync_meta_pages instead of
f2fs_write_meta_page, by this way, in its caller we can ignore any changes
(e.g. changing name) of this registered function.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:44 -08:00
Chao Yu
3b4d732a56 f2fs: introduce f2fs_update_dentry to clean up duplicated codes
This patch introduces f2fs_update_dentry to remove redundant code in
f2fs_add_inline_entry and __f2fs_add_link.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:44 -08:00
Chao Yu
1753396a0a f2fs: remove unused inline_dentry_addr
inline_dentry_addr is introduced with inline dentry feature without being used,
now we do not need to keep it for any reason, so remove it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-03-03 09:58:44 -08:00
Linus Torvalds
c7d7b98671 Merge tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "Major changes are to:
   - add f2fs_io_tracer and F2FS_IOC_GETVERSION
   - fix wrong acl assignment from parent
   - fix accessing wrong data blocks
   - fix wrong condition check for f2fs_sync_fs
   - align start block address for direct_io
   - add and refactor the readahead flows of FS metadata
   - refactor atomic and volatile write policies

  But most of patches are for clean-ups and minor bug fixes.  Some of
  them refactor old code too"

* tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (64 commits)
  f2fs: use spinlock for segmap_lock instead of rwlock
  f2fs: fix accessing wrong indexed data blocks
  f2fs: avoid variable length array
  f2fs: fix sparse warnings
  f2fs: allocate data blocks in advance for f2fs_direct_IO
  f2fs: introduce macros to convert bytes and blocks in f2fs
  f2fs: call set_buffer_new for get_block
  f2fs: check node page contents all the time
  f2fs: avoid data offset overflow when lseeking huge file
  f2fs: fix to use highmem for pages of newly created directory
  f2fs: introduce a batched trim
  f2fs: merge {invalidate,release}page for meta/node/data pages
  f2fs: show the number of writeback pages in stat
  f2fs: keep PagePrivate during releasepage
  f2fs: should fail mount when trying to recover data on read-only dev
  f2fs: split UMOUNT and FASTBOOT flags
  f2fs: avoid write_checkpoint if f2fs is mounted readonly
  f2fs: support norecovery mount option
  f2fs: fix not to drop mount options when retrying fill_super
  f2fs: merge flags in struct f2fs_sb_info
  ...
2015-02-12 19:28:50 -08:00
Chao Yu
1a118ccfd6 f2fs: use spinlock for segmap_lock instead of rwlock
rwlock can provide better concurrency when there are much more readers than
writers because readers can hold the rwlock simultaneously.

But now, for segmap_lock rwlock in struct free_segmap_info, there is only one
reader 'mount' from below call path:
->f2fs_fill_super
  ->build_segment_manager
    ->build_dirty_segmap
      ->init_dirty_segmap
        ->find_next_inuse
          read_lock
          ...
          read_unlock

Now that our concurrency can not be improved since there is no other reader for
this lock, we do not need to use rwlock_t type for segmap_lock, let's replace it
with spinlock_t type.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:51 -08:00
Jaegeuk Kim
f1a3b98e73 f2fs: fix accessing wrong indexed data blocks
This patch fixes the following test.

This causes:
 attempt to access beyond end of device
 sdb2: rw=16384, want=14413962000, limit=16777216

The reason is:
 - f2fs_write_begin
  - f2fs_convert_inline_inode returns -ENOSPC
  - f2fs_write_failed
   - truncate_blocks
    - truncate_partial_data_page
     - find_data_page
      - get_dnode_of_data returns wrong data index retrieved from inline_data
      - f2fs_submit_page_bio(wrong data index)
       - submit_bio(wrong data index)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:51 -08:00
Jaegeuk Kim
60a3b782b1 f2fs: avoid variable length array
Instead of using variable length array, this patch let preallocate memory for
them.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:50 -08:00
Jaegeuk Kim
29e7043f40 f2fs: fix sparse warnings
This patch resolves the following warnings.

include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)

fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static?
fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32
fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32

fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static?
fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static?

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:49 -08:00
Jaegeuk Kim
59b802e5a4 f2fs: allocate data blocks in advance for f2fs_direct_IO
This patch adds preallocation for data blocks to prepare f2fs_direct_IO.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:49 -08:00
Jaegeuk Kim
f7ef9b83b5 f2fs: introduce macros to convert bytes and blocks in f2fs
This patch adds two macros for transition between byte and block offsets.
Currently, f2fs only supports 4KB blocks, so use the default size for now.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:48 -08:00
Jaegeuk Kim
da17eece03 f2fs: call set_buffer_new for get_block
This patch fixes wrong handling of buffer_new flag in get_block.
If f2fs allocates new blocks and mapped buffer_head, it needs to set buffer_new
for the bh_result.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:47 -08:00
Jaegeuk Kim
aaf9607516 f2fs: check node page contents all the time
In get_node_page, if the page is up-to-date, we assumed that the page was not
reclaimed at all.
But, sometimes it was reported that its contents was missing.
So, just for sure, let's check its mapping and contents.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:46 -08:00
Chao Yu
2e023174a8 f2fs: avoid data offset overflow when lseeking huge file
xfstest generic/285 complains our issue in lseeking huge file.

Here is the detail output of generic/285:
"./check -f2fs tests/generic/285
Ran: generic/285
Failures: generic/285
Failed 1 of 1 tests

10. Test a huge file for offset overflow
10.01 SEEK_HOLE expected 65536 or 8589934592, got 65536.          succ
10.02 SEEK_HOLE expected 65536 or 8589934592, got 65536.          succ
10.03 SEEK_DATA expected 0 or 0, got 0.                           succ
10.04 SEEK_DATA expected 1 or 1, got 1.                           succ
10.05 SEEK_HOLE expected 8589934592 or 8589934592, got 0.         FAIL
10.06 SEEK_DATA expected 8589869056 or 8589869056, got 8589869056. succ
10.07 SEEK_DATA expected 8589869057 or 8589869057, got 8589869057. succ
10.08 SEEK_DATA expected 8589869056 or 8589869056, got 4294901760. FAIL"

The reason of this issue is:
We will calculate current offset through left shifting page-offset with
PAGE_CACHE_SHIFT bits, but our page-offset is a type of unsigned long, its size
is 4 bytes in 32-bits machine.

So if our page-offset is bigger than (1 << 32 / pagesize - 1), result of left
shifting will overflow.

Let's fix this issue by casting type of page-offset to type of current offset:
loff_t.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:46 -08:00
Chao Yu
560d4672e2 f2fs: fix to use highmem for pages of newly created directory
In commit a78186ebe5 ("f2fs: use highmem for directory pages"), we have set
__GFP_HIGHMEM into dir mapping's gfp flag in f2fs_iget, so high address memory
could be used for these existing dir's page.

But we forgot to set flag for newly created dir, due to this reason, our newly
created dir pages could not be allocated from high address memory. Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:45 -08:00
Jaegeuk Kim
bba681cbb2 f2fs: introduce a batched trim
This patch introduces a batched trimming feature, which submits split discard
commands.

This is to avoid long latency due to huge trim commands.
If fstrim was triggered ranging from 0 to the end of device, we should lock
all the checkpoint-related mutexes, resulting in very long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:44 -08:00
Chao Yu
487261f39b f2fs: merge {invalidate,release}page for meta/node/data pages
This patch merges ->{invalidate,release}page function for meta/node/data pages.

After this, duplication of codes could be removed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:44 -08:00
Jaegeuk Kim
d24bdcbfc6 f2fs: show the number of writeback pages in stat
This patch adds the # of writeback pages in stat info.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:43 -08:00
Jaegeuk Kim
f68daeebba f2fs: keep PagePrivate during releasepage
If PagePrivate is removed by releasepage, f2fs loses counting dirty pages.

e.g., try_to_release_page will not release page when the page is dirty,
but our releasepage removes PagePrivate.

    [<ffffffff81188d75>] try_to_release_page+0x35/0x50
    [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0
    [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs]
    [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs]
    [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs]
    [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170
    [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350
    [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0
    [<ffffffff81203081>] new_sync_write+0x81/0xb0
    [<ffffffff81203837>] vfs_write+0xb7/0x1f0
    [<ffffffff81204459>] SyS_write+0x49/0xb0
    [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:42 -08:00
Jaegeuk Kim
081d78c2fc f2fs: should fail mount when trying to recover data on read-only dev
If device is read-only, we should not proceed data recovery.
But, if the previous checkpoint was done by normal clean shutdown, it's safe to
proceed the recovery, since there will be no data to be recovered.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:42 -08:00
Jaegeuk Kim
119ee91445 f2fs: split UMOUNT and FASTBOOT flags
This patch adds FASTBOOT flag into checkpoint as follows.

 - CP_UMOUNT_FLAG is set when system is umounted.
 - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
   was done.

So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:41 -08:00
Jaegeuk Kim
11504a8e7e f2fs: avoid write_checkpoint if f2fs is mounted readonly
Do not change any partition when f2fs is changed to readonly mode.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:40 -08:00
Jaegeuk Kim
2d834bf9ac f2fs: support norecovery mount option
This patch adds a mount option, norecovery, which is mostly same as
disable_roll_forward. The only difference is that norecovery should be activated
with read-only mount option.

This can be used when user wants to check whether f2fs is mountable or not
without any recovery process. (e.g., xfstests/200)

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:40 -08:00
Jaegeuk Kim
dabc4a5c60 f2fs: fix not to drop mount options when retrying fill_super
If wrong mount option was requested, f2fs tries to fill_super again.
But, during the next trial, f2fs has no valid mount options, since
parse_options deleted all the separators in the original string.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:39 -08:00
Chao Yu
caf0047e7e f2fs: merge flags in struct f2fs_sb_info
Currently, there are several variables with Boolean type as below:

struct f2fs_sb_info {
...
	int s_dirty;
	bool need_fsck;
	bool s_closing;
...
	bool por_doing;
...
}

For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
   type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
   up.

So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
   to operate flags for sbi.

After that, above issues will be fixed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:38 -08:00
Chao Yu
88dd893419 f2fs: clean up {in,de}create_sleep_time
Use pointer parameter @wait to pass result in {in,de}create_sleep_time for
cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:37 -08:00
Chao Yu
feeb0debfb f2fs: make truncate_inline_date static
1. make truncate_inline_date static;
2. remove parameter @from of truncate_inline_date as callers only pass zero.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:37 -08:00
Kinglong Mee
3b6709b771 f2fs: fix a bug of inheriting default ACL from parent
Introduced by a6dda0e63e
"f2fs: use generic posix ACL infrastructure".

When testing default acl, gets in recent kernel (3.19.0-rc5),
user::rwx
group::r-x
other::r-x
default:user::rwx
default:group::r-x
default:group:root:rwx
default😷:rwx
default:other::r-x

]# getfacl testdir/
user::rwx
group::rwx
                // missing an acl "group:root:rwx" inherited from parent
other::r-x
default:user::rwx
default:group::r-x
default:group:root:rwx
default😷:rwx
default:other::r-x

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:36 -08:00
Chao Yu
f28e503429 f2fs: use f2fs_radix_tree_insert to clean codes
No modification in functionality, just clean codes with f2fs_radix_tree_insert.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:35 -08:00
Chao Yu
d49f3e8902 f2fs: add F2FS_IOC_GETVERSION support
In this patch we add the FS_IOC_GETVERSION ioctl for getting i_generation from
inode, after that, users can list file's generation number by using "lsattr -v".

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:35 -08:00
Jaegeuk Kim
bc4a1f873b f2fs: leave comment for code readability
During the recovery, any xattr blocks should not be found, since they are
written into cold log, not the warm node chain.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:34 -08:00
Chao Yu
1601839e9e f2fs: fix to release count of meta page in ->invalidatepage
We will encounter deadloop in below scenario:

1. increase page count for F2FS_DIRTY_META type in following path:
->recover_fsync_data
  ->recover_data
    ->do_recover_data
      ->recover_data_page
        ->change_curseg
          ->write_sum_page
            ->set_page_dirty
2. fail in recover_data()
3. invalidate meta pages in truncate_inode_pages_final without decreasing page
   count.
4. deadloop when sync_meta_pages as page count will always be non-zero.

message:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!

 [<c1129a37>] pagevec_lookup_tag+0x27/0x30
 [<f0e774c7>] sync_meta_pages+0x87/0x160 [f2fs]
 [<f0e86dd9>] recover_fsync_data+0xeb9/0xf10 [f2fs]
 [<f0e75398>] f2fs_fill_super+0x888/0x980 [f2fs]
 [<c11733ca>] mount_bdev+0x16a/0x1a0
 [<f0e7180f>] f2fs_mount+0x1f/0x30 [f2fs]
 [<c1173da6>] mount_fs+0x36/0x170
 [<c118b6f5>] vfs_kern_mount+0x55/0xe0
 [<c118d63f>] do_mount+0x1df/0x9f0
 [<c118e110>] SyS_mount+0x70/0xb0
 [<c15a0c48>] sysenter_do_call+0x12/0x12

To avoid page count leak, let's add ->invalidatepage and ->releasepage in
f2fs_meta_aops as f2fs_node_aops to release meta page count correctly.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:33 -08:00
Jaegeuk Kim
85dc2f2c6c f2fs: do checkpoint when umount flag is not set
If the previous checkpoint was done without CP_UMOUNT flag, it needs to do
checkpoint with CP_UMOUNT for the next fast boot.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:33 -08:00
Jaegeuk Kim
30a5537f9a f2fs: trigger correct checkpoint during umount
This patch fixes to trigger checkpoint with umount flag when kill_sb was called.
In kill_sb, f2fs_sync_fs was finally called, but at this time, f2fs can't do
checkpoint with CP_UMOUNT.
After then, f2fs_put_super is not doing checkpoint, since it is not dirty.

So, this patch adds a flag to indicate f2fs_sync_fs is called during umount.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:32 -08:00
Jaegeuk Kim
6f0aacbc3c f2fs: update memory footprint information
This patch adds missing memory usages, and splits them in detail.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:31 -08:00
Chao Yu
9066c6a7eb f2fs: fix wrong memory footprint statistics in debugfs
Our value of memory footprint statistics showed in debugfs is not calculated
correctly. Fix it in this patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:31 -08:00
Jaegeuk Kim
871f599f4a f2fs: avoid infinite loop on cp_error
If cp_error is set, we should avoid all the infinite loop.
In f2fs_sync_file, there is a hole, and this patch fixes that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:30 -08:00
Kirill A. Shutemov
d83a08db5b mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub
Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:30 -08:00
kbuild test robot
08e4126e1e f2fs: pids_lock can be static
fs/f2fs/trace.c:19:12: sparse: symbol 'pids_lock' was not declared. Should it be static?

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:29 -08:00
Jaegeuk Kim
351f4fba84 f2fs: add f2fs_destroy_trace_ios to free radix tree
This patch removes radix tree after finishing tracing IOs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim
c05086506f f2fs: add spin_lock to cover radix operations in IO tracer
This patch adds spin_lock to cover radix tree operations in IO tracer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim
dd4e4b59b1 f2fs: add nat/sit entries into status
This patch adds NAT/SIT entry informations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim
7aed0d4537 f2fs: free radix_tree_nodes used by nat_set entries
In the normal case, the radix_tree_nodes are freed successfully.
But, when cp_error was detected, we should destroy them forcefully.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:28 -08:00
Jaegeuk Kim
df1991391f f2fs: fix wrong unlock_page call
This patch removes wrongly called unlock_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Chao Yu
9e5ba77fdb f2fs: get rid of kzalloc in __recover_inline_status
We use kzalloc to allocate memory in __recover_inline_status, and use this
all-zero memory to check the inline date content of inode page by comparing
them. This is low effective and not needed, let's check inline date content
directly.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: make the code more neat]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim
38aa0889b2 f2fs: align direct_io'ed data to section
This patch aligns the start block address of a file for direct io to the f2fs's
section size.

Some flash devices manage an over 4KB-sized page as a write unit, and if the
direct_io'ed data are written but not aligned to that unit, the performance can
be degraded due to the partial page copies.

Thus, since f2fs has a section that is well aligned to FTL units, we can align
the block address to the section size so that f2fs avoids this misalignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim
41ef94b35c f2fs: remove uncovered code path
This patch removes unnecessary function calls.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:27 -08:00
Jaegeuk Kim
3547ea961d f2fs: avoid potential unnecessary codes
This patch relocates some operations to avoid unnecessary execution.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Jaegeuk Kim
e1509cf294 f2fs: clean up to remove parameter
This patch uses dn->data_blkaddr as a parameter for the destination block
address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu
062920734c f2fs: reuse inode_entry_slab in gc procedure for using slab more effectively
There are two slab cache inode_entry_slab and winode_slab using the same
structure as below:

struct dir_inode_entry {
	struct list_head list;	/* list head */
	struct inode *inode;	/* vfs inode pointer */
};

struct inode_entry {
	struct list_head list;
	struct inode *inode;
};

It's a little waste that the two cache can not share their memory space for each
other.
So in this patch we remove one redundant winode_slab slab cache, then use more
universal name struct inode_entry as remaining data structure name of slab,
finally we reuse the inode_entry_slab to store dirty dir item and gc item for
more effective.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu
2ace38e00e f2fs: cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
Cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
as one parameter.

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:26 -08:00
Chao Yu
3e1c8f125e f2fs: cleanup trace event of f2fs_submit_page_{m,}bio with DECLARE_EVENT_CLASS
This patch adds missing parameter _type_ for trace_f2fs_submit_page_bio, then
use DECLARE_EVENT_CLASS/DEFINE_EVENT_CONDITION pair to cleanup some trace event
code related to f2fs_submit_page_{m,}bio.

Additionally, after we remove redundant code, size of code can be reduced:
   text    data     bss     dec     hex filename
 176787    8712      56  185555   2d4d3 f2fs.ko.org
 174408    8648      56  183112   2cb48 f2fs.ko

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim
09eb483e89 f2fs: fix missing cold bit during recovery
In do_recover_data, we find and update previous node pages after updating
its new block addresses.
After then, we call fill_node_footer without reset field, we erase its
cold bit so that this new cold node block is written to wrong log area.
This patch fixes not to miss its old flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Changman Lee
b9a2c25207 f2fs: add block count by in-place-update in stat info
This patch adds block count by in-place-update in stat.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim
dd802406e3 f2fs: avoid double lock for cp_rwsem
The __f2fs_add_link is covered by cp_rwsem all the time.
This calls init_inode_metadata, which conducts some acl operations including
memory allocation with GFP_KERNEL previously.
But, under memory pressure, f2fs_write_data_page can be called, which also
grabs cp_rwsem too.

In this case, this incurs a deadlock pointed by Chao.
Thread #1        Thread #2
 down_read
                 down_write
  down_read
 -> here down_read should wait forever.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:25 -08:00
Jaegeuk Kim
db9f7c1a95 f2fs: activate f2fs_trace_ios
This patch activates f2fs_trace_ios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim
9e4ded3f30 f2fs: activate f2fs_trace_pid
This patch activates f2fs_trace_pid.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim
0e689d0369 f2fs: add key functions for f2fs_io_tracer
This patch adds two key functions to trace process ids and IOs.
The basic idea is to
1. remain process ids, pids, in page->private.
2. show pids in IO traces.

So, later we can retrieve process information according to IO traces.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim
63f92ddc8a f2fs: add f2fs_io_tracer support
This patch adds:
 o initial trace.c and trace.h with skeleton functions
 o Kconfig and Makefile to activate this feature

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:24 -08:00
Jaegeuk Kim
cf04e8eb55 f2fs: use f2fs_io_info to clean up messy parameters during IO path
This patch cleans up parameters on IO paths.
The key idea is to use f2fs_io_info adding a parameter, block address, and then
use this structure as parameters.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu
9ecf4b80bd f2fs: use ra_meta_pages to simplify readahead code in restore_node_summary
Use more common function ra_meta_pages() with META_POR to readahead node blocks
in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the
readahead code there, and also we can remove unused function ra_sum_pages().

changes from v2:
 o use invalidate_mapping_pages as before suggested by Changman Lee.
changes from v1:
 o fix one bug when using truncate_inode_pages_range which is pointed out by
   Jaegeuk Kim.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu
5c27f4ee44 f2fs: merge two uchar variable in struct node_info to reduce memory cost
This patch moves one member of struct nat_entry: _flag_ to struct node_info,
so _version_ in struct node_info and _flag_ which are unsigned char type will
merge to one 32-bit space in register/memory. So the size of nat_entry will be
reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40
bytes to 32 bytes) and then slab memory using by f2fs will be reduced.

changes from v2:
 o update description of memory usage gain for 64-bit machine suggested by
   Changman Lee.
changes from v1:
 o introduce inline copy_node_info() to copy valid data from node info suggested
   by Jaegeuk Kim, it can avoid bug.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Chao Yu
3fa06d7bc9 f2fs: readahead contiguous current summary blocks in checkpoint
Let's add readahead code for reading contiguous compact/normal summary blocks
in checkpoint, then we will gain better performance in mount procedure.

Changes from v1
  o remove inappropriate 'unlikely' in npages_for_summary_flush.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:23 -08:00
Jaegeuk Kim
5df1f1da7a f2fs: use missing the use of f2fs_kunmap_page
This patch calls f2fs_kunmap_page which I missed before.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim
042b7816aa f2fs: remove unnecessary call to invalidate inmemory pages
Now we use inmemory pages for atomic write only and provide abort procedure,
we don't need to truncate them explicitly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim
d7bc2484b8 f2fs: fix small discards not to issue redundantly
The ckpt_valid_map and cur_valid_map are synced by seg_info_to_raw_sit.

In the case of small discards, the candidates are selected before sync,
while fitrim selects candidates after sync.

So, for small discards, we need to add candidates only just being obsoleted.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim
1e84371ffe f2fs: change atomic and volatile write policies
This patch adds two new ioctls to release inmemory pages grabbed by atomic
writes.
 o f2fs_ioc_abort_volatile_write
  - If transaction was failed, all the grabbed pages and data should be written.
 o f2fs_ioc_release_volatile_write
  - This is to enhance the performance of PERSIST mode in sqlite.

In order to avoid huge memory consumption which causes OOM, this patch changes
volatile writes to use normal dirty pages, instead blocked flushing to the disk
as long as system does not suffer from memory pressure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:22 -08:00
Jaegeuk Kim
70c640b1d6 f2fs: don't need to call lock_op and lock_page for abort
We don't need to call lock_op and lock_page at the aborting path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Jaegeuk Kim
88a70a69c0 f2fs: fix wrong condition check to trigger f2fs_sync_fs
If there is not enough available memory, we need to trigger f2fs_sync_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Jaegeuk Kim
cd52b6368f f2fs: remove checking dirty_exceed
We don't need to force to write dirty_exceeded for f2fs_balance_fs_bg.
This flag was only meaningful to write bypassing conditions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-01-09 17:02:21 -08:00
Chao Yu
635aee1fef f2fs: avoid to ra unneeded blocks in recover flow
To improve recovery speed, f2fs try to readahead many contiguous blocks in warm
node segment, but for most time, abnormal power-off do not occur frequently, so
when mount a normal power-off f2fs image, by contrary ra so many blocks and then
invalid them will hurt the performance of mount.
It's better to just ra the first next-block for normal condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 14:19:09 -08:00
Chao Yu
66b00c1867 f2fs: introduce is_valid_blkaddr to cleanup codes in ra_meta_pages
This patch does cleanup work, it introduces is_valid_blkaddr() to include
verification code for blkaddr with upper and down boundary value which were in
ra_meta_pages previous.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 14:19:08 -08:00
Chao Yu
13da549460 f2fs: fix to enable readahead for SSA/CP blocks
1.We use zero as upper boundary value for ra SSA/CP blocks, we will skip
readahead as verification failure with max number, it causes low performance.
2.Low boundary value is not accurate for SSA/CP/POR region verification, so
these values need to be redefined.

This patch fixes above issues.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 14:19:07 -08:00
Chao Yu
03e14d522e f2fs: use atomic for counting inode with inline_{dir,inode} flag
As inline_{dir,inode} stat is increased/decreased concurrently by multi threads,
so the value is not so accurate, let's use atomic type for counting accurately.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:54:59 -08:00
Changman Lee
51455b1938 f2fs: cleanup path to need cp at fsync
Added some commentaries for code readability and cleaned up if-statement
clearly.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:40:22 -08:00
Changman Lee
9c7bb70212 f2fs: check if inode state is dirty at fsync
If inode state is dirty, go straight to write.

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:37:13 -08:00
Jaegeuk Kim
8dcf2ff721 f2fs: count the number of inmemory pages
This patch adds counting # of inmemory pages in the page cache.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:15 -08:00
Jaegeuk Kim
126622343a f2fs: release inmemory pages when the file was closed
If file is closed, let's drop inmemory pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:15 -08:00
Jaegeuk Kim
0722b1011a f2fs: set page private for inmemory pages for truncation
The inmemory pages should be handled by invalidate_page since it needs to be
released int the truncation path.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:14 -08:00
Jaegeuk Kim
9d1015dd4c f2fs: count inline_xx in do_read_inode
In do_read_inode, if we failed __recover_inline_status, the inode has inline
flag without increasing its count.
Later, f2fs_evict_inode will decrease the count, which causes -1.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:13 -08:00
Jaegeuk Kim
9be32d72be f2fs: do retry operations with cond_resched
This patch revists retrial paths in f2fs.
The basic idea is to use cond_resched instead of retrying from the very early
stage.

Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-08 10:35:05 -08:00
Jaegeuk Kim
769ec6e5b7 f2fs: call radix_tree_preload before radix_tree_insert
This patch tries to fix:

 BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384
  (radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200)
  (radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c)
  (gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400)
  (f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c)
  (gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac)
  (kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c)

The reason is that f2fs calls radix_tree_insert under enabled preemption.
So, before calling it, we need to call radix_tree_preload.

Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or
semaphore to cover the radix tree operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-05 09:51:04 -08:00
Jaegeuk Kim
8b26ef98da f2fs: use rw_semaphore for nat entry lock
Previoulsy, we used rwlock for nat_entry lock.
But, now we have a lot of complex operations in set_node_addr.
(e.g., allocating kernel memories, handling radix_trees, and so on)

So, this patches tries to change spinlock to rw_semaphore to give CPUs to other
threads.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-03 21:23:29 -08:00
Jaegeuk Kim
4634d71ed1 f2fs: fix missing kmem_cache_free
This patch fixes missing kmem_cache_free when handling errors.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-03 16:40:28 -08:00
Changman Lee
7dda2af83b f2fs: more fast lookup for gc_inode list
If there are many inodes that have data blocks in victim segment,
it takes long time to find a inode in gc_inode list.
Let's use radix_tree to reduce lookup time.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-02 11:02:50 -08:00
Changman Lee
9c01503f4d f2fs: cleanup redundant macro
We've already made fi and sbi for inode. Let's avoid duplicated work.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-01 14:16:50 -08:00
Chao Yu
cd34e2969b f2fs: fix to return correct error number in f2fs_write_begin
Fix the wrong error number in error path of f2fs_write_begin.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-12-01 13:56:02 -08:00
Changman Lee
31a3268839 f2fs: cleanup if-statement of phase in gc_data_segment
Little cleanup to distinguish each phase easily

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: modify indentation for code readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-27 20:30:17 -08:00
Jaegeuk Kim
95f5b0fc5e f2fs: fix to recover converted inline_data
If an inode has converted inline_data which was written to the disk, we should
set its inode flag for further fsync so that this inline_data can be recovered
from sudden power off.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 18:08:00 -08:00
Jaegeuk Kim
158c194c37 f2fs: make clean the page before writing
If a page is set to be written to the disk, we can make clean the page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 17:33:31 -08:00
Changman Lee
80ec2e914d f2fs: no more dirty_nat_entires when flushing
After flushing dirty nat entries, it has to be no more dirty nat
entries.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 17:26:36 -08:00
Changman Lee
20d047c876 f2fs: check dirty_nat_cnt before flushing nat entries in journal
It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in
journal. And although there are rooms for dirty nat entires if dirty_nat_cnt
is zero, it's also meaningless to check __has_cursum_space.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 17:26:34 -08:00
Jaegeuk Kim
5f72739583 f2fs: fix deadlock during inline_data conversion
A deadlock can be occurred:
Thread 1]                             Thread 2]
 - f2fs_write_data_pages              - f2fs_write_begin
   - lock_page(page #0)
                                        - grab_cache_page(page #X)
                                        - get_node_page(inode_page)
                                        - grab_cache_page(page #0)
                                          : to convert inline_data
   - f2fs_write_data_page
     - f2fs_write_inline_data
       - get_node_page(inode_page)

In this case, trying to lock inode_page and page #0 causes deadlock.
In order to avoid this, this patch adds a rule for this locking policy,
which is that page #0 should be locked followed by inode_page lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 12:08:30 -08:00
Markus Elfring
ce3e6d25f3 f2fs: fix typos for the word "destroy" in jump labels
Two jump labels were adjusted in the implementation of the
create_node_manager_caches() function because these identifiers
contained typos.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-25 12:08:22 -08:00
Jaegeuk Kim
0341845efc f2fs: fix livelock calling f2fs_iget during f2fs_evict_inode
In f2fs_evict_inode,
 commit_inmemory_pages
   f2fs_gc
     f2fs_iget
       iget_locked
         -> wait for inode free

Here, if the inode is same as the one to be evicted, f2fs should wait forever.
Actually, we should not call f2fs_balance_fs during f2fs_evict_inode to avoid
this.

But, the commit_inmem_pages calls f2fs_balance_fs by default, even if
f2fs_evict_inode wants to free inmemory pages only.

Hence, this patch adds to trigger f2fs_balance_fs only when there is something
to write.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:51:57 -08:00
Jaegeuk Kim
9486ba442b f2fs: introduce f2fs_dentry_kunmap to clean up
This patch introduces f2fs_dentry_kunmap to clean up dirty codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:51:53 -08:00
Changman Lee
c9ee00857c f2fs: fix wrong data structure when create slab
It used nat_entry_set when create slab for sit_entry_set.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:48:49 -08:00
Jaegeuk Kim
09b8b3c839 f2fs: call flush_dcache_page when the page was updated
Whenever f2fs updates mapped pages, it needs to call flush_dcache_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-23 21:48:31 -08:00
Jaegeuk Kim
857dc4e059 f2fs: write SSA pages under memory pressure
Under memory pressure, we don't need to skip SSA page writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:33 -08:00
Jaegeuk Kim
27c6bd60ac f2fs: submit bio for node blocks in the reclaim path
If a node page is request to be written during the reclaiming path, we should
submit the bio to avoid pending to recliam it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:32 -08:00
Chao Yu
67298804f3 f2fs: introduce struct inode_management to wrap inner fields
Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO,
and we manage fields related to inode cache separately in struct f2fs_sb_info
for each inode cache type.
This makes codes a bit messy, so that this patch intorduce a new struct
inode_management to wrap inner fields as following which make codes more neat.

/* for inner inode cache management */
struct inode_management {
	struct radix_tree_root ino_root;	/* ino entry array */
	spinlock_t ino_lock;			/* for ino entry lock */
	struct list_head ino_list;		/* inode list head */
	unsigned long ino_num;			/* number of entries */
};

struct f2fs_sb_info {
	...
	struct inode_management im[MAX_INO_ENTRY];      /* manage inode cache */
	...
}

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:32 -08:00
Chao Yu
aba291b3d8 f2fs: remove unneeded check code with option in f2fs_remount
Because we have checked the contrary condition in case of "if" judgment, we do
not need to check the condition again in case of "else" judgment. Let's remove
it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:31 -08:00
Chao Yu
6c02993203 f2fs: avoid unable to restart gc thread in remount
In f2fs_remount, we will stop gc thread and set need_restart_gc as true when new
option is set without BG_GC, then if any error occurred in the following
procedure, we can restore to start the gc thread.
But after that, We will fail to restore gc thread in start_gc_thread as BG_GC is
not set in new option, so we'd better move this condition judgment out of
start_gc_thread to fix this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-19 22:49:30 -08:00
Jaegeuk Kim
8cdcb71322 f2fs: put the inode page when error was occurred
We should put the inode page when error was occurred.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-18 17:04:33 -08:00
Jaegeuk Kim
6d20aff83c f2fs: fix to call put_page at the error handling routine
The locked page should be released before returning the function.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-18 17:02:47 -08:00
Jaegeuk Kim
92dffd0179 f2fs: convert inline_data when i_size becomes large
If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode.
Otherwise, we can make some dirty pages during the truncation, and those pages
will be written through f2fs_write_data_page.
At that moment, the inode has still inline_data, so that it tries to write non-
zero pages into inline_data area.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-11 14:16:12 -08:00
Jaegeuk Kim
764d2c8040 f2fs: fix deadlock to grab 0'th data page
The scenario is like this.

One trhead triggers:
  f2fs_write_data_pages
    lock_page
    f2fs_write_data_page
      f2fs_lock_op  <- wait

The other thread triggers:
  f2fs_truncate
    truncate_blocks
      f2fs_lock_op
        truncate_partial_data_page
          lock_page  <- wait for locking the page

This patch resolves this bug by relocating truncate_partial_data_page.
This function is just to truncate user data page and not related to FS
consistency as well.
And, we don't need to call truncate_inline_data. Rather than that,
f2fs_write_data_page will finally update inline_data later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-11 14:15:48 -08:00
Jaegeuk Kim
57e2a2c0a6 f2fs: reduce the number of inline_data inode before clearing it
The # of inline_data inode is decreased only when it has inline_data.
After clearing the flag, we can't decreased the number.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 16:29:14 -08:00
Jaegeuk Kim
b7e1d80003 f2fs: implement -o dirsync
If a mount option has dirsync, we should call checkpoint for all the directory
operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 06:51:39 -08:00
Jaegeuk Kim
510184c89f f2fs: do not skip any writes under memory pressure
Under memory pressure, let's avoid skipping data writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 06:51:38 -08:00
Jaegeuk Kim
2f97c326bf f2fs: write node pages if checkpoint is not doing
It needs to write node pages if checkpoint is not doing in order to avoid
memory pressure.

Reviewed-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-10 06:51:28 -08:00
Jaegeuk Kim
e5e7ea3c86 f2fs: control the memory footprint used by ino entries
This patch adds to control the memory footprint used by ino entries.
This will conduct best effort, not strictly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-06 15:24:46 -08:00
Jaegeuk Kim
8c402946f0 f2fs: introduce the number of inode entries
This patch adds to monitor the number of ino entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-06 15:17:43 -08:00
Jaegeuk Kim
a344b9fda0 f2fs: disable roll-forward when active_logs = 2
The roll-forward mechanism should be activated when the number of active
logs is not 2.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-05 20:05:53 -08:00
Jaegeuk Kim
d5053a34a9 f2fs: introduce -o fastboot for reducing booting time only
If a system wants to reduce the booting time as a top priority, now we can
use a mount option, -o fastboot.
With this option, f2fs conducts a little bit slow write_checkpoint, but
it can avoid the node page reads during the next mount time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:15 -08:00
Jaegeuk Kim
6a8f8ca582 f2fs: avoid race condition in handling wait_io
__submit_merged_bio    f2fs_write_end_io        f2fs_write_end_io
                       wait_io = X              wait_io = x
                       complete(X)              complete(X)
                       wait_io = NULL
wait_for_completion()
free(X)
                                                 spin_lock(X)
                                                 kernel panic

In order to avoid this, this patch removes the wait_io facility.
Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:14 -08:00
Jaegeuk Kim
adf4983bde f2fs: send discard commands in larger extent
If there is a chance to make a huge sized discard command, we don't need
to split it out, since each blkdev_issue_discard should wait one at a
time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:13 -08:00
Jaegeuk Kim
b3d208f96d f2fs: revisit inline_data to avoid data races and potential bugs
This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
 f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-04 17:34:11 -08:00
Jan Kara
1f7732fe6c f2fs: remove pointless bit testing in f2fs_delete_entry()
There's no point in using test_and_clear_bit_le() when we don't use the
return value of the function. Just use clear_bit_le() instead.

Coverity-id: 1016434
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:38 -08:00
Jaegeuk Kim
e3fb1b794b f2fs: do not discard data protected by the previous checkpoint
We should not discard any data protected by the previous checkpoint all
the time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:38 -08:00
Jaegeuk Kim
427a45c8e2 f2fs: flush_dcache_page for inline data
When reading inline data, we should call flush_dcache_page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Jaegeuk Kim
ca4b02eeed f2fs: call write_checkpoint under disabled gc
During the write_checkpoint, we should avoid f2fs_gc trigger to avoid any
filesystem consistency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Jan Kara
9234f3190b f2fs: fix possible data corruption in f2fs_write_begin()
f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has
inline data. However it uses its contents to decide whether it should
just zero out the page or load data to it. Thus if we are unlucky we can
zero out page contents instead of loading inline data into a page.

CC: stable@vger.kernel.org
CC: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Gu Zheng
2cc2218611 f2fs: use current_sit_addr to replace the open code
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:37 -08:00
Gu Zheng
52aca07425 f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit
Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean
set/clear bit and return the old value, for better readability.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng
1730663cb7 f2fs: set raw_super default to NULL to avoid compile warning
Set raw_super default to NULL to avoid the possibly used
uninitialized warning, though we may never hit it in fact.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng
c6ac4c0ec4 f2fs: introduce f2fs_change_bit to simplify the change bit logic
Introduce f2fs_change_bit to simplify the change bit logic in
function set_to_next_nat{sit}.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng
fa528722d0 f2fs: remove the redundant function cond_clear_inode_flag
Use clear_inode_flag to replace the redundant cond_clear_inode_flag.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:36 -08:00
Gu Zheng
8a2d0ace3a f2fs: remove the seems unneeded argument 'type' from __get_victim
Remove the unneeded argument 'type' from __get_victim, use
NO_CHECK_TYPE directly when calling v_ops->get_victim().

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jan Kara
9bd27ae4aa f2fs: avoid returning uninitialized value to userspace from f2fs_trim_fs()
If user specifies too low end sector for trimming, f2fs_trim_fs() will
use uninitialized value as a number of trimmed blocks and returns it to
userspace. Initialize number of trimmed blocks early to avoid the
problem.

Coverity-id: 1248809
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jaegeuk Kim
d64948a4df f2fs: declare f2fs_convert_inline_dir as a static function
This patch declares f2fs_convert_inline_dir as a static function, which was
reported by kbuild test robot.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jaegeuk Kim
f1e33a041e f2fs: use kmap_atomic instead of kmap
For better performance, we need to use kmap_atomic instead of kmap.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:35 -08:00
Jaegeuk Kim
062a3e7ba7 f2fs: reuse make_empty_dir code for inline_dentry
This patch introduces do_make_empty_dir to mitigate code redundancy
for inline_dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim
7b3cd7d6f0 f2fs: introduce f2fs_dentry_ptr structure for code clean-up
This patch introduces f2fs_dentry_ptr structure for the use of a function
parameter in inline_dentry operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim
5ab18570b8 f2fs: should not truncate any inline_dentry
If the inode has inline_dentry, we should not truncate any block indices.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim
38594de767 f2fs: reuse core function in f2fs_readdir for inline_dentry
This patch introduces a core function, f2fs_fill_dentries, to remove
redundant code in f2fs_readdir and f2fs_read_inline_dir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:34 -08:00
Jaegeuk Kim
e7a2bf2283 f2fs: fix counting inline_data inode numbers
This patch fixes wrongly counting inline_data inode numbers.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim
3289c061c5 f2fs: add stat info for inline_dentry inodes
This patch adds status information for inline_dentry inodes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim
bce8d11207 f2fs: avoid deadlock on init_inode_metadata
Previously, init_inode_metadata does not hold any parent directory's inode
page. So, f2fs_init_acl can grab its parent inode page without any problem.
But, when we use inline_dentry, that page is grabbed during f2fs_add_link,
so that we can fall into deadlock condition like below.

INFO: task mknod:11006 blocked for more than 120 seconds.
      Tainted: G           OE  3.17.0-rc1+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mknod           D ffff88003fc94580     0 11006  11004 0x00000000
 ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8
 0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220
 ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002
Call Trace:
 [<ffffffff8173dc40>] ? bit_wait+0x50/0x50
 [<ffffffff8173d4cd>] io_schedule+0x9d/0x130
 [<ffffffff8173dc6c>] bit_wait_io+0x2c/0x50
 [<ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0
 [<ffffffff811640a7>] __lock_page+0x67/0x70
 [<ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0
 [<ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs]
 [<ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs]
 [<ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs]
 [<ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs]
 [<ffffffff8122d847>] get_acl+0x47/0x70
 [<ffffffff8122db5a>] posix_acl_create+0x5a/0x150
 [<ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs]
 [<ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs]
 [<ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs]
 [<ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs]
 [<ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs]
 [<ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs]
 [<ffffffff811e3ec1>] vfs_mknod+0xe1/0x160
 [<ffffffff811e4b26>] SyS_mknod+0x1f6/0x200
 [<ffffffff81741d7f>] tracesys+0xe1/0xe6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim
59a0615540 f2fs: fix to wait correct block type
The inode page needs to wait NODE block io.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:33 -08:00
Jaegeuk Kim
4e6ebf6d49 f2fs: reuse find_in_block code for find_in_inline_dir
This patch removes redundant copied code in find_in_inline_dir.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Jaegeuk Kim
a82afa2019 f2fs: reuse room_for_filename for inline dentry operation
This patch introduces to reuse the existing room_for_filename for inline dentry
operation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Chao Yu
622f28ae9b f2fs: enable inline dir handling
Add inline dir functions into normal dir ops' function to handle inline ops.
Besides, we enable inline dir mode when a new dir inode is created if
inline_data option is on.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:32 -08:00
Chao Yu
201a05be96 f2fs: add key function to handle inline dir
Adds Functions to implement inline dir init/lookup/insert/delete/convert ops.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove needless reserved area copy, pointed by Dan Carpenter]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Chao Yu
dbeacf02eb f2fs: export dir operations for inline dir
This patch exports some dir operations for inline dir, additionally introduces
f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Chao Yu
5efd3c6f1b f2fs: add a new mount option for inline dir
Adds a new mount option 'inline_dentry' for inline dir.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Chao Yu
34d67debe0 f2fs: add infra struct and helper for inline dir
This patch defines macro/inline dentry structure, and adds some helpers for
inline dir infrastructure.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Jaegeuk Kim
af41d3ee00 f2fs: avoid infinite loop at cp_error
This patch avoids an infinite loop in sync_dirty_inode_page when -EIO was
detected.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:31 -08:00
Jaegeuk Kim
4a257ed677 f2fs: avoid build warning
This patch removes build warning.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim
13fd8f89f6 f2fs: fix to call f2fs_unlock_op
This patch fixes to call f2fs_unlock_op, which was missing before.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim
9ba69cf987 f2fs: avoid to allocate when inline_data was written
The sceanrio is like this.
inline_data   i_size     page                 write_begin/vm_page_mkwrite
  X             30       dirty_page
  X             30                            write to #4096 position
  X             30       get_dnode_of_data    wait for get_dnode_of_data
  O             30       write inline_data
  O             30                            get_dnode_of_data
  O             30                            reserve data block
..

In this case, we have #0 = NEW_ADDR and inline_data as well.
We should not allow this condition for further access.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim
a78186ebe5 f2fs: use highmem for directory pages
This patch fixes to use highmem for directory pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:30 -08:00
Jaegeuk Kim
1ce86bf6f8 f2fs: fix race conditon on truncation with inline_data
Let's consider the following scenario.

blkaddr[0] inline_data i_size  i_blocks writepage           truncate
  NEW        X        4096        2    dirty page #0
  NEW        X         0                                    change i_size
  NEW        X         0          2    f2fs_write_inline_data
  NEW        X         0          2    get_dnode_of_data
  NEW        X         0          2    truncate_data_blocks_range
  NULL       O         0          1    memcpy(inline_data)
  NULL       O         0          1    f2fs_put_dnode
  NULL       O         0          1                         f2fs_truncate
  NULL       O         0          1                         get_dnode_of_data
  NULL       O         0          1                       *invalid block addr*

This patch adds checking inline_data flag during f2fs_truncate not to refer
corrupted block indices.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim
c08a690b46 f2fs: should truncate any allocated block for inline_data write
When trying to write inline_data, we should truncate any data block allocated
and pointed by the inode block.
We should consider the data index is not 0.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim
cbcb2872e3 f2fs: invalidate inmemory page
If user truncates file's data, we should truncate inmemory pages too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim
34ba94bac9 f2fs: do not make dirty any inmemory pages
This patch let inmemory pages be clean all the time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-11-03 16:07:29 -08:00
Jaegeuk Kim
02a1335f25 f2fs: support volatile operations for transient data
This patch adds support for volatile writes which keep data pages in memory
until f2fs_evict_inode is called by iput.

For instance, we can use this feature for the sqlite database as follows.
While supporting atomic writes for main database file, we can keep its journal
data temporarily in the page cache by the following sequence.

1. open
 -> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
2. writes
 : keep all the data in the page cache.
3. flush to the database file with atomic writes
  a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
  b. writes
  c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
4. close
 -> drop the cached data

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-10-07 11:54:41 -07:00
Jaegeuk Kim
88b88a6679 f2fs: support atomic writes
This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
 o F2FS_IOC_START_ATOMIC_WRITE
 o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
 -> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
  : all the written data will be treated as atomic pages.
3. commit
 -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
  : this flushes all the data blocks to the disk, which will be shown all or
  nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

  ,- START_ATOMIC_WRITE                  ,- COMMIT_ATOMIC_WRITE
 CP | D D D D D D | FSYNC | D D D D | FSYNC ...
                      `- COMMIT_ATOMIC_WRITE

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-10-06 17:39:50 -07:00
Jaegeuk Kim
120c2cba1d f2fs: remove unused return value
Don't return any value without any usage.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-10-05 21:05:15 -07:00
Jaegeuk Kim
52656e6cf7 f2fs: clean up f2fs_ioctl functions
This patch cleans up f2fs_ioctl functions for better readability.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:56 -07:00
Dan Carpenter
8a21984d5d f2fs: potential shift wrapping buf in f2fs_trim_fs()
My static checker complains that segment is a u64 but only the lower 31
bits can be used before we hit a shift wrapping bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:56 -07:00
Jaegeuk Kim
44c1615651 f2fs: call f2fs_unlock_op after error was handled
This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:55 -07:00
Jaegeuk Kim
7cd8558baa f2fs: check the use of macros on block counts and addresses
This patch cleans up the existing and new macros for readability.

Rule is like this.

         ,-----------------------------------------> MAX_BLKADDR -,
         |  ,------------- TOTAL_BLKS ----------------------------,
         |  |                                                     |
         |  ,- seg0_blkaddr   ,----- sit/nat/ssa/main blkaddress  |
block    |  | (SEG0_BLKADDR)  | | | |   (e.g., MAIN_BLKADDR)      |
address  0..x................ a b c d .............................
            |                                                     |
global seg# 0...................... m .............................
            |                       |                             |
            |                       `------- MAIN_SEGS -----------'
            `-------------- TOTAL_SEGS ---------------------------'
                                    |                             |
 seg#                               0..........xx..................

= Note =
 o GET_SEGNO_FROM_SEG0 : blk address -> global segno
 o GET_SEGNO           : blk address -> segno
 o START_BLOCK         : segno -> starting block address

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:34:47 -07:00
Jaegeuk Kim
309cc2b6e7 f2fs: refactor flush_nat_entries to remove costly reorganizing ops
Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:30:41 -07:00
Jaegeuk Kim
4b2fecc846 f2fs: introduce FITRIM in f2fs_ioctl
This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:06:09 -07:00
Jaegeuk Kim
75ab4cb830 f2fs: introduce cp_control structure
This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-30 15:01:28 -07:00
Jaegeuk Kim
95dd897301 f2fs: use more free segments until SSR is activated
Previously, f2fs activates SSR if the # of free segments reaches to the # of
overprovisioned segments.
In this case, SSR starts to use dirty segments only, so that the overprovisoned
space cannot be selected for new data.
This means that we have no chance to utilizae the overprovisioned space at all.

This patch fixes that by allowing LFS allocations until the # of free segments
reaches to the last threshold, reserved space.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:24 -07:00
Jaegeuk Kim
9b5f136fd4 f2fs: change the ipu_policy option to enable combinations
This patch changes the ipu_policy setting to use any combination of orthogonal policies.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:24 -07:00
Chao Yu
210f41bc04 f2fs: fix to search whole dirty segmap when get_victim
In ->get_victim we get max_search value from dirty_i->nr_dirty without
protection of seglist_lock, after that, nr_dirty can be increased/decreased
before we hold seglist_lock lock.
Then in main loop we attempt to traverse all dirty section one time to find
victim section, but it's not accurate to use max_search as the total loop count,
because we might lose checking several sections or check sections redundantly
for the case of nr_dirty are increased or decreased previously.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:23 -07:00
Chao Yu
26666c8a43 f2fs: fix to clean previous mount option when remount_fs
In manual of mount, we descript remount as below:

"mount -o remount,rw /dev/foo /dir
After  this call all old mount options are replaced and arbitrary stuff from
fstab is ignored, except the loop= option which is internally generated and
maintained by the mount command."

Previously f2fs do not clear up old mount options when remount_fs, so we have no
chance of disabling previous option (e.g. flush_merge). Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:22 -07:00
Chao Yu
14cecc5cd6 f2fs: skip punching hole in special condition
Now punching hole in directory is not supported in f2fs, so let's limit file
type in punch_hole().

In addition, in punch_hole if offset is exceed file size, we should skip
punching hole.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:21 -07:00
Chao Yu
55cf9cb63f f2fs: support large sector size
Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.

In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:20 -07:00
Chao Yu
09db6a2ef8 f2fs: fix to truncate blocks past EOF in ->setattr
By using FALLOC_FL_KEEP_SIZE in ->fallocate of f2fs, we can fallocate block past
EOF without changing i_size of inode. These blocks past EOF will not be
truncated in ->setattr as we truncate them only when change the file size.

We should give a chance to truncate blocks out of filesize in setattr().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:20 -07:00
Jaegeuk Kim
976e4c50ae f2fs: update i_size when __allocate_data_block
The f2fs_direct_IO uses __allocate_data_block, but inside the allocation path,
we should update i_size at the changed time to update its inode page.
Otherwise, we can get wrong i_size after roll-forward recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:19 -07:00
Jaegeuk Kim
90a893c749 f2fs: use MAX_BIO_BLOCKS(sbi)
This patch cleans up a simple macro.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:18 -07:00
Jaegeuk Kim
c52e1b10b1 f2fs: remove redundant operation during roll-forward recovery
If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:17 -07:00
Jaegeuk Kim
19c9c466e5 f2fs: do not skip latest inode information
In f2fs_sync_file, if there is no written appended writes, it skips
to write its node blocks.
But, if there is up-to-date inode page, we should write it to update
its metadata during the roll-forward recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:16 -07:00
Jaegeuk Kim
441ac5cb32 f2fs: fix roll-forward missing scenarios
We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:16 -07:00
Jaegeuk Kim
88bd02c947 f2fs: fix conditions to remain recovery information in f2fs_sync_file
This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:15 -07:00
Jaegeuk Kim
7ef35e3b9e f2fs: introduce a flag to represent each nat entry information
This patch introduces a flag in the nat entry structure to merge various
information such as checkpointed and fsync_done marks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:14 -07:00
Jaegeuk Kim
4c521f493b f2fs: use meta_inode cache to improve roll-forward speed
Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:12 -07:00
Jaegeuk Kim
60979115a6 f2fs: fix double lock for inode page during roll-foward recovery
If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:47 -07:00
Huang Ying
c6e489305e f2fs: fix a race condition in next_free_nid
The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten.  So fix the race condition by moving the nm_i->fnct checking
into spin_lock.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:46 -07:00
Huang Ying
7704182387 f2fs: use nm_i->next_scan_nid as default for next_free_nid
Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount.  nm_i->next_scan_nid should be a better default value than
0.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:45 -07:00
Jaegeuk Kim
c1ce1b02bb f2fs: give an option to enable in-place-updates during fsync to users
If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:44 -07:00
Jaegeuk Kim
a7ffdbe22c f2fs: expand counting dirty pages in the inode page cache
Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:39 -07:00
Jaegeuk Kim
2403c155b8 f2fs: remove lengthy inode->i_ino
This patch is to remove lengthy name by adding a new variable.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-10 17:00:25 -07:00
Jaegeuk Kim
0b4c5afde9 f2fs: fix negative value for lseek offset
If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
	lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.

Reported-by: Tommi Rentala <tt.rantala@gmail.com>
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 14:46:36 -07:00
Huang Ying
9a01b56b1a f2fs: avoid node page to be written twice in gc_node_segment
In gc_node_segment, if node page gc is run concurrently with node page
writeback, and check_valid_map and get_node_page run after page locked
and before cur_valid_map is updated as below, it is possible for the
page to be written twice unnecessarily.

			sync_node_pages
			  try_lock_page
			  ...
check_valid_map		  f2fs_write_node_page
			    ...
			    write_node_page
			      do_write_page
			        allocate_data_block
				  ...
				  refresh_sit_entry /* update cur_valid_map */
				  ...
			    ...
			    unlock_page
get_node_page
...
set_page_dirty
...
f2fs_put_page
  unlock_page

This can be solved via calling check_valid_map after get_node_page again.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:07 -07:00
Gu Zheng
721bd4d5c3 f2fs: use lock-less list(llist) to simplify the flush cmd management
We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:06 -07:00
Chao Yu
184a5cd2ce f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c6 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
		sit page num	cp count	sit pages/cp
based		2006.50		1349.75		1.486
patched		1566.25		1463.25		1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns)	dirty sit count
36038		2151
49168		2123
37174		2232

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:05 -07:00
Chao Yu
d3a14afd5e f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO
sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:05 -07:00
Jaegeuk Kim
b0c44f05a2 f2fs: need fsck.f2fs if the recovery was failed
If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:04 -07:00
Jaegeuk Kim
ec325b5270 f2fs: handle bug cases by letting fsck.f2fs initiate
This patch adds to handle corner buggy cases for fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:03 -07:00
Jaegeuk Kim
05796763b8 f2fs: add BUG cases to initiate fsck.f2fs
This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:03 -07:00
Jaegeuk Kim
9850cf4a89 f2fs: need fsck.f2fs when f2fs_bug_on is triggered
If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:02 -07:00
Jaegeuk Kim
2ae4c673e3 f2fs: retain inconsistency information to initiate fsck.f2fs
This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:14:25 -07:00
Jaegeuk Kim
4081363fbe f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB
This patch adds three inline functions to clean up dirty casting codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-03 17:37:13 -07:00
Chao Yu
b73e52824c f2fs: reposition unlock_new_inode to prevent accessing invalid inode
As the race condition on the inode cache, following scenario can appear:
[Thread a]				[Thread b]
					->f2fs_mkdir
					  ->f2fs_add_link
					    ->__f2fs_add_link
					      ->init_inode_metadata failed here
->gc_thread_func
  ->f2fs_gc
    ->do_garbage_collect
      ->gc_data_segment
        ->f2fs_iget
          ->iget_locked
            ->wait_on_inode
					  ->unlock_new_inode
        ->move_data_page
					  ->make_bad_inode
					  ->iput

When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
should be set as bad to avoid being accessed by other thread. But in above
scenario, it allows f2fs to access the invalid inode before this inode was set
as bad.
This patch fix the potential problem, and this issue was found by code review.

change log from v1:
 o Add condition judgment in gc_data_segment() suggested by Changman Lee.
 o use iget_failed to simplify code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-02 00:22:24 -07:00
Jaegeuk Kim
3304b56401 f2fs: fix wrong casting for dentry name
The dentry name type is unsigned char *.
If we don't match this type, some character codes can be changed by signed bit.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-29 00:26:50 -07:00
Dan Carpenter
922cedbd00 f2fs: simplify by using a literal
We can make the code a bit simpler because we know that "!retry" is
zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-28 09:25:29 -07:00
Jaegeuk Kim
c2e69583a4 f2fs: truncate stale block for inline_data
This verifies to truncate any allocated blocks, offset[0], by inline_data.
Not figured out, but for making sure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-25 14:52:09 -07:00
Chao Yu
b5b822050c f2fs: use macro for code readability
This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
instead of numbers in code for readability.

change log from v1:
 o fix typo pointed out by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-22 13:56:47 -07:00
Chao Yu
9d1589ef2e f2fs: introduce need_do_checkpoint for readability
This patch introduce need_do_checkpoint() to include numerous judgment condition
for readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:07 -07:00
Chao Yu
c200b1aa6c f2fs: fix incorrect calculation with total/free inode num
Theoretically, our total inodes number is the same as total node number, but
there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2
(meta nid), and they should never be used by user, so our total/free inode
number calculated in ->statfs is wrong.

This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by
recalculating total/free inode number with the macro.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:06 -07:00
Jaegeuk Kim
04859dba50 f2fs: remove rename and use rename2
Refer the following patch.

commit 7177a9c4b5
Author: Miklos Szeredi <mszeredi@suse.cz>
Date:   Wed Jul 23 15:15:30 2014 +0200

    fs: call rename2 if exists

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:04 -07:00
Jaegeuk Kim
ec4e7af4ca f2fs: skip if inline_data was converted already
This patch checks inline_data one more time under the inode page lock whether
its inline_data is converted or not.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:03 -07:00
Jaegeuk Kim
202095a7a0 f2fs: remove rewrite_node_page
I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:02 -07:00
Jaegeuk Kim
764aa3e978 f2fs: avoid double lock in truncate_blocks
The init_inode_metadata calls truncate_blocks when error is occurred.
The callers holds f2fs_lock_op, so we should not call it again in
truncate_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:01 -07:00
Jaegeuk Kim
14f4e69085 f2fs: prevent checkpoint during roll-forward
Any checkpoint should not be done during the core roll-forward procedure.
Especially, it includes error cases too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:00 -07:00
Jaegeuk Kim
b3fe0a0da2 f2fs: add WARN_ON in f2fs_bug_on
This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:56:59 -07:00
Jaegeuk Kim
cf779cab14 f2fs: handle EIO not to break fs consistency
There are two rules when EIO is occurred.
1. don't write any checkpoint data to preserve the previous checkpoint
2. don't lose the cached dentry/node/meta pages

So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure.
Then, writing checkpoint/dentry/node blocks is not allowed.

Note that, for the data pages, we can't just throw away by redirtying them.
Otherwise, kworker can fall into infinite loop to flush them.
(Ref. xfstests/019)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:55:05 -07:00
Jaegeuk Kim
8501017e50 f2fs: check s_dirty under cp_mutex
It needs to check s_dirty under cp_mutex, since s_dirty is reset under that
mutex.
And previous condition was not correct, since we can omit doing checkpoint
when checkpoint was done followed by all the node pages were written back.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:02 -07:00
Jaegeuk Kim
5274651927 f2fs: unlock_page when node page is redirtied out
This patch fixes missing unlock_page when a node page is redirtied out.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:01 -07:00
Jaegeuk Kim
1e968fdfe6 f2fs: introduce f2fs_cp_error for readability
This patch adds f2fs_cp_error for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:00 -07:00
Jaegeuk Kim
ed2e621a95 f2fs: give a chance to mount again when encountering errors
This patch gives another chance to try mount process when we encounter an error.
This makes an effect on the roll-forward recovery failures as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:00 -07:00
Jaegeuk Kim
6f12ac25f0 f2fs: trigger release_dirty_inode in f2fs_put_super
The generic_shutdown_super calls sync_filesystem, evict_inode, and then
f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information
so we should release them at f2fs_put_super.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:20:29 -07:00
Jaegeuk Kim
97c3c5cac2 f2fs: don't skip checkpoint if there is no dirty node pages
This is the errorneous scenario.
1. write data
2. do checkpoint
3. produce some dirty node pages by the gc thread
4. write back dirty node pages
5. f2fs_put_super will skip the checkpoint, since dirty count for node pages is
  zero.

This patch removes such the wrong condition check.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:35 -07:00
Jaegeuk Kim
b307384e4f f2fs: avoid bug_on when error is occurred
During the recovery, if an error like EIO or ENOMEM, f2fs_bug_on should skip.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:35 -07:00
Jaegeuk Kim
1c35a90e8a f2fs: fix to recover inline_xattr/data and blocks
This patch fixes not to skip xattr recovery and inline xattr/data recovery
order.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim
e3b4d43f7c f2fs: should clear the inline_xattr flag
During the recovery, we should clear the inline_xattr flag if its xattr node
block is recovered.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim
695facc05a f2fs: clear FI_INC_LINK during the recovery
If an inode are fsynced multiple times with fsync & dent marks, this inode will
set FI_INC_LINK at find_fsync_dnodes during the recovery.
But, in recover_inode, recover_dentry doesn't clear that flag when multiple hits
were occurred.

So this patch removes the flag for the further consistency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim
617deb8c05 f2fs: fix the initial inode page for recovery
If a new inode page is needed for recover_dentry, we should assing i_inline
as zero.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim
0342fd301a f2fs: make clear on test condition and return types
This patch adds a parentheses to make clear for condition check.
And also it changes the return type for better meanings.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:33 -07:00
Jaegeuk Kim
b067ba1f1b f2fs: should convert inline_data during the mkwrite
If mkwrite is called to an inode having inline_data, it can overwrite the data
index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:33 -07:00
arter97
e1c4204520 f2fs: fix typo
Fix typo and some grammatical errors.

The words "filesystem" and "readahead" are being used without the space treewide.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:33 -07:00
Chao Yu
b65ee14818 f2fs: use for_each_set_bit to simplify the code
This patch uses for_each_set_bit to simplify some codes in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04 13:20:53 -07:00
Chao Yu
497a0930bb f2fs: add f2fs_balance_fs for expand_inode_data
This patch adds f2fs_balance_fs in expand_inode_data to avoid allocation failure
with segment.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04 13:02:03 -07:00
Chao Yu
002a41cabb f2fs: invalidate xattr node page when evict inode
When inode is evicted, all the page cache belong to this inode should be
released including the xattr node page. But previously we didn't do this, this
patch fixed this issue.

v2:
 o reposition invalidate_mapping_pages() to the right place suggested by
Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04 13:01:22 -07:00
Chao Yu
70cfed88ef f2fs: avoid skipping recover_inline_xattr after recover_inline_data
When we recover data of inode in roll-forward procedure, and the inode has both
inline data and inline xattr. We may skip recovering inline xattr if we recover
inline data form node page first.
This patch will fix the problem that we lost inline xattr data in above
scenario.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-02 07:43:51 -07:00
Chao Yu
70407fad85 f2fs: add tracepoint for f2fs_direct_IO
This patch adds a tracepoint for f2fs_direct_IO.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-02 07:34:46 -07:00
Chao Yu
b3582c6892 f2fs: reduce competition among node page writes
We do not need to block on ->node_write among different node page writers e.g.
fsync/flush, unless we have a node page writer from write_checkpoint.
So it's better use rw_semaphore instead of mutex type for ->node_write to
promote performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 23:28:37 -07:00
Jaegeuk Kim
65b85ccce0 f2fs: fix coding style
This patch fixes wrong coding style.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 17:25:54 -07:00
Dongho Sim
33be828ada f2fs: remove redundant lines in allocate_data_block
There are redundant lines in allocate_data_block.

In this function, we call refresh_sit_entry with old seg and old curseg.
After that, we call locate_dirty_segment with old curseg.

But, the new address is always allocated from old curseg and
we call locate_dirty_segment with old curseg in refresh_sit_entry.
So, we do not need to call locate_dirty_segment with old curseg again.

We've discussed like below:

Jaegeuk said:
 "When considering SSR, we need to take care of the following scenario.
  - old segno : X
  - new address : Z
  - old curseg : Y
  This means, a new block is supposed to be written to Z from X.
  And Z is newly allocated in the same path from Y.

  In that case, we should trigger locate_dirty_segment for Y, since
  it was a current_segment and can be dirty owing to SSR.
  But that was not included in the dirty list."

Changman said:
 "We already choosed old curseg(Y) and then we allocate new address(Z) from old
  curseg(Y). After that we call refresh_sit_entry(old address, new address).
  In the funcation, we call locate_dirty_segment with old seg and old curseg.
  So calling locate_dirty_segment after refresh_sit_entry again is redundant."

Jaegeuk said:
 "Right. The new address is always allocated from old_curseg."

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Dongho Sim <dh.sim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 15:38:59 -07:00
Jaegeuk Kim
24a9ee0fa3 f2fs: add tracepoint for f2fs_issue_flush
This patch adds a tracepoint for f2fs_issue_flush.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 14:13:36 -07:00
Jaegeuk Kim
cf2271e781 f2fs: avoid retrying wrong recovery routine when error was occurred
This patch eliminates the propagation of recovery errors to the next mount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 14:13:35 -07:00
Jaegeuk Kim
61e0f2d0a5 f2fs: test before set/clear bits
If the bit is already set, we don't need to reset it, and vice versa.
Because we don't need to make the caches dirty for that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 14:13:35 -07:00
Jaegeuk Kim
01229f5e1b f2fs: fix wrong condition for unlikely
This patch fixes the wrongly used unlikely condition.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 14:13:31 -07:00
Jaegeuk Kim
ea1aa12ca2 f2fs: enable in-place-update for fdatasync
This patch enforces in-place-updates only when fdatasync is requested.
If we adopt this in-place-updates for the fdatasync, we can skip to write the
recovery information.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 14:13:23 -07:00
Jaegeuk Kim
6d99ba41a7 f2fs: skip unnecessary data writes during fsync
This patch intends to improve the fsync performance by skipping remaining the
recovery information, only when there is no data that we should recover.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 14:13:03 -07:00
Jaegeuk Kim
fff04f90c1 f2fs: add info of appended or updated data writes
This patch introduces a inode number list in which represents inodes having
appended data writes or updated data writes after last checkpoint.
This will be used at fsync to determine whether the recovery information
should be written or not.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-29 07:46:11 -07:00
Jaegeuk Kim
39efac41fb f2fs: use radix_tree for ino management
For better ino management, this patch replaces the data structure from list
to radix tree.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-29 07:46:11 -07:00
Jaegeuk Kim
6451e041c8 f2fs: add infra for ino management
This patch changes the naming of orphan-related data structures to use as
inode numbers managed globally.
Later, we can use this facility for managing any inode number lists.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-29 07:45:54 -07:00
Jaegeuk Kim
953e6cc6bc f2fs: punch the core function for inode management
This patch punches out the core functions to manage the inode numbers.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-29 07:40:25 -07:00
Jaegeuk Kim
0f7b2abd18 f2fs: add nobarrier mount option
This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-29 05:27:48 -07:00
Chao Yu
9d84795077 f2fs: fix to put root inode in error path of fill_super
We should put root inode correctly in error path of fill_super, otherwise we
may encounter a leak case of inode resource.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-25 08:19:57 -07:00
Chao Yu
dbf20cb259 f2fs: avoid use invalid mapping of node_inode when evict meta inode
Andrey Tsyvarev reported:
"Using memory error detector reveals the following use-after-free error
in 3.15.0:

AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
  [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
  [<ffffffff812359af>] evict+0x15f/0x290
  [<     inlined    >] iput+0x196/0x280 iput_final
  [<ffffffff812369a6>] iput+0x196/0x280
  [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
  [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
  [<ffffffff812105fd>] kill_block_super+0x4d/0xb0
  [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
  [<ffffffff81211c98>] deactivate_super+0x68/0x80
  [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
  [<     inlined    >] SyS_umount+0xe9/0x1a0 SYSC_umount
  [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
  [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b

Freed by thread T3:
  [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
  [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
  [<ffffffff8107cce2>] __do_softirq+0x142/0x380
  [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
  [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
  [<ffffffff810a8238>] kthread+0x148/0x160
  [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0

Allocated by thread T22276:
  [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
  [<ffffffff81235e2a>] iget_locked+0x10a/0x230
  [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
  [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
  [<ffffffff81211bce>] mount_bdev+0x1de/0x240
  [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
  [<ffffffff81212a85>] mount_fs+0x55/0x220
  [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
  [<     inlined    >] do_mount+0x2b4/0x1120 do_new_mount
  [<ffffffff812400d4>] do_mount+0x2b4/0x1120
  [<     inlined    >] SyS_mount+0xb2/0x110 SYSC_mount
  [<ffffffff812414a2>] SyS_mount+0xb2/0x110
  [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b

The buggy address ffff8800587866c8 is located 48 bytes inside
  of 680-byte region [ffff880058786698, ffff880058786940)

Memory state around the buggy address:
  ffff880058786100: ffffffff ffffffff ffffffff ffffffff
  ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
  ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
  ffff880058786400: ffffffff ffffffff ffffffff ffffffff
  ffff880058786500: ffffffff ffffffff ffffffff fffffffr
 >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
                                                ^
  ffff880058786700: ffffffff ffffffff ffffffff ffffffff
  ffff880058786800: ffffffff ffffffff ffffffff ffffffff
  ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
  ffff880058786a00: ........ ........ ........ ........
  ffff880058786b00: ........ ........ ........ ........
Legend:
  f - 8 freed bytes
  r - 8 redzone bytes
  . - 8 allocated bytes
  x=1..7 - x allocated bytes + (8-x) redzone bytes

Investigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().

It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned."

Nids for both meta_inode and node_inode are reservation, so it's not necessary
for us to invalidate pages which will never be allocated.
To fix this issue, let's skipping needlessly invalidating pages for
{meta,node}_inode in f2fs_evict_inode.

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-25 08:19:55 -07:00
Chao Yu
32f9bc25cb f2fs: support ->rename2()
Now new interface ->rename2() is added to VFS, here are related description:
https://lkml.org/lkml/2014/2/7/873
https://lkml.org/lkml/2014/2/7/758

This patch adds function f2fs_rename2() to support ->rename2() including
handling both RENAME_EXCHANGE and RENAME_NOREPLACE flag.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-25 08:14:08 -07:00
Huang Ying
79e35dc3c2 f2fs: add f2fs_balance_fs for direct IO
Otherwise, if a large amount of direct IO writes were done, the
segment allocation may be failed because no enough segments are gced.

Changes:

v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-25 08:13:58 -07:00
Chao Yu
f1121ab0ba f2fs: reduce searching region of segmap when free section
In __set_test_and_free we will check whether all segment are free in one section
When free one segment, in order to set section to free status.
But the searching region of segmap is from start segno to last segno of f2fs,
it's not necessary. So let's just only check all segment bitmap of target
section.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-15 13:56:49 -07:00
Gu Zheng
4b2868aa4f f2fs: remove the unused stat_lock
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-11 15:01:48 -07:00
Gu Zheng
7a6c76b1b2 f2fs: cleanup the needless return of f2fs_create_root_stats
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-11 15:01:47 -07:00
Chao Yu
81e366f87f f2fs: check name_len of dir entry to prevent from deadloop
We assume that modification of some special application could result in zeroed
name_len, or it is consciously made by somebody. We will deadloop in
find_in_block when name_len of dir entry is zero.

This patch is added for preventing deadloop in above scenario.

change log from v1:
 o use f2fs_bug_on rather than break out from searching dir entry suggested by
Jaegeuk Kim.

Jaegeuk describe:
"Well, IMO, it would be good to add f2fs_bug_on() here with a specific comment.
In the current phase of f2fs, it is more important to investigate the file
system bugs, rather than workarounds for any corrupted images.
And, definitely it needs to stop the kernel if any corrupted image was mounted,
so that we can figure out where the bugs are occurred."

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-10 17:00:02 -07:00
Chao Yu
6b2920a513 f2fs: use inner macro and function to clean up codes
In this patch we use below inner macro and function to clean up codes.
1. ADDRS_PER_PAGE
2. SM_I
3. f2fs_readonly

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:26 -07:00
Chao Yu
3aab8f828e f2fs: introduce f2fs_write_failed to handle error case when write
When we fail in ->write_begin()/->direct_IO(), our allocated node block in disk
and page cache are still kept, despite these may not be used again.

This patch introduce f2fs_write_failed() to handle the error case of these two
interfaces, it will truncate page cache and blocks of this file according to
i_size.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:26 -07:00
Gu Zheng
eee6160f2e f2fs: arguments cleanup of finding file flow functions
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:26 -07:00
Gu Zheng
1c3bb97899 f2fs: remove the needless point-cast
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:26 -07:00
Gu Zheng
34e6d456da f2fs: remove the redundant validation check of acl
kernel side(xx_init_acl), the acl is get/cloned from the parent dir's,
which is credible. So remove the redundant validation check of acl
here.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:25 -07:00
Chao Yu
1256010ab1 f2fs: reduce region of f2fs_lock_op covered for better concurrency
In our rename process, region of f2fs_lock_op covered is too big as some of the
code like f2fs_empty_dir/f2fs_find_entry are not needed to protect by this lock.

So in the extreme case like doing checkpoint when we rename old inode to exist
inode in a large directory could cause lower concurrency.

Let's reduce the region of f2fs_lock_op to fix this.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:25 -07:00
Fabian Frederick
b434babf85 f2fs: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:25 -07:00
Chao Yu
aec71382c6 f2fs: refactor flush_nat_entries codes for reducing NAT writes
Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time.

In this patch we merge dirty entries located in same NAT block to nat entry set,
and linked all set to list, sorted ascending order by entries' count of set.
Later we flush entries in sparse set into journal as many as we can, and then
flush merged entries to disk. In this way we can not only gain in performance,
but also save lifetime of flash device.

In my testing environment, it shows this patch can help to reduce NAT block
writes obviously. In hard disk test case: cost time of fsstress is stablely
reduced by about 5%.

1. virtual machine + hard disk:
fsstress -p 20 -n 200 -l 5
		node num	cp count	nodes/cp
based		4599.6		1803.0		2.551
patched		2714.6		1829.6		1.483

2. virtual machine + 32g micro SD card:
fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0
-f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5
-f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S

		node num	cp count	nodes/cp
based		84.5		43.7		1.933
patched		49.2		40.0		1.23

Our latency of merging op shows not bad when handling extreme case like:
merging a great number of dirty nats:
latency(ns)	dirty nat count
3089219		24922
5129423		27422
4000250		24523

change log from v1:
 o fix wrong logic in add_nat_entry when grab a new nat entry set.
 o swith to create slab cache in create_node_manager_caches.
 o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency.

change log from v2:
 o make comment position more appropriate suggested by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:25 -07:00
Jaegeuk Kim
a014e037be f2fs: clean up an unused parameter and assignment
This patch cleans up simple unnecessary codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:25 -07:00
Jaegeuk Kim
b97a9b5da8 f2fs: introduce f2fs_do_tmpfile for code consistency
This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata
flow.
Throught this, we can provide the consistent lock usage, e.g., fi->i_sem,  and
this will enable better debugging stuffs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:24 -07:00
Chao Yu
50732df02e f2fs: support ->tmpfile()
Add function f2fs_tmpfile() to support O_TMPFILE file creation, and modify logic
of init_inode_metadata to enable linkat temp file.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:24 -07:00
Chao Yu
ca0a81b397 f2fs: avoid to truncate non-updated page partially
After we call find_data_page in truncate_partial_data_page, we could not
guarantee this page is updated or not as error may occurred in lower layer.

We'd better check status of the page to avoid this no updated page be
writebacked to device.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:24 -07:00
Chao Yu
5576cd6ca5 f2fs: avoid unneeded SetPageUptodate in f2fs_write_end
We have already set page update in ->write_begin, so we should remove redundant
SetPageUptodate in ->write_end.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 14:04:24 -07:00
Chao Yu
50e1f8d221 f2fs: avoid to access NULL pointer in issue_flush_thread
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861

Denis 2014-05-10 11:28:59 UTC reported:
"F2FS-fs (mmcblk0p28): mounting..
 Unable to handle kernel NULL pointer dereference at virtual address 00000018
 ...
 [<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c)
 [<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4)
 [<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)"

This patch assign cmd_control_info in sm_info before issue_flush_thread is being
created, so this make sure that issue flush thread will have no chance to access
invalid info in fcc.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:59:55 -07:00
Jaegeuk Kim
2743f86554 f2fs: check bdi->dirty_exceeded when trying to skip data writes
If we don't check the current backing device status, balance_dirty_pages can
fall into infinite pausing routine.

This can be occurred when a lot of directories make a small number of dirty
dentry pages including files.

Reported-by: Brian Chadwick <brianchad@westnet.com.au>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:59:45 -07:00
Jaegeuk Kim
b2c0829912 f2fs: do checkpoint for the renamed inode
If an inode is renamed, it should be registered as file_lost_pino to conduct
checkpoint at f2fs_sync_file.
Otherwise, the inode cannot be recovered due to no dent_mark in the following
scenario.

Note that, this scenario is from xfstests/322.

1. create "a"
2. fsync "a"
3. rename "a" to "b"
4. fsync "b"
5. Sudden power-cut

After recovery is done, "b" should be seen.
However, the result shows "a", since the recovery procedure does not enter
recover_dentry due to no dent_mark.

The reason is like below.
- The nid of "a" is checkpointed during #2, f2fs_sync_file.
- The inode page for "b" produced by #3 is written without dent_mark by
sync_node_pages.

So, this patch fixes this bug by assinging file_lost_pino to the "a"'s inode.
If the pino is lost, f2fs_sync_file conducts checkpoint, and then recovers
the latest pino and its dentry information for further recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:59:31 -07:00
Chao Yu
dd4d961fe7 f2fs: release new entry page correctly in error path of f2fs_rename
This patch correct releasing code of new_page to avoid BUG_ON in error patch of
f2fs_rename.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:59:11 -07:00
Chao Yu
90d72459cc f2fs: fix error path in init_inode_metadata
If we fail in this path:
->init_inode_metadata
  ->make_empty_dir
    ->get_new_data_page
      ->grab_cache_page return -ENOMEM

We will bug on in error path of init_inode_metadata when call remove_inode_page
because i_block = 2 (one inode block will be released later & one dentry block).

We should release the dentry block in init_inode_metadata to avoid this BUG_ON,
and avoid leak of dentry block resource, because we never have second chance to
release that block in ->evict_inode as in upper error path we make this inode
'bad'.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:58:50 -07:00
Chao Yu
d6b7d4b31d f2fs: check lower bound nid value in check_nid_range
This patch add lower bound verification for nid in check_nid_range, so nids
reserved like 0, node, meta passed by caller could be checked there.

And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying
code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:58:08 -07:00
Chao Yu
8bc6f60e3f f2fs: remove unused variables in f2fs_sm_info
Remove unused variables in struct f2fs_sm_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-09 05:57:57 -07:00
Jaegeuk Kim
98397ff3cd f2fs: fix not to allocate unnecessary blocks during fallocate
This patch fixes the fallocate bug like below. (See xfstests/255)

In fallocate(fd, 0, 20480),
expand_inode_data processes
	for (index = pg_start; index <= pg_end; index++) {
		f2fs_reserve_block();
		...
	}

So, even though fallocate requests 20480, 5 blocks, f2fs allocates 6 blocks
including pg_end.
So, this patch adds one condition to avoid block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-23 10:05:08 +09:00
Jaegeuk Kim
ead432756a f2fs: recover fallocated data and its i_size together
This patch arranges the f2fs_locks to cover the fallocated data and its i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-23 10:05:08 +09:00
Jaegeuk Kim
ccfb30001f f2fs: fix to report newly allocate region as extent
Previous get_block in f2fs didn't report the newly allocated region which has
NEW_ADDR.
For reader, it should not report, but fiemap needs this.
So, this patch introduces two get_block sharing core function.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-23 10:05:07 +09:00
Linus Torvalds
16b9057804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "This the bunch that sat in -next + lock_parent() fix.  This is the
  minimal set; there's more pending stuff.

  In particular, I really hope to get acct.c fixes merged this cycle -
  we need that to deal sanely with delayed-mntput stuff.  In the next
  pile, hopefully - that series is fairly short and localized
  (kernel/acct.c, fs/super.c and fs/namespace.c).  In this pile: more
  iov_iter work.  Most of prereqs for ->splice_write with sane locking
  order are there and Kent's dio rewrite would also fit nicely on top of
  this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
  lock_parent: don't step on stale ->d_parent of all-but-freed one
  kill generic_file_splice_write()
  ceph: switch to iter_file_splice_write()
  shmem: switch to iter_file_splice_write()
  nfs: switch to iter_splice_write_file()
  fs/splice.c: remove unneeded exports
  ocfs2: switch to iter_file_splice_write()
  ->splice_write() via ->write_iter()
  bio_vec-backed iov_iter
  optimize copy_page_{to,from}_iter()
  bury generic_file_aio_{read,write}
  lustre: get rid of messing with iovecs
  ceph: switch to ->write_iter()
  ceph_sync_direct_write: stop poking into iov_iter guts
  ceph_sync_read: stop poking into iov_iter guts
  new helper: copy_page_from_iter()
  fuse: switch to ->write_iter()
  btrfs: switch to ->write_iter()
  ocfs2: switch to ->write_iter()
  xfs: switch to ->write_iter()
  ...
2014-06-12 10:30:18 -07:00
Al Viro
8d0207652c ->splice_write() via ->write_iter()
iter_file_splice_write() - a ->splice_write() instance that gathers the
pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
it to ->write_iter().  A bunch of simple cases coverted to that...

[AV: fixed the braino spotted by Cyrill]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12 00:18:51 -04:00
Linus Torvalds
64b2d1fbbf f2fs updates for v3.16
This patch-set includes the following major enhancement patches.
  o enhance wait_on_page_writeback
  o support SEEK_DATA and SEEK_HOLE
  o enhance readahead flows
  o enhance IO flushes
  o support fiemap
  o add some tracepoints
 
 The other bug fixes are as follows.
  o fix to support a large volume > 2TB correctly
  o recovery bug fix wrt fallocated space
  o fix recursive lock on xattr operations
  o fix some cases on the remount flow
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTleLYAAoJEEAUqH6CSFDSdhEP/iY5hTZ02jH4ZiFPP/Nee4hS
 l0BHeZvrMjoccaWUDu0ZvIPC8BiJ7kLOgK+/VWS7LAfY1PY11ALEtYQOrW+RM47+
 jMfULegTod/F8WS2B6dk31QMhAZldtnsYvA5PS1VV3S0rht+qbOz+PDejZFgSMc3
 VVQ7OZzq30gMmtsw7oh3FHfeTu4xe/bxygdMRXgljQQD2MvorJiOb4MA+ovEDd9z
 CZMMTQvRKjc0d8LPf0gOkZEvG63GR6klCgFMuiappUsua0O8IPIEhCyEGFrE66vS
 fObVKpqNAsR2ABhS2grgn6Q7UUvn4xrF6jOwDH5tuw2yzmkQiMAWINwBdgnbEy1c
 D5S89PQ8TkQK9KBSoU0v8WKWC4HzWZF4ZEd6eq9VxVTS8iT0w8DtNHXK518FVC34
 82iqrLc0EhrcGNiW/7Nrc6WzHrWqorylCFN7atB3ruhVqeMh3MZsDU4Gq0WgB2oh
 pF9XVBEpJQpV35DYSAPzLkm+2+mwHVNqwdY3HcvUs7IP90+wZlrWSRG2FEfFt/e8
 6nwvbORrHMTI5VfdYlEPgpjuesFmYPZqEvRGdaDa1YrHqhvvgStEPT9qiq2qVn9+
 tr0HjpNRDD/etkaE6ujYOYqdxuk3mm6RY68h+KSbNcY1/VTvt2bN2telwWuDsxIq
 8yOsxV2x3JB/euDAJsSU
 =xqsO
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, there is no special interesting feature, but we've
  investigated a couple of tuning points with respect to the I/O flow.
  Several major bug fixes and a bunch of clean-ups also have been made.

  This patch-set includes the following major enhancement patches:
   - enhance wait_on_page_writeback
   - support SEEK_DATA and SEEK_HOLE
   - enhance readahead flows
   - enhance IO flushes
   - support fiemap
   - add some tracepoints

  The other bug fixes are as follows:
   - fix to support a large volume > 2TB correctly
   - recovery bug fix wrt fallocated space
   - fix recursive lock on xattr operations
   - fix some cases on the remount flow

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
  f2fs: support f2fs_fiemap
  f2fs: avoid not to call remove_dirty_inode
  f2fs: recover fallocated space
  f2fs: fix to recover data written by dio
  f2fs: large volume support
  f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
  f2fs: avoid overflow when large directory feathure is enabled
  f2fs: fix recursive lock by f2fs_setxattr
  MAINTAINERS: add a co-maintainer from samsung for F2FS
  MAINTAINERS: change the email address for f2fs
  f2fs: use inode_init_owner() to simplify codes
  f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency
  f2fs: add a tracepoint for f2fs_read_data_page
  f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages
  f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page
  f2fs: add a tracepoint for f2fs_write_end
  f2fs: add a tracepoint for f2fs_write_begin
  f2fs: fix checkpatch warning
  f2fs: deactivate inode page if the inode is evicted
  f2fs: decrease the lock granularity during write_begin
  ...
2014-06-09 19:11:44 -07:00
Jaegeuk Kim
9ab7013492 f2fs: support f2fs_fiemap
This patch links f2fs_fiemap with generic function with get_block.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-08 08:56:49 +09:00
Jaegeuk Kim
86928f984e f2fs: avoid not to call remove_dirty_inode
There is an errorneous case during the recovery like below.

In recovery_dentry,
 1) dir = f2fs_iget();
 2) mark the dir with FI_DELAY_IPUT
 3) goto unmap_out

After the end of recovery routine, there is no dirty dentries so the dir cannot
be released by iput in remove_dirty_dir_inode.

This patch fixes such the bug case by handling the iget and iput in the
recovery_dentry procedure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-07 03:18:36 +09:00
Jaegeuk Kim
6fa1df533a f2fs: recover fallocated space
If a fallocated file is fsynced, we should recover the i_size after sudden
power cut.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-07 03:18:35 +09:00
Mel Gorman
2457aec637 mm: non-atomically mark page accessed during page cache allocation where possible
aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after.  Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage.  The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page.  This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.

The primary APIs of concern in this care are the following and are used
by most filesystems.

	find_get_page
	find_lock_page
	find_or_create_page
	grab_cache_page_nowait
	grab_cache_page_write_begin

All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not.  Then
old API is preserved but is basically a thin wrapper around this core
function.

Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job.  There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted.  This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change.  It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.

The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations.  The size of the
file is 1/10th physical memory to avoid dirty page balancing.  In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO.  The sync results are expected to be
more stable.  The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.

The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts.  Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison.  As async results were variable
do to writback timings, I'm only reporting the maximum figures.  The sync
results were stable enough to make the mean and stddev uninteresting.

The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.

async dd
                                    3.15.0-rc3            3.15.0-rc3
                                       vanilla           accessed-v2
ext3    Max      elapsed     13.9900 (  0.00%)     11.5900 ( 17.16%)
tmpfs	Max      elapsed      0.5100 (  0.00%)      0.4900 (  3.92%)
btrfs   Max      elapsed     12.8100 (  0.00%)     12.7800 (  0.23%)
ext4	Max      elapsed     18.6000 (  0.00%)     13.3400 ( 28.28%)
xfs	Max      elapsed     12.5600 (  0.00%)      2.0900 ( 83.36%)

The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.

        samples percentage
ext3       86107    0.9783  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext3       23833    0.2710  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3        5036    0.0573  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4       64566    0.8961  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext4        5322    0.0713  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4        2869    0.0384  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs        62126    1.7675  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
xfs         1904    0.0554  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs          103    0.0030  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs      10655    0.1338  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
btrfs       2020    0.0273  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs        587    0.0079  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs      59562    3.2628  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
tmpfs       1210    0.0696  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs         94    0.0054  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:10 -07:00
Jaegeuk Kim
b6fe5873cb f2fs: fix to recover data written by dio
If data are overwritten through dio, previous f2fs doesn't remain the fsync mark
due to no additional node writes.

Note that this patch should resolve the xfstests:311.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04 18:41:38 +09:00
Changman Lee
1dbe415216 f2fs: large volume support
f2fs's cp has one page which consists of struct f2fs_checkpoint and
version bitmap of sit and nat. To support lots of segments, we need more
blocks for sit bitmap. So let's arrange sit bitmap as following:
+-----------------+------------+
| f2fs_checkpoint | sit bitmap |
| + nat bitmap    |            |
+-----------------+------------+
0                 4k        N blocks

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: simple code change for readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04 13:34:30 +09:00
Chao Yu
bac4eef653 f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
Previously we allocate pages with no mapping in ra_sum_pages(), so we may
encounter a crash in event trace of f2fs_submit_page_mbio where we access
mapping data of the page.

We'd better allocate pages in bd_inode mapping and invalidate these pages after
we restore data from pages. It could avoid crash in above scenario.

Changes from V1
 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim.

Call Trace:
 [<f1031630>] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
 [<f10377bb>] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
 [<f103c5da>] restore_node_summary+0x13a/0x280 [f2fs]
 [<f103e22d>] build_curseg+0x2bd/0x620 [f2fs]
 [<f104043b>] build_segment_manager+0x1cb/0x920 [f2fs]
 [<f1032c85>] f2fs_fill_super+0x535/0x8e0 [f2fs]
 [<c115b66a>] mount_bdev+0x16a/0x1a0
 [<f102f63f>] f2fs_mount+0x1f/0x30 [f2fs]
 [<c115c096>] mount_fs+0x36/0x170
 [<c1173635>] vfs_kern_mount+0x55/0xe0
 [<c1175388>] do_mount+0x1e8/0x900
 [<c1175d72>] SyS_mount+0x82/0xc0
 [<c16059cc>] sysenter_do_call+0x12/0x22

Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04 13:34:30 +09:00
Chao Yu
bfec07d0f8 f2fs: avoid overflow when large directory feathure is enabled
When large directory feathure is enable, We have one case which could cause
overflow in dir_buckets() as following:
special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.

Here we define MAX_DIR_BUCKETS to limit the return value when the condition
could trigger potential overflow.

Changes from V1
 o modify description of calculation in f2fs.txt suggested by Changman Lee.

Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04 13:34:30 +09:00
Jaegeuk Kim
d631abdac9 f2fs: fix recursive lock by f2fs_setxattr
This patch should resolve the following recursive lock.

[<ffffffff8135a9c3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffffa01749dc>] f2fs_setxattr+0x5c/0xa0 [f2fs]
[<ffffffffa0174c99>] __f2fs_set_acl+0x1b9/0x340 [f2fs]
[<ffffffffa017515a>] f2fs_init_acl+0x4a/0xcb [f2fs]
[<ffffffffa0159abe>] __f2fs_add_link+0x26e/0x780 [f2fs]
[<ffffffffa015d4d8>] f2fs_mkdir+0xb8/0x150 [f2fs]
[<ffffffff811cebd7>] vfs_mkdir+0xb7/0x160
[<ffffffff811cf89b>] SyS_mkdir+0xab/0xe0
[<ffffffff817244bf>] tracesys+0xe1/0xe6
[<ffffffffffffffff>] 0xffffffffffffffff

The call path indicates:
- f2fs_add_link
   : down_write(&fi->i_sem);

 - init_inode_metadata
   - f2fs_init_acl
     - __f2fs_set_acl
       - f2fs_setxattr
         : down_write(&fi->i_sem);

Here we should not call f2fs_setxattr, but __f2fs_setxattr.
But __f2fs_setxattr is a static function in xattr.c, so that I found the other
generic approach to use f2fs_setxattr.

In f2fs_setxattr, the page pointer is only given from init_inode_metadata.
So, this patch adds this condition to avoid this in f2fs_setxattr.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-02 22:13:16 +09:00
Chao Yu
70ff5dfeb6 f2fs: use inode_init_owner() to simplify codes
This patch uses exported inode_init_owner() to simplify codes in
f2fs_new_inode().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-08 18:23:21 +09:00
Chao Yu
adf8d90b6a f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency
If we use slab memory in f2fs_issue_flush(), we will face memory pressure and
latency time caused by racing of kmem_cache_{alloc,free}.

Let's alloc memory in stack instead of slab.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-08 18:23:21 +09:00
Chao Yu
c20e89cde6 f2fs: add a tracepoint for f2fs_read_data_page
This patch adds a tracepoint for f2fs_read_data_page to trace when page is
readed by user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:59 +09:00
Chao Yu
e574843438 f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages
This patch adds a tracepoint for f2fs_write_{meta,node,data}_pages to trace when
pages are fsyncing/flushing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:59 +09:00
Chao Yu
ecda0de343 f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page
This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when
page is writting out.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:59 +09:00
Chao Yu
dfb2bf38bf f2fs: add a tracepoint for f2fs_write_end
This patch adds a tracepoint for f2fs_write_end to trace write op of user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:59 +09:00
Chao Yu
62aed044ea f2fs: add a tracepoint for f2fs_write_begin
This patch adds a tracepoint for f2fs_write_begin to trace write op of user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:59 +09:00
Zhang Zhen
8b376249e7 f2fs: fix checkpatch warning
fix the following checkpatch warning:
WARNING: do {} while (0) macros should not be semicolon terminated

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:59 +09:00
Jaegeuk Kim
8198899b94 f2fs: deactivate inode page if the inode is evicted
If the inode page is clean during its inode eviction, it'd better drop the page
to reduce further memory pressure.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:58 +09:00
Jaegeuk Kim
d5f66990bb f2fs: decrease the lock granularity during write_begin
This patch reduces the lock granularity during write_begin.
When the system is under memory pressure, it would be better to reduce
the locking time for the data pages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:58 +09:00
Jaegeuk Kim
bde446866c f2fs: no need to wait on page writebck to meta pages
This patch removes grab_cache_page_write_begin for meta pages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:58 +09:00
Jaegeuk Kim
9ac1349ad7 f2fs: avoid grab_cache_page_write_begin for data pages
We don't need to wait on page writeback for these cases.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:58 +09:00
Jaegeuk Kim
54b591dfda f2fs: split grab_cache_page and wait_on_page_writeback for node pages
This patch splits grab_cache_page_write_begin into grab_cache_page and
wait_on_page_writeback for node pages.

This patch intends to enhance the latency to get node pages by alleviating
unnecessary wait_on_page_writeback.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:58 +09:00
Chao Yu
8aa6f1c5bd f2fs: fix to truncate inline data in inode page when setattr
Previous we do not truncate inline data in inode page when setattr, so following
case could still read the inline data which has already truncated:

1.write inline data
2.ftruncate size to 0
3.ftruncate size to max inline data size
4.read from offset 0

This patch introduces truncate_inline_data() to fix this problem.

change log from v1:
 o fix a bug and do not truncate first page data after truncate inline data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:58 +09:00
Chao Yu
817202d937 f2fs: readahead multi pages of directory for performance
We have no so such readahead mechanism in ->iterate() path as the one in
->read() path, it cause low performance when we read large directory.
This patch add readahead in f2fs_readdir() for better performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:57 +09:00
Chao Yu
5c1f9927ec f2fs: set errno when f2fs_iget failed in recover_dentry
We should set the error number correctly when we fail in recover_dentry(), so
the recover flow could stop for the reason as error number shows instead of
continuing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:57 +09:00
Jaegeuk Kim
7f7670fe9f f2fs: consider fallocated space for SEEK_DATA
If an amount of data are allocated though fallocate and user writes a couple of
data among the space, f2fs should return the data offset made by user when
SEEK_DATA is requested.

For example, (N: NEW_ADDR by fallocate, X: NEW_ADDR by user)
1) fallocate 0 ~ 10MB
f -> N N N N N N N N N N N N ... N

2) write 4KB at 5MB offset
f -> N N N N N X N N N N N N ... N

3) SEEK_DATA from 0 should return 5MB offset

So, this patch adds a routine to search the first dirty page to handle that.
Then, the SEEK_DATA flow skips NEW_ADDR offsets until any dirty page is found.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:57 +09:00
Jaegeuk Kim
fe369bc8ba f2fs: return i_size if the hole is outside of i_size
When SEEK_HOLE is requeted, it should return i_size if the hole position is
found outside of i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:57 +09:00
Chao Yu
267378d4de f2fs: introduce f2fs_seek_block to support SEEK_{DATA, HOLE} in llseek
In This patch we introduce f2fs_seek_block to support SEEK_{DATA,HOLE} of
lseek(2).

change log from v1:
 o fix bug when lseek from middle of page and fix wrong calculation of
PGOFS_OF_NEXT_DNODE macro.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:57 +09:00
Gu Zheng
2163d19815 f2fs: introduce help function {create,destroy}_flush_cmd_control
Introduce help function {create,destroy}_flush_cmd_control to clean up
the create/destory flush merge operation.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:57 +09:00
Gu Zheng
a688b9d9e5 f2fs: introduce struct flush_cmd_control to wrap the flush_merge fields
Split the flush_merge fields from sm_i, and use the new struct flush_cmd_control
to wrap it, so that we can igonre these fileds if flush_merge is disable, and
it alse can the structs more neat.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:56 +09:00
Chao Yu
6403eb1f64 f2fs: introduce help macro ADDRS_PER_PAGE()
Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in
direct node or inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:56 +09:00
Jaegeuk Kim
2aea39eca6 f2fs: submit bio at the reclaim path
If f2fs_write_data_page is called through the reclaim path, we should submit
the bio right away.

This patch resolves the following issue that Marc Dietrich reported.
"It took me a while to bisect a problem which causes my ARM (tegra2) netbook to
frequently stall for 5-10 seconds when I enable EXA acceleration (opentegra
experimental ddx)."
And this patch fixes that.

Reported-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:56 +09:00
Jaegeuk Kim
916decbf39 f2fs: return errors right after checking them
This patch adds two error conditions early in the setxattr operations.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:56 +09:00
Jaegeuk Kim
c02745ef68 f2fs: pass flags field to setxattr functions
This patch passes the "flags" field to the low level setxattr functions
to use XATTR_REPLACE in the following patches.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:56 +09:00
Jaegeuk Kim
e112326805 f2fs: clean up long variable names
This patch includes simple clean-ups to reduce unnecessary long variable names.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:56 +09:00
Chao Yu
454ae7e519 f2fs: handle inline data independently in f2fs_bmap
We'd better handle inline data case independently in f2fs_bmap().
It can reduce our handling time in f2fs_bmap().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:56 +09:00
Jaegeuk Kim
6fb03f3a40 f2fs: adjust free mem size to flush dentry blocks
If so many dirty dentry blocks are cached, not reached to the flush condition,
we should fall into livelock in balance_dirty_pages.
So, let's consider the mem size for the condition.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:55 +09:00
Jaegeuk Kim
e8271fa390 f2fs: avoid BUG_ON when mouting corrupted image having garbage blocks
If the disk has some garbage blocks, F2FS is able to face with BUG_ON when
recovering direct node blocks.
This patch detects the error case and avoids that prior to reaching BUG_ON.

Alexey Khoroshilov addressed the potential security issues as follows.
"An ability to trigger a BUG_ON assert by mounting a crafted image is
usually considered as a local denial of service [1-3]. As far as I
understand, the reason is that some kernel data may become inconsistent
that can lead to further problems.

[1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353
[2] http://www.openwall.com/lists/oss-security/2011/06/24/4
[3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928
etc."

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:55 +09:00
Jaegeuk Kim
7ee0eeabcd f2fs: add available_nids to fix handling max_nid correctly
This patch introduces available_nids for alloc_nids() and fixes max_nid for
build_free_nids() and scan_nat_pages().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:55 +09:00
Fabian Frederick
b49ad51e6d f2fs: add static to get_max_meta_blks
inline get_max_meta_blks is only used in checkpoint.c
Use standard static inline format.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:55 +09:00
Chao Yu
94dac22e72 f2fs: introduce raw_nat_from_node_info() to simplfy codes
This patch introduce raw_nat_from_node_info() to simplfy some codes, and also
use exist function node_info_from_raw_nat() to do the same job.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:55 +09:00
Gu Zheng
876dc59eb1 f2fs: add the flush_merge handle in the remount flow
Add the *remount* handle of flush_merge option, so that the users
can enable flush_merge in the runtime, such as the underlying device
handles the cache_flush command relatively slowly.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:55 +09:00
Zhang Zhen
8abfb36ab3 f2fs: atomically set inode->i_flags in f2fs_set_inode_flags()
Use set_mask_bits() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
FS_IMMUTABLE_FL, FS_APPEND_FL, etc. flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:54 +09:00
Jingoo Han
b156d54241 f2fs: make recover_inline_xattr() static
Make recover_inline_xattr() static, because this function is
used only in this file.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:54 +09:00
Jaegeuk Kim
ed57c27f73 f2fs: remove costly dirty_dir_inode operations
This patch removes list opeations in handling dirty dir inodes.
Previously, F2FS traverses whole the list of dirty dir inodes to check whether
there is an existing inode or not, resulting in heavy CPU overheads.

So this patch removes such the traverse operations by adding FI_DIRTY_DIR to
indicate the inode lies on the list or not.
Through this simple flag, we can remove redundant operations gracefully.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:54 +09:00
Jaegeuk Kim
15c6e3aae6 f2fs: fix to unlock f2fs_lock at the omitted error case
If it occurs an error, we should call f2fs_unlock_op.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:54 +09:00
Jaegeuk Kim
76f60268e7 f2fs: call redirty_page_for_writepage
This patch replace some general codes with redirty_page_for_writepage, which
can be enabled after consideration on additional procedure like counting dirty
pages appropriately.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:54 +09:00
Jaegeuk Kim
1e87a78d95 f2fs: avoid to conduct roll-forward due to the remained garbage blocks
The f2fs always scans the next chain of direct node blocks.
But some garbage blocks are able to be remained due to no discard support or
SSR triggers.
This occasionally wreaks recovering wrong inodes that were used or BUG_ONs
due to reallocating node ids as follows.

When mount this f2fs image:
http://linuxtesting.org/downloads/f2fs_fault_image.zip
BUG_ON is triggered in f2fs driver (messages below are generated on
kernel 3.13.2; for other kernels output is similar):

kernel BUG at fs/f2fs/node.c:215!
 Call Trace:
 [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
 [<ffffffff811446e7>] ? __lock_page+0x67/0x70
 [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
 [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
 [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
 [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
 [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
 [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
 [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
 [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
 [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
 [<ffffffff811b9497>] mount_fs+0x47/0x1c0
 [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
 [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
 [<ffffffff811d6763>] do_mount+0x493/0x910
 [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
 [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
 [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b

Found by Linux File System Verification project (linuxtesting.org).

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:54 +09:00
Gu Zheng
b270ad6f0a f2fs: enable flush_merge only in f2fs is not read-only
Enable flush_merge only in f2fs is not read-only, so does the mount
option show.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:54 +09:00
Gu Zheng
197d46476c f2fs: use __GFP_ZERO to avoid appending set-NULL
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:53 +09:00
Gu Zheng
a4ed23f2f1 f2fs: put the bio when issue_flush completed
Put the bio when the flush cmd issued, it also can fix the following
kmemleak:
unreferenced object 0xffff8800270c73c0 (size 200):
  comm "f2fs_flush-7:0", pid 27161, jiffies 4312127988 (age 988.503s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 40 07 81 19 01 88 ff ff  ........@.......
    01 00 00 00 00 00 00 f0 11 14 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81559866>] kmemleak_alloc+0x72/0x96
    [<ffffffff81156f7e>] slab_post_alloc_hook+0x28/0x2a
    [<ffffffff811595b1>] kmem_cache_alloc+0xec/0x157
    [<ffffffff8111924d>] mempool_alloc_slab+0x15/0x17
    [<ffffffff81119513>] mempool_alloc+0x71/0x138
    [<ffffffff81193548>] bio_alloc_bioset+0x93/0x18c
    [<ffffffffa040f857>] issue_flush_thread+0x8d/0x145 [f2fs]
    [<ffffffff8107ac16>] kthread+0xba/0xc2
    [<ffffffff81571b2c>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-05-07 10:21:53 +09:00
Al Viro
8174202b34 write_iter variants of {__,}generic_file_aio_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:38:00 -04:00
Al Viro
aad4f8bb42 switch simple generic_file_aio_read() users to ->read_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:37:55 -04:00
Al Viro
5b46f25ddc f2fs: switch to iov_iter_alignment()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:52 -04:00
Al Viro
31b140398c switch {__,}blockdev_direct_IO() to iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:46 -04:00
Al Viro
d8d3d94b80 pass iov_iter to ->direct_IO()
unmodified, for now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:44 -04:00
Linus Torvalds
26c12d9334 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - the rest of MM
 - zram updates
 - zswap updates
 - exit
 - procfs
 - exec
 - wait
 - crash dump
 - lib/idr
 - rapidio
 - adfs, affs, bfs, ufs
 - cris
 - Kconfig things
 - initramfs
 - small amount of IPC material
 - percpu enhancements
 - early ioremap support
 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
  MAINTAINERS: update Intel C600 SAS driver maintainers
  fs/ufs: remove unused ufs_super_block_third pointer
  fs/ufs: remove unused ufs_super_block_second pointer
  fs/ufs: remove unused ufs_super_block_first pointer
  fs/ufs/super.c: add __init to init_inodecache()
  doc/kernel-parameters.txt: add early_ioremap_debug
  arm64: add early_ioremap support
  arm64: initialize pgprot info earlier in boot
  x86: use generic early_ioremap
  mm: create generic early_ioremap() support
  x86/mm: sparse warning fix for early_memremap
  lglock: map to spinlock when !CONFIG_SMP
  percpu: add preemption checks to __this_cpu ops
  vmstat: use raw_cpu_ops to avoid false positives on preemption checks
  slub: use raw_cpu_inc for incrementing statistics
  net: replace __this_cpu_inc in route.c with raw_cpu_inc
  modules: use raw_cpu_write for initialization of per cpu refcount.
  mm: use raw_cpu ops for determining current NUMA node
  percpu: add raw_cpu_ops
  slub: fix leak of 'name' in sysfs_slab_add
  ...
2014-04-07 16:38:06 -07:00
Kirill A. Shutemov
f1820361f8 mm: implement ->map_pages for page cache
filemap_map_pages() is generic implementation of ->map_pages() for
filesystems who uses page cache.

It should be safe to use filemap_map_pages() for ->map_pages() if
filesystem use filemap_fault() for ->fault().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:53 -07:00
Linus Torvalds
3021112598 f2fs updates for v3.15
This patch-set includes the following major enhancement patches.
  o introduce large directory support
  o introduce f2fs_issue_flush to merge redundant flush commands
  o merge write IOs as much as possible aligned to the segment
  o add sysfs entries to tune the f2fs configuration
  o use radix_tree for the free_nid_list to reduce in-memory operations
  o remove costly bit operations in f2fs_find_entry
  o enhance the readahead flow for CP/NAT/SIT/SSA blocks
 
 The other bug fixes are as follows.
  o recover xattr node blocks correctly after sudden-power-cut
  o fix to calculate the maximum number of node ids
  o enhance to handle many error cases
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQiQrAAoJEEAUqH6CSFDSlbIP/iq06BrUeMDLoQFhA2GQFKFD
 wd0A5h9hCiFcKBcI/u/aAQqj/a5wdwzDl9XzH2PzJ45IM6sVGQZ0lv+kdLhab6rk
 ipNbV7G0yLAX+8ygS6GZF7pSKfMzGSGTrRvfdtoiunIip1jCY1IkUxv1XMgBSPza
 wnWYrE5HXEqRUDCqPXJyxrPmx0/0jw8/V82Ng9stnY34ySs+l/3Pvg65Kh0QuSSy
 BRjJUGlOCF68KUBKd+6YB2T5KlbQde3/5lhP+GMOi+xm5sFB+j+59r/WpJpF2Nxs
 ImxQs5GkiU01ErH/rn5FgHY/zzddQenBKwOvrjEeUA1eVpBurdsIr1JN0P6qDbgB
 ho5U8LzCQq+HZiW444eQGkXSOagpUKqDhTVJO7Fji/wG88Atc9gLX3ix8TH2skxT
 C5CvvrJM7DKBtkZyTzotKY/cWorOZhge6E/EkbGaM1sSHdK5b1Rg4YlFi9TDyz0n
 QjGD1uuvEeukeKGdIG9pjc7o5ledbMDYwLpT2RuRXenLOTsn8BqDOo9aRTg+5Kag
 tJNJLFumjPR2mEBNKjicJMUf381J/SKDwZszAz9mgvCZXldMza/Ax0LzJDJCVmkP
 UuBiVzGxVzpd33IsESUDr0J9hc+t8kS10jfAeKnE3cpb6n7/RYxstHh6CHOFKNXM
 gPUSYPN3CYiP47DnSfzA
 =eSW+
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce large directory support
   - introduce f2fs_issue_flush to merge redundant flush commands
   - merge write IOs as much as possible aligned to the segment
   - add sysfs entries to tune the f2fs configuration
   - use radix_tree for the free_nid_list to reduce in-memory operations
   - remove costly bit operations in f2fs_find_entry
   - enhance the readahead flow for CP/NAT/SIT/SSA blocks

  The other bug fixes are as follows:
   - recover xattr node blocks correctly after sudden-power-cut
   - fix to calculate the maximum number of node ids
   - enhance to handle many error cases

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits)
  f2fs: fix wrong statistics of inline data
  f2fs: check the acl's validity before setting
  f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
  f2fs: fix to cover io->bio with io_rwsem
  f2fs: fix error path when fail to read inline data
  f2fs: use list_for_each_entry{_safe} for simplyfying code
  f2fs: avoid free slab cache under spinlock
  f2fs: avoid unneeded lookup when xattr name length is too long
  f2fs: avoid unnecessary bio submit when wait page writeback
  f2fs: return -EIO when node id is not matched
  f2fs: avoid RECLAIM_FS-ON-W warning
  f2fs: skip unnecessary node writes during fsync
  f2fs: introduce fi->i_sem to protect fi's info
  f2fs: change reclaim rate in percentage
  f2fs: add missing documentation for dir_level
  f2fs: remove unnecessary threshold
  f2fs: throttle the memory footprint with a sysfs entry
  f2fs: avoid to drop nat entries due to the negative nr_shrink
  f2fs: call f2fs_wait_on_page_writeback instead of native function
  f2fs: introduce nr_pages_to_write for segment alignment
  ...
2014-04-07 10:55:36 -07:00
Chao Yu
48b230a583 f2fs: fix wrong statistics of inline data
If we remove a file that has inline data after mount, our statistics turns to
inaccurate.

cat /sys/kernel/debug/f2fs/status
  - Inline_data Inode: 4294967295

Let's add stat_inc_inline_inode() to stat inline info of the file when lookup.

Change log from v1:
 o stat in f2fs_lookup() instead of in do_read_inode() for excluding wrong stat.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07 12:40:58 +09:00
ZhangZhen
3a8861e271 f2fs: check the acl's validity before setting
Before setting the acl, call posix_acl_valid() to check if it is
valid or not.

Signed-off-by: zhangzhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07 12:18:30 +09:00
Jaegeuk Kim
6b4afdd794 f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
Some storage devices show relatively high latencies to complete cache_flush
commands, even though their normal IO speed is prettry much high. In such
the case, it needs to merge cache_flush commands as much as possible to avoid
issuing them redundantly.
So, this patch introduces a mount option, "-o flush_merge", to mitigate such
the overhead.

If this option is enabled by user, F2FS merges the cache_flush commands and then
issues just one cache_flush on behalf of them. Once the single command is
finished, F2FS sends a completion signal to all the pending threads.

Note that, this option can be used under a workload consisting of very intensive
concurrent fsync calls, while the storage handles cache_flush commands slowly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07 09:50:58 +09:00
Linus Torvalds
24e7ea3bea Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
 in the jbd2 layer and in xattr handling when the extended attributes
 spill over into an external block.
 
 Other than that, the usual clean ups and minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
 Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
 ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
 kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
 bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
 a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
 rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
 ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
 +165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
 2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
 700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
 DrPKlXwIgva7zq5/S0Vr
 =4s1Z
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Major changes for 3.14 include support for the newly added ZERO_RANGE
  and COLLAPSE_RANGE fallocate operations, and scalability improvements
  in the jbd2 layer and in xattr handling when the extended attributes
  spill over into an external block.

  Other than that, the usual clean ups and minor bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
  ext4: fix premature freeing of partial clusters split across leaf blocks
  ext4: remove unneeded test of ret variable
  ext4: fix comment typo
  ext4: make ext4_block_zero_page_range static
  ext4: atomically set inode->i_flags in ext4_set_inode_flags()
  ext4: optimize Hurd tests when reading/writing inodes
  ext4: kill i_version support for Hurd-castrated file systems
  ext4: each filesystem creates and uses its own mb_cache
  fs/mbcache.c: doucple the locking of local from global data
  fs/mbcache.c: change block and index hash chain to hlist_bl_node
  ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
  ext4: refactor ext4_fallocate code
  ext4: Update inode i_size after the preallocation
  ext4: fix partial cluster handling for bigalloc file systems
  ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
  ext4: only call sync_filesystm() when remounting read-only
  fs: push sync_filesystem() down to the file system's remount_fs()
  jbd2: improve error messages for inconsistent journal heads
  jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
  jbd2: minimize region locked by j_list_lock in journal_get_create_access()
  ...
2014-04-04 15:39:39 -07:00
Johannes Weiner
91b0abe36a mm + fs: store shadow entries in page cache
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Jaegeuk Kim
ce23447fe5 f2fs: fix to cover io->bio with io_rwsem
In the f2fs_wait_on_page_writeback, io->bio should be covered by io_rwsem.
Otherwise, the bio pointer can become a dangling pointer due to data races.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-02 09:56:27 +09:00
Chao Yu
d54c795b49 f2fs: fix error path when fail to read inline data
We should unlock page in ->readpage() path and also should unlock & release page
in error path of ->write_begin() to avoid deadlock or memory leak.
So let's add release code to fix the problem when we fail to read inline data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-02 09:56:27 +09:00
Chao Yu
2d7b822ad9 f2fs: use list_for_each_entry{_safe} for simplyfying code
This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for
simplfying code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-02 09:56:27 +09:00
Chao Yu
cf0ee0f09b f2fs: avoid free slab cache under spinlock
Move kmem_cache_free out of spinlock protection region for better performance.

Change log from v1:
 o remove spinlock protection for kmem_cache_free in destroy_node_manager
suggested by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-02 09:56:12 +09:00
Chao Yu
6e452d69d4 f2fs: avoid unneeded lookup when xattr name length is too long
In f2fs_setxattr we have limit this attribute name length, so we should also
check it in f2fs_getxattr to avoid useless lookup caused by invalid name length.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01 18:54:24 +09:00
Chao Yu
df0f8dc0e1 f2fs: avoid unnecessary bio submit when wait page writeback
This patch introduce is_merged_page() to check whether current page is merged
in f2fs bio cache. When page is not in cache, we can avoid submitting bio cache,
resulting in having more chance to merge pages.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01 18:53:41 +09:00
Jaegeuk Kim
3bb5e2c8fe f2fs: return -EIO when node id is not matched
During the cleaing of node segments, F2FS can get errored node blocks due to
data race between node page lock and its valid bitmap operations.
In that case, it needs to return an error to skip such the obsolete block copy.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01 17:38:26 +09:00
Jaegeuk Kim
808a1d7490 f2fs: avoid RECLAIM_FS-ON-W warning
This patch should resolve the following possible bug.

RECLAIM_FS-ON-W at:
 mark_held_locks+0xb9/0x140
 lockdep_trace_alloc+0x85/0xf0
 __kmalloc+0x53/0x1d0
 read_all_xattrs+0x3d1/0x3f0 [f2fs]
 f2fs_getxattr+0x4f/0x100 [f2fs]
 f2fs_get_acl+0x4c/0x290 [f2fs]
 get_acl+0x4f/0x80
 posix_acl_create+0x72/0x180
 f2fs_init_acl+0x29/0xcc [f2fs]
 __f2fs_add_link+0x259/0x710 [f2fs]
 f2fs_create+0xad/0x1c0 [f2fs]
 vfs_create+0xed/0x150
 do_last+0xd36/0xed0
 path_openat+0xc5/0x680
 do_filp_open+0x43/0xa0
 do_sys_open+0x13c/0x230
 SyS_creat+0x1e/0x20
 system_call_fastpath+0x16/0x1b

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:21:08 +09:00
Jaegeuk Kim
479f40c44a f2fs: skip unnecessary node writes during fsync
If multiple redundant fsync calls are triggered, we don't need to write its
node pages with fsync mark continuously.

So, this patch adds FI_NEED_FSYNC to track whether the latest node block is
written with the fsync mark or not.
If the mark was set, a new fsync doesn't need to write a node block.
Otherwise, we should do a new node block with the mark for roll-forward
recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:11 +09:00
Jaegeuk Kim
d928bfbfe7 f2fs: introduce fi->i_sem to protect fi's info
This patch introduces fi->i_sem to protect fi's info that includes xattr_ver,
pino, i_nlink.
This enables to remove i_mutex during f2fs_sync_file, resulting in performance
improvement when a number of fsync calls are triggered from many concurrent
threads.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:11 +09:00
Jaegeuk Kim
58c410351e f2fs: change reclaim rate in percentage
It is more reasonable to determine the reclaiming rate of prefree segments
according to the volume size, which is set to 5% by default.
For example, if the volume is 128GB, the prefree segments are reclaimed
when the number reaches to 6.4GB.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:10 +09:00
Jaegeuk Kim
a5f420101d f2fs: remove unnecessary threshold
The NM_WOUT_THRESHOLD is now obsolete since f2fs starts to control on a basis
of the memory footprint.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:09 +09:00
Jaegeuk Kim
cdfc41c134 f2fs: throttle the memory footprint with a sysfs entry
This patch introduces ram_thresh, a sysfs entry, which controls the memory
footprint used by the free nid list and the nat cache.

Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat
cache was managed by NM_WOUT_THRESHOLD.
However, this approach cannot be applied dynamically according to the system.

So, this patch adds ram_thresh that users can specify the threshold, which is
in order of 1 / 1024.
For example, if the total ram size is 4GB and the value is set to 10 by default,
f2fs tries to control the number of free nids and nat caches not to consume over
10 * (4GB / 1024) = 10MB.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:09 +09:00
Jaegeuk Kim
40bb0058c8 f2fs: avoid to drop nat entries due to the negative nr_shrink
The try_to_free_nats should not receive the negative nr_shrink.
Otherwise, it can drop all the nat entries by the while loop.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:08 +09:00
Jaegeuk Kim
3cb5ad152b f2fs: call f2fs_wait_on_page_writeback instead of native function
If a page is on writeback, f2fs can face with deadlock due to under writepages.
This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw
merged IOs, which is implemented by f2fs_wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:04 +09:00
Jaegeuk Kim
50c8cdb35a f2fs: introduce nr_pages_to_write for segment alignment
This patch introduces nr_pages_to_write to align page writes to the segment
or other operational unit size, which can be tuned according to the system
environment.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 16:37:53 +09:00
Jaegeuk Kim
d3baf95da5 f2fs: increase pages_skipped when skipping writepages
This patch increases pages_skipped when skipping writepages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 16:37:16 +09:00
Jaegeuk Kim
87d6f89094 f2fs: avoid small data writes by skipping writepages
This patch introduces nr_pages_to_skip(sbi, type) to determine writepages can
be skipped.
The dentry, node, and meta pages can be conrolled by F2FS without breaking the
FS consistency.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 13:58:59 +09:00
Jaegeuk Kim
f8b2c1f940 f2fs: introduce get_dirty_dents for readability
The get_dirty_dents gives us the number of dirty dentry pages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 12:34:30 +09:00
Chao Yu
04c0938844 f2fs: fix incorrect parsing with option string
Previously 'background_gc={on***,off***}' is being parsed as correct option,
with this patch we cloud fix the trivial bug in mount process.

Change log from v1:
 o need to check length of parameter suggested by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 10:13:02 +09:00
Chao Yu
e4fc5fbfc9 f2fs: avoid to return incorrect errno of read_normal_summaries
We should return error number of read_normal_summaries instead of -EINVAL when
read_normal_summaries failed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 09:29:53 +09:00
Chao Yu
4bc8e9bcf5 f2fs: introduce f2fs_has_xattr_block for better readability
This patch introduces a help function f2fs_has_xattr_block for better
readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 09:29:46 +09:00
Chao Yu
90aa6dc9b9 f2fs: print type for each segment in segment_info's show
The original segment_info's show looks out-of-format:
cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 512
512 512 512 512 512 512 512 0 0 512
348 0 263 0 0 512 0 0 512 512
512 512 0 512 512 512 512 512 512 512
512 512 511 328 512 512 512 512 512 512
512 512 512 512 512 512 512 0 0 175

Let's fix this and show type for each segment.
cat /proc/fs/f2fs/loop0/segment_info
format: segment_type|valid_blocks
segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN)
0    2|0   1|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
10   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
20   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
30   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
40   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
50   3|0   3|0   3|0   3|0   3|0   3|0   3|0   0|0   3|0   3|0
60   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|512
70   3|512 3|512 3|512 3|512 3|512 3|512 3|512 3|0   3|0   3|512
80   3|0   3|0   3|0   3|0   3|0   3|512 3|0   3|0   3|512 3|512
90   3|512 0|512 3|274 0|512 0|512 0|512 0|512 0|512 0|512 3|512
100  3|512 0|512 3|511 0|328 3|512 0|512 0|512 3|512 0|512 0|512
110  0|512 0|512 0|512 0|512 0|512 0|512 0|512 5|0   4|0   3|512

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18 09:27:18 +09:00
Theodore Ts'o
02b9984d64 fs: push sync_filesystem() down to the file system's remount_fs()
Previously, the no-op "mount -o mount /dev/xxx" operation when the
file system is already mounted read-write causes an implied,
unconditional syncfs().  This seems pretty stupid, and it's certainly
documented or guaraunteed to do this, nor is it particularly useful,
except in the case where the file system was mounted rw and is getting
remounted read-only.

However, it's possible that there might be some file systems that are
actually depending on this behavior.  In most file systems, it's
probably fine to only call sync_filesystem() when transitioning from
read-write to read-only, and there are some file systems where this is
not needed at all (for example, for a pseudo-filesystem or something
like romfs).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jan Kara <jack@suse.cz>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anders Larsen <al@alarsen.net>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: fuse-devel@lists.sourceforge.net
Cc: cluster-devel@redhat.com
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: linux-nilfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org
2014-03-13 10:14:33 -04:00
Chao Yu
910bb12d29 f2fs: check upper bound of ino value in f2fs_nfs_get_inode
Upper bound checking of ino should be added to f2fs_nfs_get_inode, so unneeded
process before do_read_inode in f2fs_iget could be avoided when ino is invalid.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-12 18:15:38 +09:00
Chao Yu
987c7c3112 f2fs: introduce f2fs_has_inline_xattr for better readability
This patch introduces a help function f2fs_has_inline_xattr for better
readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-12 17:23:35 +09:00
Chao Yu
28cdce0459 f2fs: recover inline xattr data in roll-forward process
Previously we do not recover inline xattr data of inode after power-cut, so
inline xattr data may be lost.
We should recover the data during the roll-forward process.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-11 16:31:06 +09:00
Gu Zheng
d653788a43 f2fs: optimize restore_node_summary slightly
Previously, we ra_sum_pages to pre-read contiguous pages as more
as possible, and if we fail to alloc more pages, an ENOMEM error
will be reported upstream, even though we have alloced some pages
yet. In fact, we can use the available pages to do the job partly,
and continue the rest in the following circle. Only reporting ENOMEM
upstream if we really can not alloc any available page.

And another fix is ignoring dealing with the following pages if an
EIO occurs when reading page from page_list.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: modify the flow for better neat code]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:15 +09:00
Gu Zheng
46c04366bb f2fs: format segment_info's show for better legibility
The original segment_info's show is a bit out-of-format:

[root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
......
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 [root@guz Demoes]#

so we fix it here for better legibility.
[root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
......
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1
[root@guz Demoes]#

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:15 +09:00
Gu Zheng
e8512d2e0c f2fs: remove the unused ctor argument of f2fs_kmem_cache_create()
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:14 +09:00
Gu Zheng
b6ce391e61 f2fs: update start nid only once each circle
Integrated a couple of minor changes for better readability suggested by
Chao Yu.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10 18:45:09 +09:00
Jaegeuk Kim
20f70751c6 f2fs: fix wrong kernel coding style
This patch includes a simple fix to adjust coding style.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-05 10:48:53 +09:00
Jaegeuk Kim
c81bf1c84f f2fs: fix to write node pages with WRITE_SYNC
This patch fixes performance regression of dbench reported by
Alex <hbx7d@yandex.com>.

This issue was revealed by Phoronix tests results:
http://www.phoronix.com/scan.php?page=article&item=linux_314_ssdfs&num=2

It turns out that we need to assign WRITE_SYNC to the node writes, if
fsync is triggered.

The performance numbers are like below, which is measured by Alex.
1. 355MB/s       ext4
2. 225MB/s       f2fs : WRITE for node writes
3. 525MB/s       f2fs : WRITE_SYNC for node writes

Reported-And-Tested-by: Alex <hbx7d@yandex.com>.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-03 11:28:40 +09:00
Chao Yu
9cf3c3898a f2fs: fix dirty page accounting when redirty
We should de-account dirty counters for page when redirty in ->writepage().

Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75':
"writeback: fix dirtied pages accounting on redirty
De-account the accumulative dirty counters on page redirty.

Page redirties (very common in ext4) will introduce mismatch between
counters (a) and (b)

a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
b) NR_WRITTEN, BDI_WRITTEN

This will introduce systematic errors in balanced_rate and result in
dirty page position errors (ie. the dirty pages are no longer balanced
around the global/bdi setpoints)."

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-28 13:09:08 +09:00
Chao Yu
695fd1ed3b f2fs: use existing macro to clean up some codes
This patch use existing macro F2FS_INODE/NEXT_FREE_BLKADDR to clean up some
codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 21:09:28 +09:00
Chao Yu
81c1a0f13e f2fs: readahead contiguous SSA blocks for f2fs_gc
If there are multi segments in one section, we will read those SSA blocks which
have contiguous address one by one in f2fs_gc. It may lost performance, let's
read ahead SSA blocks by merge multi read request.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 20:40:36 +09:00
Jaegeuk Kim
ab9fa662e4 f2fs: add an sysfs entry to control the directory level
This patch adds an sysfs entry to control dir_level used by the large directory.

The description of this entry is:

 dir_level                    This parameter controls the directory level to
			      support large directory. If a directory has a
			      number of files, it can reduce the file lookup
			      latency by increasing this dir_level value.
			      Otherwise, it needs to decrease this value to
			      reduce the space overhead. The default value is 0.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 20:31:15 +09:00
Jaegeuk Kim
3843154598 f2fs: introduce large directory support
This patch introduces an i_dir_level field to support large directory.

Previously, f2fs maintains multi-level hash tables to find a dentry quickly
from a bunch of chiild dentries in a directory, and the hash tables consist of
the following tree structure as below.

In Documentation/filesystems/f2fs.txt,

----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------

level #0   | A(2B)
           |
level #1   | A(2B) - A(2B)
           |
level #2   | A(2B) - A(2B) - A(2B) - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

But, if we can guess that a directory will handle a number of child files,
we don't need to traverse the tree from level #0 to #N all the time.
Since the lower level tables contain relatively small number of dentries,
the miss ratio of the target dentry is likely to be high.

In order to avoid that, we can configure the hash tables sparsely from level #0
like this.

level #0   | A(2B) - A(2B) - A(2B) - A(2B)

level #1   | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

With this structure, we can skip the ineffective tree searches in lower level
hash tables.

This patch adds just a facility for this by introducing i_dir_level in
f2fs_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 19:56:09 +09:00
Jaegeuk Kim
5d0c667121 f2fs: remove costly bit operations for f2fs_find_entry
It turns out that a bit operation like find_next_bit is not always fast enough
for f2fs_find_entry.
Instead, it is pretty much simple and fast to traverse each dentries.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 16:25:20 +09:00
Jaegeuk Kim
8b8343fa9d f2fs: implement a lock-free stat_show
The stat_show is just to show the current status of f2fs.
So, we can remove all the there-in locks.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:41 +09:00
Jaegeuk Kim
8a7ed66aaf f2fs: introduce a radix_tree for the free_nid list
This patch introduces a radix tree for the list of free_nids, which enhances
the performance on free nid management.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:41 +09:00
Gu Zheng
f978f5a061 f2fs: introduce help macro on_build_free_nids()
Introduce help macro on_build_free_nids() which just uses build_lock
to judge whether the building free nid is going, so that we can remove
the on_build_free_nids field from f2fs_sb_info.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: remove an unnecessary white line removal]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:40 +09:00
Jaegeuk Kim
fffc2a00fc f2fs: fix to mark the checkpointed nat entry correctly
The nat cache entry maintains a status whether it is checkpointed or not.
So, if a new cache entry is loaded from the last checkpoint,
nat_entry->checkpointed should be true.
If the cache entry is modified as being dirty, nat_entry->checkpoint should
be false.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:40 +09:00
Jaegeuk Kim
6437d1b0ad f2fs: fix to do build_stat prior to the recovery procedure
At the end of the recovery procedure, write_checkpoint is called and updates
the cp count which is managed by f2fs stat.
But, previously build_stat() is called after the recovery procedure, which
results in:

BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
IP: [<ffffffffa03b1030>] write_checkpoint+0x720/0xbc0 [f2fs]
Call Trace:
 [<ffffffff810a6b44>] ? mark_held_locks+0x74/0x140
 [<ffffffff8109a3e0>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffffa03bf036>] recover_fsync_data+0x656/0xf20 [f2fs]
 [<ffffffff812ee3eb>] ? security_d_instantiate+0x1b/0x30
 [<ffffffffa03aeb4d>] f2fs_fill_super+0x94d/0xa00 [f2fs]
 [<ffffffff811a9825>] mount_bdev+0x1a5/0x1f0
 [<ffffffff8114915e>] ? __get_free_pages+0xe/0x40
 [<ffffffffa03ae200>] ? f2fs_remount+0x130/0x130 [f2fs]
 [<ffffffffa03aa575>] f2fs_mount+0x15/0x20 [f2fs]
 [<ffffffff811aa713>] mount_fs+0x43/0x1b0
 [<ffffffff811c7124>] vfs_kern_mount+0x74/0x160
 [<ffffffff811c5cb1>] ? __get_fs_type+0x51/0x60
 [<ffffffff811c9727>] do_mount+0x237/0xb50
 [<ffffffff811c936a>] ? copy_mount_options+0x3a/0x170

So, this patche changes the order of recovery_fsync_data() and
f2fs_build_stats().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:39 +09:00
Jaegeuk Kim
8618b881e9 f2fs: fix not to write data pages on the page reclaiming path
Even if f2fs_write_data_page is called by the page reclaiming path, we should
not write the page to provide enough free segments for the worst case scenario.
Otherwise, f2fs can face with no free segment while gc is conducted, resulting
in:

 ------------[ cut here ]------------
 kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/segment.c:565!
 RIP: 0010:[<ffffffffa02c3b11>]  [<ffffffffa02c3b11>] new_curseg+0x331/0x340 [f2fs]
 Call Trace:
  allocate_segment_by_default+0x204/0x280 [f2fs]
  allocate_data_block+0x108/0x210 [f2fs]
  write_data_page+0x8a/0xc0 [f2fs]
  do_write_data_page+0xe1/0x2a0 [f2fs]
  move_data_page+0x8a/0xf0 [f2fs]
  f2fs_gc+0x446/0x970 [f2fs]
  f2fs_balance_fs+0xb6/0xd0 [f2fs]
  f2fs_write_begin+0x50/0x350 [f2fs]
  ? unlock_page+0x27/0x30
  ? unlock_page+0x27/0x30
  generic_file_buffered_write+0x10a/0x280
  ? file_update_time+0xa3/0xf0
  __generic_file_aio_write+0x1c8/0x3d0
  ? generic_file_aio_write+0x52/0xb0
  ? generic_file_aio_write+0x52/0xb0
  generic_file_aio_write+0x65/0xb0
  do_sync_write+0x5a/0x90
  vfs_write+0xc5/0x1f0
  SyS_write+0x55/0xa0
  system_call_fastpath+0x16/0x1b

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24 16:00:33 +09:00
Jaegeuk Kim
b63da15e8b f2fs: fix the calculation of max_nids
Total nids that f2fs can use should not include 0, nid for node inode, and nid
for meta inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Changman Lee
942e0be621 f2fs: show counts of checkpoint in status
This patch shows the counts of checkpoint in f2fs' status.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Chao Yu
662befda25 f2fs: introduce ra_meta_pages to readahead CP/NAT/SIT pages
This patch help us to cleanup the readahead code by merging ra_{sit,nat}_pages
function into ra_meta_pages.
Additionally the new function is used to readahead cp block in
recover_orphan_inodes.

Change log from v1:
 o fix a deadloop bug pointed by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Chao Yu
3375f696bd f2fs: use inode mutex to keep atomicity of f2fs_falloc
Previously without protection of inode mutex, f2fs_falloc and other data
correlated operations will interfere with each other.
So let's use inode mutex to keep atomicity of f2fs_falloc.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Jaegeuk Kim
1fe54f9dd3 f2fs: clean up redundant function call
This patch integrates inode_[inc|dec]_dirty_dents with inc_page_count to remove
redundant calls.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Jaegeuk Kim
203681f65b f2fs: fix f2fs_write_meta_page at no checkpoint status
If f2fs entered errorneous checkpoint status, it should skip writing meta
pages instead of redirtying the pages out.
Otherwise, it cannot unmount the partition even though f2fs is under read-only
status.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:53 +09:00
Jaegeuk Kim
bd859c6598 f2fs: fix to truncate dentry pages in the error case
When a new directory is allocated, if an error is occurred, we should truncate
preallocated dentry pages too.

This bug was reported by Andrey Tsyvarev after a while as follows.

mkdir()->
 f2fs_add_link()->
  init_inode_metadata()->
    f2fs_init_acl()->
      f2fs_get_acl()->
        f2fs_getxattr()->
          read_all_xattrs() fails.

Also there was a BUG_ON triggered after the fault in
mkdir()->
 f2fs_add_link()->
   init_inode_metadata()->
    remove_inode_page() ->
      f2fs_bug_on(inode->i_blocks != 0 && inode->i_blocks != 1);

But, previous patch wasn't perfect to resolve that bug, so the following bug
report was also submitted.

kernel BUG at fs/f2fs/inode.c:274!
Call Trace:
 [<ffffffff811fde03>] evict+0xa3/0x1a0
 [<ffffffff811fe615>] iput+0xf5/0x180
 [<ffffffffa01c7f63>] f2fs_mkdir+0xf3/0x150 [f2fs]
 [<ffffffff811f2a77>] vfs_mkdir+0xb7/0x160
 [<ffffffff811f36bf>] SyS_mkdir+0x5f/0xc0
 [<ffffffff81680769>] system_call_fastpath+0x16/0x1b

Finally, this patch resolves all the issues like below.

If an error is occurred after make_empty_dir(),
 1. truncate_inode_pages()
   The make_bad_inode() prior to iput() will change i_mode to S_IFREG, which
   means that f2fs will not decrement fi->dirty_dents during f2fs_evict_inode.
   But, by calling it here, we can do that.

 2. truncate_blocks()
   Preallocated dentry pages are trucated here to sync i_blocks.

 3. remove_dirty_dir_inode()
   Remove this directory inode from the list.

Reported-and-Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
f6517cfc84 f2fs: fix a build warning
This patch modifies flow a little bit to avoid the following build warnings.

src/fs/f2fs/recovery.c: In function ‘check_index_in_prev_nodes’:
src/fs/f2fs/recovery.c:288:51: warning: ‘sum.<U5390>.<U52f8>.ofs_in_node’ may
	be used uninitialized in this function [-Wmaybe-uninitialized]
src/fs/f2fs/recovery.c:260:23: warning: ‘sum.nid’ may be used uninitialized
	in this function [-Wmaybe-uninitialized]

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
491c0854b4 f2fs: clean up with a macro
This patch adds GET_BLKOFF_FROM_SEG0 to clean up some codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
924a2ddbd0 f2fs: fix the potential mismatch between dir's i_size and i_blocks
This is the erroneous scenario.

                             i_size    on-disk i_size    i_blocks
__f2fs_add_link()             4096           4096           2
 get_new_data_page            8192           4096           3
 -ENOSPC = init_inode_metadata
 checkpoint                     -            4096           3
 POR and reboot

__f2fs_add_link()             4096           4096           3
 page = get_new_data_page (page->index = 1 by NEW_ADDR)
 add a dentry to the page successfully

f2fs_rmdir()
 f2fs_empty_dir()             4096           4096           3
 f2fs_unlink() goes, since there is no valid dentry due to i_size = 4096.
 But, still there is one dentry in page->index = 1.

So this patch moves the code to write dir->i_size into on-disk i_size in order
to sync dir's i_size, on-disk i_size, and its i_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
1b1f559fc3 f2fs: remove the ugly pointer conversion
This patch modifies the use of bi_private to remove pointer chasing for sbi.
Previously, we had a bi_private structure, but it needs memory allocation.
So this patch uses bi_private by the sbi pointer and adds a completion pointer
into the sbi.
This can achieve no memory allocation and nice use of the bi_private.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
abb2366c82 f2fs: fix to recover xattr node block
If a new xattr node page was allocated and its inode is fsynced, we should
recover the xattr node page during the roll-forward process after power-cut.
But, previously, f2fs didn't handle that case, resulting in kernel panic as
follows reported by Tom Li.

BUG: unable to handle kernel paging request at ffffc9001c861a98
IP: [<ffffffffa0295236>] check_index_in_prev_nodes+0x86/0x2d0 [f2fs]
Call Trace:
 [<ffffffff815ece9b>] ? printk+0x48/0x4a
 [<ffffffffa029626a>] recover_fsync_data+0xdca/0xf50 [f2fs]
 [<ffffffffa02873ae>] f2fs_fill_super+0x92e/0x970 [f2fs]
 [<ffffffff8112c9f8>] mount_bdev+0x1b8/0x200
 [<ffffffffa0286a80>] ? f2fs_remount+0x130/0x130 [f2fs]
 [<ffffffffa0285e40>] f2fs_mount+0x10/0x20 [f2fs]
 [<ffffffff8112d4de>] mount_fs+0x3e/0x1b0
 [<ffffffff810ef4eb>] ? __alloc_percpu+0xb/0x10
 [<ffffffff8114761f>] vfs_kern_mount+0x6f/0x120
 [<ffffffff811497b9>] do_mount+0x259/0xa90
 [<ffffffff810ead1d>] ? memdup_user+0x3d/0x80
 [<ffffffff810eadb3>] ? strndup_user+0x53/0x70
 [<ffffffff8114a2c9>] SyS_mount+0x89/0xd0
 [<ffffffff815feae2>] system_call_fastpath+0x16/0x1b

This patch adds a recovery function of xattr node pages.

Reported-by: Tom Li <biergaizi@members.fsf.org>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
5e443818fa f2fs: handle dirty segments inside refresh_sit_entry
This patch cleans up the refresh_sit_entry to handle locate_dirty_segments.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:52 +09:00
Jaegeuk Kim
744602cf45 f2fs: update_inode_page should be done all the time
In order to make fs consistency, update_inode_page should not be failed all
the time. Otherwise, it is possible to lose some metadata in the inode like
a link count.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-17 14:58:51 +09:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds
bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Christoph Hellwig
a6dda0e63e f2fs: use generic posix ACL infrastructure
f2fs has some weird mode bit handling, so still using the old
chmod code for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:19 -05:00
Christoph Hellwig
37bc15392a fs: make posix_acl_create more useful
Rename the current posix_acl_created to __posix_acl_create and add
a fully featured helper to set up the ACLs on file creation that
uses get_acl().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Christoph Hellwig
5bf3258fd2 fs: make posix_acl_chmod more useful
Rename the current posix_acl_chmod to __posix_acl_chmod and add
a fully featured ACL chmod helper that uses the ->set_acl inode
operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Jaegeuk Kim
bf39c00a9a f2fs: drop obsolete node page when it is truncated
If a node page is trucated, we'd better drop the page in the node_inode's page
cache for better memory footprint.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-23 08:04:21 +09:00
Jaegeuk Kim
4ef51a8fcc f2fs: introduce NODE_MAPPING for code consistency
This patch adds NODE_MAPPING which is similar as META_MAPPING introduced by
Gu Zheng.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:08 +09:00
Gu Zheng
63f5384c9a f2fs: remove the orphan block page array
As the orphan_blocks may be max to 504, so it is not security
and rigorous to store such a large array in the kernel stack
as Dan Carpenter said.
In fact, grab_meta_page has locked the page in the page cache,
and we can use find_get_page() to fetch the page safely in the
downstream, so we can remove the page array directly.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:08 +09:00
Gu Zheng
9df27d982d f2fs: add help function META_MAPPING
Introduce help function META_MAPPING() to get the cache meta blocks'
address space.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:07 +09:00
Jaegeuk Kim
e8dae60458 f2fs: move a branch for code redability
This patch moves a function in f2fs_delete_entry for code readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:41:07 +09:00
Jaegeuk Kim
a18ff06340 f2fs: call mark_inode_dirty to flush dirty pages
If a dentry page is updated, we should call mark_inode_dirty to add the inode
into the dirty list, so that its dentry pages are flushed to the disk.
Otherwise, the inode can be evicted without flush.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-22 18:40:34 +09:00
Chris Fries
6c311ec6c2 f2fs: clean checkpatch warnings
Fixed a variety of trivial checkpatch warnings.  The only delta should
be some minor formatting on log strings that were split / too long.

Signed-off-by: Chris Fries <cfries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-20 10:27:12 +09:00
Changman Lee
c434cbc0ed f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH)
Doing sync_meta_pages with META_FLUSH when checkpoint, we overide rw
using WRITE_FLUSH_FUA. At this time, we also should set
REQ_META|REQ_PRIO.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-16 17:28:35 +09:00
Jaegeuk Kim
c33ec32692 f2fs: avoid f2fs_balance_fs call during pageout
This patch should resolve the following bug.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.13.0-rc5.f2fs+ #6 Not tainted
---------------------------------------------------------
kswapd0/41 just changed the state of lock:
 (&sbi->gc_mutex){+.+.-.}, at: [<ffffffffa030503e>] f2fs_balance_fs+0xae/0xd0 [f2fs]
but this lock took another, RECLAIM_FS-READ-unsafe lock in the past:
 (&sbi->cp_rwsem){++++.?}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Chain exists of:
  &sbi->gc_mutex --> &sbi->cp_mutex --> &sbi->cp_rwsem

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sbi->cp_rwsem);
                               local_irq_disable();
                               lock(&sbi->gc_mutex);
                               lock(&sbi->cp_mutex);
  <Interrupt>
    lock(&sbi->gc_mutex);

 *** DEADLOCK ***

This bug is due to the f2fs_balance_fs call in f2fs_write_data_page.
If f2fs_write_data_page is triggered by wbc->for_reclaim via kswapd, it should
not call f2fs_balance_fs which tries to get a mutex grabbed by original syscall
flow.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-16 16:20:40 +09:00
Changman Lee
499046ab2c f2fs: add delimiter to seperate name and value in debug phrase
Support for f2fs-tools/tools/f2stat to monitor
/sys/kernel/debug/f2fs/status

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:22:17 +09:00
Gu Zheng
17b692f60e f2fs: use spinlock rather than mutex for better speed
With the 2 previous changes, all the long time operations are moved out
of the protection region, so here we can use spinlock rather than mutex
(orphan_inode_mutex) for lower overhead.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:12:05 +09:00
Gu Zheng
c1ef372572 f2fs: move alloc new orphan node out of lock protection region
Move alloc new orphan node out of lock protection region.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:12:04 +09:00
Gu Zheng
4531929e39 f2fs: move grabing orphan pages out of protection region
Move grabing orphan block page out of protection region, and grab all
the orphan block pages ahead.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unnecessary code pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 18:11:20 +09:00
Yuan Zhong
5514f0aadd f2fs: remove the needless parameter of f2fs_wait_on_page_writeback
"boo sync" parameter is never referenced in f2fs_wait_on_page_writeback.
We should remove this parameter.

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-14 17:45:54 +09:00
Jaegeuk Kim
b1c57c1caa f2fs: add a sysfs entry to control max_victim_search
Previously during SSR and GC, the maximum number of retrials to find a victim
segment was hard-coded by MAX_VICTIM_SEARCH, 4096 by default.

This number makes an effect on IO locality, when SSR mode is activated, which
results in performance fluctuation on some low-end devices.

If max_victim_search = 4, the victim will be searched like below.
("D" represents a dirty segment, and "*" indicates a selected victim segment.)

 D1 D2 D3 D4 D5 D6 D7 D8 D9
[   *       ]
      [   *    ]
            [         * ]
	                [ ....]

This patch adds a sysfs entry to control the number dynamically through:
  /sys/fs/f2fs/$dev/max_victim_search

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-08 13:45:08 +09:00
Jaegeuk Kim
fb5566da91 f2fs: improve write performance under frequent fsync calls
When considering a bunch of data writes with very frequent fsync calls, we
are able to think the following performance regression.

N: Node IO, D: Data IO, IO scheduler: cfq

Issue    pending IOs
	 D1 D2 D3 D4
 D1         D2 D3 D4 N1
 D2            D3 D4 N1 N2
 N1            D3 D4 N2 D1
 --> N1 can be selected by cfq becase of the same priority of N and D.
     Then D3 and D4 would be delayed, resuling in performance degradation.

So, when processing the fsync call, it'd better give higher priority to data IOs
than node IOs by assigning WRITE and WRITE_SYNC respectively.
This patch improves the random wirte performance with frequent fsync calls by up
to 10%.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-08 11:16:20 +09:00
Chao Yu
04a17fb17f f2fs: avoid to read inline data except first page
Here is a case which could read inline page data not from first page.

1. write inline data
2. lseek to offset 4096
3. read 4096 bytes from offset 4096
	(read_inline_data read inline data page to non-first page,
	And previously VFS has add this page to page cache)
4. ftruncate offset 8192
5. read 4096 bytes from offset 4096
	(we meet this updated page with inline data in cache)

So we should leave this page with inited data and uptodate flag
for this case.

Change log from v1:
 o fix a deadlock bug

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:22 +09:00
Chao Yu
18309aaa41 f2fs: avoid to left uninitialized data in page when read inline data
Change log from v1:
 o reduce unneeded memset in __f2fs_convert_inline_data

>From 58796be2bd2becbe8d52305210fb2a64e7dd80b6 Mon Sep 17 00:00:00 2001
From: Chao Yu <chao2.yu@samsung.com>
Date: Mon, 30 Dec 2013 09:21:33 +0800
Subject: [PATCH] f2fs: avoid to left uninitialized data in page when read
 inline data

We left uninitialized data in the tail of page when we read an inline data
page. So let's initialize left part of the page excluding inline data region.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:22 +09:00
shifei10.ge
a225dca394 f2fs: fix truncate_partial_nodes bug
The truncate_partial_nodes puts pages incorrectly in the following two cases.
Note that the value for argc 'depth' can only be 2 or 3.
Please see truncate_inode_blocks() and truncate_partial_nodes().

1) An err is occurred in the first 'for' loop
  When err is occurred with depth = 2, pages[0] is invalid, so this page doesn't
  need to be put. There is no problem, however, when depth is 3, it doesn't put
  the pages correctly where pages[0] is valid and pages[1] is invalid.
  In this case, depth is set to 2 (ref to statemnt depth = i + 1), and then
  'goto fail'.
  In label 'fail', for (i = depth - 3; i >= 0; i--) cannot meet the condition
  because i = -1, so pages[0] cann't be put.

2) An err happened in the second 'for' loop
  Now we've got pages[0] with depth = 2, or we've got pages[0] and pages[1]
  with depth = 3. When an err is detected, we need 'goto fail' to put such
  the pages.
  When depth is 2, in label 'fail', for (i = depth - 3; i >= 0; i--) cann't
  meet the condition because i = -1, so pages[0] cann't be put.
  When depth is 3, in label 'fail', for (i = depth - 3; i >= 0; i--) can
  only put pages[0], pages[1] also cann't be put.

Note that 'depth' has been changed before first 'goto fail' (ref to statemnt
depth = i + 1), so passing this modified 'depth' to the tracepoint,
trace_f2fs_truncate_partial_nodes, is also incorrect.

Signed-off-by: Shifei Ge <shifei10.ge@samsung.com>
[Jaegeuk Kim: modify the description and fix one bug]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:21 +09:00
Jaegeuk Kim
a8865372a8 f2fs: handle errors correctly during f2fs_reserve_block
The get_dnode_of_data nullifies inode and node page when error is occurred.

There are two cases that passes inode page into get_dnode_of_data().

1. make_empty_dir()
    -> get_new_data_page()
      -> f2fs_reserve_block(ipage)
	-> get_dnode_of_data()

2. f2fs_convert_inline_data()
    -> __f2fs_convert_inline_data()
      -> f2fs_reserve_block(ipage)
	-> get_dnode_of_data()

This patch adds correct error handling codes when get_dnode_of_data() returns
an error.

At first, f2fs_reserve_block() calls f2fs_put_dnode() whenever reserve_new_block
returns an error.
So, the rule of f2fs_reserve_block() is to nullify inode page when there is any
error internally.

Finally, two callers of f2fs_reserve_block() should call f2fs_put_dnode()
appropriately if they got an error since successful f2fs_reserve_block().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:21 +09:00
Jaegeuk Kim
1e1bb4baf1 f2fs: add inline_data recovery routine
This patch adds a inline_data recovery routine with the following policy.

[prev.] [next] of inline_data flag
   o       o  -> recover inline_data
   o       x  -> remove inline_data, and then recover data blocks
   x       o  -> remove inline_data, and then recover inline_data
   x       x  -> recover data blocks

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:20 +09:00
Jaegeuk Kim
0dbdc2ae9b f2fs: add the number of inline_data files to status info
This patch adds the number of inline_data files into the status information.
Note that the number is reset whenever the filesystem is newly mounted.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:20 +09:00
Jaegeuk Kim
9e09fc855d f2fs: refactor f2fs_convert_inline_data
Change log from v1:
 o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu.

This patch refactors f2fs_convert_inline_data to check a couple of conditions
internally for deciding whether it needs to convert inline_data or not.

So, the new f2fs_convert_inline_data initially checks:
1) f2fs_has_inline_data(), and
2) the data size to be changed.

If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA,
then we don't need to convert the inline_data with data allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:19 +09:00
Jaegeuk Kim
26f466f4a9 f2fs: call f2fs_put_page at the error case
In f2fs_write_begin(), if f2fs_conver_inline_data() returns an error like
-ENOSPC, f2fs should call f2fs_put_page().
Otherwise, it is remained as a locked page, resulting in the following bug.

[<ffffffff8114657e>] sleep_on_page+0xe/0x20
[<ffffffff81146567>] __lock_page+0x67/0x70
[<ffffffff81157d08>] truncate_inode_pages_range+0x368/0x5d0
[<ffffffff81157ff5>] truncate_inode_pages+0x15/0x20
[<ffffffff8115804b>] truncate_pagecache+0x4b/0x70
[<ffffffff81158082>] truncate_setsize+0x12/0x20
[<ffffffffa02a1842>] f2fs_setattr+0x72/0x270 [f2fs]
[<ffffffff811cdae3>] notify_change+0x213/0x400
[<ffffffff811ab376>] do_truncate+0x66/0xa0
[<ffffffff811ab541>] vfs_truncate+0x191/0x1b0
[<ffffffff811ab5bc>] do_sys_truncate+0x5c/0xa0
[<ffffffff811ab78e>] SyS_truncate+0xe/0x10
[<ffffffff81756052>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:19 +09:00
Jaegeuk Kim
8230a0a49f f2fs: convert inline_data for punch_hole
In the punch_hole(), let's convert inline_data all the time for simplicity and
to avoid potential deadlock conditions.
It is pretty much not a big deal to do this.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-06 16:42:12 +09:00
Jaegeuk Kim
f185ff979f f2fs: don't need to get f2fs_lock_op for the inline_data test
This patch locates checking the inline_data prior to calling f2fs_lock_op()
in truncate_blocks(), since getting the lock is unnecessary.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-27 12:40:41 +09:00
Huajun Li
9ffe0fb5f3 f2fs: handle inline data operations
Hook inline data read/write, truncate, fallocate, setattr, etc.

Files need meet following 2 requirement to inline:
 1) file size is not greater than MAX_INLINE_DATA;
 2) file doesn't pre-allocate data blocks by fallocate().

FI_INLINE_DATA will not be set while creating a new regular inode because
most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when
data is submitted to block layer, ranther than set it while creating a new
inode, this also avoids converting data from inline to normal data block
and vice versa.

While writting inline data to inode block, the first data block should be
released if the file has a block indexed by i_addr[0].

On the other hand, when a file operation is appied to a file with inline
data, we need to test if this file can remain inline by doing this
operation, otherwise it should be convert into normal file by reserving
a new data block, copying inline data to this new block and clear
FI_INLINE_DATA flag. Because reserve a new data block here will make use
of i_addr[0], if we save inline data in i_addr[0..872], then the first
4 bytes would be overwriten. This problem can be avoided simply by
not using i_addr[0] for inline data.

Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Weihong Xu <weihong.xu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:40:41 +09:00
Huajun Li
e18c65b2ac f2fs: key functions to handle inline data
Functions to implement inline data read/write, and move inline data to
normal data block when file size exceeds inline data limitation.

Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Weihong Xu <weihong.xu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:40:09 +09:00
Gu Zheng
0d47c1adc2 f2fs: convert max_orphans to a field of f2fs_sb_info
Previously, we need to calculate the max orphan num when we try to acquire an
orphan inode, but it's a stable value since the super block was inited. So
converting it to a field of f2fs_sb_info and use it directly when needed seems
a better choose.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:37:52 +09:00
Jaegeuk Kim
944fcfc184 f2fs: check the blocksize before calling generic_direct_IO path
The f2fs supports 4KB block size. If user requests dwrite with under 4KB data,
it allocates a new 4KB data block.
However, f2fs doesn't add zero data into the untouched data area inside the
newly allocated data block.

This incurs an error during the xfstest #263 test as follow.

263 12s ... [failed, exit status 1] - output mismatch (see 263.out.bad)
	--- 263.out	2013-03-09 03:37:15.043967603 +0900
	+++ 263.out.bad	2013-12-27 04:20:39.230203114 +0900
	@@ -1,3 +1,976 @@
	QA output created by 263
	fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
	-fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
	+fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
	+truncating to largest ever: 0x12a00
	+truncating to largest ever: 0x75400
	+fallocating to largest ever: 0x79cbf
	...
	(Run 'diff -u 263.out 263.out.bad' to see the entire diff)
	Ran: 263
	Failures: 263
	Failed 1 of 1 tests

It turns out that, when the test tries to write 2KB data with dio, the new dio
path allocates 4KB data block without filling zero data inside the remained 2KB
area. Finally, the output file contains a garbage data for that region.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:33:07 +09:00
Jaegeuk Kim
1ec79083b2 f2fs: should put the dnode when NEW_ADDR is detected
When get_dnode_of_data() in get_data_block() returns a successful dnode, we
should put the dnode.
But, previously, if its data block address is equal to NEW_ADDR, we didn't do
that, resulting in a deadlock condition.
So, this patch splits original error conditions with this case, and then calls
f2fs_put_dnode before finishing the function.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:33:06 +09:00
Jaegeuk Kim
58bfaf44df f2fs: introduce F2FS_INODE macro to get f2fs_inode
This patch introduces F2FS_INODE that returns struct f2fs_inode * from the inode
page.
By using this macro, we can remove unnecessary casting codes like below.

   struct f2fs_inode *ri = &F2FS_NODE(inode_page)->i;
-> struct f2fs_inode *ri = F2FS_INODE(inode_page);

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:32:48 +09:00
Chao Yu
d96b143151 f2fs: check filename length in recover_dentry
In current flow, we will get Null return value of f2fs_find_entry in
recover_dentry when name.len is bigger than F2FS_NAME_LEN, and then we
still add this inode into its dir entry.
To avoid this situation, we must check filename length before we use it.

Another point is that we could remove the code of checking filename length
In f2fs_find_entry, because f2fs_lookup will be called previously to ensure of
validity of filename length.

V2:
 o add WARN_ON() as Jaegeuk Kim suggested.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 12:50:09 +09:00
Chao Yu
deead09009 f2fs: avoid to set wrong pino of inode when rename dir
When we rename a dir to new name which is not exist previous,
we will set pino of parent inode with ino of child inode in f2fs_set_link.
It destroy consistency of pino, it should be fixed.

Thanks for previous work of Shu Tan.

Signed-off-by: Shu Tan <shu.tan@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:42:51 +09:00
Chao Yu
4f4124d0b9 f2fs: update several comments
Update several comments:
1. use f2fs_{un}lock_op install of mutex_{un}lock_op.
2. update comment of get_data_block().
3. update description of node offset.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:26:03 +09:00
Gu Zheng
7e8f23081a f2fs: remove the rw_flag domain from f2fs_io_info
When using the f2fs_io_info in the low level, we still need to merge the
rw and rw_flag, so use the rw to hold all the io flags directly,
and remove the rw_flag field.

ps.It is based on the previous patch:
f2fs: move all the bio initialization into __bio_alloc

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Gu Zheng
940a6d34b3 f2fs: move all the bio initialization into __bio_alloc
Move all the bio initialization into __bio_alloc, and some minor cleanups are
also added.

v3:
  Use 'bool' rather than 'int' as Kim suggested.

v2:
  Use 'is_read' rather than 'rw' as Yu Chao suggested.
  Remove the needless initialization of bio->bi_private.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Jaegeuk Kim
5459aa9770 f2fs: write dirty meta pages collectively
This patch enhances writing dirty meta pages collectively in background.
During the file data writes, it'd better avoid to write small dirty meta pages
frequently.
So let's give a chance to collect a number of dirty meta pages for a while.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Jaegeuk Kim
bfad7c2d40 f2fs: introduce a new direct_IO write path
Previously, f2fs doesn't support direct IOs with high performance, which throws
every write requests via the buffered write path, resulting in highly
performance degradation due to memory opeations like copy_from_user.

This patch introduces a new direct IO path in which every write requests are
processed by generic blockdev_direct_IO() with enhanced get_block function.

The get_data_block() in f2fs handles:
1. if original data blocks are allocates, then give them to blockdev.
2. otherwise,
  a. preallocate requested block addresses
  b. do not use extent cache for better performance
  c. give the block addresses to blockdev

This policy induces that:
- new allocated data are sequentially written to the disk
- updated data are randomly written to the disk.
- f2fs gives consistency on its file meta, not file data.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Jaegeuk Kim
216fbd6443 f2fs: introduce sysfs entry to control in-place-update policy
This patch introduces new sysfs entries for users to control the policy of
in-place-updates, namely IPU, in f2fs.

Sometimes f2fs suffers from performance degradation due to its out-of-place
update policy that produces many additional node block writes.
If the storage performance is very dependant on the amount of data writes
instead of IO patterns, we'd better drop this out-of-place update policy.

This patch suggests 5 polcies and their triggering conditions as follows.

[sysfs entry name = ipu_policy]

0: F2FS_IPU_FORCE       all the time,
1: F2FS_IPU_SSR         if SSR mode is activated,
2: F2FS_IPU_UTIL        if FS utilization is over threashold,
3: F2FS_IPU_SSR_UTIL    if SSR mode is activated and FS utilization is over
                        threashold,
4: F2FS_IPU_DISABLE    disable IPU. (=default option)

[sysfs entry name = min_ipu_util]

This parameter controls the threshold to trigger in-place-updates.
The number indicates percentage of the filesystem utilization, and used by
F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.

For more details, see need_inplace_update() in segment.h.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Changman Lee
5dcd8a7150 f2fs: missing kmem_cache_destroy for discard_entry
insmod f2fs.ko is failed after insmod and rmmod firstly.

$ sudo insmod fs/f2fs/f2fs.ko
insmod: error inserting 'fs/f2fs/f2fs.ko': -1 Cannot allocate memory

-- dmesg --
kmem_cache_sanity_check (free_nid): Cache name already exists.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Jaegeuk Kim
76130ccabc f2fs: fix the location of tracepoint
We need to get a trace before submit_bio, since its bi_sector is remapped during
the submit_bio.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:06 +09:00
Jaegeuk Kim
458e6197c3 f2fs: refactor bio->rw handling
This patch introduces f2fs_io_info to mitigate the complex parameter list.

struct f2fs_io_info {
	enum page_type type;		/* contains DATA/NODE/META/META_FLUSH */
	int rw;				/* contains R/RS/W/WS */
	int rw_flag;			/* contains REQ_META/REQ_PRIO */
}

1. f2fs_write_data_pages
 - DATA
 - WRITE_SYNC is set when wbc->WB_SYNC_ALL.

2. sync_node_pages
 - NODE
 - WRITE_SYNC all the time

3. sync_meta_pages
 - META
 - WRITE_SYNC all the time
 - REQ_META | REQ_PRIO all the time

 ** f2fs_submit_merged_bio() handles META_FLUSH.

4. ra_nat_pages, ra_sit_pages, ra_sum_pages
 - META
 - READ_SYNC

Cc: Fan Li <fanofcode.li@samsung.com>
Cc: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:06 +09:00
Fan Li
63a0b7cb33 f2fs: merge pages with the same sync_mode flag
Previously f2fs submits most of write requests using WRITE_SYNC, but f2fs_write_data_pages
submits last write requests by sync_mode flags callers pass.

This causes a performance problem since continuous pages with different sync flags
can't be merged in cfq IO scheduler(thanks yu chao for pointing it out), and synchronous
requests often take more time.

This patch makes the following modifies to DATA writebacks:

1. every page will be written back using the sync mode caller pass.
2. only pages with the same sync mode can be merged in one bio request.

These changes are restricted to DATA pages.Other types of writebacks are modified
To remain synchronous.

In my test with tiotest, f2fs sequence write performance is improved by about 7%-10% ,
and this patch has no obvious impact on other performance tests.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:06 +09:00
Jaegeuk Kim
6bacf52fb5 f2fs: add unlikely() macro for compiler more aggressively
This patch adds unlikely() macro into the most of codes.
The basic rule is to add that when:
- checking unusual errors,
- checking page mappings,
- and the other unlikely conditions.

Change log from v1:
 - Don't add unlikely for the NULL test and error test: advised by Andi Kleen.

Cc: Chao Yu <chao2.yu@samsung.com>
Cc: Andi Kleen <andi@firstfloor.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:06 +09:00
Chao Yu
cfb271d485 f2fs: add unlikely() macro for compiler optimization
As we know, some of our branch condition will rarely be true. So we could add
'unlikely' to let compiler optimize these code, by this way we could drop
unneeded 'jump' assemble code to improve performance.

change log:
 o add *unlikely* as many as possible across the whole source files at once
   suggested by Jaegeuk Kim.

Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:06 +09:00
Chao Yu
b9987a277f f2fs: avoid unneeded page release for correct _count of page
In find_fsync_dnodes() and recover_data(), our flow is like this:

->f2fs_submit_page_bio()
	-> f2fs_put_page()
		-> page_cache_release()	---- page->_count declined to zero.
->__free_pages()
	-> put_page_testzero() ---- page->_count will be declined again.

We will get a segment fault in put_page_testzero when CONFIG_DEBUG_VM
is on, or return MM with a bad page with wrong _count num.

So let's just release this page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:06 +09:00
Chao Yu
a0acdfe05a f2fs: use inner macro GFP_F2FS_ZERO for simplification
Use inner macro GFP_F2FS_ZERO to instead of GFP_NOFS | __GFP_ZERO for
simplification of code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:05 +09:00
Younger Liu
40e1ebe97d f2fs: replace the debugfs_root with f2fs_debugfs_root
This minor change for the naming conventions of debugfs_root
to avoid any possible conflicts to the other filesystem.

Signed-off-by: Younger Liu <younger.liucn@gmail.com>
Cc: Younger Liu <younger.liucn@gmail.com>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
[Jaegeuk Kim: change the patch name]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:05 +09:00
Younger Liu
c524723ebf f2fs: remove debufs dir if debugfs_create_file() failed
When debugfs_create_file() failed in f2fs_create_root_stats(),
debugfs_root should be remove.

Signed-off-by: Younger Liu <liuyiyang@hisense.com>
Cc: Younger Liu <younger.liucn@gmail.com>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:05 +09:00
Chao Yu
9af0ff1c52 f2fs: readahead contiguous pages for restore_node_summary
If cp has no CP_UMOUNT_FLAG, we will read all pages in whole node segment
one by one, it makes low performance. So let's merge contiguous pages and
readahead for better performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: adjust the new bio operations]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:05 +09:00
Jaegeuk Kim
93dfe2ac51 f2fs: refactor bio-related operations
This patch integrates redundant bio operations on read and write IOs.

1. Move bio-related codes to the top of data.c.
2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read
   bios additionally.
3. Introduce __submit_merged_bio to submit the merged bio.
4. Change f2fs_readpage to f2fs_submit_page_bio.
5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and
   submit_write_page.

Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com >
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:05 +09:00
Jaegeuk Kim
187b5b8b3d f2fs: remove the own bi_private allocation
Previously f2fs allocates its own bi_private data structure all the time even
though we don't use it. But, can we remove this bi_private allocation?

This patch removes such the additional bi_private allocation.

1. Retrieve f2fs_sb_info from its page->mapping->host->i_sb.
 - This removes the usecases of bi_private in end_io.

2. Use bi_private only when we really need it.
 - The bi_private is used only when the checkpoint procedure is conducted.
 - When conducting the checkpoint, f2fs submits a META_FLUSH bio to wait its bio
completion.
 - Since we have no dependancies to remove bi_private now, let's just use
 bi_private pointer as the completion pointer.

Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:05 +09:00
Chao Yu
8f99a946f3 f2fs: convert recover_orphan_inodes to void
The recover_orphan_inodes() returns no error all the time, so we don't need to
check its errors.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: add description]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:05 +09:00
Chao Yu
1069bbf7b9 f2fs: check return value of f2fs_readpage in find_data_page
We should return error if we do not get an updated page in find_date_page
when f2fs_readpage failed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Chao Yu
01d2d1aa06 f2fs: use true and false for boolean variable
The inode_page_locked should be a boolean variable.

struct dnode_of_data {
	struct inode *inode;            /* vfs inode pointer */
	struct page *inode_page;        /* its inode page, NULL is possible */
	struct page *node_page;         /* cached direct node page */
	nid_t nid;                      /* node id of the direct node block */
	unsigned int ofs_in_node;       /* data offset in the node page */
==>	bool inode_page_locked;         /* inode page is locked or not */
	block_t data_blkaddr;           /* block address of the node block */
};

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: add description]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Chao Yu
aac44046a2 f2fs: correct type of wait in struct bio_private
The void *wait in bio_private is used for waiting completion of checkpoint bio.
So we don't need to use its type as void, but declare it as completion type.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: add description]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Chao Yu
6947eea957 f2fs: avoid to calculate incorrect max orphan number
Because we will write node summaries when do_checkpoint with umount flag,
our number of max orphan blocks should minus NR_CURSEG_NODE_TYPE additional.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Shu Tan <shu.tan@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Chao Yu
a66c7b2fcf f2fs: remove unneeded code in punch_hole
Because FALLOC_FL_PUNCH_HOLE flag must be ORed with FALLOC_FL_KEEP_SIZE
in fallocate, so we could remove the useless 'keep size' branch code which
will never be excuted in punch_hole.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Fan Li <fanofcode.li@samsung.com>
[Jaegeuk Kim: remove an unnecessary parameter togather]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Jaegeuk Kim
031fa8cc9b f2fs: remove unnecessary condition checks
This patch removes the unnecessary condition checks on:

fs/f2fs/gc.c:667 do_garbage_collect() warn: 'sum_page' isn't an ERR_PTR
fs/f2fs/f2fs.h:795 f2fs_put_page() warn: 'page' isn't an ERR_PTR

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Jaegeuk Kim
f9a4e6df52 f2fs: bug fix on bit overflow from 32bits to 64bits
This patch fixes some bit overflows by the shift operations.

Dan Carpenter reported potential bugs on bit overflows as follows.

fs/f2fs/segment.c:910 submit_write_page()
	warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
fs/f2fs/checkpoint.c:429 get_valid_checkpoint()
	warn: should '1 << ()' be a 64 bit type?
fs/f2fs/data.c:408 f2fs_readpage()
	warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
fs/f2fs/data.c:457 submit_read_page()
	warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
fs/f2fs/data.c:525 get_data_block_ro()
	warn: should 'i << blkbits' be a 64 bit type?

Bug-Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Gu Zheng
3679556794 f2fs: fix a potential out of range issue
Fix a potential out of range issue introduced by commit:
22fb72225a
f2fs: simplify write_orphan_inodes for better readable

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:04 +09:00
Jaegeuk Kim
0e80220ac5 f2fs: remove unnecessary return value
Let's remove the unnecessary return value.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:03 +09:00
Huajun Li
8274de77b7 f2fs: add a new mount option: inline_data
Add a mount option: inline_data. If the mount option is set,
data of New created small files can be stored in their inode.

Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Weihong Xu <weihong.xu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:03 +09:00
Huajun Li
1001b3479c f2fs: add flags and helpers to support inline data
Add new inode flags F2FS_INLINE_DATA and FI_INLINE_DATA to indicate
whether the inode has inline data.

Inline data makes use of inode block's data indices region to save small
file. Currently there are 923 data indices in an inode block. Since
inline xattr has made use of the last 50 indices to save its data, there
are 873 indices left which can be used for inline data. When
FI_INLINE_DATA is set, the layout of inode block's indices region is
like below:

+-----------------+
|                 | Reserved. reserve_new_block() will make use of
| i_addr[0]       | i_addr[0] when we need to reserve a new data block
|                 | to convert inline data into regular one's.
|-----------------|
|                 | Used by inline data. A file whose size is less than
| i_addr[1~872]   | 3488 bytes(~3.4k) and doesn't reserve extra
|                 | blocks by fallocate() can be saved here.
|-----------------|
|                 |
| i_addr[873~922] | Reserved for inline xattr
|                 |
+-----------------+

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Weihong Xu <weihong.xu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:03 +09:00
Changman Lee
03232305ff f2fs: send REQ_META or REQ_PRIO when reading meta area
Let's send REQ_META or REQ_PRIO when reading meta area such as NAT/SIT
etc.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:03 +09:00
Jaegeuk Kim
a709f4a2f2 f2fs: add detailed information of bio types in the tracepoints
This patch inserts information of bio types in more detail.
So, we can now see REQ_META and REQ_PRIO too.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:03 +09:00
Huajun Li
b600965c43 f2fs: add a new function: f2fs_reserve_block()
Add the function f2fs_reserve_block() to easily reserve new blocks, and
use it to clean up more codes.

Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Weihong Xu <weihong.xu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:03 +09:00
Jaegeuk Kim
0daaad97dc f2fs: avoid lock debugging overhead
If CONFIG_F2FS_CHECK_FS is unset, we don't need to add any debugging overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:03 +09:00
Chao Yu
74de593af7 f2fs: read contiguous sit entry pages by merging for mount performance
Previously we read sit entries page one by one, this method lost the chance
of reading contiguous page together. So we read pages as contiguous as
possible for better mount performance.

change log:
 o merge judgements/use 'Continue' or 'Break' instead of 'Goto' as Gu Zheng
   suggested.
 o add mark_page_accessed() before release page to delay VM reclaiming.
 o remove '*order' for simplification of function as Jaegeuk Kim suggested.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix a bug on the block address calculation]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:02 +09:00
Chao Yu
d4d288bc72 f2fs: adds a tracepoint for f2fs_submit_read_bio
This patch adds a tracepoint for f2fs_submit_read_bio.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_bio]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:02 +09:00
Chao Yu
87b8872d5b f2fs: adds a tracepoint for submit_read_page
This patch adds a tracepoint for submit_read_page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_page]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:02 +09:00
Changman Lee
61ae45c880 f2fs: simplify IS_DATASEG and IS_NODESEG macro
It is not efficient comparing each segment type to find node or data.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: remove unnecessary white spaces]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:02 +09:00
Jaegeuk Kim
7107e0a9b1 f2fs: merge read IOs at ra_nat_pages()
Change log from v1:
  o add mark_page_accessed() not to reclaim the nat pages.

This patch changes the policy of submitting read bios at ra_nat_pages.

Previously, f2fs submits small read bios with block plugging.
But, with this patch, f2fs itself merges read bios first and then submits a
large bio, which can reduce the bio handling overheads.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:02 +09:00
Chao Yu
924b720b58 f2fs: add a new function to support for merging contiguous read
For better read performance, we add a new function to support for merging
contiguous read as the one for write.

v1-->v2:
 o add declarations here as Gu Zheng suggested.
 o use new structure f2fs_bio_info introduced by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Acked-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
2013-12-23 10:18:02 +09:00
Gu Zheng
ce3b7d80ed f2fs: move the list_head initialization into the lock protection region
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:02 +09:00
Gu Zheng
502c6e0bcd f2fs: simplify write_orphan_inodes for better readable
Simplify write_orphan_inodes for better readable. Because we hold the
orphan_inode_mutex, so it's safe to use list_for_each_entry instead of
list_for_each_safe.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:01 +09:00
Gu Zheng
ef86d70994 f2fs: convert inc/dec_valid_node_count to inc/dec one count
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:01 +09:00
Gu Zheng
da19b0dc50 f2fs: convert dev_valid_block_count to void
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:01 +09:00
Gu Zheng
58e674d6ab f2fs: convert remove_inode_page to void
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:01 +09:00
Jaegeuk Kim
1ff7bd3bb5 f2fs: introduce a bio array for per-page write bios
The f2fs has three bio types, NODE, DATA, and META, and manages some data
structures per each bio types.

The codes are a little bit messy, thus, this patch introduces a bio array
which groups individual data structures as follows.

struct f2fs_bio_info {
	struct bio *bio;		/* bios to merge */
	sector_t last_block_in_bio;	/* last block number */
	struct mutex io_mutex;		/* mutex for bio */
};

struct f2fs_sb_info {
	...
	struct f2fs_bio_info write_io[NR_PAGE_TYPE];	/* for write bios */
	...
};

The code changes from this new data structure are trivial.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:01 +09:00
Jaegeuk Kim
c11abd1a80 f2fs: disable the extent cache ops on high fragmented files
The f2fs manages an extent cache to search a number of consecutive data blocks
very quickly.

However it conducts unnecessary cache operations if the file is highly
fragmented with no valid extent cache.

In such the case, we don't need to handle the extent cache, but just can disable
the cache facility.

Nevertheless, this patch gives one more chance to enable the extent cache.

For example,
1. create a file
2. write data sequentially which produces a large valid extent cache
3. update some data, resulting in a fragmented extent
4. if the fragmented extent is too small, then drop extent cache
5. close the file

6. open the file again
7. give another chance to make a new extent cache
8. write data sequentially again which creates another big extent cache.
...

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:01 +09:00
Jaegeuk Kim
971767caf6 f2fs: use sbi->write_mutex for write bios
This patch removes an unnecessary semaphore (i.e., sbi->bio_sem).
There is no reason to use the semaphore when f2fs submits read and write IOs.
Instead, let's use a write mutex and cover the sbi->bio[] by the lock.

Change log from v1:
 o split write_mutex suggested by Chao Yu

Chao described,
"All DATA/NODE/META bio buffers in superblock is protected by
'sbi->write_mutex', but each bio buffer area is independent, So we
should split write_mutex to three for DATA/NODE/META."

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:01 +09:00
Jaegeuk Kim
7d5e510944 f2fs: clean up the do_submit_bio flow
This patch introduces PAGE_TYPE_OF_BIO() and cleans up do_submit_bio() with it.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:00 +09:00
Chao Yu
75c3c8bc88 f2fs: use f2fs_put_page to release page for uniform style
We should use f2fs_put_page to release page for uniform style of f2fs code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:00 +09:00
Jaegeuk Kim
1661d07c2d f2fs: add a tracepoint for f2fs_issue_discard
This patch adds a tracepoint for f2fs_issue_discard.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:00 +09:00
Jaegeuk Kim
3720887910 f2fs: introduce f2fs_issue_discard() to clean up
Change log from v1:
 o fix 32bit drops reported by Dan Carpenter

This patch adds f2fs_issue_discard() to clean up blkdev_issue_discard() flows.

Dan carpenter reported:
"block_t is a 32 bit type and sector_t is a 64 bit type.  The upper 32
bits of the sector_t are not used because the shift will wrap."

Bug-Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:00 +09:00
Jaegeuk Kim
7ac8c3b051 f2fs: add a sysfs entry to control max_discards
If frequent small discards are issued to the device, the performance would
be degraded significantly.
So, this patch adds a sysfs entry to control the number of discards to be
issued during a checkpoint procedure.

By default, f2fs does not issue any small discards, which means max_discards
is zero.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:00 +09:00
Jaegeuk Kim
b29555505d f2fs: add key functions for small discards
This patch adds key functions to activate the small discard feature.

Note that this procedure is conducted during the checkpoint only.

In flush_sit_entries(), when a new dirty sit entry is flushed, f2fs calls
add_discard_addrs() which searches candidates to be discarded.
The candidates should be marked *invalidated* and also previous checkpoint
recognizes it as *valid*.

At the end of a checkpoint procedure, f2fs throws discards based on the
discard entry list.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:00 +09:00
Jaegeuk Kim
7fd9e544fb f2fs: add a slab cache entry for small discards
This patch adds a slab cache entry for small discards.

Each entry consists of:

struct discard_entry {
	struct list_head list;	/* list head */
	block_t blkaddr;	/* block address to be discarded */
	int len;		/* # of consecutive blocks of the discard */
};

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:00 +09:00
Changman Lee
e81c93cf8c f2fs: improve searching speed of __next_free_blkoff
To find a zero bit using the result of OR operation between ckpt_valid_map
and cur_valid_map is more fast than find a zero bit in each bitmap.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: adjust changed function name]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:17:59 +09:00
Changman Lee
9a7f143ab5 f2fs: introduce __find_rev_next(_zero)_bit
When f2fs_set_bit is used, in a byte MSB and LSB is reversed,
in that case we can use __find_rev_next_bit or __find_rev_next_zero_bit.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: change the function names]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:17:59 +09:00
Kent Overstreet
4f024f3797 block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23 22:33:47 -08:00
Kent Overstreet
2c30c71bd6 block: Convert various code to bio_for_each_segment()
With immutable biovecs we don't want code accessing bi_io_vec directly -
the uses this patch changes weren't incorrect since they all own the
bio, but it makes the code harder to audit for no good reason - also,
this will help with multipage bvecs later.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-23 22:33:46 -08:00
Changman Lee
29e59c14ae f2fs: issue more large discard command
o Changes from v1
  Use find_next(_zero)_bit suggested by jg.kim

When f2fs issues discard command, if segment is contiguous,
let's issue more large segment to gather adjacent segments.

** blktrace **
179,1    0     5859    42.619023770   971  C   D 131072 + 2097152 [0]
179,1    0    33665   108.840475468   971  C   D 2228224 + 2494464 [0]
179,1    0    33671   109.131616427   971  C   D 14909440 + 344064 [0]
179,1    0    33677   109.137100677   971  C   D 15261696 + 4096 [0]

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-11 09:36:32 +09:00
Chao Yu
1d15bd2034 f2fs: fix memory leak after kobject init failed in fill_super
If we failed to init&add kobject when fill_super, stats info and proc object of
f2fs will not be released.
We should free them before we finish fill_super.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-08 14:10:29 +09:00
Changman Lee
fb51b5ef9c f2fs: cleanup waiting routine for writeback pages in cp
use genernal method supported by kernel

 o changes from v1
   If any waiter exists at end io, wake up it.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-08 14:10:29 +09:00
Chao Yu
3b03f72445 f2fs: avoid to use a NULL point in destroy_segment_manager
A NULL point should avoid to be used in destroy_segment_manager after allocating
memory fail for f2fs_sm_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-06 16:37:44 +09:00
Chao Yu
4bf08ff6f9 f2fs: remove unnecessary TestClearPageError when wait pages writeback
In wait_on_node_pages_writeback we will test and clear error flag for all
pages in radix tree, but not necessary.
So we only do this for pages belong to the specified inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-04 12:24:01 +09:00
Jaegeuk Kim
cfe58f9dcd f2fs: avoid to wait all the node blocks during fsync
Previously, f2fs_sync_file() waits for all the node blocks to be written.
But, we don't need to do that, but wait only the inode-related node blocks.

This patch adds wait_on_node_pages_writeback() in which waits inode-related
node blocks that are on writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-31 16:01:03 +09:00
Chao Yu
44c60bf2b9 f2fs: check all ones or zeros bitmap with bitops for better mount performance
Previously, check_block_count check valid_map with bit data type in common
scenario that sit has all ones or zeros bitmap, it makes low mount performance.
So let's check the special bitmap with integer data type instead of the bit one.

v1-->v2:
 o use find_next_{zero_}bit_le for better performance and readable as Jaegeuk
   suggested.
 o use neat logogram in comment as Gu Zheng suggested.
 o search continuous ones or zeros for better performance when checking mixed
   bitmap.

Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Shu Tan <shu.tan@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-30 12:23:23 +09:00
Fan Li
9a47938b22 f2fs: change the method of calculating the number summary blocks
npages_for_summary_flush uses (SUMMARY_SIZE + 1) as the size of a f2fs_summary
while its actual size is  SUMMARY_SIZE. So the result sometimes is bigger than
actual number by one, which causes checkpoint can't be written into disk
contiguously, and sometimes summary blocks can't be compacted like they should.
Besides, when writing summary blocks into pages, if remain space in a page
isn't big enough for one f2fs_summary, it will be left unused, current code
seems not to take it into account.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-30 12:17:58 +09:00
Chao Yu
cc3de6a3ac f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr
During xattr updating, free size should be corrected to remainder free size
+ old entry size.
It can avoid ENOSPC error when we update old entry with the same size new
entry at fully filled xattr.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-29 15:56:08 +09:00
Jaegeuk Kim
5d56b6718a f2fs: add an option to avoid unnecessary BUG_ONs
If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS
in your kernel config.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-29 15:44:38 +09:00
Jaegeuk Kim
3b218e3a21 f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control
This config will support an option to remove so many BUG_ONs that degrade
the performance potentially.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-29 15:43:01 +09:00
Jaegeuk Kim
2ed2d5b33c f2fs: fix a deadlock during init_acl procedure
The deadlock is found through the following scenario.

sys_mkdir()
 -> f2fs_add_link()
  -> __f2fs_add_link()
   -> init_inode_metadata()
     : lock_page(inode);
    -> f2fs_init_acl()
     -> f2fs_set_acl()
      -> f2fs_setxattr(..., NULL)
       : This NULL page incurs a deadlock at update_inode_page().

So, likewise f2fs_init_security(), this patch adds a parameter to transfer the
locked inode page to f2fs_setxattr().

Found by Linux File System Verification project (linuxtesting.org).

Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-28 13:39:09 +09:00
Jaegeuk Kim
b8b60e1a65 f2fs: clean up acl flow for better readability
This patch cleans up a couple of acl codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-28 13:38:21 +09:00
Changman Lee
4625d6aac2 f2fs: remove unnecessary segment bitmap updates
Only one dirty type is set in __locate_dirty_segment and we can know
dirty type of segment. So we don't need to check other dirty types.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-28 13:38:16 +09:00
Jaegeuk Kim
e943a10d94 f2fs: add tracepoint for vm_page_mkwrite
This patch adds a tracepoint for f2fs_vm_page_mkwrite.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:40 +09:00
Jaegeuk Kim
26c6b88799 f2fs: add tracepoint for set_page_dirty
This patch adds a tracepoint for set_page_dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:40 +09:00
Chao Yu
e8d61a7488 f2fs: remove redundant set_page_dirty from write_compacted_summaries
Previously, set_page_dirty is called every time after writting one summary info
into compacted summary page,
To avoid redundant set_page_dirty, we only call set_page_dirty before release
page.

Signed-off-by: Yu Chao <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:39 +09:00
Jaegeuk Kim
ea91e9b043 f2fs: add reclaiming control by sysfs
This patch adds a control method in sysfs to reclaim prefree segments.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:39 +09:00
Jaegeuk Kim
4660f9c0fe f2fs: introduce f2fs_balance_fs_bg for some background jobs
This patch merges some background jobs into this new function.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:38 +09:00