Commit Graph

289 Commits

Author SHA1 Message Date
Jaegeuk Kim
36abef4e79 f2fs: introduce mode=lfs mount option
This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.

Especially, this supports host-managed SMR device.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-13 11:55:21 -07:00
Jaegeuk Kim
9a449e9c3b f2fs: remove obsolete parameter in f2fs_truncate
We don't need lock parameter, which is always true.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-07 09:45:39 -07:00
Jaegeuk Kim
26de9b1171 f2fs: avoid unnecessary updating inode during fsync
If roll-forward recovery can recover i_size, we don't need to update inode's
metadata during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:13 -07:00
Jaegeuk Kim
ee6d182f2a f2fs: remove syncing inode page in all the cases
This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()

Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:12 -07:00
Jaegeuk Kim
205b98221c f2fs: call mark_inode_dirty_sync for i_field changes
This patch calls mark_inode_dirty_sync() for the following on-disk inode
changes.

 -> largest
 -> ctime/mtime/atime
 -> i_current_depth
 -> i_xattr_nid
 -> i_pino
 -> i_advise
 -> i_flags
 -> i_mode

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:11 -07:00
Jaegeuk Kim
fc9581c809 f2fs: introduce f2fs_i_size_write with mark_inode_dirty_sync
This patch introduces f2fs_i_size_write() to call mark_inode_dirty_sync() with
i_size_write().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:08 -07:00
Jaegeuk Kim
91942321e4 f2fs: use inode pointer for {set, clear}_inode_flag
This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-06-02 18:05:07 -07:00
Linus Torvalds
f6c658df63 Enhancement
- fs-specific prefix for fscrypto
 - fault injection facility
 - expose validity bitmaps for user to be aware of fragmentation
 - fallocate/rm/preallocation speed up
 - use percpu counters
 
 Bug fixes
 - some inline_dentry/inline_data bugs
 - error handling for atomic/volatile/orphan inodes
 - recover broken superblock
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXQPu4AAoJEEAUqH6CSFDSILgP/1dj6fmtytr8c+55EBqXUGpo
 M7rS93JTxlmU5BduIo9psJsEquTQoVEmxB/Gjd+ZnI5R6Rp1c/REaP0ba374rEhZ
 ecMQh5QqzM1gRNFXrQhWFEL/KtfRqt3T80zebQP7pxFUm/m9NGMLWT43RzQ8AAhr
 Y3P0NLdvxA4HAnipKptkPJcGZQlWnL9W/MR+LgsXLXqLDwJHkVu61GcF0y2ibcJM
 lEtIRmyH5tg7hP5c5LTw9pKQFHkIZt5cHFLjrJ1x8FSm2TXOcJPbjOrThvcb+NKK
 e0O+6R0meH2eMpak+BTkZp2YbPPyXOb1N00j//lmbPjCoJPd4ZuiJ+oRoHUlTxtU
 FhO67t0brlDbMFQVRFrtv8VA8M6by+DTAAP3Ffx62I/TJkphKANCSoyQRhlWtxxO
 kRU69N7ipnRNxO4WCv40FjaQjSIElCKysP1POazRmAOQm7UFTGT9Nj37+eqUcEPJ
 HZ7O61DEHNemb0SMlJ8WSClstt0yUU+2cjRfTPAr2Wd3V8gYbRs0QUg5M2GLgywR
 EmiJfpkXse3f/nR8W6g1hganSOXA0AZX+EUibed6VkV3oYemdFbm8OymeEmLmWpM
 y2F3D7dPLW7MCoTXJqtwFWdoDwI+zkH4rJaPGTq5TVBRWVU/njX8OvoB47pOvKV1
 kccL7zv2PekE1hSDO5WF
 =6MSp
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, as Ted pointed out, fscrypto allows one more key prefix
  given by filesystem to resolve backward compatibility issues.  Other
  than that, we've fixed several error handling cases by introducing
  a fault injection facility.  We've also achieved performance
  improvement in some workloads as well as a bunch of bug fixes.

  Summary:

  Enhancements:
   - fs-specific prefix for fscrypto
   - fault injection facility
   - expose validity bitmaps for user to be aware of fragmentation
   - fallocate/rm/preallocation speed up
   - use percpu counters

  Bug fixes:
   - some inline_dentry/inline_data bugs
   - error handling for atomic/volatile/orphan inodes
   - recover broken superblock"

* tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (73 commits)
  f2fs: fix to update dirty page count correctly
  f2fs: flush pending bios right away when error occurs
  f2fs: avoid ENOSPC fault in the recovery process
  f2fs: make exit_f2fs_fs more clear
  f2fs: use percpu_counter for total_valid_inode_count
  f2fs: use percpu_counter for alloc_valid_block_count
  f2fs: use percpu_counter for # of dirty pages in inode
  f2fs: use percpu_counter for page counters
  f2fs: use bio count instead of F2FS_WRITEBACK page count
  f2fs: manipulate dirty file inodes when DATA_FLUSH is set
  f2fs: add fault injection to sysfs
  f2fs: no need inc dirty pages under inode lock
  f2fs: fix incorrect error path handling in f2fs_move_rehashed_dirents
  f2fs: fix i_current_depth during inline dentry conversion
  f2fs: correct return value type of f2fs_fill_super
  f2fs: fix deadlock when flush inline data
  f2fs: avoid f2fs_bug_on during recovery
  f2fs: show # of orphan inodes
  f2fs: support in batch fzero in dnode page
  f2fs: support in batch multi blocks preallocation
  ...
2016-05-21 18:25:28 -07:00
Andy Shevchenko
8da4b8c48e lib/uuid.c: move generate_random_uuid() to uuid.c
Let's gather the UUID related functions under one hood.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Jaegeuk Kim
38f91ca8c0 f2fs: flush pending bios right away when error occurs
Given errors, this patch flushes pending bios as soon as possible.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-20 11:46:15 -07:00
Jaegeuk Kim
1beba1b3a9 f2fs: use percpu_counter for # of dirty pages in inode
This patch adds percpu_counter for # of dirty pages in inode.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-18 13:57:28 -07:00
Linus Torvalds
c2e7b20705 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs cleanups from Al Viro:
 "More cleanups from Christoph"

* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd: use RWF_SYNC
  fs: add RWF_DSYNC aand RWF_SYNC
  ceph: use generic_write_sync
  fs: simplify the generic_write_sync prototype
  fs: add IOCB_SYNC and IOCB_DSYNC
  direct-io: remove the offset argument to dio_complete
  direct-io: eliminate the offset argument to ->direct_IO
  xfs: eliminate the pos variable in xfs_file_dio_aio_write
  filemap: remove the pos argument to generic_file_direct_write
  filemap: remove pos variables in generic_file_read_iter
2016-05-17 15:05:23 -07:00
Chao Yu
6e9619499f f2fs: support in batch fzero in dnode page
This patch tries to speedup fzero_range by making space preallocation and
address removal of blocks in one dnode page as in batch operation.

In virtual machine, with zram driver:

dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=4096
time xfs_io -f /mnt/f2fs/file -c "fzero 0 4096M"

Before:
real	0m3.276s
user	0m0.008s
sys	0m3.260s

After:
real	0m1.568s
user	0m0.000s
sys	0m1.564s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: consider ENOSPC case]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-11 09:56:36 -07:00
Chao Yu
0fac558b96 f2fs: make atomic/volatile operation exclusive
atomic/volatile ioctl interfaces are exposed to user like other file
operation interface, it needs to make them getting exclusion against
to each other to avoid potential conflict among these operations
in concurrent scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-11 09:56:34 -07:00
Chao Yu
7fb17fe44b f2fs: use mnt_{want,drop}_write_file in ioctl
In interfaces of ioctl, mnt_{want,drop}_write_file should be used for:
- get exclusion against file system freezing which may used by lvm
  snapshot.
- do telling filesystem that a write is about to be performed on it, and
  make sure that the writes are permitted.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-11 09:56:32 -07:00
Jaegeuk Kim
79344efb93 f2fs: read node blocks ahead when truncating blocks
This patch enables reading node blocks in advance when truncating large
data blocks.

 > time rm $MNT/testfile (500GB) after drop_cachees
Before : 9.422 s
After  : 4.821 s

Reported-by: Stephen Bates <stephen.bates@microsemi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:44:56 -07:00
Jaegeuk Kim
e12dd7bd87 f2fs: fallocate data blocks in single locked node page
This patch is to improve the expand_inode speed in fallocate by allocating
data blocks as many as possible in single locked node page.

In SSD,
 # time fallocate -l 500G $MNT/testfile

Before : 1m 33.410 s
After  : 24.758 s

Reported-by: Stephen Bates <stephen.bates@microsemi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:44:55 -07:00
Chao Yu
09210c973a f2fs: avoid panic when truncating to max filesize
The following panic occurs when truncating inode which has inline
xattr to max filesize.

[<ffffffffa013d3be>] get_dnode_of_data+0x4e/0x580 [f2fs]
[<ffffffffa013aca1>] ? read_node_page+0x51/0x90 [f2fs]
[<ffffffffa013ad99>] ? get_node_page.part.34+0xb9/0x170 [f2fs]
[<ffffffffa01235b1>] truncate_blocks+0x131/0x3f0 [f2fs]
[<ffffffffa01238e3>] f2fs_truncate+0x73/0x100 [f2fs]
[<ffffffffa01239d2>] f2fs_setattr+0x62/0x2a0 [f2fs]
[<ffffffff811a72c8>] notify_change+0x158/0x300
[<ffffffff8118a42b>] do_truncate+0x6b/0xa0
[<ffffffff8118e539>] ? __sb_start_write+0x49/0x100
[<ffffffff8118a798>] do_sys_ftruncate.constprop.12+0x118/0x170
[<ffffffff8118a82e>] SyS_ftruncate+0xe/0x10
[<ffffffff8169efcf>] tracesys+0xe1/0xe6
[<ffffffffa0139ae0>] get_node_path+0x210/0x220 [f2fs]
 <ffff880206a89ce8>
--[ end trace 5fea664dfbcc6625 ]---

The reason is truncate_blocks tries to truncate all node and data blocks
start from specified block offset with value of (max filesize / block
size), but actually, our valid max block offset is (max filesize / block
size) - 1, so f2fs detects such invalid block offset with BUG_ON in
truncation path.

This patch lets f2fs skip truncating data which is exceeding max
filesize.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-05-07 10:32:33 -07:00
Christoph Hellwig
e259221763 fs: simplify the generic_write_sync prototype
The kiocb already has the new position, so use that.  The only interesting
case is AIO, where we currently don't bother updating ki_pos.  We're about
to free the kiocb after we're done, so we might as well update it to make
everyone's life simpler.

While we're at it also return the bytes written argument passed in if
we were successful so that the boilerplate error switch code in the
callers can go away.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01 19:58:39 -04:00
Christoph Hellwig
dde0c2e798 fs: add IOCB_SYNC and IOCB_DSYNC
This will allow us to do per-I/O sync file writes, as required by a lot
of fileservers or storage targets.

XXX: Will need a few additional audits for O_DSYNC

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01 19:58:39 -04:00
Jaegeuk Kim
608514deba f2fs: set fsync mark only for the last dnode
In order to give atomic writes, we should consider power failure during
sync_node_pages in fsync.
So, this patch marks fsync flag only in the last dnode block.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:59 -07:00
Jaegeuk Kim
c267ec1526 f2fs: report unwritten status in fsync_node_pages
The fsync_node_pages should return pass or failure so that user could know
fsync is completed or not.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:54 -07:00
Jaegeuk Kim
5268137564 f2fs: split sync_node_pages with fsync_node_pages
This patch splits the existing sync_node_pages into (f)sync_node_pages.
The fsync_node_pages is used for f2fs_sync_file only.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26 14:24:48 -07:00
Jaegeuk Kim
c27753d675 f2fs: flush dirty pages before starting atomic writes
If somebody wrote some data before atomic writes, we should flush them in order
to handle atomic data in a right period.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15 08:50:25 -07:00
Jaegeuk Kim
26dc3d4424 f2fs: unset atomic/volatile flag in f2fs_release_file
The atomic/volatile operation should be done in pair of start and commit
ioctl.
For example, if a killed process remains open-ended atomic operation, we should
drop its flag as well as its atomic data. Otherwise, if sqlite initiates another
operation which doesn't require atomic writes, it will lose every data, since
f2fs still treats with them as atomic writes; nobody will trigger its commit.

Reported-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15 08:49:47 -07:00
Jaegeuk Kim
de5307e46d f2fs: fix dropping inmemory pages in a wrong time
When one reader closes its file while the other writer is doing atomic writes,
f2fs_release_file drops atomic data resulting in an empty commit.
This patch fixes this wrong commit problem by checking openess of the file.

 Process0                       Process1
 				open file
 start atomic write
 write data
 read data
				close file
				f2fs_release_file()
				clear atomic data
 commit atomic write

Reported-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-15 08:49:47 -07:00
Jaegeuk Kim
33b1395124 f2fs: use dget_parent and file_dentry in f2fs_file_open
This patch synced with the below two ext4 crypto fixes together.

In 4.6-rc1, f2fs newly introduced accessing f_path.dentry which crashes
overlayfs. To fix, now we need to use file_dentry() to access that field.

Fixes: c0a37d4878 ("ext4: use file_dentry()")
Fixes: 9dd78d8c9a ("ext4: use dget_parent() in ext4_file_open()")
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-12 10:24:22 -07:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Jaegeuk Kim
8074bb5150 f2fs crypto: sync ext4_lookup and ext4_file_open
This patch tries to catch up with lookup and open policies in ext4.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:42 -07:00
Jaegeuk Kim
0b81d07790 fs crypto: move per-file encryption from f2fs tree to fs/crypto
This patch adds the renamed functions moved from the f2fs crypto files.

1. definitions for per-file encryption used by ext4 and f2fs.

2. crypto.c for encrypt/decrypt functions
 a. IO preparation:
  - fscrypt_get_ctx / fscrypt_release_ctx
 b. before IOs:
  - fscrypt_encrypt_page
  - fscrypt_decrypt_page
  - fscrypt_zeroout_range
 c. after IOs:
  - fscrypt_decrypt_bio_pages
  - fscrypt_pullback_bio_page
  - fscrypt_restore_control_page

3. policy.c supporting context management.
 a. For ioctls:
  - fscrypt_process_policy
  - fscrypt_get_policy
 b. For context permission
  - fscrypt_has_permitted_context
  - fscrypt_inherit_context

4. keyinfo.c to handle permissions
  - fscrypt_get_encryption_info
  - fscrypt_free_encryption_info

5. fname.c to support filename encryption
 a. general wrapper functions
  - fscrypt_fname_disk_to_usr
  - fscrypt_fname_usr_to_disk
  - fscrypt_setup_filename
  - fscrypt_free_filename

 b. specific filename handling functions
  - fscrypt_fname_alloc_buffer
  - fscrypt_fname_free_buffer

6. Makefile and Kconfig

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:33 -07:00
Chao Yu
f28b3434af f2fs: introduce f2fs_update_data_blkaddr for cleanup
Add a new help f2fs_update_data_blkaddr to clean up redundant codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-26 11:52:01 -08:00
Chao Yu
80dd9c0e9d f2fs: fix incorrect upper bound when iterating inode mapping tree
1. Inode mapping tree can index page in range of [0, ULONG_MAX], however,
in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX],
result in miss hitting in page cache.

2. filemap_fdatawait_range accepts range parameters in unit of bytes, so
the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX]
as range for waiting on writeback, big number of pages will not be covered.

This patch corrects above two issues.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-25 17:27:03 -08:00
Chao Yu
ae10866864 f2fs crypto: handle unexpected lack of encryption keys
This patch syncs f2fs with commit abdd438b26 ("ext4 crypto: handle
unexpected lack of encryption keys") from ext4.

Fix up attempts by users to try to write to a file when they don't
have access to the encryption key.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
28bc106b23 f2fs: support revoking atomic written pages
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file

With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.

But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.

So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.

If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
29b96b547e f2fs: split drop_inmem_pages from commit_inmem_pages
Split drop_inmem_pages from commit_inmem_pages for code readability,
and prepare for the following modification.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
8ef2af45ae f2fs: increase i_size to avoid missing data
When finsert is doing with dirting pages, we should increase i_size right away.
Otherwise, the moved page is able to be dropped by the following
filemap_write_and_wait_range before updating i_size.
Especially, it can be done by
	if ((page->index >= end_index + 1) || !offset)
		goto out;
in f2fs_write_data_page.

This should resolve the below xfstests/091 failure reported by Dave.

$ diff -u tests/generic/091.out /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad
--- tests/generic/091.out       2014-01-20 16:57:33.000000000 +1100
+++ /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad       2016-02-08 15:21:02.701375087 +1100
@@ -1,7 +1,18 @@
 QA output created by 091
 fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
+mapped writes DISABLED
+skipping insert range behind EOF
+skipping insert range behind EOF
+truncating to largest ever: 0x11e00
+dowrite: write: Invalid argument
+LOG DUMP (7 total operations):
+1(  1 mod 256): SKIPPED (no operation)
+2(  2 mod 256): SKIPPED (no operation)
+3(  3 mod 256): FALLOC   0x2e0f2 thru 0x3134a  (0x3258 bytes) PAST_EOF
+4(  4 mod 256): SKIPPED (no operation)
+5(  5 mod 256): SKIPPED (no operation)
+6(  6 mod 256): TRUNCATE UP    from 0x0 to 0x11e00
+7(  7 mod 256): WRITE    0x73400 thru 0x79fff  (0x6c00 bytes) HOLE
+Log of operations saved to "/mnt/test/junk.fsxops"; replay with --replay-ops
+Correct content saved for comparison
+(maybe hexdump "/mnt/test/junk" vs "/mnt/test/junk.fsxgood")

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
b439b103a6 f2fs: move dio preallocation into f2fs_file_write_iter
This patch moves preallocation code for direct IOs into f2fs_file_write_iter.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
da85985c61 f2fs: speed up handling holes in fiemap
This patch makes f2fs_map_blocks supporting returning next potential
page offset which skips hole region in indirect tree of inode, and
use it to speed up fiemap in handling big hole case.

Test method:
xfs_io -f /mnt/f2fs/file  -c "pwrite 1099511627776 4096"
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"

Before:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    3m3.518s
user    0m0.000s
sys     3m3.456s

After:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    0m0.008s
user    0m0.000s
sys     0m0.008s

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
3cf4574705 f2fs: introduce get_next_page_offset to speed up SEEK_DATA
When seeking data in ->llseek, if we encounter a big hole which covers
several dnode pages, we will try to seek data from index of page which
is the first page of next dnode page, at most we could skip searching
(ADDRS_PER_BLOCK - 1) pages.

However it's still not efficient, because if our indirect/double-indirect
pointer are NULL, there are no dnode page locate in the tree indirect/
double-indirect pointer point to, it's not necessary to search the whole
region.

This patch introduces get_next_page_offset to calculate next page offset
based on current searching level and max searching level returned from
get_dnode_of_data, with this, we could skip searching the entire area
indirect or double-indirect node block is not exist.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Chao Yu
81ca7350ce f2fs: remove unneeded pointer conversion
There are redundant pointer conversion in following call stack:
 - at position a, inode was been converted to f2fs_file_info.
 - at position b, f2fs_file_info was been converted to inode again.

 - truncate_blocks(inode,..)
  - fi = F2FS_I(inode)		---a
  - ADDRS_PER_PAGE(node_page, fi)
   - addrs_per_inode(fi)
    - inode = &fi->vfs_inode	---b
    - f2fs_has_inline_xattr(inode)
     - fi = F2FS_I(inode)
     - is_inode_flag_set(fi,..)

In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Jaegeuk Kim
fec1d6576c f2fs: use wait_for_stable_page to avoid contention
In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22 16:07:23 -08:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Jaegeuk Kim
447135a866 f2fs: should unset atomic flag after successful commit
If there is an error during commit, we should keep the flag in order to
abort it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:56:44 -08:00
Jaegeuk Kim
d0239e1bf5 f2fs: detect idle time depending on user behavior
This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:56:37 -08:00
Jaegeuk Kim
2c4db1a6f6 f2fs: clean up f2fs_balance_fs
This patch adds one parameter to clean up all the callers of f2fs_balance_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:23 -08:00
Jaegeuk Kim
2a4b8e9fab f2fs: remove redundant calls
This patch removes redundant calls.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:23 -08:00
Jaegeuk Kim
c46a155bdf f2fs: use IPU for fdatasync
This patch fixes missing IPU condition when fdatasync is called.
With this patch, fdatasync is able to avoid additional node writes for recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 13:49:17 -08:00
Jaegeuk Kim
732d56489f f2fs: fix f2fs_ioc_abort_volatile_write
There are two rules to handle aborting volatile or atomic writes.

1. drop atomic writes
 - we don't need to keep any stale db data.

2. write journal data
 - we should keep the journal data with fsync for db recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:53:25 -08:00
Chao Yu
e96248bb45 f2fs: clean up f2fs_ioc_write_checkpoint
Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:18 -08:00
Chao Yu
6d5a1495ee f2fs: let user being aware of IO error
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.

This sould be avoided, so this patch reports such kind of error to user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:15 -08:00
Chao Yu
c34f42e2cb f2fs: report error of do_checkpoint
do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Jaegeuk Kim
2a34076070 f2fs: call f2fs_balance_fs only when node was changed
If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Jaegeuk Kim
b9d777b85f f2fs: check inline_data flag at converting time
We can check inode's inline_data flag  when calling to convert it.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:05 -08:00
Chao Yu
e8240f656d f2fs: don't grab super block buffer header all the time
We have already got one copy of valid super block in memory, do not grab
buffer header of super block all the time.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:06 -08:00
Fan Li
a1c1e9b74f f2fs: fix to reset variable correctlly
f2fs_map_blocks will set m_flags and m_len to 0, so we don't need to
reset m_flags ourselves, but have to reset m_len to correct value
before use it again.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16 08:58:04 -08:00
Chao Yu
a49324f127 f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry
remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic
function in following patch for removing directory/regular/symlink inode
in global dirty list.

Here rename ino management related functions for readability, also in
order to avoid name conflict.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-15 13:23:43 -08:00
Fan Li
e8931582bd f2fs: fix to update variable correctly when skip a unmapped block
map.m_len should be reduced after skip a block

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-14 10:17:54 -08:00
Fan Li
d8fe4f0e74 f2fs: write only the pages in range during defragment
@lend of filemap_write_and_wait_range is supposed to be a "offset
in bytes where the range ends (inclusive)". Subtract 1 to avoid
writing an extra page.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-14 10:17:54 -08:00
Jaegeuk Kim
ea212a4a7a f2fs: use lock_buffer when changing superblock
When modifying sb contents, we need to use lock its buffer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-09 09:53:18 -08:00
Chao Yu
0cab80ee0c f2fs: fix to convert inline inode in ->setattr
In commit 3c45414527 ("f2fs: do not trim preallocated blocks when
truncating after i_size"), in order to follow the regulation: "truncate(x)
where x > i_size will not trim all blocks past i_size." like other file
systems, in ->setattr we invoked truncate_setsize instead of f2fs_truncate
to avoid unneeded block trimming in such case, but forgot to call
f2fs_convert_inline_inode keep consistency of inline data conversion rule.

This patch fixes to convert inline data if necessary.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 13:14:44 -08:00
Chao Yu
3519e3f992 f2fs: use sbi->blocks_per_seg to avoid unnecessary calculation
Use sbi->blocks_per_seg directly to avoid unnecessary calculation when using
1 << sbi->log_blocks_per_seg.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 12:07:57 -08:00
Chao Yu
04ef4b626c f2fs: fix to enable missing ioctl interfaces in ->compat_ioctl
In 64-bit kernel f2fs can supports 32-bit ioctl system call by identifying
encoded code which is converted from 32-bit one to 64-bit one in
->compat_ioctl.

When we introduced new interfaces in ->ioctl, we forgot to enable them in
->compat_ioctl, so enable them for fixing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix wrongly added spaces together]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:34 -08:00
Chao Yu
d323d005ac f2fs: support file defragment
This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file
defragment in a specified range of regular file.

This ioctl can be used in very limited workload: if user expects high
sequential read performance in randomly written file, this interface
can be used for defragmentation, after that file can be written as
continuous as possible in the device.

Meanwhile, it has side-effect, it will make holes in segments where
blocks located originally, so it's better to trigger GC to eliminate
fragment in segments.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04 11:52:33 -08:00
Jaegeuk Kim
d7b8b384b0 f2fs: fix leakage of inmemory atomic pages
If we got failure during commit_atomic_write, abort_volatile_write will be
called, but will not drop the inmemory pages due to no FI_ATOMIC_FILE.
Actually, there is no reason to check the flag in abort_volatile_write.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-21 19:04:17 -07:00
Chao Yu
08b39fbd59 f2fs crypto: fix racing of accessing encrypted page among
different competitors

Since we use different page cache (normally inode's page cache for R/W
and meta inode's page cache for GC) to cache the same physical block
which is belong to an encrypted inode. Writeback of these two page
cache should be exclusive, but now we didn't handle writeback state
well, so there may be potential racing problem:

a)
kworker:				f2fs_gc:
 - f2fs_write_data_pages
  - f2fs_write_data_page
   - do_write_data_page
    - write_data_page
     - f2fs_submit_page_mbio
(page#1 in inode's page cache was queued
in f2fs bio cache, and be ready to write
to new blkaddr)
					 - gc_data_segment
					  - move_encrypted_block
					   - pagecache_get_page
					(page#2 in meta inode's page cache
					was cached with the invalid datas
					of physical block located in new
					blkaddr)
					   - f2fs_submit_page_mbio
					(page#1 was submitted, later, page#2
					with invalid data will be submitted)

b)
f2fs_gc:
 - gc_data_segment
  - move_encrypted_block
   - f2fs_submit_page_mbio
(page#1 in meta inode's page cache was
queued in f2fs bio cache, and be ready
to write to new blkaddr)
					user thread:
					 - f2fs_write_begin
					  - f2fs_submit_page_bio
					(we submit the request to block layer
					to update page#2 in inode's page cache
					with physical block located in new
					blkaddr, so here we may read gabbage
					data from new blkaddr since GC hasn't
					writebacked the page#1 yet)

This patch fixes above potential racing problem for encrypted inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-13 09:52:34 -07:00
Jaegeuk Kim
a56c7c6fb3 f2fs: set GFP_NOFS for grab_cache_page
For normal inodes, their pages are allocated with __GFP_FS, which can cause
filesystem calls when reclaiming memory.
This can incur a dead lock condition accordingly.

So, this patch addresses this problem by introducing
f2fs_grab_cache_page(.., bool for_write), which calls
grab_cache_page_write_begin() with AOP_FLAG_NOFS.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:38:03 -07:00
Jaegeuk Kim
6e2c64ad7c f2fs: fix SSA updates resulting in corruption
The f2fs_collapse_range and f2fs_insert_range changes the block addresses
directly. But that can cause uncovered SSA updates.
In that case, we need to give up to change the block addresses and do buffered
writes to keep filesystem consistency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:38:02 -07:00
Jaegeuk Kim
c912a8298c f2fs: add F2FS_GOING_DOWN_METAFLUSH to test power-failure
This patch introduces F2FS_GOING_DOWN_METAFLUSH which flushes meta pages like
SSA blocks and then blocks all the writes.
This can be used by power-failure tests.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:37:54 -07:00
Chao Yu
456b88e4d1 f2fs: introduce a new ioctl F2FS_IOC_WRITE_CHECKPOINT
This patch introduce a new ioctl for those users who want to trigger
checkpoint from userspace through ioctl.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:56 -07:00
Chao Yu
d530d4d8e2 f2fs: support synchronous gc in ioctl
This patch drops in batches gc triggered through ioctl, since user
can easily control the gc by designing the loop around the ->ioctl.

We support synchronous gc by forcing using FG_GC in f2fs_gc, so with
it, user can make sure that in this round all blocks gced were
persistent in the device until ioctl returned.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:56 -07:00
Chao Yu
345a6b2ee2 f2fs: fix to update {m,c}time correctly when truncating larger
This patch fixes to update ctime and atime correctly when truncating
larger in ->setattr.

The bug is reported by xfstest generic/313 as below:

generic/313 2s ... - output mismatch (see ./results/generic/313.out.bad)
    --- tests/generic/313.out   2015-08-04 15:28:53.430798882 +0800
    +++ results/generic/313.out.bad   2015-09-28 17:04:27.294278016 +0800
    @@ -1,2 +1,4 @@
     QA output created by 313
     Silence is golden
    +ctime not updated after truncate up
    +mtime not updated after truncate up
    ...
    (Run 'diff -u tests/generic/313.out tests/generic/313.out.bad'  to see the entire diff)
Ran: generic/313
Failures: generic/313
Failed 1 of 1 tests

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
ea58711e88 f2fs: do in batches truncation in truncate_hole
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.

But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.

This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
9edcdabf36 f2fs: fix overflow of size calculation
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:

pgoff_t index =  0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000

So we should cast index with 64-bits type to avoid this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
c998012b0b f2fs: verify file type early in f2fs_fallocate
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Nicholas Krause
538e17e7e9 f2fs: fix incorrect return statement in the function f2fs_ioc_release_volatile_write
This fixes the incorrect return statement at the end of the function
f2fs_ioc_release_volatile_write's body for returning zero as this is
incorrect due to the function call before this return statement to
the function punch_hole being able to fail and we should return this
function's return fail directly in order to signal to callers of the
function f2fs_ioc_release_volatile if a failure arises with this call
to punch_hole fails.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Chao Yu
19b2c30d3c f2fs: update extent tree in batches
This patch introduce a new helper f2fs_update_extent_tree_range which can
do extent mapping update at a specified range.

The main idea is:
1) punch all mapping info in extent node(s) which are at a specified range;
2) try to merge new extent mapping with adjacent node, or failing that,
   insert the mapping into extent tree as a new node.

In order to see the benefit, I add a function for stating time stamping
count as below:

uint64_t rdtsc(void)
{
	uint32_t lo, hi;
	__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
	return (uint64_t)hi << 32 | lo;
}

My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.

truncation path:	update extent cache from truncate_data_blocks_range
non-truncataion path:	update extent cache from other paths
total:			all update paths

a) Removing 128MB file which has one extent node mapping whole range of
file:
1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
2. sync
3. rm /mnt/f2fs/128M

Before:
		total		count		average
truncation:	7651022		32768		233.49

Patched:
		total		count		average
truncation:	3321		33		100.64

b) fsstress:
fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
Test times:		5 times.

Before:
		total		count		average
truncation:	5812480.6	20911.6		277.95
non-truncation:	7783845.6	13440.8		579.12
total:		13596326.2	34352.4		395.79

Patched:
		total		count		average
truncation:	1281283.0	3041.6		421.25
non-truncation:	7355844.4	13662.8		538.38
total:		8637127.4	16704.4		517.06

1) For the updates in truncation path:
 - we can see updating in batches leads total tsc and update count reducing
   explicitly;
 - besides, for a single batched updating, punching multiple extent nodes
   in a loop, result in executing more operations, so our average tsc
   increase intensively.
2) For the updates in non-truncation path:
 - there is a little improvement, that is because for the scenario that we
   just need to update in the head or tail of extent node, new interface
   optimize to update info in extent node directly, rather than removing
   original extent node for updating and then inserting that updated one
   into cache as new node.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-26 11:50:35 -07:00
Chao Yu
b01548919c f2fs: handle f2fs_truncate error correctly
This patch fixes to return error number of f2fs_truncate, so that we
can handle the error correctly in callers.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-24 09:39:56 -07:00
Jaegeuk Kim
2286c0205d f2fs: fix to cover lock_op for update_inode_page
Previously, update_inode_page is not called under f2fs_lock_op.
Instead we should call with f2fs_write_inode.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20 09:00:11 -07:00
Chao Yu
6394328ab8 f2fs: report error of fill_zero
fill_zero can fail due to a lot of reason, but previously we do not handle
its return value, so its callers such as punch_hole/f2fs_zero_range may
report success, but actually can fail because of error occurs inside
fill_zero.

This patch fixes to report correct return value of fill_zero.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-10 12:26:34 -07:00
Jaegeuk Kim
edb27deea7 f2fs: handle error cases in commit_inmem_pages
This patch adds to handle error cases in commit_inmem_pages.
If an error occurs, it stops to write the pages and return the error right
away.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 08:08:15 -07:00
Chao Yu
f4c9c743ac f2fs: convert inline data before set atomic/volatile flag
In f2fs_ioc_start_{atomic,volatile}_write, if we failed in converting
inline data, we will report error to user, but still remain atomic/volatile
flag in inode, it will impact further writes for this file. Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 08:08:13 -07:00
Chao Yu
a5f64b6aa6 f2fs: fix to wait all atomic written pages writeback
This patch fixes the incorrect range (0, LONG_MAX) which is used
in ranged fsync. If we use LONG_MAX as the parameter for indicating
the end of file we want to synchronize, in 32-bits architecture
machine, these datas after 4GB offset may not be persisted in
storage after ->fsync returned.

Here, we alter LONG_MAX to LLONG_MAX to fix this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 08:08:12 -07:00
Chao Yu
55f57d2c42 f2fs: fix double lock in handle_failed_inode
In handle_failed_inode, there is a potential deadlock which can happen
in below call path:

- f2fs_create
 - f2fs_lock_op   down_read(cp_rwsem)
 - f2fs_add_link
  - __f2fs_add_link
   - init_inode_metadata
    - f2fs_init_security    failed
    - truncate_blocks    failed
 - handle_failed_inode
  - f2fs_truncate
   - truncate_blocks(..,true)
					- write_checkpoint
					 - block_operations
					  - f2fs_lock_all  down_write(cp_rwsem)
    - f2fs_lock_op   down_read(cp_rwsem)

So in this path, we pass parameter to f2fs_truncate to make sure
cp_rwsem in truncate_blocks will not be locked again.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 08:08:09 -07:00
Chao Yu
ecbaa4068f f2fs: reduce region of cp_rwsem covered in f2fs_do_collapse
In f2fs_do_collapse, region cp_rwsem covered is large, since it will be
held until all blocks are left shifted, so if we try to collapse small
area at the beginning of large file, checkpoint who want to grab writer's
lock of cp_rwsem will be delayed for long time.

In order to avoid this condition, altering to lock/unlock cp_rwsem each
shift operation.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 08:08:09 -07:00
Chao Yu
5b3391244d f2fs: warm up cold page after mmaped write
With cost-benifit method, background gc will consider old section with
fewer valid blocks as candidate victim, these old blocks in section will
be treated as cold data, and laterly will be moved into cold segment.

But if the gcing page is attached by user through buffered or mmaped
write, we should reset the page as non-cold one, because this page may
have more opportunity for further updating.

So fix to add clearing code for the missed 'mmap' case.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:59 -07:00
Chao Yu
c1c1b58359 f2fs: add new ioctl F2FS_IOC_GARBAGE_COLLECT
When background gc is off, the only way to trigger gc is executing
a force gc in some operations who wants to grab space in disk.

The executing condition is limited: to execute force gc, we should
wait for the time when there is almost no more free section for LFS
allocation. This seems not reasonable for our user who wants to
control triggering gc by himself.

This patch introduces F2FS_IOC_GARBAGE_COLLECT interface for
triggering garbage collection by using ioctl. It provides our users
one more option to trigger gc.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:58 -07:00
Jaegeuk Kim
97a7b2c274 f2fs: convert inline_data for various fallocate
For newly added fallocate types, it should convert inline_data before handling
block swapping.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:53 -07:00
Jaegeuk Kim
6282adbf93 f2fs: call set_page_dirty to attach i_wb for cgroup
The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
is called, __inc_wb_stat() updates i_wb's stat.

So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
any writebacking pages.

This patch should resolve the following kernel panic reported by Andreas Reis.

https://bugzilla.kernel.org/show_bug.cgi?id=101801

--- Comment #2 from Andreas Reis <andreas.reis@gmail.com> ---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
PGD 2951ff067 PUD 2df43f067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
T01 02/03/2015
task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
__percpu_counter_add+0x1a/0x90
RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
Call Trace:
 [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
 RSP <ffff880295143ac8>
CR2: 00000000000000a8
---[ end trace 5132449a58ed93a3 ]---
note: gcc[10356] exited with preempt_count 2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-07-25 08:54:26 -07:00
Chao Yu
3c45414527 f2fs: do not trim preallocated blocks when truncating after i_size
When we perform generic/092 in xfstests, output is like below:

     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     0: [0..10239]: data
     0: [0..10239]: data
    -1: [10240..20479]: unwritten
    +1: [10240..14335]: unwritten

This is because with this testcase, we redefine the regulation for
truncate in perallocated space past i_size as below:

"There was some confused about what the fs was supposed to do when you
truncate at i_size with preallocated space past i_size. We decided on the
following things.

1) truncate(i_size) will trim all blocks past i_size.
2) truncate(x) where x > i_size will not trim all blocks past i_size.
"

This method is used in xfs, and then ext4/btrfs will follow the rule.

This patch fixes to follow the new rule for f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-11 18:30:49 -07:00
Jaegeuk Kim
de6a8ec982 f2fs: drop the volatile_write flag only
When aborting volatile_writes, let's drop its flag and give up any further
volatile_writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-09 13:56:47 -07:00
Chao Yu
c5bda1c8b1 f2fs: skip committing valid superblock
In recovery procedure for superblock, we try to write data of valid
superblock into invalid one for recovery, work should be finished here,
but then still we will write the valid one with its original data.
This operation is not needed. Let's skip doing this unnecessary work.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-08 11:20:51 -07:00
Chao Yu
f62185d0e2 f2fs: support FALLOC_FL_INSERT_RANGE
FALLOC_FL_INSERT_RANGE flag for ->fallocate was introduced in commit
dd46c78778 ("fs: Add support FALLOC_FL_INSERT_RANGE for fallocate").

The effect of FALLOC_FL_INSERT_RANGE command is the opposite of
FALLOC_FL_COLLAPSE_RANGE, if this command was performed, all data from
offset to EOF in our file will be shifted to right as given length, and
then range [offset, offset + length] becomes a hole.

This command is useful for our user who wants to add some data in the
middle of the file, for example: video/music editor will insert a keyframe
in specified position of media file, with this command we can easily create
a hole for inserting without removing original data.

This patch introduces f2fs_insert_range() to support FALLOC_FL_INSERT_RANGE.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-02 09:53:27 -07:00
Chao Yu
528e34593d f2fs: hide common code in f2fs_replace_block
This patch clean up codes through:
1.rename f2fs_replace_block to __f2fs_replace_block().
2.introduce new f2fs_replace_block() to include __f2fs_replace_block()
and some common related codes around __f2fs_replace_block().

Then, newly introduced function f2fs_replace_block can be used by
following patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-02 09:52:07 -07:00
Chao Yu
4637fd11ff f2fs crypto: do not set encryption policy for non-directory by ioctl
Encryption policy should only be set to an empty directory through ioctl,
This patch add a judgement condition to verify type of the target inode
to avoid incorrectly configuring for non-directory.

Additionally, remove unneeded inline data conversion since regular or symlink
file should not be processed here.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:07 -07:00
Jaegeuk Kim
e7d5545285 f2fs crypto: add filename encryption for roll-forward recovery
This patch adds a bit flag to indicate whether or not i_name in the inode
is encrypted.

If this name is encrypted, we can't do recover_dentry during roll-forward.
So, f2fs_sync_file() needs to do checkpoint, if this will be needed in future.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:55 -07:00
Jaegeuk Kim
4375a33664 f2fs crypto: add encryption support in read/write paths
This patch adds encryption support in read and write paths.

Note that, in f2fs, we need to consider cleaning operation.
In cleaning procedure, we must avoid encrypting and decrypting written blocks.
So, this patch implements move_encrypted_block().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:52 -07:00
Jaegeuk Kim
fcc85a4d86 f2fs crypto: activate encryption support for fs APIs
This patch activates the following APIs for encryption support.

The rules quoted by ext4 are:
 - An unencrypted directory may contain encrypted or unencrypted files
   or directories.
 - All files or directories in a directory must be protected using the
   same key as their containing directory.
 - Encrypted inode for regular file should not have inline_data.
 - Encrypted symlink and directory may have inline_data and inline_dentry.

This patch activates the following APIs.
1. f2fs_link              : validate context
2. f2fs_lookup            :      ''
3. f2fs_rename            :      ''
4. f2fs_create/f2fs_mkdir : inherit its dir's context
5. f2fs_direct_IO         : do buffered io for regular files
6. f2fs_open              : check encryption info
7. f2fs_file_mmap         :      ''
8. f2fs_setattr           :      ''
9. f2fs_file_write_iter   :      ''           (Called by sys_io_submit)
10. f2fs_fallocate        : do not support fcollapse
11. f2fs_evict_inode      : free_encryption_info

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:51 -07:00
Jaegeuk Kim
f424f664f0 f2fs crypto: add encryption policy and password salt support
This patch adds encryption policy and password salt support through ioctl
implementation.

It adds three ioctls:
 F2FS_IOC_SET_ENCRYPTION_POLICY,
 F2FS_IOC_GET_ENCRYPTION_POLICY,
 F2FS_IOC_GET_ENCRYPTION_PWSALT, which use xattr operations.

Note that, these definition and codes are taken from ext4 crypto support.
For f2fs, xattr operations and on-disk flags for superblock and inode were
changed.

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:48 -07:00
Chao Yu
75cd4e098d f2fs: support FALLOC_FL_ZERO_RANGE
Now, FALLOC_FL_ZERO_RANGE flag in ->fallocate is supported in ext4/xfs.

In commit, the semantics of this flag is descripted as following:"
1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
   block number is not the starting block of the extent, split the extent
   such that the block number is the starting block of the extent.
4) Shift all the extents which are lying between
   [offset, last allocated extent] towards right by len bytes. This step
   will make a hole of len bytes at offset."

This patch implements fallocate's FALLOC_FL_ZERO_RANGE for f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:43 -07:00
Chao Yu
b4ace33703 f2fs: support FALLOC_FL_COLLAPSE_RANGE
Now, FALLOC_FL_COLLAPSE_RANGE flag in ->fallocate is supported in ext4/xfs.

In commit, the semantics of this flag is descripted as following:"
1) It collapses the range lying between offset and length by removing any
   data blocks which are present in this range and than updates all the
   logical offsets of extents beyond "offset + len" to nullify the hole
   created by removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
   in case of xfs and ext4.
4) Collaspe range does not work beyond i_size."

This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:42 -07:00