Commit Graph

105 Commits

Author SHA1 Message Date
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Al Viro
21fc61c73c don't put symlink bodies in pagecache into highmem
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:36 -05:00
Al Viro
9cdce3c074 ufs: get rid of ->setattr() for symlinks
It was to needed for a couple of months in 2010, until UFS
quota support got dropped.  Since then it's equivalent to
simple_setattr() (i.e. the default) for everything except the
regular files.  And dropping it there allows to convert all
UFS symlinks to {page,simple}_symlink_inode_operations, getting
rid of fs/ufs/symlink.c completely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 20:43:26 -05:00
Al Viro
4e317ce73a ufs_inode_get{frag,block}(): get rid of 'phys' argument
Just pass NULL as locked_page in case of first block in the indirect
chain.  Old calling conventions aside, a reason for having 'phys'
was that ufs_inode_getfrag() used to be able to do _two_ allocations
- indirect block and extending/reallocating a tail.  We needed
locked_page for the latter (it's a data), but we also needed to
figure out that indirect block is metadata.  So we used to pass
non-NULL locked_page in all cases *and* used NULL phys as
indication of being asked to allocate an indirect.

With tail unpacking taken into a separate function we don't need
those convolutions anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:05 -04:00
Al Viro
0385f1f9e3 ufs_getfrag_block(): tidy up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:04 -04:00
Al Viro
5fbfb238f7 ufs_inode_getblock(): failure to read an indirect block is -EIO
... and not "write to beginning of the disk", TYVM...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:03 -04:00
Al Viro
4eeff4c932 ufs_getfrag_block(): turn following indirects into a loop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:02 -04:00
Al Viro
5336970be0 ufs_inode_getfrag(): pass index instead of 'fragment'
same story as with ufs_inode_getblock()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:01 -04:00
Al Viro
0f3c1294be ufs_inode_getfrag(): split extending the partial blocks off
ufs_extend_tail() is handling that now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:00 -04:00
Al Viro
619cfac091 ufs_inode_getblock(): pass indirect block number and full index
... instead of messing with buffer_head.  We can bloody well do
sb_bread() in there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:59 -04:00
Al Viro
721435a767 ufs_inode_getblock(): pass index instead of 'fragment'
The value passed to ufs_inode_getblock() as the 3rd argument
had lower bits ignored; the upper bits were shifted down
and used and they actually make sense - those are _lower_ bits
of index in indirect block (i.e. they form the index within
a fragment within an indirect block).

Pass those as argument.  Upper bits of index (i.e. the number
of fragment within indirect block) will join them shortly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:58 -04:00
Al Viro
177848a018 ufs_inode_get{frag,block}(): leave sb_getblk() to caller
just return the damn block number

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:57 -04:00
Al Viro
8d9dcf1436 ufs_getfrag_block(): get rid of macro jungles
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:56 -04:00
Al Viro
bbb3eb9d34 ufs_inode_get{frag,block}(): consolidate success exits
These calling conventions are rudiments of pre-2.3 times; they
really need to be sanitized.  This is the first step; next
will be _always_ returning a block number, instead of this
"return a pointer to buffer_head, except when we get to the
actual data" crap.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:55 -04:00
Al Viro
71dd42846f ufs: use the branch depth in ufs_getfrag_block()
we'd already calculated it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:54 -04:00
Al Viro
4b7068c8b1 ufs: move calculation of offsets into ufs_getfrag_block()
... and massage ufs_frag_map() to take those instead of fragment number.

As it is, we duplicate the damn thing on the write side, open-coded and
bloody hard to follow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:53 -04:00
Al Viro
5a39c25562 ufs_inode_get{frag,block}(): get rid of retries
We are holding ->truncate_mutex, so nobody else can alter our
block pointers.  Rechecks/retries were needed back when we
only held BKL there, and had to cope with write_begin/writepage
and writepage/truncate races.  Can't happen anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:52 -04:00
Al Viro
f53bd1421b __ufs_truncate_blocks(): avoid excessive dirtying of indirect blocks
There's a case when an indirect block gets dirtied for no good
reason - when there's a hole starting in the middle of area
covered by it and spanning past its end, and truncate() is done
precisely to the beginning of the hole.

The block is obviously not modified at all - all removals happen
beyond it.  However, existing code ends up dirtying it just in
case.  It's trivial to fix and while it's not a real bug by any
stretch of imagination, it makes the damn thing harder to follow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:51 -04:00
Al Viro
cc7231e309 free_full_branch(): don't bother modifying the block we are going to free
Note that it's already made unreachable from the inode, so we don't have
to worry about ufs_frag_map() walking into something already freed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:50 -04:00
Al Viro
b6eede0ec6 move marking inode dirty to the end of __ufs_truncate_blocks()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:49 -04:00
Al Viro
163073db51 free_full_branch(): saner calling conventions
Have caller fetch the block number *and* remove it from wherever
it was.  Pass the block number instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:48 -04:00
Al Viro
7b4e4f7f81 ufs_trunc_branch(): kill recursion
turn recursion into a pair of loops

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:47 -04:00
Al Viro
6aab6dd379 ufs_trunc_branch(): massage towards killing recursion
We always have 0 < depth2 <= depth in there, so
if (--depth) {
	if (--depth2)
		A
	B
} else {
	C // not using depth2
}
D // not using depth2

is equivalent to

if (--depth2)
	A with s/depth/depth - 1/
if (--depth)
	B
else
	C
D

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:46 -04:00
Al Viro
6d1ebbca2b split ufs_truncate_branch() into full- and partial-branch variants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:45 -04:00
Al Viro
a138b4b688 ufs: unify the logics for collecting adjacent data blocks to free
open-coded in several places...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:44 -04:00
Al Viro
a96574233c ufs_trunc_branch(): separate the calls with non-NULL offsets
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:43 -04:00
Al Viro
97e0f8f87c ufs_trunc_branch(): never call with offsets != NULL && depth2 == 0
For calls in __ufs_truncate_blocks() it's just a matter of not
incrementing offsets[0] and not making that call - immediately
following loop will be executed one extra time and we'll be just
fine.  For recursive call in ufs_trunc_branch() itself, just
assing NULL to offsets if we would be about to make such call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:42 -04:00
Al Viro
42432739b5 __ufs_trunc_blocks(): turn the part after switch into a loop
... and turn the switch into if (), since all cases with
depth != 1 have just become identical.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:41 -04:00
Al Viro
ef3a315d4c __ufs_truncate_blocks(): unify freeing the full branches
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:40 -04:00
Al Viro
9e0fbbde27 unify ufs_trunc_..indirect()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:39 -04:00
Al Viro
6775e24d9c ufs_trunc_..indirect(): more massage towards unifying
Instead of manually checking that the array contains only zeroes,
find the position of the last non-zero (in __ufs_truncate(), where
we can conveniently do that) and use that to tell if there's
any non-zero in the array tail passed to ufs_trunc_...indirect().

The goal of all that clumsiness is to get fold these functions
together.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:38 -04:00
Al Viro
85416288bf ufs_trunc_...indirect(): pass the array of indices instead of offsets
rather than bitslicing the offset just formed as sum of shifted indices,
pass the array of those indices itself.  NULL is used as equivalent
of "all zeroes" (== free the entire branch).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:37 -04:00
Al Viro
7a4fdda724 __ufs_truncate(); find cutoff distances into branches by offsets[] array
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:36 -04:00
Al Viro
7bad5939fc ufs_trunc_dindirect(): pass the number of blocks to keep
same as the previous two.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:35 -04:00
Al Viro
6ac36b8777 ufs_trunc_indirect(): pass the index of the first pointer to free
... instead of file offset.  Same cleanups as in the tindirect
conversion in previous commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:34 -04:00
Al Viro
18ca51d821 ufs_trunc_tindirect(): pass the number of blocks to keep
IOW, the distance of cutoff from the begining of the branch
(in blocks).

That (and the fact that block just prior to cutoff is guaranteed to
be present) allows to tell whether to free triple indirect block
just by looking at the offset.

While we are at it, using u64 for index in the block is wrong -
those should be unsigned int.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:33 -04:00
Al Viro
31cd043e1a ufs: beginning of __ufs_truncate_block() massage
Use ufs_block_to_path() to find the cutoff path in the block pointers' tree.
For now just use the information about the depth (to bypass the fully
preserved subtrees); subsequent commits will use the information about actual
path.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:32 -04:00
Al Viro
4e3911f3d7 ufs: the offsets ufs_block_to_path() puts into array are not sector_t
type makes no sense - those are indices in block number arrays, not
block numbers.  And no, UFS is not likely to grow indirect blocks with
4Gpointers in them...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:31 -04:00
Al Viro
010d331fc3 ufs: move truncate code into inode.c
It is closely tied to block pointers handling there, can benefit
from existing helpers, etc. - no point keeping them apart.

Trimmed the trailing whitespaces in inode.c at the same time.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:30 -04:00
Al Viro
724bb09fdc ufs: don't use lock_ufs() for block pointers tree protection
* stores to block pointers are under per-inode seqlock (meta_lock) and
mutex (truncate_mutex)
* fetches of block pointers are either under truncate_mutex, or wrapped
into seqretry loop on meta_lock
* all changes of ->i_size are under truncate_mutex and i_mutex
* all changes of ->i_lastfrag are under truncate_mutex

It's similar to what ext2 is doing; the main difference is that unlike
ext2 we can't rely upon the atomicity of stores into block pointers -
on UFS2 they are 64bit.  So we can't cut the corner when switching
a pointer from NULL to non-NULL as we could in ext2_splice_branch()
and need to use meta_lock on all modifications.

We use seqlock where ext2 uses rwlock; ext2 could probably also benefit
from such change...

Another non-trivial difference is that with UFS we *cannot* have reader
grab truncate_mutex in case of race - it has to keep retrying.  That
might be possible to change, but not until we lift tail unpacking
several levels up in call chain.

After that commit we do *NOT* hold fs-wide serialization on accesses
to block pointers anymore.  Moreover, lock_ufs() can become a normal
mutex now - it's only used on statfs, remount and sync_fs and none
of those uses are recursive.  As the matter of fact, *now* it can be
collapsed with ->s_lock, and be eventually replaced with saner
per-cylinder-group spinlocks, but that's a separate story.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:25 -04:00
Al Viro
3b7a3a05e8 ufs: free excessive blocks upon ->write_begin() failure/short copy
Broken in "[PATCH] ufs: truncate should allocate block for last byte";
all way back in 2006.  ufs_setattr() hadn't been the only user of
vmtruncate() and eliminating ->truncate() method required corrections
in a bunch of places.  Eventually those places had migrated into
->write_begin() failure exit and ->write_end() after short copy...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:21 -04:00
Al Viro
d622f167b8 ufs: switch ufs_evict_inode() to trimmed-down variant of ufs_truncate()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:20 -04:00
Al Viro
f3e0f3da1b ufs: kill more lock_ufs() calls
a) move it inside ufs_truncate()
b) ufs_free_inode() doesn't need it - it's serialized on ->s_lock
c) ufs_write_inode() doesn't need it either (and can be called without
it anyway).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:19 -04:00
Al Viro
4ef51e8b7a Merge branch 'for-linus' into for-next 2015-06-17 14:44:05 -04:00
Fabian Frederick
13b987ea27 fs/ufs: revert "ufs: fix deadlocks introduced by sb mutex merge"
This reverts commit 9ef7db7f38 ("ufs: fix deadlocks introduced by sb
mutex merge") That patch tried to solve commit 0244756edc ("ufs: sb
mutex merge + mutex_destroy") which is itself partially reverted due to
multiple deadlocks.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Suggested-by: Jan Kara <jack@suse.cz>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Roger Pau Monne <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-06-14 11:31:51 -04:00
Al Viro
4b8061a67f ufs: switch to simple_follow_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-10 22:18:25 -04:00
Alexey Khoroshilov
9ef7db7f38 ufs: fix deadlocks introduced by sb mutex merge
Commit 0244756edc ("ufs: sb mutex merge + mutex_destroy") introduces
deadlocks in ufs_new_inode() and ufs_free_inode().
Most callers of that functions acqure the mutex by themselves and
ufs_{new,free}_inode() do that via lock_ufs(),
i.e we have an unavoidable double lock.

The patch proposes to resolve the issue by making sure that
ufs_{new,free}_inode() are not called with the mutex held.

Found by Linux Driver Verification project (linuxtesting.org).

Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-07 13:26:39 -04:00
Fabian Frederick
edc023caf4 fs/ufs/inode.c: kernel-doc warning fixes
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:21 -07:00
Johannes Weiner
91b0abe36a mm + fs: store shadow entries in page cache
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Kirill A. Shutemov
7caef26767 truncate: drop 'oldsize' truncate_pagecache() parameter
truncate_pagecache() doesn't care about old size since commit
cedabed49b ("vfs: Fix vmtruncate() regression").  Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:02 -07:00