Use folios throughout the release_folio path.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
If we need a release_folio, we can add it back.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use folios throughout the release_folio paths.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
The use of folios should be pushed further down into jfs from here.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use a folio throughout hfsplus_release_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
While converting f2fs_release_page() to f2fs_release_folio(), cache the
sb_info so we don't need to retrieve it twice, and remove the redundant
call to set_page_private(). The use of folios should be pushed further
into f2fs from here.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
The use of folios should be pushed deeper into ext4 from here.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use a folio in erofs_managed_cache_release_folio(), but use of folios
should be pushed into erofs_try_to_free_cached_page().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use a folio throughout cifs_release_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use a folio throughout ceph_release_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
I've only converted the outer layers of the btrfs release_folio paths
to use folios; the use of folios should be pushed further down into
btrfs from here.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
A straightforward conversion as they already work in terms of folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
A straightforward conversion as it already works in terms of folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Change all the filesystems which used iomap_releasepage to use the
new function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
This replaces aops->releasepage. Update the documentation, and call it
if it exists.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
When running the stress-ng clone benchmark with multiple testing threads,
it was found that there were significant spinlock contention in sget_fc().
The contended spinlock was the sb_lock. It is under heavy contention
because the following code in the critcal section of sget_fc():
hlist_for_each_entry(old, &fc->fs_type->fs_supers, s_instances) {
if (test(old, fc))
goto share_extant_sb;
}
After testing with added instrumentation code, it was found that the
benchmark could generate thousands of ipc namespaces with the
corresponding number of entries in the mqueue's fs_supers list where the
namespaces are the key for the search. This leads to excessive time in
scanning the list for a match.
Looking back at the mqueue calling sequence leading to sget_fc():
mq_init_ns()
=> mq_create_mount()
=> fc_mount()
=> vfs_get_tree()
=> mqueue_get_tree()
=> get_tree_keyed()
=> vfs_get_super()
=> sget_fc()
Currently, mq_init_ns() is the only mqueue function that will indirectly
call mqueue_get_tree() with a newly allocated ipc namespace as the key for
searching. As a result, there will never be a match with the exising ipc
namespaces stored in the mqueue's fs_supers list.
So using get_tree_keyed() to do an existing ipc namespace search is just a
waste of time. Instead, we could use get_tree_nodev() to eliminate the
useless search. By doing so, we can greatly reduce the sb_lock hold time
and avoid the spinlock contention problem in case a large number of ipc
namespaces are present.
Of course, if the code is modified in the future to allow
mqueue_get_tree() to be called with an existing ipc namespace instead of a
new one, we will have to use get_tree_keyed() in this case.
The following stress-ng clone benchmark command was run on a 2-socket
48-core Intel system:
./stress-ng --clone 32 --verbose --oomable --metrics-brief -t 20
The "bogo ops/s" increased from 5948.45 before patch to 9137.06 after
patch. This is an increase of 54% in performance.
Link: https://lkml.kernel.org/r/20220121172315.19652-1-longman@redhat.com
Fixes: 935c6912b1 ("ipc: Convert mqueue fs to fs_context")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When processing a "file" entry, gen_init_cpio attempts to allocate a
buffer large enough to stage the entire contents of the source file. It
then attempts to fill the buffer via a single read() call and subsequently
writes out the entire buffer length, without checking that read() returned
the full length, potentially writing uninitialized buffer memory.
Fix this by breaking up file I/O into 64k chunks and only writing the
length returned by the prior read() call.
Link: https://lkml.kernel.org/r/20220404093429.27570-5-ddiss@suse.de
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
initramfs cpio mtime preservation, as implemented in commit 889d51a107
("initramfs: add option to preserve mtime from initramfs cpio images"),
uses a linked list to defer directory mtime processing until after all
other items in the cpio archive have been processed. This is done to
ensure that parent directory mtimes aren't overwritten via subsequent
child creation.
The lkml link below indicates that the mtime retention use case was for
embedded devices with applications running exclusively out of initramfs,
where the 32-bit mtime value provided a rough file version identifier.
Linux distributions which discard an extracted initramfs immediately after
the root filesystem has been mounted may want to avoid the unnecessary
overhead.
This change adds a new INITRAMFS_PRESERVE_MTIME Kconfig option, which can
be used to disable on-by-default mtime retention and in turn speed up
initramfs extraction, particularly for cpio archives with large directory
counts.
Benchmarks with a one million directory cpio archive extracted 20 times
demonstrated:
mean extraction time (s) std dev
INITRAMFS_PRESERVE_MTIME=y 3.808 0.006
INITRAMFS_PRESERVE_MTIME unset 3.056 0.004
The above extraction times were measured using ftrace (initcall_finish -
initcall_start) values for populate_rootfs() with initramfs_async
disabled.
[ddiss@suse.de: rebase atop dir_entry.name flexible array member and drop separate initramfs_mtime.h header]
Link: https://lkml.org/lkml/2008/9/3/424
Link: https://lkml.kernel.org/r/20220404093429.27570-4-ddiss@suse.de
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "initramfs: "crc" cpio format and INITRAMFS_PRESERVE_MTIME", v7.
This patchset does some minor initramfs refactoring and allows cpio entry
mtime preservation to be disabled via a new Kconfig
INITRAMFS_PRESERVE_MTIME option.
Patches 4/6 to 6/6 implement support for creation and extraction of "crc"
cpio archives, which carry file data checksums. Basic tests for this
functionality can be found at https://github.com/rapido-linux/rapido/pull/163
This patch (of 6):
do_header() is called for each cpio entry and fails if the first six bytes
don't match "newc" magic. The magic check includes a special case error
message if POSIX.1 ASCII (cpio -H odc) magic is detected. This special
case POSIX.1 check can be nested under the "newc" mismatch code path to
avoid calling memcmp() twice in a non-error case.
Link: https://lkml.kernel.org/r/20220404093429.27570-1-ddiss@suse.de
Link: https://lkml.kernel.org/r/20220404093429.27570-2-ddiss@suse.de
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a process exits, /proc/${pid}, and /proc/${pid}/net dentries are
flushed. However some leaf dentries like /proc/${pid}/net/arp_cache
aren't. That's because respective PDEs have proc_misc_d_revalidate() hook
which returns 1 and leaves dentries/inodes in the LRU.
Force revalidation/lookup on everything under /proc/${pid}/net by
inheriting proc_net_dentry_ops.
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/YjdVHgildbWO7diJ@localhost.localdomain
Fixes: c6c75deda8 ("proc: fix lookup in /proc/net subdirectories after setns(2)")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: hui li <juanfengpy@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently various places test if direct IO is possible on a file by
checking for the existence of the direct_IO address space operation.
This is a poor choice, as the direct_IO operation may not be used - it is
only used if the generic_file_*_iter functions are called for direct IO
and some filesystems - particularly NFS - don't do this.
Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the
various places to check this (avoiding pointer dereferences).
do_dentry_open() will set this flag if ->direct_IO is present, so
filesystems do not need to be changed.
NFS *is* changed, to set the flag explicitly and discard the direct_IO
entry in the address_space_operations for files.
Other filesystems which currently use noop_direct_IO could usefully be
changed to set this flag instead.
Link: https://lkml.kernel.org/r/164859778128.29473.15189737957277399416.stgit@noble.brown
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
swap_writepage() is given one page at a time, but may be called repeatedly
in succession.
For block-device swapspace, the blk_plug functionality allows the multiple
pages to be combined together at lower layers. That cannot be used for
SWP_FS_OPS as blk_plug may not exist - it is only active when
CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page
reads.
With this patch we pass a pointer-to-pointer via the wbc. swap_writepage
can store state between calls - much like the pointer passed explicitly to
swap_readpage. After calling swap_writepage() some number of times, the
state will be passed to swap_write_unplug() which can submit the combined
request.
Link: https://lkml.kernel.org/r/164859778128.29473.5191868522654408537.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
swap_readpage() is given one page at a time, but may be called repeatedly
in succession.
For block-device swap-space, the blk_plug functionality allows the
multiple pages to be combined together at lower layers. That cannot be
used for SWP_FS_OPS as blk_plug may not exist - it is only active when
CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page
reads.
With this patch we pass in a pointer-to-pointer when swap_readpage can
store state between calls - much like the effect of blk_plug. After
calling swap_readpage() some number of times, the state will be passed to
swap_read_unplug() which can submit the combined request.
Link: https://lkml.kernel.org/r/164859778127.29473.14059420492644907783.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
swap currently uses ->readpage to read swap pages. This can only request
one page at a time from the filesystem, which is not most efficient.
swap uses ->direct_IO for writes which while this is adequate is an
inappropriate over-loading. ->direct_IO may need to had handle allocate
space for holes or other details that are not relevant for swap.
So this patch introduces a new address_space operation: ->swap_rw. In
this patch it is used for reads, and a subsequent patch will switch writes
to use it.
No filesystem yet supports ->swap_rw, but that is not a problem because
no filesystem actually works with filesystem-based swap.
Only two filesystems set SWP_FS_OPS:
- cifs sets the flag, but ->direct_IO always fails so swap cannot work.
- nfs sets the flag, but ->direct_IO calls generic_write_checks()
which has failed on swap files for several releases.
To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both
NFS and cifs are changed to fail if ->swap_rw is not set. This can be
removed if/when the function is added.
Future patches will restore swap-over-NFS functionality.
To submit an async read with ->swap_rw() we need to allocate a structure
to hold the kiocb and other details. swap_readpage() cannot handle
transient failure, so we create a mempool to provide the structures.
Link: https://lkml.kernel.org/r/164859778125.29473.13430559328221330589.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If a filesystem wishes to handle all swap IO itself (via ->direct_IO and
->readpage), rather than just providing devices addresses for
submit_bio(), SWP_FS_OPS must be set.
Currently the protocol for setting this it to have ->swap_activate return
zero. In that case SWP_FS_OPS is set, and add_swap_extent() is called for
the entire file.
This is a little clumsy as different return values for ->swap_activate
have quite different meanings, and it makes it hard to search for which
filesystems require SWP_FS_OPS to be set.
So remove the special meaning of a zero return, and require the filesystem
to set SWP_FS_OPS if it so desires, and to always call add_swap_extent()
as required.
Currently only NFS and CIFS return zero for add_swap_extent().
Link: https://lkml.kernel.org/r/164859778123.29473.17908205846599043598.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
folios that are written to swap are owned by the MM subsystem - not any
filesystem.
When such a folio is passed to a filesystem to be written out to a
swap-file, the filesystem handles the data, but the folio itself does not
belong to the filesystem. So calling the filesystem's ->dirty_folio()
address_space operation makes no sense. This is for folios in the given
address space, and a folio to be written to swap does not exist in the
given address space.
So drop swap_dirty_folio() which calls the address-space's
->dirty_folio(), and always use noop_dirty_folio(), which is appropriate
for folios being swapped out.
Link: https://lkml.kernel.org/r/164859778123.29473.6900942583784889976.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "MM changes to improve swap-over-NFS support".
Assorted improvements for swap-via-filesystem.
This is a resend of these patches, rebased on current HEAD. The only
substantial changes is that swap_dirty_folio has replaced
swap_set_page_dirty.
Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem. It
has previously worked for NFS but that broke a few releases back. This
series changes to use a new ->swap_rw rather than ->readpage and
->direct_IO. It also makes other improvements.
There is a companion series already in linux-next which fixes various
issues with NFS. Once both series land, a final patch is needed which
changes NFS over to use ->swap_rw.
This patch (of 10):
Many functions declared in include/linux/swap.h are only used within mm/
Create a new "mm/swap.h" and move some of these declarations there.
Remove the redundant 'extern' from the function declarations.
[akpm@linux-foundation.org: mm/memory-failure.c needs mm/swap.h]
Link: https://lkml.kernel.org/r/164859751830.29473.5309689752169286816.stgit@noble.brown
Link: https://lkml.kernel.org/r/164859778120.29473.11725907882296224053.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The gup_test binary will fail showing only the output of perror("open") in
the case that /sys/kernel/debug/gup_test is not found. This will almost
always be due to CONFIG_GUP_TEST not being set, which enables
compilation of a kernel that provides this file.
Add a short error message to clarify this failure and point the user to
the solution.
Link: https://lkml.kernel.org/r/20220502224942.995427-1-jsavitz@redhat.com
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>