2019-05-27 06:55:06 +00:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2013-07-10 23:05:03 +00:00
|
|
|
/*
|
|
|
|
* zswap.c - zswap driver file
|
|
|
|
*
|
2023-07-17 16:02:27 +00:00
|
|
|
* zswap is a cache that takes pages that are in the process
|
2013-07-10 23:05:03 +00:00
|
|
|
* of being swapped out and attempts to compress and store them in a
|
|
|
|
* RAM-based memory pool. This can result in a significant I/O reduction on
|
|
|
|
* the swap device and, in the case where decompressing from RAM is faster
|
|
|
|
* than reading from the swap device, can also improve workload performance.
|
|
|
|
*
|
|
|
|
* Copyright (C) 2012 Seth Jennings <sjenning@linux.vnet.ibm.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/cpu.h>
|
|
|
|
#include <linux/highmem.h>
|
|
|
|
#include <linux/slab.h>
|
|
|
|
#include <linux/spinlock.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/atomic.h>
|
|
|
|
#include <linux/rbtree.h>
|
|
|
|
#include <linux/swap.h>
|
|
|
|
#include <linux/crypto.h>
|
2020-12-15 03:14:18 +00:00
|
|
|
#include <linux/scatterlist.h>
|
mempolicy: alloc_pages_mpol() for NUMA policy without vma
Shrink shmem's stack usage by eliminating the pseudo-vma from its folio
allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the
principal actor for passing mempolicy choice down to __alloc_pages(),
rather than vma_alloc_folio(gfp, order, vma, addr, hugepage).
vma_alloc_folio() and alloc_pages() remain, but as wrappers around
alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the
additional args to policy_nodemask(), which subsumes policy_node().
Cleanup throughout, cutting out some unhelpful "helpers".
It would all be much simpler without MPOL_INTERLEAVE, but that adds a
dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8bfd
("tmpfs: distribute interleave better across nodes"), which added ino bias
to the interleave, hidden from mm/mempolicy.c until this commit.
Hence "ilx" throughout, the "interleave index". Originally I thought it
could be done just with nid, but that's wrong: the nodemask may come from
the shared policy layer below a shmem vma, or it may come from the task
layer above a shmem vma; and without the final nodemask then nodeid cannot
be decided. And how ilx is applied depends also on page order.
The interleave index is almost always irrelevant unless MPOL_INTERLEAVE:
with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX
passed down from vma-less alloc_pages() is also used as hint not to use
THP-style hugepage allocation - to avoid the overhead of a hugepage arg
(though I don't understand why we never just added a GFP bit for THP - if
it actually needs a different allocation strategy from other pages of the
same order). vma_alloc_folio() still carries its hugepage arg here, but
it is not used, and should be removed when agreed.
get_vma_policy() no longer allows a NULL vma: over time I believe we've
eradicated all the places which used to need it e.g. swapoff and madvise
used to pass NULL vma to read_swap_cache_async(), but now know the vma.
[hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()]
Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com
Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com
Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun heo <tj@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-19 20:39:08 +00:00
|
|
|
#include <linux/mempolicy.h>
|
2013-07-10 23:05:03 +00:00
|
|
|
#include <linux/mempool.h>
|
2014-08-06 23:08:40 +00:00
|
|
|
#include <linux/zpool.h>
|
2020-12-15 03:14:18 +00:00
|
|
|
#include <crypto/acompress.h>
|
2023-07-17 16:02:27 +00:00
|
|
|
#include <linux/zswap.h>
|
2013-07-10 23:05:03 +00:00
|
|
|
#include <linux/mm_types.h>
|
|
|
|
#include <linux/page-flags.h>
|
|
|
|
#include <linux/swapops.h>
|
|
|
|
#include <linux/writeback.h>
|
|
|
|
#include <linux/pagemap.h>
|
2020-01-31 06:15:04 +00:00
|
|
|
#include <linux/workqueue.h>
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2022-05-10 01:20:47 +00:00
|
|
|
#include "swap.h"
|
mm: zswap: shrink until can accept
This update addresses an issue with the zswap reclaim mechanism, which
hinders the efficient offloading of cold pages to disk, thereby
compromising the preservation of the LRU order and consequently
diminishing, if not inverting, its performance benefits.
The functioning of the zswap shrink worker was found to be inadequate, as
shown by basic benchmark test. For the test, a kernel build was utilized
as a reference, with its memory confined to 1G via a cgroup and a 5G swap
file provided. The results are presented below, these are averages of
three runs without the use of zswap:
real 46m26s
user 35m4s
sys 7m37s
With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G
system), the results changed to:
real 56m4s
user 35m13s
sys 8m43s
written_back_pages: 18
reject_reclaim_fail: 0
pool_limit_hit:1478
Besides the evident regression, one thing to notice from this data is the
extremely low number of written_back_pages and pool_limit_hit.
The pool_limit_hit counter, which is increased in zswap_frontswap_store
when zswap is completely full, doesn't account for a particular scenario:
once zswap hits his limit, zswap_pool_reached_full is set to true; with
this flag on, zswap_frontswap_store rejects pages if zswap is still above
the acceptance threshold. Once we include the rejections due to
zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478
to a significant 21578266.
Zswap is stuck in an undesirable state where it rejects pages because it's
above the acceptance threshold, yet fails to attempt memory reclaimation.
This happens because the shrink work is only queued when
zswap_frontswap_store detects that it's full and the work itself only
reclaims one page per run.
This state results in hot pages getting written directly to disk, while
cold ones remain memory, waiting only to be invalidated. The LRU order is
completely broken and zswap ends up being just an overhead without
providing any benefits.
This commit applies 2 changes: a) the shrink worker is set to reclaim
pages until the acceptance threshold is met and b) the task is also
enqueued when zswap is not full but still above the threshold.
Testing this suggested update showed much better numbers:
real 36m37s
user 35m8s
sys 9m32s
written_back_pages: 10459423
reject_reclaim_fail: 12896
pool_limit_hit: 75653
Link: https://lkml.kernel.org/r/20230526183227.793977-1-cerasuolodomenico@gmail.com
Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-26 18:32:27 +00:00
|
|
|
#include "internal.h"
|
2022-05-10 01:20:47 +00:00
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/*********************************
|
|
|
|
* statistics
|
|
|
|
**********************************/
|
2014-08-06 23:08:40 +00:00
|
|
|
/* Total bytes used by the compressed storage */
|
2022-05-19 21:08:53 +00:00
|
|
|
u64 zswap_pool_total_size;
|
2013-07-10 23:05:03 +00:00
|
|
|
/* The number of compressed pages currently stored in zswap */
|
2022-05-19 21:08:53 +00:00
|
|
|
atomic_t zswap_stored_pages = ATOMIC_INIT(0);
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
/* The number of same-value filled pages currently stored in zswap */
|
|
|
|
static atomic_t zswap_same_filled_pages = ATOMIC_INIT(0);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The statistics below are not protected from concurrent access for
|
|
|
|
* performance reasons so they may not be a 100% accurate. However,
|
|
|
|
* they do provide useful information on roughly how many times a
|
|
|
|
* certain event is occurring.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Pool limit was hit (see zswap_max_pool_percent) */
|
|
|
|
static u64 zswap_pool_limit_hit;
|
|
|
|
/* Pages written back when pool limit was reached */
|
|
|
|
static u64 zswap_written_back_pages;
|
|
|
|
/* Store failed due to a reclaim failure after pool limit was reached */
|
|
|
|
static u64 zswap_reject_reclaim_fail;
|
2023-10-24 23:45:09 +00:00
|
|
|
/* Store failed due to compression algorithm failure */
|
|
|
|
static u64 zswap_reject_compress_fail;
|
2013-07-10 23:05:03 +00:00
|
|
|
/* Compressed page was too big for the allocator to (optimally) store */
|
|
|
|
static u64 zswap_reject_compress_poor;
|
|
|
|
/* Store failed because underlying allocator could not get memory */
|
|
|
|
static u64 zswap_reject_alloc_fail;
|
|
|
|
/* Store failed because the entry metadata could not be allocated (rare) */
|
|
|
|
static u64 zswap_reject_kmemcache_fail;
|
|
|
|
/* Duplicate store was encountered (rare) */
|
|
|
|
static u64 zswap_duplicate_entry;
|
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
/* Shrinker work queue */
|
|
|
|
static struct workqueue_struct *shrink_wq;
|
|
|
|
/* Pool limit was hit, we need to calm down */
|
|
|
|
static bool zswap_pool_reached_full;
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/*********************************
|
|
|
|
* tunables
|
|
|
|
**********************************/
|
2015-06-25 22:00:35 +00:00
|
|
|
|
2017-02-27 22:26:50 +00:00
|
|
|
#define ZSWAP_PARAM_UNSET ""
|
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
static int zswap_setup(void);
|
|
|
|
|
2020-04-07 03:08:03 +00:00
|
|
|
/* Enable/disable zswap */
|
|
|
|
static bool zswap_enabled = IS_ENABLED(CONFIG_ZSWAP_DEFAULT_ON);
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
static int zswap_enabled_param_set(const char *,
|
|
|
|
const struct kernel_param *);
|
2020-12-15 03:14:11 +00:00
|
|
|
static const struct kernel_param_ops zswap_enabled_param_ops = {
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
.set = zswap_enabled_param_set,
|
|
|
|
.get = param_get_bool,
|
|
|
|
};
|
|
|
|
module_param_cb(enabled, &zswap_enabled_param_ops, &zswap_enabled, 0644);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2015-09-09 22:35:21 +00:00
|
|
|
/* Crypto compressor to use */
|
2020-04-07 03:08:03 +00:00
|
|
|
static char *zswap_compressor = CONFIG_ZSWAP_COMPRESSOR_DEFAULT;
|
2015-09-09 22:35:21 +00:00
|
|
|
static int zswap_compressor_param_set(const char *,
|
|
|
|
const struct kernel_param *);
|
2020-12-15 03:14:11 +00:00
|
|
|
static const struct kernel_param_ops zswap_compressor_param_ops = {
|
2015-09-09 22:35:21 +00:00
|
|
|
.set = zswap_compressor_param_set,
|
2015-11-07 00:29:15 +00:00
|
|
|
.get = param_get_charp,
|
|
|
|
.free = param_free_charp,
|
2015-09-09 22:35:21 +00:00
|
|
|
};
|
|
|
|
module_param_cb(compressor, &zswap_compressor_param_ops,
|
2015-11-07 00:29:15 +00:00
|
|
|
&zswap_compressor, 0644);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2015-09-09 22:35:21 +00:00
|
|
|
/* Compressed storage zpool to use */
|
2020-04-07 03:08:03 +00:00
|
|
|
static char *zswap_zpool_type = CONFIG_ZSWAP_ZPOOL_DEFAULT;
|
2015-09-09 22:35:21 +00:00
|
|
|
static int zswap_zpool_param_set(const char *, const struct kernel_param *);
|
2020-12-15 03:14:11 +00:00
|
|
|
static const struct kernel_param_ops zswap_zpool_param_ops = {
|
2015-11-07 00:29:15 +00:00
|
|
|
.set = zswap_zpool_param_set,
|
|
|
|
.get = param_get_charp,
|
|
|
|
.free = param_free_charp,
|
2015-09-09 22:35:21 +00:00
|
|
|
};
|
2015-11-07 00:29:15 +00:00
|
|
|
module_param_cb(zpool, &zswap_zpool_param_ops, &zswap_zpool_type, 0644);
|
2014-08-06 23:08:40 +00:00
|
|
|
|
2015-09-09 22:35:21 +00:00
|
|
|
/* The maximum percentage of memory that the compressed pool can occupy */
|
|
|
|
static unsigned int zswap_max_pool_percent = 20;
|
|
|
|
module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644);
|
2014-04-07 22:38:27 +00:00
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
/* The threshold for accepting new pages after the max_pool_percent was hit */
|
|
|
|
static unsigned int zswap_accept_thr_percent = 90; /* of max pool size */
|
|
|
|
module_param_named(accept_threshold_percent, zswap_accept_thr_percent,
|
|
|
|
uint, 0644);
|
|
|
|
|
2022-03-22 21:47:43 +00:00
|
|
|
/*
|
|
|
|
* Enable/disable handling same-value filled pages (enabled by default).
|
|
|
|
* If disabled every page is considered non-same-value filled.
|
|
|
|
*/
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
static bool zswap_same_filled_pages_enabled = true;
|
|
|
|
module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
|
|
|
|
bool, 0644);
|
|
|
|
|
2022-03-22 21:47:43 +00:00
|
|
|
/* Enable/disable handling non-same-value filled pages (enabled by default) */
|
|
|
|
static bool zswap_non_same_filled_pages_enabled = true;
|
|
|
|
module_param_named(non_same_filled_pages_enabled, zswap_non_same_filled_pages_enabled,
|
|
|
|
bool, 0644);
|
|
|
|
|
2023-06-07 19:51:43 +00:00
|
|
|
static bool zswap_exclusive_loads_enabled = IS_ENABLED(
|
|
|
|
CONFIG_ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON);
|
|
|
|
module_param_named(exclusive_loads, zswap_exclusive_loads_enabled, bool, 0644);
|
|
|
|
|
2023-06-20 19:46:44 +00:00
|
|
|
/* Number of zpools in zswap_pool (empirically determined for scalability) */
|
|
|
|
#define ZSWAP_NR_ZPOOLS 32
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/*********************************
|
2015-09-09 22:35:19 +00:00
|
|
|
* data structures
|
2013-07-10 23:05:03 +00:00
|
|
|
**********************************/
|
|
|
|
|
2020-12-15 03:14:18 +00:00
|
|
|
struct crypto_acomp_ctx {
|
|
|
|
struct crypto_acomp *acomp;
|
|
|
|
struct acomp_req *req;
|
|
|
|
struct crypto_wait wait;
|
|
|
|
u8 *dstmem;
|
|
|
|
struct mutex *mutex;
|
|
|
|
};
|
|
|
|
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
/*
|
|
|
|
* The lock ordering is zswap_tree.lock -> zswap_pool.lru_lock.
|
|
|
|
* The only case where lru_lock is not acquired while holding tree.lock is
|
|
|
|
* when a zswap_entry is taken off the lru for writeback, in that case it
|
|
|
|
* needs to be verified that it's still valid in the tree.
|
|
|
|
*/
|
2015-09-09 22:35:19 +00:00
|
|
|
struct zswap_pool {
|
2023-06-20 19:46:44 +00:00
|
|
|
struct zpool *zpools[ZSWAP_NR_ZPOOLS];
|
2020-12-15 03:14:18 +00:00
|
|
|
struct crypto_acomp_ctx __percpu *acomp_ctx;
|
2015-09-09 22:35:19 +00:00
|
|
|
struct kref kref;
|
|
|
|
struct list_head list;
|
2020-01-31 06:15:04 +00:00
|
|
|
struct work_struct release_work;
|
|
|
|
struct work_struct shrink_work;
|
2016-11-26 23:13:40 +00:00
|
|
|
struct hlist_node node;
|
2015-09-09 22:35:19 +00:00
|
|
|
char tfm_name[CRYPTO_MAX_ALG_NAME];
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
struct list_head lru;
|
|
|
|
spinlock_t lru_lock;
|
2013-07-10 23:05:03 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* struct zswap_entry
|
|
|
|
*
|
|
|
|
* This structure contains the metadata for tracking a single compressed
|
|
|
|
* page within zswap.
|
|
|
|
*
|
|
|
|
* rbnode - links the entry into red-black tree for the appropriate swap type
|
2023-08-08 06:20:56 +00:00
|
|
|
* swpentry - associated swap entry, the offset indexes into the red-black tree
|
2013-07-10 23:05:03 +00:00
|
|
|
* refcount - the number of outstanding reference to the entry. This is needed
|
|
|
|
* to protect against premature freeing of the entry by code
|
2014-04-07 22:38:25 +00:00
|
|
|
* concurrent calls to load, invalidate, and writeback. The lock
|
2013-07-10 23:05:03 +00:00
|
|
|
* for the zswap_tree structure that contains the entry must
|
|
|
|
* be held while changing the refcount. Since the lock must
|
|
|
|
* be held, there is no reason to also make refcount atomic.
|
|
|
|
* length - the length in bytes of the compressed page data. Needed during
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
* decompression. For a same value filled page length is 0, and both
|
|
|
|
* pool and lru are invalid and must be ignored.
|
2015-09-09 22:35:19 +00:00
|
|
|
* pool - the zswap_pool the entry's data is in
|
|
|
|
* handle - zpool allocation handle that stores the compressed page data
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
* value - value of the same-value filled pages which have same content
|
2023-08-08 06:20:56 +00:00
|
|
|
* objcg - the obj_cgroup that the compressed memory is charged to
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
* lru - handle to the pool's lru used to evict pages.
|
2013-07-10 23:05:03 +00:00
|
|
|
*/
|
|
|
|
struct zswap_entry {
|
|
|
|
struct rb_node rbnode;
|
2023-06-12 09:38:15 +00:00
|
|
|
swp_entry_t swpentry;
|
2013-07-10 23:05:03 +00:00
|
|
|
int refcount;
|
|
|
|
unsigned int length;
|
2015-09-09 22:35:19 +00:00
|
|
|
struct zswap_pool *pool;
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
union {
|
|
|
|
unsigned long handle;
|
|
|
|
unsigned long value;
|
|
|
|
};
|
2022-05-19 21:08:53 +00:00
|
|
|
struct obj_cgroup *objcg;
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
struct list_head lru;
|
2013-07-10 23:05:03 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The tree lock in the zswap_tree struct protects a few things:
|
|
|
|
* - the rbtree
|
|
|
|
* - the refcount field of each entry in the tree
|
|
|
|
*/
|
|
|
|
struct zswap_tree {
|
|
|
|
struct rb_root rbroot;
|
|
|
|
spinlock_t lock;
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct zswap_tree *zswap_trees[MAX_SWAPFILES];
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
/* RCU-protected iteration */
|
|
|
|
static LIST_HEAD(zswap_pools);
|
|
|
|
/* protects zswap_pools list modification */
|
|
|
|
static DEFINE_SPINLOCK(zswap_pools_lock);
|
2016-05-05 23:22:23 +00:00
|
|
|
/* pool counter to provide unique names to zpool */
|
|
|
|
static atomic_t zswap_pools_count = ATOMIC_INIT(0);
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2023-04-03 12:13:17 +00:00
|
|
|
enum zswap_init_type {
|
|
|
|
ZSWAP_UNINIT,
|
|
|
|
ZSWAP_INIT_SUCCEED,
|
|
|
|
ZSWAP_INIT_FAILED
|
|
|
|
};
|
2015-09-09 22:35:21 +00:00
|
|
|
|
2023-04-03 12:13:17 +00:00
|
|
|
static enum zswap_init_type zswap_init_state;
|
2015-09-09 22:35:21 +00:00
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
/* used to ensure the integrity of initialization */
|
|
|
|
static DEFINE_MUTEX(zswap_init_lock);
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
|
2017-02-27 22:26:47 +00:00
|
|
|
/* init completed, but couldn't create the initial pool */
|
|
|
|
static bool zswap_has_pool;
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
/*********************************
|
|
|
|
* helpers and fwd declarations
|
|
|
|
**********************************/
|
|
|
|
|
|
|
|
#define zswap_pool_debug(msg, p) \
|
|
|
|
pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \
|
2023-06-20 19:46:44 +00:00
|
|
|
zpool_get_type((p)->zpools[0]))
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2023-06-12 09:38:15 +00:00
|
|
|
static int zswap_writeback_entry(struct zswap_entry *entry,
|
2023-06-12 09:38:14 +00:00
|
|
|
struct zswap_tree *tree);
|
2015-09-09 22:35:19 +00:00
|
|
|
static int zswap_pool_get(struct zswap_pool *pool);
|
|
|
|
static void zswap_pool_put(struct zswap_pool *pool);
|
|
|
|
|
|
|
|
static bool zswap_is_full(void)
|
|
|
|
{
|
2018-12-28 08:34:29 +00:00
|
|
|
return totalram_pages() * zswap_max_pool_percent / 100 <
|
|
|
|
DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
|
2015-09-09 22:35:19 +00:00
|
|
|
}
|
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
static bool zswap_can_accept(void)
|
|
|
|
{
|
|
|
|
return totalram_pages() * zswap_accept_thr_percent / 100 *
|
|
|
|
zswap_max_pool_percent / 100 >
|
|
|
|
DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
|
|
|
|
}
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
static void zswap_update_total_size(void)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
u64 total = 0;
|
2023-06-20 19:46:44 +00:00
|
|
|
int i;
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
list_for_each_entry_rcu(pool, &zswap_pools, list)
|
2023-06-20 19:46:44 +00:00
|
|
|
for (i = 0; i < ZSWAP_NR_ZPOOLS; i++)
|
|
|
|
total += zpool_get_total_size(pool->zpools[i]);
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
zswap_pool_total_size = total;
|
|
|
|
}
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/*********************************
|
|
|
|
* zswap entry functions
|
|
|
|
**********************************/
|
|
|
|
static struct kmem_cache *zswap_entry_cache;
|
|
|
|
|
|
|
|
static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp)
|
|
|
|
{
|
|
|
|
struct zswap_entry *entry;
|
|
|
|
entry = kmem_cache_alloc(zswap_entry_cache, gfp);
|
|
|
|
if (!entry)
|
|
|
|
return NULL;
|
|
|
|
entry->refcount = 1;
|
2013-11-12 23:08:27 +00:00
|
|
|
RB_CLEAR_NODE(&entry->rbnode);
|
2013-07-10 23:05:03 +00:00
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_entry_cache_free(struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
kmem_cache_free(zswap_entry_cache, entry);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* rbtree functions
|
|
|
|
**********************************/
|
|
|
|
static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset)
|
|
|
|
{
|
|
|
|
struct rb_node *node = root->rb_node;
|
|
|
|
struct zswap_entry *entry;
|
2023-06-12 09:38:15 +00:00
|
|
|
pgoff_t entry_offset;
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
while (node) {
|
|
|
|
entry = rb_entry(node, struct zswap_entry, rbnode);
|
2023-06-12 09:38:15 +00:00
|
|
|
entry_offset = swp_offset(entry->swpentry);
|
|
|
|
if (entry_offset > offset)
|
2013-07-10 23:05:03 +00:00
|
|
|
node = node->rb_left;
|
2023-06-12 09:38:15 +00:00
|
|
|
else if (entry_offset < offset)
|
2013-07-10 23:05:03 +00:00
|
|
|
node = node->rb_right;
|
|
|
|
else
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In the case that a entry with the same offset is found, a pointer to
|
|
|
|
* the existing entry is stored in dupentry and the function returns -EEXIST
|
|
|
|
*/
|
|
|
|
static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry,
|
|
|
|
struct zswap_entry **dupentry)
|
|
|
|
{
|
|
|
|
struct rb_node **link = &root->rb_node, *parent = NULL;
|
|
|
|
struct zswap_entry *myentry;
|
2023-06-12 09:38:15 +00:00
|
|
|
pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
while (*link) {
|
|
|
|
parent = *link;
|
|
|
|
myentry = rb_entry(parent, struct zswap_entry, rbnode);
|
2023-06-12 09:38:15 +00:00
|
|
|
myentry_offset = swp_offset(myentry->swpentry);
|
|
|
|
if (myentry_offset > entry_offset)
|
2013-07-10 23:05:03 +00:00
|
|
|
link = &(*link)->rb_left;
|
2023-06-12 09:38:15 +00:00
|
|
|
else if (myentry_offset < entry_offset)
|
2013-07-10 23:05:03 +00:00
|
|
|
link = &(*link)->rb_right;
|
|
|
|
else {
|
|
|
|
*dupentry = myentry;
|
|
|
|
return -EEXIST;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
rb_link_node(&entry->rbnode, parent, link);
|
|
|
|
rb_insert_color(&entry->rbnode, root);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before
returning from zswap_frontswap_load(), after dropping the local reference.
However, the tree lock is dropped during decompression after the local
reference is acquired, so the entry could be invalidated before we drop
the local ref. If this happens, the entry is freed once we drop the local
ref, and zswap_invalidate_entry() tries to invalidate an already freed
entry.
Fix this by:
(a) Making sure zswap_invalidate_entry() is always called with a local
ref held, to avoid being called on a freed entry.
(b) Making sure zswap_invalidate_entry() only drops the ref if the entry
was actually on the rbtree. Otherwise, another invalidation could
have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to
make sure the entry still exists in the tree in zswap_reclaim_entry()
before invalidating it, as zswap_reclaim_entry() will make this check
internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com
Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-21 09:30:09 +00:00
|
|
|
static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
|
2013-11-12 23:08:27 +00:00
|
|
|
{
|
|
|
|
if (!RB_EMPTY_NODE(&entry->rbnode)) {
|
|
|
|
rb_erase(&entry->rbnode, root);
|
|
|
|
RB_CLEAR_NODE(&entry->rbnode);
|
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before
returning from zswap_frontswap_load(), after dropping the local reference.
However, the tree lock is dropped during decompression after the local
reference is acquired, so the entry could be invalidated before we drop
the local ref. If this happens, the entry is freed once we drop the local
ref, and zswap_invalidate_entry() tries to invalidate an already freed
entry.
Fix this by:
(a) Making sure zswap_invalidate_entry() is always called with a local
ref held, to avoid being called on a freed entry.
(b) Making sure zswap_invalidate_entry() only drops the ref if the entry
was actually on the rbtree. Otherwise, another invalidation could
have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to
make sure the entry still exists in the tree in zswap_reclaim_entry()
before invalidating it, as zswap_reclaim_entry() will make this check
internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com
Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-21 09:30:09 +00:00
|
|
|
return true;
|
2013-11-12 23:08:27 +00:00
|
|
|
}
|
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before
returning from zswap_frontswap_load(), after dropping the local reference.
However, the tree lock is dropped during decompression after the local
reference is acquired, so the entry could be invalidated before we drop
the local ref. If this happens, the entry is freed once we drop the local
ref, and zswap_invalidate_entry() tries to invalidate an already freed
entry.
Fix this by:
(a) Making sure zswap_invalidate_entry() is always called with a local
ref held, to avoid being called on a freed entry.
(b) Making sure zswap_invalidate_entry() only drops the ref if the entry
was actually on the rbtree. Otherwise, another invalidation could
have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to
make sure the entry still exists in the tree in zswap_reclaim_entry()
before invalidating it, as zswap_reclaim_entry() will make this check
internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com
Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-21 09:30:09 +00:00
|
|
|
return false;
|
2013-11-12 23:08:27 +00:00
|
|
|
}
|
|
|
|
|
2023-06-20 19:46:44 +00:00
|
|
|
static struct zpool *zswap_find_zpool(struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
int i = 0;
|
|
|
|
|
|
|
|
if (ZSWAP_NR_ZPOOLS > 1)
|
|
|
|
i = hash_ptr(entry, ilog2(ZSWAP_NR_ZPOOLS));
|
|
|
|
|
|
|
|
return entry->pool->zpools[i];
|
|
|
|
}
|
|
|
|
|
2013-11-12 23:08:27 +00:00
|
|
|
/*
|
2014-08-06 23:08:40 +00:00
|
|
|
* Carries out the common pattern of freeing and entry's zpool allocation,
|
2013-11-12 23:08:27 +00:00
|
|
|
* freeing the entry itself, and decrementing the number of stored pages.
|
|
|
|
*/
|
2014-04-07 22:38:27 +00:00
|
|
|
static void zswap_free_entry(struct zswap_entry *entry)
|
2013-11-12 23:08:27 +00:00
|
|
|
{
|
2022-05-19 21:08:53 +00:00
|
|
|
if (entry->objcg) {
|
|
|
|
obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
|
|
|
|
obj_cgroup_put(entry->objcg);
|
|
|
|
}
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
if (!entry->length)
|
|
|
|
atomic_dec(&zswap_same_filled_pages);
|
|
|
|
else {
|
2023-06-12 09:38:13 +00:00
|
|
|
spin_lock(&entry->pool->lru_lock);
|
|
|
|
list_del(&entry->lru);
|
|
|
|
spin_unlock(&entry->pool->lru_lock);
|
2023-06-20 19:46:44 +00:00
|
|
|
zpool_free(zswap_find_zpool(entry), entry->handle);
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
zswap_pool_put(entry->pool);
|
|
|
|
}
|
2013-11-12 23:08:27 +00:00
|
|
|
zswap_entry_cache_free(entry);
|
|
|
|
atomic_dec(&zswap_stored_pages);
|
2015-09-09 22:35:19 +00:00
|
|
|
zswap_update_total_size();
|
2013-11-12 23:08:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* caller must hold the tree lock */
|
|
|
|
static void zswap_entry_get(struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
entry->refcount++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* caller must hold the tree lock
|
|
|
|
* remove from the tree and free it, if nobody reference the entry
|
|
|
|
*/
|
|
|
|
static void zswap_entry_put(struct zswap_tree *tree,
|
|
|
|
struct zswap_entry *entry)
|
|
|
|
{
|
|
|
|
int refcount = --entry->refcount;
|
|
|
|
|
2023-07-27 16:22:24 +00:00
|
|
|
WARN_ON_ONCE(refcount < 0);
|
2013-11-12 23:08:27 +00:00
|
|
|
if (refcount == 0) {
|
2023-07-27 16:22:24 +00:00
|
|
|
WARN_ON_ONCE(!RB_EMPTY_NODE(&entry->rbnode));
|
2014-04-07 22:38:27 +00:00
|
|
|
zswap_free_entry(entry);
|
2013-11-12 23:08:27 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* caller must hold the tree lock */
|
|
|
|
static struct zswap_entry *zswap_entry_find_get(struct rb_root *root,
|
|
|
|
pgoff_t offset)
|
|
|
|
{
|
2015-11-07 00:29:09 +00:00
|
|
|
struct zswap_entry *entry;
|
2013-11-12 23:08:27 +00:00
|
|
|
|
|
|
|
entry = zswap_rb_search(root, offset);
|
|
|
|
if (entry)
|
|
|
|
zswap_entry_get(entry);
|
|
|
|
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/*********************************
|
|
|
|
* per-cpu code
|
|
|
|
**********************************/
|
|
|
|
static DEFINE_PER_CPU(u8 *, zswap_dstmem);
|
2020-12-15 03:14:18 +00:00
|
|
|
/*
|
|
|
|
* If users dynamically change the zpool type and compressor at runtime, i.e.
|
|
|
|
* zswap is running, zswap can have more than one zpool on one cpu, but they
|
|
|
|
* are sharing dtsmem. So we need this mutex to be per-cpu.
|
|
|
|
*/
|
|
|
|
static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2016-11-26 23:13:39 +00:00
|
|
|
static int zswap_dstmem_prepare(unsigned int cpu)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
2020-12-15 03:14:18 +00:00
|
|
|
struct mutex *mutex;
|
2013-07-10 23:05:03 +00:00
|
|
|
u8 *dst;
|
|
|
|
|
2016-11-26 23:13:39 +00:00
|
|
|
dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
|
2017-07-06 22:40:40 +00:00
|
|
|
if (!dst)
|
2016-11-26 23:13:39 +00:00
|
|
|
return -ENOMEM;
|
2017-07-06 22:40:40 +00:00
|
|
|
|
2020-12-15 03:14:18 +00:00
|
|
|
mutex = kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu));
|
|
|
|
if (!mutex) {
|
|
|
|
kfree(dst);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_init(mutex);
|
2016-11-26 23:13:39 +00:00
|
|
|
per_cpu(zswap_dstmem, cpu) = dst;
|
2020-12-15 03:14:18 +00:00
|
|
|
per_cpu(zswap_mutex, cpu) = mutex;
|
2016-11-26 23:13:39 +00:00
|
|
|
return 0;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
|
2016-11-26 23:13:39 +00:00
|
|
|
static int zswap_dstmem_dead(unsigned int cpu)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
2020-12-15 03:14:18 +00:00
|
|
|
struct mutex *mutex;
|
2016-11-26 23:13:39 +00:00
|
|
|
u8 *dst;
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2020-12-15 03:14:18 +00:00
|
|
|
mutex = per_cpu(zswap_mutex, cpu);
|
|
|
|
kfree(mutex);
|
|
|
|
per_cpu(zswap_mutex, cpu) = NULL;
|
|
|
|
|
2016-11-26 23:13:39 +00:00
|
|
|
dst = per_cpu(zswap_dstmem, cpu);
|
|
|
|
kfree(dst);
|
|
|
|
per_cpu(zswap_dstmem, cpu) = NULL;
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-11-26 23:13:40 +00:00
|
|
|
static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
|
2015-09-09 22:35:19 +00:00
|
|
|
{
|
2016-11-26 23:13:40 +00:00
|
|
|
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
|
2020-12-15 03:14:18 +00:00
|
|
|
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
|
|
|
|
struct crypto_acomp *acomp;
|
|
|
|
struct acomp_req *req;
|
|
|
|
|
|
|
|
acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
|
|
|
|
if (IS_ERR(acomp)) {
|
|
|
|
pr_err("could not alloc crypto acomp %s : %ld\n",
|
|
|
|
pool->tfm_name, PTR_ERR(acomp));
|
|
|
|
return PTR_ERR(acomp);
|
|
|
|
}
|
|
|
|
acomp_ctx->acomp = acomp;
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2020-12-15 03:14:18 +00:00
|
|
|
req = acomp_request_alloc(acomp_ctx->acomp);
|
|
|
|
if (!req) {
|
|
|
|
pr_err("could not alloc crypto acomp_request %s\n",
|
|
|
|
pool->tfm_name);
|
|
|
|
crypto_free_acomp(acomp_ctx->acomp);
|
2016-11-26 23:13:40 +00:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2020-12-15 03:14:18 +00:00
|
|
|
acomp_ctx->req = req;
|
|
|
|
|
|
|
|
crypto_init_wait(&acomp_ctx->wait);
|
|
|
|
/*
|
|
|
|
* if the backend of acomp is async zip, crypto_req_done() will wakeup
|
|
|
|
* crypto_wait_req(); if the backend of acomp is scomp, the callback
|
|
|
|
* won't be called, crypto_wait_req() will return without blocking.
|
|
|
|
*/
|
|
|
|
acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
|
|
|
|
crypto_req_done, &acomp_ctx->wait);
|
|
|
|
|
|
|
|
acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
|
|
|
|
acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-11-26 23:13:40 +00:00
|
|
|
static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
|
2015-09-09 22:35:19 +00:00
|
|
|
{
|
2016-11-26 23:13:40 +00:00
|
|
|
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
|
2020-12-15 03:14:18 +00:00
|
|
|
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
|
|
|
|
|
|
|
|
if (!IS_ERR_OR_NULL(acomp_ctx)) {
|
|
|
|
if (!IS_ERR_OR_NULL(acomp_ctx->req))
|
|
|
|
acomp_request_free(acomp_ctx->req);
|
|
|
|
if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
|
|
|
|
crypto_free_acomp(acomp_ctx->acomp);
|
|
|
|
}
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2016-11-26 23:13:40 +00:00
|
|
|
return 0;
|
2015-09-09 22:35:19 +00:00
|
|
|
}
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/*********************************
|
2015-09-09 22:35:19 +00:00
|
|
|
* pool functions
|
2013-07-10 23:05:03 +00:00
|
|
|
**********************************/
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
static struct zswap_pool *__zswap_pool_current(void)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
2015-09-09 22:35:19 +00:00
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
pool = list_first_or_null_rcu(&zswap_pools, typeof(*pool), list);
|
2017-02-27 22:26:47 +00:00
|
|
|
WARN_ONCE(!pool && zswap_has_pool,
|
|
|
|
"%s: no page storage pool!\n", __func__);
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
return pool;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_pool *zswap_pool_current(void)
|
|
|
|
{
|
|
|
|
assert_spin_locked(&zswap_pools_lock);
|
|
|
|
|
|
|
|
return __zswap_pool_current();
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_pool *zswap_pool_current_get(void)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
pool = __zswap_pool_current();
|
2017-02-27 22:26:47 +00:00
|
|
|
if (!zswap_pool_get(pool))
|
2015-09-09 22:35:19 +00:00
|
|
|
pool = NULL;
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return pool;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct zswap_pool *zswap_pool_last_get(void)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool, *last = NULL;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
list_for_each_entry_rcu(pool, &zswap_pools, list)
|
|
|
|
last = pool;
|
2017-02-27 22:26:47 +00:00
|
|
|
WARN_ONCE(!last && zswap_has_pool,
|
|
|
|
"%s: no page storage pool!\n", __func__);
|
|
|
|
if (!zswap_pool_get(last))
|
2015-09-09 22:35:19 +00:00
|
|
|
last = NULL;
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return last;
|
|
|
|
}
|
|
|
|
|
2015-12-18 22:22:04 +00:00
|
|
|
/* type and compressor must be null-terminated */
|
2015-09-09 22:35:19 +00:00
|
|
|
static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
assert_spin_locked(&zswap_pools_lock);
|
|
|
|
|
|
|
|
list_for_each_entry_rcu(pool, &zswap_pools, list) {
|
2015-12-18 22:22:04 +00:00
|
|
|
if (strcmp(pool->tfm_name, compressor))
|
2015-09-09 22:35:19 +00:00
|
|
|
continue;
|
2023-06-20 19:46:44 +00:00
|
|
|
/* all zpools share the same type */
|
|
|
|
if (strcmp(zpool_get_type(pool->zpools[0]), type))
|
2015-09-09 22:35:19 +00:00
|
|
|
continue;
|
|
|
|
/* if we can't get it, it's about to be destroyed */
|
|
|
|
if (!zswap_pool_get(pool))
|
|
|
|
continue;
|
|
|
|
return pool;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before
returning from zswap_frontswap_load(), after dropping the local reference.
However, the tree lock is dropped during decompression after the local
reference is acquired, so the entry could be invalidated before we drop
the local ref. If this happens, the entry is freed once we drop the local
ref, and zswap_invalidate_entry() tries to invalidate an already freed
entry.
Fix this by:
(a) Making sure zswap_invalidate_entry() is always called with a local
ref held, to avoid being called on a freed entry.
(b) Making sure zswap_invalidate_entry() only drops the ref if the entry
was actually on the rbtree. Otherwise, another invalidation could
have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to
make sure the entry still exists in the tree in zswap_reclaim_entry()
before invalidating it, as zswap_reclaim_entry() will make this check
internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com
Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-21 09:30:09 +00:00
|
|
|
/*
|
|
|
|
* If the entry is still valid in the tree, drop the initial ref and remove it
|
|
|
|
* from the tree. This function must be called with an additional ref held,
|
|
|
|
* otherwise it may race with another invalidation freeing the entry.
|
|
|
|
*/
|
2023-06-14 14:31:22 +00:00
|
|
|
static void zswap_invalidate_entry(struct zswap_tree *tree,
|
|
|
|
struct zswap_entry *entry)
|
|
|
|
{
|
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before
returning from zswap_frontswap_load(), after dropping the local reference.
However, the tree lock is dropped during decompression after the local
reference is acquired, so the entry could be invalidated before we drop
the local ref. If this happens, the entry is freed once we drop the local
ref, and zswap_invalidate_entry() tries to invalidate an already freed
entry.
Fix this by:
(a) Making sure zswap_invalidate_entry() is always called with a local
ref held, to avoid being called on a freed entry.
(b) Making sure zswap_invalidate_entry() only drops the ref if the entry
was actually on the rbtree. Otherwise, another invalidation could
have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to
make sure the entry still exists in the tree in zswap_reclaim_entry()
before invalidating it, as zswap_reclaim_entry() will make this check
internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com
Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-21 09:30:09 +00:00
|
|
|
if (zswap_rb_erase(&tree->rbroot, entry))
|
|
|
|
zswap_entry_put(tree, entry);
|
2023-06-14 14:31:22 +00:00
|
|
|
}
|
|
|
|
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
static int zswap_reclaim_entry(struct zswap_pool *pool)
|
|
|
|
{
|
|
|
|
struct zswap_entry *entry;
|
|
|
|
struct zswap_tree *tree;
|
|
|
|
pgoff_t swpoffset;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* Get an entry off the LRU */
|
|
|
|
spin_lock(&pool->lru_lock);
|
|
|
|
if (list_empty(&pool->lru)) {
|
|
|
|
spin_unlock(&pool->lru_lock);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
entry = list_last_entry(&pool->lru, struct zswap_entry, lru);
|
|
|
|
list_del_init(&entry->lru);
|
|
|
|
/*
|
|
|
|
* Once the lru lock is dropped, the entry might get freed. The
|
|
|
|
* swpoffset is copied to the stack, and entry isn't deref'd again
|
|
|
|
* until the entry is verified to still be alive in the tree.
|
|
|
|
*/
|
2023-06-12 09:38:15 +00:00
|
|
|
swpoffset = swp_offset(entry->swpentry);
|
|
|
|
tree = zswap_trees[swp_type(entry->swpentry)];
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
spin_unlock(&pool->lru_lock);
|
|
|
|
|
|
|
|
/* Check for invalidate() race */
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) {
|
|
|
|
ret = -EAGAIN;
|
|
|
|
goto unlock;
|
|
|
|
}
|
|
|
|
/* Hold a reference to prevent a free during writeback */
|
|
|
|
zswap_entry_get(entry);
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
2023-06-12 09:38:15 +00:00
|
|
|
ret = zswap_writeback_entry(entry, tree);
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
if (ret) {
|
|
|
|
/* Writeback failed, put entry back on LRU */
|
|
|
|
spin_lock(&pool->lru_lock);
|
|
|
|
list_move(&entry->lru, &pool->lru);
|
|
|
|
spin_unlock(&pool->lru_lock);
|
2023-06-12 09:38:14 +00:00
|
|
|
goto put_unlock;
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
}
|
|
|
|
|
2023-06-14 14:31:22 +00:00
|
|
|
/*
|
|
|
|
* Writeback started successfully, the page now belongs to the
|
|
|
|
* swapcache. Drop the entry from zswap - unless invalidate already
|
|
|
|
* took it out while we had the tree->lock released for IO.
|
|
|
|
*/
|
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before
returning from zswap_frontswap_load(), after dropping the local reference.
However, the tree lock is dropped during decompression after the local
reference is acquired, so the entry could be invalidated before we drop
the local ref. If this happens, the entry is freed once we drop the local
ref, and zswap_invalidate_entry() tries to invalidate an already freed
entry.
Fix this by:
(a) Making sure zswap_invalidate_entry() is always called with a local
ref held, to avoid being called on a freed entry.
(b) Making sure zswap_invalidate_entry() only drops the ref if the entry
was actually on the rbtree. Otherwise, another invalidation could
have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to
make sure the entry still exists in the tree in zswap_reclaim_entry()
before invalidating it, as zswap_reclaim_entry() will make this check
internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com
Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-21 09:30:09 +00:00
|
|
|
zswap_invalidate_entry(tree, entry);
|
2023-06-12 09:38:14 +00:00
|
|
|
|
|
|
|
put_unlock:
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
/* Drop local reference */
|
|
|
|
zswap_entry_put(tree, entry);
|
|
|
|
unlock:
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
return ret ? -EAGAIN : 0;
|
|
|
|
}
|
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
static void shrink_worker(struct work_struct *w)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool = container_of(w, typeof(*pool),
|
|
|
|
shrink_work);
|
mm: zswap: shrink until can accept
This update addresses an issue with the zswap reclaim mechanism, which
hinders the efficient offloading of cold pages to disk, thereby
compromising the preservation of the LRU order and consequently
diminishing, if not inverting, its performance benefits.
The functioning of the zswap shrink worker was found to be inadequate, as
shown by basic benchmark test. For the test, a kernel build was utilized
as a reference, with its memory confined to 1G via a cgroup and a 5G swap
file provided. The results are presented below, these are averages of
three runs without the use of zswap:
real 46m26s
user 35m4s
sys 7m37s
With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G
system), the results changed to:
real 56m4s
user 35m13s
sys 8m43s
written_back_pages: 18
reject_reclaim_fail: 0
pool_limit_hit:1478
Besides the evident regression, one thing to notice from this data is the
extremely low number of written_back_pages and pool_limit_hit.
The pool_limit_hit counter, which is increased in zswap_frontswap_store
when zswap is completely full, doesn't account for a particular scenario:
once zswap hits his limit, zswap_pool_reached_full is set to true; with
this flag on, zswap_frontswap_store rejects pages if zswap is still above
the acceptance threshold. Once we include the rejections due to
zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478
to a significant 21578266.
Zswap is stuck in an undesirable state where it rejects pages because it's
above the acceptance threshold, yet fails to attempt memory reclaimation.
This happens because the shrink work is only queued when
zswap_frontswap_store detects that it's full and the work itself only
reclaims one page per run.
This state results in hot pages getting written directly to disk, while
cold ones remain memory, waiting only to be invalidated. The LRU order is
completely broken and zswap ends up being just an overhead without
providing any benefits.
This commit applies 2 changes: a) the shrink worker is set to reclaim
pages until the acceptance threshold is met and b) the task is also
enqueued when zswap is not full but still above the threshold.
Testing this suggested update showed much better numbers:
real 36m37s
user 35m8s
sys 9m32s
written_back_pages: 10459423
reject_reclaim_fail: 12896
pool_limit_hit: 75653
Link: https://lkml.kernel.org/r/20230526183227.793977-1-cerasuolodomenico@gmail.com
Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-26 18:32:27 +00:00
|
|
|
int ret, failures = 0;
|
|
|
|
|
|
|
|
do {
|
2023-06-12 09:38:13 +00:00
|
|
|
ret = zswap_reclaim_entry(pool);
|
mm: zswap: shrink until can accept
This update addresses an issue with the zswap reclaim mechanism, which
hinders the efficient offloading of cold pages to disk, thereby
compromising the preservation of the LRU order and consequently
diminishing, if not inverting, its performance benefits.
The functioning of the zswap shrink worker was found to be inadequate, as
shown by basic benchmark test. For the test, a kernel build was utilized
as a reference, with its memory confined to 1G via a cgroup and a 5G swap
file provided. The results are presented below, these are averages of
three runs without the use of zswap:
real 46m26s
user 35m4s
sys 7m37s
With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G
system), the results changed to:
real 56m4s
user 35m13s
sys 8m43s
written_back_pages: 18
reject_reclaim_fail: 0
pool_limit_hit:1478
Besides the evident regression, one thing to notice from this data is the
extremely low number of written_back_pages and pool_limit_hit.
The pool_limit_hit counter, which is increased in zswap_frontswap_store
when zswap is completely full, doesn't account for a particular scenario:
once zswap hits his limit, zswap_pool_reached_full is set to true; with
this flag on, zswap_frontswap_store rejects pages if zswap is still above
the acceptance threshold. Once we include the rejections due to
zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478
to a significant 21578266.
Zswap is stuck in an undesirable state where it rejects pages because it's
above the acceptance threshold, yet fails to attempt memory reclaimation.
This happens because the shrink work is only queued when
zswap_frontswap_store detects that it's full and the work itself only
reclaims one page per run.
This state results in hot pages getting written directly to disk, while
cold ones remain memory, waiting only to be invalidated. The LRU order is
completely broken and zswap ends up being just an overhead without
providing any benefits.
This commit applies 2 changes: a) the shrink worker is set to reclaim
pages until the acceptance threshold is met and b) the task is also
enqueued when zswap is not full but still above the threshold.
Testing this suggested update showed much better numbers:
real 36m37s
user 35m8s
sys 9m32s
written_back_pages: 10459423
reject_reclaim_fail: 12896
pool_limit_hit: 75653
Link: https://lkml.kernel.org/r/20230526183227.793977-1-cerasuolodomenico@gmail.com
Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-26 18:32:27 +00:00
|
|
|
if (ret) {
|
|
|
|
zswap_reject_reclaim_fail++;
|
|
|
|
if (ret != -EAGAIN)
|
|
|
|
break;
|
|
|
|
if (++failures == MAX_RECLAIM_RETRIES)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
cond_resched();
|
|
|
|
} while (!zswap_can_accept());
|
2020-01-31 06:15:04 +00:00
|
|
|
zswap_pool_put(pool);
|
|
|
|
}
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
|
|
|
|
{
|
2023-06-20 19:46:44 +00:00
|
|
|
int i;
|
2015-09-09 22:35:19 +00:00
|
|
|
struct zswap_pool *pool;
|
2016-05-05 23:22:23 +00:00
|
|
|
char name[38]; /* 'zswap' + 32 char (max) num + \0 */
|
2015-11-07 00:28:21 +00:00
|
|
|
gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
|
2016-11-26 23:13:40 +00:00
|
|
|
int ret;
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2017-02-27 22:26:50 +00:00
|
|
|
if (!zswap_has_pool) {
|
|
|
|
/* if either are unset, pool initialization failed, and we
|
|
|
|
* need both params to be set correctly before trying to
|
|
|
|
* create a pool.
|
|
|
|
*/
|
|
|
|
if (!strcmp(type, ZSWAP_PARAM_UNSET))
|
|
|
|
return NULL;
|
|
|
|
if (!strcmp(compressor, ZSWAP_PARAM_UNSET))
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
pool = kzalloc(sizeof(*pool), GFP_KERNEL);
|
2017-07-06 22:40:34 +00:00
|
|
|
if (!pool)
|
2015-09-09 22:35:19 +00:00
|
|
|
return NULL;
|
|
|
|
|
2023-06-20 19:46:44 +00:00
|
|
|
for (i = 0; i < ZSWAP_NR_ZPOOLS; i++) {
|
|
|
|
/* unique name for each pool specifically required by zsmalloc */
|
|
|
|
snprintf(name, 38, "zswap%x",
|
|
|
|
atomic_inc_return(&zswap_pools_count));
|
2016-05-05 23:22:23 +00:00
|
|
|
|
2023-06-20 19:46:44 +00:00
|
|
|
pool->zpools[i] = zpool_create_pool(type, name, gfp);
|
|
|
|
if (!pool->zpools[i]) {
|
|
|
|
pr_err("%s zpool not available\n", type);
|
|
|
|
goto error;
|
|
|
|
}
|
2015-09-09 22:35:19 +00:00
|
|
|
}
|
2023-06-20 19:46:44 +00:00
|
|
|
pr_debug("using %s zpool\n", zpool_get_type(pool->zpools[0]));
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2021-05-05 01:39:57 +00:00
|
|
|
strscpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
|
2020-12-15 03:14:18 +00:00
|
|
|
|
|
|
|
pool->acomp_ctx = alloc_percpu(*pool->acomp_ctx);
|
|
|
|
if (!pool->acomp_ctx) {
|
2015-09-09 22:35:19 +00:00
|
|
|
pr_err("percpu alloc failed\n");
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2016-11-26 23:13:40 +00:00
|
|
|
ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
|
|
|
|
&pool->node);
|
|
|
|
if (ret)
|
2015-09-09 22:35:19 +00:00
|
|
|
goto error;
|
|
|
|
pr_debug("using %s compressor\n", pool->tfm_name);
|
|
|
|
|
|
|
|
/* being the current pool takes 1 ref; this func expects the
|
|
|
|
* caller to always add the new pool as the current pool
|
|
|
|
*/
|
|
|
|
kref_init(&pool->kref);
|
|
|
|
INIT_LIST_HEAD(&pool->list);
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
INIT_LIST_HEAD(&pool->lru);
|
|
|
|
spin_lock_init(&pool->lru_lock);
|
2020-01-31 06:15:04 +00:00
|
|
|
INIT_WORK(&pool->shrink_work, shrink_worker);
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
zswap_pool_debug("created", pool);
|
|
|
|
|
|
|
|
return pool;
|
|
|
|
|
|
|
|
error:
|
2020-12-15 03:14:18 +00:00
|
|
|
if (pool->acomp_ctx)
|
|
|
|
free_percpu(pool->acomp_ctx);
|
2023-06-20 19:46:44 +00:00
|
|
|
while (i--)
|
|
|
|
zpool_destroy_pool(pool->zpools[i]);
|
2015-09-09 22:35:19 +00:00
|
|
|
kfree(pool);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
static struct zswap_pool *__zswap_pool_create_fallback(void)
|
2015-09-09 22:35:19 +00:00
|
|
|
{
|
2017-02-27 22:26:50 +00:00
|
|
|
bool has_comp, has_zpool;
|
|
|
|
|
2020-12-15 03:14:18 +00:00
|
|
|
has_comp = crypto_has_acomp(zswap_compressor, 0, 0);
|
2020-04-07 03:08:03 +00:00
|
|
|
if (!has_comp && strcmp(zswap_compressor,
|
|
|
|
CONFIG_ZSWAP_COMPRESSOR_DEFAULT)) {
|
2015-09-09 22:35:19 +00:00
|
|
|
pr_err("compressor %s not available, using default %s\n",
|
2020-04-07 03:08:03 +00:00
|
|
|
zswap_compressor, CONFIG_ZSWAP_COMPRESSOR_DEFAULT);
|
2015-11-07 00:29:15 +00:00
|
|
|
param_free_charp(&zswap_compressor);
|
2020-04-07 03:08:03 +00:00
|
|
|
zswap_compressor = CONFIG_ZSWAP_COMPRESSOR_DEFAULT;
|
2020-12-15 03:14:18 +00:00
|
|
|
has_comp = crypto_has_acomp(zswap_compressor, 0, 0);
|
2015-09-09 22:35:19 +00:00
|
|
|
}
|
2017-02-27 22:26:50 +00:00
|
|
|
if (!has_comp) {
|
|
|
|
pr_err("default compressor %s not available\n",
|
|
|
|
zswap_compressor);
|
|
|
|
param_free_charp(&zswap_compressor);
|
|
|
|
zswap_compressor = ZSWAP_PARAM_UNSET;
|
|
|
|
}
|
|
|
|
|
|
|
|
has_zpool = zpool_has_pool(zswap_zpool_type);
|
2020-04-07 03:08:03 +00:00
|
|
|
if (!has_zpool && strcmp(zswap_zpool_type,
|
|
|
|
CONFIG_ZSWAP_ZPOOL_DEFAULT)) {
|
2015-09-09 22:35:19 +00:00
|
|
|
pr_err("zpool %s not available, using default %s\n",
|
2020-04-07 03:08:03 +00:00
|
|
|
zswap_zpool_type, CONFIG_ZSWAP_ZPOOL_DEFAULT);
|
2015-11-07 00:29:15 +00:00
|
|
|
param_free_charp(&zswap_zpool_type);
|
2020-04-07 03:08:03 +00:00
|
|
|
zswap_zpool_type = CONFIG_ZSWAP_ZPOOL_DEFAULT;
|
2017-02-27 22:26:50 +00:00
|
|
|
has_zpool = zpool_has_pool(zswap_zpool_type);
|
2015-09-09 22:35:19 +00:00
|
|
|
}
|
2017-02-27 22:26:50 +00:00
|
|
|
if (!has_zpool) {
|
|
|
|
pr_err("default zpool %s not available\n",
|
|
|
|
zswap_zpool_type);
|
|
|
|
param_free_charp(&zswap_zpool_type);
|
|
|
|
zswap_zpool_type = ZSWAP_PARAM_UNSET;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!has_comp || !has_zpool)
|
|
|
|
return NULL;
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
return zswap_pool_create(zswap_zpool_type, zswap_compressor);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_pool_destroy(struct zswap_pool *pool)
|
|
|
|
{
|
2023-06-20 19:46:44 +00:00
|
|
|
int i;
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
zswap_pool_debug("destroying", pool);
|
|
|
|
|
2016-11-26 23:13:40 +00:00
|
|
|
cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
|
2020-12-15 03:14:18 +00:00
|
|
|
free_percpu(pool->acomp_ctx);
|
2023-06-20 19:46:44 +00:00
|
|
|
for (i = 0; i < ZSWAP_NR_ZPOOLS; i++)
|
|
|
|
zpool_destroy_pool(pool->zpools[i]);
|
2015-09-09 22:35:19 +00:00
|
|
|
kfree(pool);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __must_check zswap_pool_get(struct zswap_pool *pool)
|
|
|
|
{
|
2017-02-27 22:26:47 +00:00
|
|
|
if (!pool)
|
|
|
|
return 0;
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
return kref_get_unless_zero(&pool->kref);
|
|
|
|
}
|
|
|
|
|
mm/zswap: use workqueue to destroy pool
Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().
When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().
It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:
$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool
nor does this:
$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done
although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.
However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:
$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done
This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.
Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 23:59:54 +00:00
|
|
|
static void __zswap_pool_release(struct work_struct *work)
|
2015-09-09 22:35:19 +00:00
|
|
|
{
|
2020-01-31 06:15:04 +00:00
|
|
|
struct zswap_pool *pool = container_of(work, typeof(*pool),
|
|
|
|
release_work);
|
mm/zswap: use workqueue to destroy pool
Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().
When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().
It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:
$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool
nor does this:
$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done
although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.
However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:
$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done
This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.
Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 23:59:54 +00:00
|
|
|
|
|
|
|
synchronize_rcu();
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
/* nobody should have been able to get a kref... */
|
|
|
|
WARN_ON(kref_get_unless_zero(&pool->kref));
|
|
|
|
|
|
|
|
/* pool is now off zswap_pools list and has no references. */
|
|
|
|
zswap_pool_destroy(pool);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __zswap_pool_empty(struct kref *kref)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool;
|
|
|
|
|
|
|
|
pool = container_of(kref, typeof(*pool), kref);
|
|
|
|
|
|
|
|
spin_lock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
WARN_ON(pool == zswap_pool_current());
|
|
|
|
|
|
|
|
list_del_rcu(&pool->list);
|
mm/zswap: use workqueue to destroy pool
Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
use the workqueue instead of using call_rcu().
When zswap destroys a pool no longer in use, it uses call_rcu() to
perform the destruction/freeing. Since that executes in softirq
context, it must not sleep. However, actually destroying the pool
involves freeing the per-cpu compressors (which requires locking the
cpu_add_remove_lock mutex) and freeing the zpool, for which the
implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
locks the slab_mutex). So if either mutex is currently taken, or any
other part of the compressor or zpool implementation sleeps, it will
result in a BUG().
It's not easy to reproduce this when changing zswap's params normally.
In testing with a loaded system, this does not fail:
$ cd /sys/module/zswap/parameters
$ echo lz4 > compressor ; echo zsmalloc > zpool
nor does this:
$ while true ; do
> echo lzo > compressor ; echo zbud > zpool
> sleep 1
> echo lz4 > compressor ; echo zsmalloc > zpool
> sleep 1
> done
although it's still possible either of those might fail, depending on
whether anything else besides zswap has locked the mutexes.
However, changing a parameter with no delay immediately causes the
schedule while atomic BUG:
$ while true ; do
> echo lzo > compressor ; echo lz4 > compressor
> done
This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
but moved to zswap, to cover compressor and zpool freeing.
Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 23:59:54 +00:00
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
INIT_WORK(&pool->release_work, __zswap_pool_release);
|
|
|
|
schedule_work(&pool->release_work);
|
2015-09-09 22:35:19 +00:00
|
|
|
|
|
|
|
spin_unlock(&zswap_pools_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_pool_put(struct zswap_pool *pool)
|
|
|
|
{
|
|
|
|
kref_put(&pool->kref, __zswap_pool_empty);
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
|
2015-09-09 22:35:21 +00:00
|
|
|
/*********************************
|
|
|
|
* param callbacks
|
|
|
|
**********************************/
|
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
static bool zswap_pool_changed(const char *s, const struct kernel_param *kp)
|
|
|
|
{
|
|
|
|
/* no change required */
|
|
|
|
if (!strcmp(s, *(char **)kp->arg) && zswap_has_pool)
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-11-07 00:29:15 +00:00
|
|
|
/* val must be a null-terminated string */
|
2015-09-09 22:35:21 +00:00
|
|
|
static int __zswap_param_set(const char *val, const struct kernel_param *kp,
|
|
|
|
char *type, char *compressor)
|
|
|
|
{
|
|
|
|
struct zswap_pool *pool, *put_pool = NULL;
|
2015-11-07 00:29:15 +00:00
|
|
|
char *s = strstrip((char *)val);
|
2023-04-03 12:13:18 +00:00
|
|
|
int ret = 0;
|
|
|
|
bool new_pool = false;
|
2015-09-09 22:35:21 +00:00
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
mutex_lock(&zswap_init_lock);
|
2023-04-03 12:13:17 +00:00
|
|
|
switch (zswap_init_state) {
|
|
|
|
case ZSWAP_UNINIT:
|
|
|
|
/* if this is load-time (pre-init) param setting,
|
|
|
|
* don't create a pool; that's done during init.
|
|
|
|
*/
|
2023-04-03 12:13:18 +00:00
|
|
|
ret = param_set_charp(s, kp);
|
|
|
|
break;
|
2023-04-03 12:13:17 +00:00
|
|
|
case ZSWAP_INIT_SUCCEED:
|
2023-04-03 12:13:18 +00:00
|
|
|
new_pool = zswap_pool_changed(s, kp);
|
2023-04-03 12:13:17 +00:00
|
|
|
break;
|
|
|
|
case ZSWAP_INIT_FAILED:
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
pr_err("can't set param, initialization failed\n");
|
2023-04-03 12:13:18 +00:00
|
|
|
ret = -ENODEV;
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
}
|
2023-04-03 12:13:18 +00:00
|
|
|
mutex_unlock(&zswap_init_lock);
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
/* no need to create a new pool, return directly */
|
|
|
|
if (!new_pool)
|
|
|
|
return ret;
|
2015-09-09 22:35:21 +00:00
|
|
|
|
|
|
|
if (!type) {
|
2015-11-07 00:29:15 +00:00
|
|
|
if (!zpool_has_pool(s)) {
|
|
|
|
pr_err("zpool %s not available\n", s);
|
2015-09-09 22:35:21 +00:00
|
|
|
return -ENOENT;
|
|
|
|
}
|
2015-11-07 00:29:15 +00:00
|
|
|
type = s;
|
2015-09-09 22:35:21 +00:00
|
|
|
} else if (!compressor) {
|
2020-12-15 03:14:18 +00:00
|
|
|
if (!crypto_has_acomp(s, 0, 0)) {
|
2015-11-07 00:29:15 +00:00
|
|
|
pr_err("compressor %s not available\n", s);
|
2015-09-09 22:35:21 +00:00
|
|
|
return -ENOENT;
|
|
|
|
}
|
2015-11-07 00:29:15 +00:00
|
|
|
compressor = s;
|
|
|
|
} else {
|
|
|
|
WARN_ON(1);
|
|
|
|
return -EINVAL;
|
2015-09-09 22:35:21 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
spin_lock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
pool = zswap_pool_find_get(type, compressor);
|
|
|
|
if (pool) {
|
|
|
|
zswap_pool_debug("using existing", pool);
|
2017-02-27 22:26:53 +00:00
|
|
|
WARN_ON(pool == zswap_pool_current());
|
2015-09-09 22:35:21 +00:00
|
|
|
list_del_rcu(&pool->list);
|
|
|
|
}
|
|
|
|
|
2017-02-27 22:26:53 +00:00
|
|
|
spin_unlock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
if (!pool)
|
|
|
|
pool = zswap_pool_create(type, compressor);
|
|
|
|
|
2015-09-09 22:35:21 +00:00
|
|
|
if (pool)
|
2015-11-07 00:29:15 +00:00
|
|
|
ret = param_set_charp(s, kp);
|
2015-09-09 22:35:21 +00:00
|
|
|
else
|
|
|
|
ret = -EINVAL;
|
|
|
|
|
2017-02-27 22:26:53 +00:00
|
|
|
spin_lock(&zswap_pools_lock);
|
|
|
|
|
2015-09-09 22:35:21 +00:00
|
|
|
if (!ret) {
|
|
|
|
put_pool = zswap_pool_current();
|
|
|
|
list_add_rcu(&pool->list, &zswap_pools);
|
2017-02-27 22:26:47 +00:00
|
|
|
zswap_has_pool = true;
|
2015-09-09 22:35:21 +00:00
|
|
|
} else if (pool) {
|
|
|
|
/* add the possibly pre-existing pool to the end of the pools
|
|
|
|
* list; if it's new (and empty) then it'll be removed and
|
|
|
|
* destroyed by the put after we drop the lock
|
|
|
|
*/
|
|
|
|
list_add_tail_rcu(&pool->list, &zswap_pools);
|
|
|
|
put_pool = pool;
|
2017-02-27 22:26:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
spin_unlock(&zswap_pools_lock);
|
|
|
|
|
|
|
|
if (!zswap_has_pool && !pool) {
|
2017-02-27 22:26:47 +00:00
|
|
|
/* if initial pool creation failed, and this pool creation also
|
|
|
|
* failed, maybe both compressor and zpool params were bad.
|
|
|
|
* Allow changing this param, so pool creation will succeed
|
|
|
|
* when the other param is changed. We already verified this
|
2020-12-15 03:14:18 +00:00
|
|
|
* param is ok in the zpool_has_pool() or crypto_has_acomp()
|
2017-02-27 22:26:47 +00:00
|
|
|
* checks above.
|
|
|
|
*/
|
|
|
|
ret = param_set_charp(s, kp);
|
2015-09-09 22:35:21 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* drop the ref from either the old current pool,
|
|
|
|
* or the new pool we failed to add
|
|
|
|
*/
|
|
|
|
if (put_pool)
|
|
|
|
zswap_pool_put(put_pool);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int zswap_compressor_param_set(const char *val,
|
|
|
|
const struct kernel_param *kp)
|
|
|
|
{
|
|
|
|
return __zswap_param_set(val, kp, zswap_zpool_type, NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int zswap_zpool_param_set(const char *val,
|
|
|
|
const struct kernel_param *kp)
|
|
|
|
{
|
|
|
|
return __zswap_param_set(val, kp, NULL, zswap_compressor);
|
|
|
|
}
|
|
|
|
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
static int zswap_enabled_param_set(const char *val,
|
|
|
|
const struct kernel_param *kp)
|
|
|
|
{
|
2023-04-03 12:13:18 +00:00
|
|
|
int ret = -ENODEV;
|
|
|
|
|
|
|
|
/* if this is load-time (pre-init) param setting, only set param. */
|
|
|
|
if (system_state != SYSTEM_RUNNING)
|
|
|
|
return param_set_bool(val, kp);
|
|
|
|
|
|
|
|
mutex_lock(&zswap_init_lock);
|
2023-04-03 12:13:17 +00:00
|
|
|
switch (zswap_init_state) {
|
|
|
|
case ZSWAP_UNINIT:
|
2023-04-03 12:13:18 +00:00
|
|
|
if (zswap_setup())
|
|
|
|
break;
|
|
|
|
fallthrough;
|
2023-04-03 12:13:17 +00:00
|
|
|
case ZSWAP_INIT_SUCCEED:
|
2023-04-03 12:13:18 +00:00
|
|
|
if (!zswap_has_pool)
|
2023-04-03 12:13:17 +00:00
|
|
|
pr_err("can't enable, no pool configured\n");
|
2023-04-03 12:13:18 +00:00
|
|
|
else
|
|
|
|
ret = param_set_bool(val, kp);
|
|
|
|
break;
|
2023-04-03 12:13:17 +00:00
|
|
|
case ZSWAP_INIT_FAILED:
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
pr_err("can't enable, initialization failed\n");
|
2017-02-27 22:26:47 +00:00
|
|
|
}
|
2023-04-03 12:13:18 +00:00
|
|
|
mutex_unlock(&zswap_init_lock);
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
return ret;
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
}
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/*********************************
|
|
|
|
* writeback code
|
|
|
|
**********************************/
|
|
|
|
/*
|
|
|
|
* Attempts to free an entry by adding a page to the swap cache,
|
|
|
|
* decompressing the entry data into the page, and issuing a
|
|
|
|
* bio write to write the page back to the swap device.
|
|
|
|
*
|
|
|
|
* This can be thought of as a "resumed writeback" of the page
|
|
|
|
* to the swap device. We are basically resuming the same swap
|
2023-07-17 16:02:27 +00:00
|
|
|
* writeback path that was intercepted with the zswap_store()
|
2013-07-10 23:05:03 +00:00
|
|
|
* in the first place. After the page has been decompressed into
|
|
|
|
* the swap cache, the compressed version stored by zswap can be
|
|
|
|
* freed.
|
|
|
|
*/
|
2023-06-12 09:38:15 +00:00
|
|
|
static int zswap_writeback_entry(struct zswap_entry *entry,
|
2023-06-12 09:38:14 +00:00
|
|
|
struct zswap_tree *tree)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
2023-06-12 09:38:15 +00:00
|
|
|
swp_entry_t swpentry = entry->swpentry;
|
2013-07-10 23:05:03 +00:00
|
|
|
struct page *page;
|
mempolicy: alloc_pages_mpol() for NUMA policy without vma
Shrink shmem's stack usage by eliminating the pseudo-vma from its folio
allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the
principal actor for passing mempolicy choice down to __alloc_pages(),
rather than vma_alloc_folio(gfp, order, vma, addr, hugepage).
vma_alloc_folio() and alloc_pages() remain, but as wrappers around
alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the
additional args to policy_nodemask(), which subsumes policy_node().
Cleanup throughout, cutting out some unhelpful "helpers".
It would all be much simpler without MPOL_INTERLEAVE, but that adds a
dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8bfd
("tmpfs: distribute interleave better across nodes"), which added ino bias
to the interleave, hidden from mm/mempolicy.c until this commit.
Hence "ilx" throughout, the "interleave index". Originally I thought it
could be done just with nid, but that's wrong: the nodemask may come from
the shared policy layer below a shmem vma, or it may come from the task
layer above a shmem vma; and without the final nodemask then nodeid cannot
be decided. And how ilx is applied depends also on page order.
The interleave index is almost always irrelevant unless MPOL_INTERLEAVE:
with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX
passed down from vma-less alloc_pages() is also used as hint not to use
THP-style hugepage allocation - to avoid the overhead of a hugepage arg
(though I don't understand why we never just added a GFP bit for THP - if
it actually needs a different allocation strategy from other pages of the
same order). vma_alloc_folio() still carries its hugepage arg here, but
it is not used, and should be removed when agreed.
get_vma_policy() no longer allows a NULL vma: over time I believe we've
eradicated all the places which used to need it e.g. swapoff and madvise
used to pass NULL vma to read_swap_cache_async(), but now know the vma.
[hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()]
Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com
Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com
Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun heo <tj@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-19 20:39:08 +00:00
|
|
|
struct mempolicy *mpol;
|
2020-12-15 03:14:18 +00:00
|
|
|
struct scatterlist input, output;
|
|
|
|
struct crypto_acomp_ctx *acomp_ctx;
|
2023-06-20 19:46:44 +00:00
|
|
|
struct zpool *pool = zswap_find_zpool(entry);
|
2023-07-27 16:22:25 +00:00
|
|
|
bool page_was_allocated;
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
u8 *src, *tmp = NULL;
|
2013-07-10 23:05:03 +00:00
|
|
|
unsigned int dlen;
|
2013-11-12 23:08:27 +00:00
|
|
|
int ret;
|
2013-07-10 23:05:03 +00:00
|
|
|
struct writeback_control wbc = {
|
|
|
|
.sync_mode = WB_SYNC_NONE,
|
|
|
|
};
|
|
|
|
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
if (!zpool_can_sleep_mapped(pool)) {
|
2022-11-22 03:30:22 +00:00
|
|
|
tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
if (!tmp)
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/* try to allocate swap cache page */
|
mempolicy: alloc_pages_mpol() for NUMA policy without vma
Shrink shmem's stack usage by eliminating the pseudo-vma from its folio
allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the
principal actor for passing mempolicy choice down to __alloc_pages(),
rather than vma_alloc_folio(gfp, order, vma, addr, hugepage).
vma_alloc_folio() and alloc_pages() remain, but as wrappers around
alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the
additional args to policy_nodemask(), which subsumes policy_node().
Cleanup throughout, cutting out some unhelpful "helpers".
It would all be much simpler without MPOL_INTERLEAVE, but that adds a
dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8bfd
("tmpfs: distribute interleave better across nodes"), which added ino bias
to the interleave, hidden from mm/mempolicy.c until this commit.
Hence "ilx" throughout, the "interleave index". Originally I thought it
could be done just with nid, but that's wrong: the nodemask may come from
the shared policy layer below a shmem vma, or it may come from the task
layer above a shmem vma; and without the final nodemask then nodeid cannot
be decided. And how ilx is applied depends also on page order.
The interleave index is almost always irrelevant unless MPOL_INTERLEAVE:
with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX
passed down from vma-less alloc_pages() is also used as hint not to use
THP-style hugepage allocation - to avoid the overhead of a hugepage arg
(though I don't understand why we never just added a GFP bit for THP - if
it actually needs a different allocation strategy from other pages of the
same order). vma_alloc_folio() still carries its hugepage arg here, but
it is not used, and should be removed when agreed.
get_vma_policy() no longer allows a NULL vma: over time I believe we've
eradicated all the places which used to need it e.g. swapoff and madvise
used to pass NULL vma to read_swap_cache_async(), but now know the vma.
[hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()]
Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com
Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com
Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun heo <tj@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-19 20:39:08 +00:00
|
|
|
mpol = get_task_policy(current);
|
|
|
|
page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
|
|
|
|
NO_INTERLEAVE_INDEX, &page_was_allocated);
|
2023-07-27 16:22:25 +00:00
|
|
|
if (!page) {
|
2013-07-10 23:05:03 +00:00
|
|
|
ret = -ENOMEM;
|
|
|
|
goto fail;
|
2023-07-27 16:22:25 +00:00
|
|
|
}
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2023-07-27 16:22:25 +00:00
|
|
|
/* Found an existing page, we raced with load/swapin */
|
|
|
|
if (!page_was_allocated) {
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 12:29:47 +00:00
|
|
|
put_page(page);
|
2013-07-10 23:05:03 +00:00
|
|
|
ret = -EEXIST;
|
|
|
|
goto fail;
|
2023-07-27 16:22:25 +00:00
|
|
|
}
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2023-07-27 16:22:25 +00:00
|
|
|
/*
|
|
|
|
* Page is locked, and the swapcache is now secured against
|
|
|
|
* concurrent swapping to and from the slot. Verify that the
|
|
|
|
* swap entry hasn't been invalidated and recycled behind our
|
|
|
|
* backs (our zswap_entry reference doesn't prevent that), to
|
|
|
|
* avoid overwriting a new swap page with old compressed data.
|
|
|
|
*/
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
|
mm: fix zswap writeback race condition
The zswap writeback mechanism can cause a race condition resulting in
memory corruption, where a swapped out page gets swapped in with data that
was written to a different page.
The race unfolds like this:
1. a page with data A and swap offset X is stored in zswap
2. page A is removed off the LRU by zpool driver for writeback in
zswap-shrink work, data for A is mapped by zpool driver
3. user space program faults and invalidates page entry A, offset X is
considered free
4. kswapd stores page B at offset X in zswap (zswap could also be
full, if so, page B would then be IOed to X, then skip step 5.)
5. entry A is replaced by B in tree->rbroot, this doesn't affect the
local reference held by zswap-shrink work
6. zswap-shrink work writes back A at X, and frees zswap entry A
7. swapin of slot X brings A in memory instead of B
The fix:
Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW),
zswap-shrink work just checks that the local zswap_entry reference is
still the same as the one in the tree. If it's not the same it means that
it's either been invalidated or replaced, in both cases the writeback is
aborted because the local entry contains stale data.
Reproducer:
I originally found this by running `stress` overnight to validate my work
on the zswap writeback mechanism, it manifested after hours on my test
machine. The key to make it happen is having zswap writebacks, so
whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do
the trick.
In order to reproduce this faster on a vm, I setup a system with ~100M of
available memory and a 500M swap file, then running `stress --vm 1
--vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens
of minutes. One can speed things up even more by swinging
/sys/module/zswap/parameters/max_pool_percent up and down between, say, 20
and 1; this makes it reproduce in tens of seconds. It's crucial to set
`--vm-stride` to something other than 4096 otherwise `stress` won't
realize that memory has been corrupted because all pages would have the
same data.
Link: https://lkml.kernel.org/r/20230503151200.19707-1-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-03 15:12:00 +00:00
|
|
|
spin_unlock(&tree->lock);
|
2023-07-27 16:22:25 +00:00
|
|
|
delete_from_swap_cache(page_folio(page));
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
spin_unlock(&tree->lock);
|
mm: fix zswap writeback race condition
The zswap writeback mechanism can cause a race condition resulting in
memory corruption, where a swapped out page gets swapped in with data that
was written to a different page.
The race unfolds like this:
1. a page with data A and swap offset X is stored in zswap
2. page A is removed off the LRU by zpool driver for writeback in
zswap-shrink work, data for A is mapped by zpool driver
3. user space program faults and invalidates page entry A, offset X is
considered free
4. kswapd stores page B at offset X in zswap (zswap could also be
full, if so, page B would then be IOed to X, then skip step 5.)
5. entry A is replaced by B in tree->rbroot, this doesn't affect the
local reference held by zswap-shrink work
6. zswap-shrink work writes back A at X, and frees zswap entry A
7. swapin of slot X brings A in memory instead of B
The fix:
Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW),
zswap-shrink work just checks that the local zswap_entry reference is
still the same as the one in the tree. If it's not the same it means that
it's either been invalidated or replaced, in both cases the writeback is
aborted because the local entry contains stale data.
Reproducer:
I originally found this by running `stress` overnight to validate my work
on the zswap writeback mechanism, it manifested after hours on my test
machine. The key to make it happen is having zswap writebacks, so
whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do
the trick.
In order to reproduce this faster on a vm, I setup a system with ~100M of
available memory and a 500M swap file, then running `stress --vm 1
--vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens
of minutes. One can speed things up even more by swinging
/sys/module/zswap/parameters/max_pool_percent up and down between, say, 20
and 1; this makes it reproduce in tens of seconds. It's crucial to set
`--vm-stride` to something other than 4096 otherwise `stress` won't
realize that memory has been corrupted because all pages would have the
same data.
Link: https://lkml.kernel.org/r/20230503151200.19707-1-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-03 15:12:00 +00:00
|
|
|
|
2023-07-27 16:22:25 +00:00
|
|
|
/* decompress */
|
|
|
|
acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
|
|
|
|
dlen = PAGE_SIZE;
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
|
2023-07-27 16:22:25 +00:00
|
|
|
src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO);
|
|
|
|
if (!zpool_can_sleep_mapped(pool)) {
|
|
|
|
memcpy(tmp, src, entry->length);
|
|
|
|
src = tmp;
|
|
|
|
zpool_unmap_handle(pool, entry->handle);
|
|
|
|
}
|
zswap: fix writeback lock ordering for zsmalloc
Patch series "Implement writeback for zsmalloc", v7.
Unlike other zswap allocators such as zbud or z3fold, zsmalloc currently
lacks the writeback mechanism. This means that when the zswap pool is
full, it will simply reject further allocations, and the pages will be
written directly to swap.
This series of patches implements writeback for zsmalloc. When the zswap
pool becomes full, zsmalloc will attempt to evict all the compressed
objects in the least-recently used zspages.
This patch (of 6):
zswap's customary lock order is tree->lock before pool->lock, because the
tree->lock protects the entries' refcount, and the free callbacks in the
backends acquire their respective pool locks to dispatch the backing
object. zsmalloc's map callback takes the pool lock, so zswap must not
grab the tree->lock while a handle is mapped. This currently only happens
during writeback, which isn't implemented for zsmalloc. In preparation
for it, move the tree->lock section out of the mapped entry section
Link: https://lkml.kernel.org/r/20221128191616.1261026-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20221128191616.1261026-2-nphamcs@gmail.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-28 19:16:10 +00:00
|
|
|
|
2023-07-27 16:22:25 +00:00
|
|
|
mutex_lock(acomp_ctx->mutex);
|
|
|
|
sg_init_one(&input, src, entry->length);
|
|
|
|
sg_init_table(&output, 1);
|
|
|
|
sg_set_page(&output, page, PAGE_SIZE, 0);
|
|
|
|
acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
|
|
|
|
ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
|
|
|
|
dlen = acomp_ctx->req->dlen;
|
|
|
|
mutex_unlock(acomp_ctx->mutex);
|
zswap: fix writeback lock ordering for zsmalloc
Patch series "Implement writeback for zsmalloc", v7.
Unlike other zswap allocators such as zbud or z3fold, zsmalloc currently
lacks the writeback mechanism. This means that when the zswap pool is
full, it will simply reject further allocations, and the pages will be
written directly to swap.
This series of patches implements writeback for zsmalloc. When the zswap
pool becomes full, zsmalloc will attempt to evict all the compressed
objects in the least-recently used zspages.
This patch (of 6):
zswap's customary lock order is tree->lock before pool->lock, because the
tree->lock protects the entries' refcount, and the free callbacks in the
backends acquire their respective pool locks to dispatch the backing
object. zsmalloc's map callback takes the pool lock, so zswap must not
grab the tree->lock while a handle is mapped. This currently only happens
during writeback, which isn't implemented for zsmalloc. In preparation
for it, move the tree->lock section out of the mapped entry section
Link: https://lkml.kernel.org/r/20221128191616.1261026-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20221128191616.1261026-2-nphamcs@gmail.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-28 19:16:10 +00:00
|
|
|
|
2023-07-27 16:22:25 +00:00
|
|
|
if (!zpool_can_sleep_mapped(pool))
|
|
|
|
kfree(tmp);
|
|
|
|
else
|
|
|
|
zpool_unmap_handle(pool, entry->handle);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2023-07-27 16:22:25 +00:00
|
|
|
BUG_ON(ret);
|
|
|
|
BUG_ON(dlen != PAGE_SIZE);
|
|
|
|
|
|
|
|
/* page is up to date */
|
|
|
|
SetPageUptodate(page);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2013-11-12 23:07:52 +00:00
|
|
|
/* move it to the tail of the inactive list after end_writeback */
|
|
|
|
SetPageReclaim(page);
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/* start writeback */
|
2022-08-11 14:17:41 +00:00
|
|
|
__swap_writepage(page, &wbc);
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 12:29:47 +00:00
|
|
|
put_page(page);
|
2013-07-10 23:05:03 +00:00
|
|
|
zswap_written_back_pages++;
|
|
|
|
|
zswap: fix writeback lock ordering for zsmalloc
Patch series "Implement writeback for zsmalloc", v7.
Unlike other zswap allocators such as zbud or z3fold, zsmalloc currently
lacks the writeback mechanism. This means that when the zswap pool is
full, it will simply reject further allocations, and the pages will be
written directly to swap.
This series of patches implements writeback for zsmalloc. When the zswap
pool becomes full, zsmalloc will attempt to evict all the compressed
objects in the least-recently used zspages.
This patch (of 6):
zswap's customary lock order is tree->lock before pool->lock, because the
tree->lock protects the entries' refcount, and the free callbacks in the
backends acquire their respective pool locks to dispatch the backing
object. zsmalloc's map callback takes the pool lock, so zswap must not
grab the tree->lock while a handle is mapped. This currently only happens
during writeback, which isn't implemented for zsmalloc. In preparation
for it, move the tree->lock section out of the mapped entry section
Link: https://lkml.kernel.org/r/20221128191616.1261026-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20221128191616.1261026-2-nphamcs@gmail.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-28 19:16:10 +00:00
|
|
|
return ret;
|
2023-07-27 16:22:25 +00:00
|
|
|
|
zswap: fix writeback lock ordering for zsmalloc
Patch series "Implement writeback for zsmalloc", v7.
Unlike other zswap allocators such as zbud or z3fold, zsmalloc currently
lacks the writeback mechanism. This means that when the zswap pool is
full, it will simply reject further allocations, and the pages will be
written directly to swap.
This series of patches implements writeback for zsmalloc. When the zswap
pool becomes full, zsmalloc will attempt to evict all the compressed
objects in the least-recently used zspages.
This patch (of 6):
zswap's customary lock order is tree->lock before pool->lock, because the
tree->lock protects the entries' refcount, and the free callbacks in the
backends acquire their respective pool locks to dispatch the backing
object. zsmalloc's map callback takes the pool lock, so zswap must not
grab the tree->lock while a handle is mapped. This currently only happens
during writeback, which isn't implemented for zsmalloc. In preparation
for it, move the tree->lock section out of the mapped entry section
Link: https://lkml.kernel.org/r/20221128191616.1261026-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20221128191616.1261026-2-nphamcs@gmail.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-28 19:16:10 +00:00
|
|
|
fail:
|
|
|
|
if (!zpool_can_sleep_mapped(pool))
|
|
|
|
kfree(tmp);
|
2013-11-12 23:08:27 +00:00
|
|
|
|
|
|
|
/*
|
2023-07-27 16:22:25 +00:00
|
|
|
* If we get here because the page is already in swapcache, a
|
|
|
|
* load may be happening concurrently. It is safe and okay to
|
|
|
|
* not free the entry. It is also okay to return !0.
|
|
|
|
*/
|
2013-07-10 23:05:03 +00:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
static int zswap_is_page_same_filled(void *ptr, unsigned long *value)
|
|
|
|
{
|
|
|
|
unsigned long *page;
|
2023-02-05 19:00:36 +00:00
|
|
|
unsigned long val;
|
|
|
|
unsigned int pos, last_pos = PAGE_SIZE / sizeof(*page) - 1;
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
|
|
|
|
page = (unsigned long *)ptr;
|
2023-02-05 19:00:36 +00:00
|
|
|
val = page[0];
|
|
|
|
|
|
|
|
if (val != page[last_pos])
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (pos = 1; pos < last_pos; pos++) {
|
|
|
|
if (val != page[pos])
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
return 0;
|
|
|
|
}
|
2023-02-05 19:00:36 +00:00
|
|
|
|
|
|
|
*value = val;
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zswap_fill_page(void *ptr, unsigned long value)
|
|
|
|
{
|
|
|
|
unsigned long *page;
|
|
|
|
|
|
|
|
page = (unsigned long *)ptr;
|
|
|
|
memset_l(page, value, PAGE_SIZE / sizeof(unsigned long));
|
|
|
|
}
|
|
|
|
|
2023-07-15 04:23:40 +00:00
|
|
|
bool zswap_store(struct folio *folio)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
2023-08-21 16:08:48 +00:00
|
|
|
swp_entry_t swp = folio->swap;
|
2023-07-17 16:02:27 +00:00
|
|
|
int type = swp_type(swp);
|
|
|
|
pgoff_t offset = swp_offset(swp);
|
2023-07-15 04:23:40 +00:00
|
|
|
struct page *page = &folio->page;
|
2013-07-10 23:05:03 +00:00
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
|
|
|
struct zswap_entry *entry, *dupentry;
|
2020-12-15 03:14:18 +00:00
|
|
|
struct scatterlist input, output;
|
|
|
|
struct crypto_acomp_ctx *acomp_ctx;
|
2022-05-19 21:08:53 +00:00
|
|
|
struct obj_cgroup *objcg = NULL;
|
|
|
|
struct zswap_pool *pool;
|
2023-06-20 19:46:44 +00:00
|
|
|
struct zpool *zpool;
|
2023-06-12 09:38:15 +00:00
|
|
|
unsigned int dlen = PAGE_SIZE;
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
unsigned long handle, value;
|
2013-07-10 23:05:03 +00:00
|
|
|
char *buf;
|
|
|
|
u8 *src, *dst;
|
zswap: use movable memory if zpool support allocate movable memory
This is the third version that was updated according to the comments from
Sergey Senozhatsky https://lkml.org/lkml/2019/5/29/73 and Shakeel Butt
https://lkml.org/lkml/2019/6/4/973
zswap compresses swap pages into a dynamically allocated RAM-based memory
pool. The memory pool should be zbud, z3fold or zsmalloc. All of them
will allocate unmovable pages. It will increase the number of unmovable
page blocks that will bad for anti-fragment.
zsmalloc support page migration if request movable page:
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);
And commit "zpool: Add malloc_support_movable to zpool_driver" add
zpool_malloc_support_movable check malloc_support_movable to make sure if
a zpool support allocate movable memory.
This commit let zswap allocate block with gfp
__GFP_HIGHMEM | __GFP_MOVABLE if zpool support allocate movable memory.
Following part is test log in a pc that has 8G memory and 2G swap.
Without this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4826062 usecs = 549973 KB/s
2717908992 bytes / 4864201 usecs = 545661 KB/s
2717908992 bytes / 4867015 usecs = 545346 KB/s
2717908992 bytes / 4915485 usecs = 539968 KB/s
397853 usecs to free memory
357820 usecs to free memory
421333 usecs to free memory
420454 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 6 5 8 6 6 5 4 1 1 1 0
Node 0, zone DMA32, type Movable 25 20 20 19 22 15 14 11 11 5 767
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 4753 5588 5159 4613 3712 2520 1448 594 188 11 0
Node 0, zone Normal, type Movable 16 3 457 2648 2143 1435 860 459 223 224 296
Node 0, zone Normal, type Reclaimable 0 0 44 38 11 2 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1652 0 0 0 0
Node 0, zone Normal 931 1485 15 0 0 0
With this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4689240 usecs = 566020 KB/s
2717908992 bytes / 4760605 usecs = 557535 KB/s
2717908992 bytes / 4803621 usecs = 552543 KB/s
2717908992 bytes / 5069828 usecs = 523530 KB/s
431546 usecs to free memory
383397 usecs to free memory
456454 usecs to free memory
224487 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 10 8 10 9 10 4 3 2 3 0 0
Node 0, zone DMA32, type Movable 18 12 14 16 16 11 9 5 5 6 775
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 2669 1236 452 118 37 14 4 1 2 3 0
Node 0, zone Normal, type Movable 3850 6086 5274 4327 3510 2494 1520 934 438 220 470
Node 0, zone Normal, type Reclaimable 56 93 155 124 47 31 17 7 3 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1650 2 0 0 0
Node 0, zone Normal 79 2326 26 0 0 0
You can see that the number of unmovable page blocks is decreased
when the kernel has this commit.
Link: http://lkml.kernel.org/r/20190605100630.13293-2-teawaterz@linux.alibaba.com
Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-23 22:39:40 +00:00
|
|
|
gfp_t gfp;
|
2023-07-17 16:02:27 +00:00
|
|
|
int ret;
|
|
|
|
|
2023-07-15 04:23:40 +00:00
|
|
|
VM_WARN_ON_ONCE(!folio_test_locked(folio));
|
|
|
|
VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2023-07-15 04:23:40 +00:00
|
|
|
/* Large folios aren't supported */
|
|
|
|
if (folio_test_large(folio))
|
2023-07-17 16:02:27 +00:00
|
|
|
return false;
|
mm, swap, frontswap: fix THP swap if frontswap enabled
It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,
kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
#0 0x00007fc08889ae0d _int_malloc (libc.so.6)
#1 0x00007fc08889c2f3 malloc (libc.so.6)
#2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
#3 0x0000560e6005e75c n/a (urxvt)
#4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
#5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
#6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
#7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
#8 0x0000560e6005cb55 ev_run (urxvt)
#9 0x0000560e6003b9b9 main (urxvt)
#10 0x00007fc08883af4a __libc_start_main (libc.so.6)
#11 0x0000560e6003f9da _start (urxvt)
After bisection, it was found the first bad commit is bd4c82c22c36 ("mm,
THP, swap: delay splitting THP after swapped out").
The root cause is as follows:
When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance. But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved. After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.
This is fixed by refusing to save page in the frontswap store functions
if the page is a THP. So that the THP will be swapped out to swap
device.
Another choice is to split THP if frontswap is enabled. But it is found
that the frontswap enabling isn't flexible. For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.
Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.
Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org> [put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shaohua Li <shli@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21 22:45:39 +00:00
|
|
|
|
2023-07-17 16:02:27 +00:00
|
|
|
if (!zswap_enabled || !tree)
|
|
|
|
return false;
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2023-09-22 17:22:11 +00:00
|
|
|
/*
|
|
|
|
* If this is a duplicate, it must be removed before attempting to store
|
|
|
|
* it, otherwise, if the store fails the old page won't be removed from
|
|
|
|
* the tree, and it might be written back overriding the new data.
|
|
|
|
*/
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
dupentry = zswap_rb_search(&tree->rbroot, offset);
|
|
|
|
if (dupentry) {
|
|
|
|
zswap_duplicate_entry++;
|
|
|
|
zswap_invalidate_entry(tree, dupentry);
|
|
|
|
}
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
2023-05-30 22:24:40 +00:00
|
|
|
/*
|
|
|
|
* XXX: zswap reclaim does not work with cgroups yet. Without a
|
|
|
|
* cgroup-aware entry LRU, we will push out entries system-wide based on
|
|
|
|
* local cgroup limits.
|
|
|
|
*/
|
2023-07-15 04:23:41 +00:00
|
|
|
objcg = get_obj_cgroup_from_folio(folio);
|
2023-07-17 16:02:27 +00:00
|
|
|
if (objcg && !obj_cgroup_may_zswap(objcg))
|
2023-05-30 22:24:40 +00:00
|
|
|
goto reject;
|
2022-05-19 21:08:53 +00:00
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/* reclaim space if needed */
|
|
|
|
if (zswap_is_full()) {
|
|
|
|
zswap_pool_limit_hit++;
|
2020-01-31 06:15:04 +00:00
|
|
|
zswap_pool_reached_full = true;
|
2022-05-19 21:08:53 +00:00
|
|
|
goto shrink;
|
2020-01-31 06:15:04 +00:00
|
|
|
}
|
2018-07-26 23:37:42 +00:00
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
if (zswap_pool_reached_full) {
|
2023-07-17 16:02:27 +00:00
|
|
|
if (!zswap_can_accept())
|
mm: zswap: shrink until can accept
This update addresses an issue with the zswap reclaim mechanism, which
hinders the efficient offloading of cold pages to disk, thereby
compromising the preservation of the LRU order and consequently
diminishing, if not inverting, its performance benefits.
The functioning of the zswap shrink worker was found to be inadequate, as
shown by basic benchmark test. For the test, a kernel build was utilized
as a reference, with its memory confined to 1G via a cgroup and a 5G swap
file provided. The results are presented below, these are averages of
three runs without the use of zswap:
real 46m26s
user 35m4s
sys 7m37s
With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G
system), the results changed to:
real 56m4s
user 35m13s
sys 8m43s
written_back_pages: 18
reject_reclaim_fail: 0
pool_limit_hit:1478
Besides the evident regression, one thing to notice from this data is the
extremely low number of written_back_pages and pool_limit_hit.
The pool_limit_hit counter, which is increased in zswap_frontswap_store
when zswap is completely full, doesn't account for a particular scenario:
once zswap hits his limit, zswap_pool_reached_full is set to true; with
this flag on, zswap_frontswap_store rejects pages if zswap is still above
the acceptance threshold. Once we include the rejections due to
zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478
to a significant 21578266.
Zswap is stuck in an undesirable state where it rejects pages because it's
above the acceptance threshold, yet fails to attempt memory reclaimation.
This happens because the shrink work is only queued when
zswap_frontswap_store detects that it's full and the work itself only
reclaims one page per run.
This state results in hot pages getting written directly to disk, while
cold ones remain memory, waiting only to be invalidated. The LRU order is
completely broken and zswap ends up being just an overhead without
providing any benefits.
This commit applies 2 changes: a) the shrink worker is set to reclaim
pages until the acceptance threshold is met and b) the task is also
enqueued when zswap is not full but still above the threshold.
Testing this suggested update showed much better numbers:
real 36m37s
user 35m8s
sys 9m32s
written_back_pages: 10459423
reject_reclaim_fail: 12896
pool_limit_hit: 75653
Link: https://lkml.kernel.org/r/20230526183227.793977-1-cerasuolodomenico@gmail.com
Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-26 18:32:27 +00:00
|
|
|
goto shrink;
|
2023-07-17 16:02:27 +00:00
|
|
|
else
|
2020-01-31 06:15:04 +00:00
|
|
|
zswap_pool_reached_full = false;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* allocate entry */
|
|
|
|
entry = zswap_entry_cache_alloc(GFP_KERNEL);
|
|
|
|
if (!entry) {
|
|
|
|
zswap_reject_kmemcache_fail++;
|
|
|
|
goto reject;
|
|
|
|
}
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
if (zswap_same_filled_pages_enabled) {
|
|
|
|
src = kmap_atomic(page);
|
|
|
|
if (zswap_is_page_same_filled(src, &value)) {
|
|
|
|
kunmap_atomic(src);
|
2023-06-12 09:38:15 +00:00
|
|
|
entry->swpentry = swp_entry(type, offset);
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
entry->length = 0;
|
|
|
|
entry->value = value;
|
|
|
|
atomic_inc(&zswap_same_filled_pages);
|
|
|
|
goto insert_entry;
|
|
|
|
}
|
|
|
|
kunmap_atomic(src);
|
|
|
|
}
|
|
|
|
|
2023-07-17 16:02:27 +00:00
|
|
|
if (!zswap_non_same_filled_pages_enabled)
|
2022-03-22 21:47:43 +00:00
|
|
|
goto freepage;
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
/* if entry is successfully added, it keeps the reference */
|
|
|
|
entry->pool = zswap_pool_current_get();
|
2023-07-17 16:02:27 +00:00
|
|
|
if (!entry->pool)
|
2015-09-09 22:35:19 +00:00
|
|
|
goto freepage;
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/* compress */
|
2020-12-15 03:14:18 +00:00
|
|
|
acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
|
|
|
|
|
|
|
|
mutex_lock(acomp_ctx->mutex);
|
|
|
|
|
|
|
|
dst = acomp_ctx->dstmem;
|
|
|
|
sg_init_table(&input, 1);
|
|
|
|
sg_set_page(&input, page, PAGE_SIZE, 0);
|
|
|
|
|
|
|
|
/* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
|
|
|
|
sg_init_one(&output, dst, PAGE_SIZE * 2);
|
|
|
|
acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen);
|
|
|
|
/*
|
|
|
|
* it maybe looks a little bit silly that we send an asynchronous request,
|
|
|
|
* then wait for its completion synchronously. This makes the process look
|
|
|
|
* synchronous in fact.
|
|
|
|
* Theoretically, acomp supports users send multiple acomp requests in one
|
|
|
|
* acomp instance, then get those requests done simultaneously. but in this
|
2023-07-17 16:02:27 +00:00
|
|
|
* case, zswap actually does store and load page by page, there is no
|
2020-12-15 03:14:18 +00:00
|
|
|
* existing method to send the second page before the first page is done
|
2023-07-17 16:02:27 +00:00
|
|
|
* in one thread doing zwap.
|
2020-12-15 03:14:18 +00:00
|
|
|
* but in different threads running on different cpu, we have different
|
|
|
|
* acomp instance, so multiple threads can do (de)compression in parallel.
|
|
|
|
*/
|
|
|
|
ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
|
|
|
|
dlen = acomp_ctx->req->dlen;
|
|
|
|
|
2023-10-24 23:45:09 +00:00
|
|
|
if (ret) {
|
|
|
|
zswap_reject_compress_fail++;
|
2015-09-09 22:35:19 +00:00
|
|
|
goto put_dstmem;
|
2023-10-24 23:45:09 +00:00
|
|
|
}
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
/* store */
|
2023-06-20 19:46:44 +00:00
|
|
|
zpool = zswap_find_zpool(entry);
|
zswap: use movable memory if zpool support allocate movable memory
This is the third version that was updated according to the comments from
Sergey Senozhatsky https://lkml.org/lkml/2019/5/29/73 and Shakeel Butt
https://lkml.org/lkml/2019/6/4/973
zswap compresses swap pages into a dynamically allocated RAM-based memory
pool. The memory pool should be zbud, z3fold or zsmalloc. All of them
will allocate unmovable pages. It will increase the number of unmovable
page blocks that will bad for anti-fragment.
zsmalloc support page migration if request movable page:
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);
And commit "zpool: Add malloc_support_movable to zpool_driver" add
zpool_malloc_support_movable check malloc_support_movable to make sure if
a zpool support allocate movable memory.
This commit let zswap allocate block with gfp
__GFP_HIGHMEM | __GFP_MOVABLE if zpool support allocate movable memory.
Following part is test log in a pc that has 8G memory and 2G swap.
Without this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4826062 usecs = 549973 KB/s
2717908992 bytes / 4864201 usecs = 545661 KB/s
2717908992 bytes / 4867015 usecs = 545346 KB/s
2717908992 bytes / 4915485 usecs = 539968 KB/s
397853 usecs to free memory
357820 usecs to free memory
421333 usecs to free memory
420454 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 6 5 8 6 6 5 4 1 1 1 0
Node 0, zone DMA32, type Movable 25 20 20 19 22 15 14 11 11 5 767
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 4753 5588 5159 4613 3712 2520 1448 594 188 11 0
Node 0, zone Normal, type Movable 16 3 457 2648 2143 1435 860 459 223 224 296
Node 0, zone Normal, type Reclaimable 0 0 44 38 11 2 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1652 0 0 0 0
Node 0, zone Normal 931 1485 15 0 0 0
With this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4689240 usecs = 566020 KB/s
2717908992 bytes / 4760605 usecs = 557535 KB/s
2717908992 bytes / 4803621 usecs = 552543 KB/s
2717908992 bytes / 5069828 usecs = 523530 KB/s
431546 usecs to free memory
383397 usecs to free memory
456454 usecs to free memory
224487 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 10 8 10 9 10 4 3 2 3 0 0
Node 0, zone DMA32, type Movable 18 12 14 16 16 11 9 5 5 6 775
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 2669 1236 452 118 37 14 4 1 2 3 0
Node 0, zone Normal, type Movable 3850 6086 5274 4327 3510 2494 1520 934 438 220 470
Node 0, zone Normal, type Reclaimable 56 93 155 124 47 31 17 7 3 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1650 2 0 0 0
Node 0, zone Normal 79 2326 26 0 0 0
You can see that the number of unmovable page blocks is decreased
when the kernel has this commit.
Link: http://lkml.kernel.org/r/20190605100630.13293-2-teawaterz@linux.alibaba.com
Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-23 22:39:40 +00:00
|
|
|
gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
|
2023-06-20 19:46:44 +00:00
|
|
|
if (zpool_malloc_support_movable(zpool))
|
zswap: use movable memory if zpool support allocate movable memory
This is the third version that was updated according to the comments from
Sergey Senozhatsky https://lkml.org/lkml/2019/5/29/73 and Shakeel Butt
https://lkml.org/lkml/2019/6/4/973
zswap compresses swap pages into a dynamically allocated RAM-based memory
pool. The memory pool should be zbud, z3fold or zsmalloc. All of them
will allocate unmovable pages. It will increase the number of unmovable
page blocks that will bad for anti-fragment.
zsmalloc support page migration if request movable page:
handle = zs_malloc(zram->mem_pool, comp_len,
GFP_NOIO | __GFP_HIGHMEM |
__GFP_MOVABLE);
And commit "zpool: Add malloc_support_movable to zpool_driver" add
zpool_malloc_support_movable check malloc_support_movable to make sure if
a zpool support allocate movable memory.
This commit let zswap allocate block with gfp
__GFP_HIGHMEM | __GFP_MOVABLE if zpool support allocate movable memory.
Following part is test log in a pc that has 8G memory and 2G swap.
Without this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4826062 usecs = 549973 KB/s
2717908992 bytes / 4864201 usecs = 545661 KB/s
2717908992 bytes / 4867015 usecs = 545346 KB/s
2717908992 bytes / 4915485 usecs = 539968 KB/s
397853 usecs to free memory
357820 usecs to free memory
421333 usecs to free memory
420454 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 6 5 8 6 6 5 4 1 1 1 0
Node 0, zone DMA32, type Movable 25 20 20 19 22 15 14 11 11 5 767
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 4753 5588 5159 4613 3712 2520 1448 594 188 11 0
Node 0, zone Normal, type Movable 16 3 457 2648 2143 1435 860 459 223 224 296
Node 0, zone Normal, type Reclaimable 0 0 44 38 11 2 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1652 0 0 0 0
Node 0, zone Normal 931 1485 15 0 0 0
With this commit:
~# echo lz4 > /sys/module/zswap/parameters/compressor
~# echo zsmalloc > /sys/module/zswap/parameters/zpool
~# echo 1 > /sys/module/zswap/parameters/enabled
~# swapon /swapfile
~# cd /home/teawater/kernel/vm-scalability/
/home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
/home/teawater/kernel/vm-scalability# ./case-anon-w-seq
2717908992 bytes / 4689240 usecs = 566020 KB/s
2717908992 bytes / 4760605 usecs = 557535 KB/s
2717908992 bytes / 4803621 usecs = 552543 KB/s
2717908992 bytes / 5069828 usecs = 523530 KB/s
431546 usecs to free memory
383397 usecs to free memory
456454 usecs to free memory
224487 usecs to free memory
/home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 10 8 10 9 10 4 3 2 3 0 0
Node 0, zone DMA32, type Movable 18 12 14 16 16 11 9 5 5 6 775
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 1
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 2669 1236 452 118 37 14 4 1 2 3 0
Node 0, zone Normal, type Movable 3850 6086 5274 4327 3510 2494 1520 934 438 220 470
Node 0, zone Normal, type Reclaimable 56 93 155 124 47 31 17 7 3 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
Node 0, zone DMA 1 7 0 0 0 0
Node 0, zone DMA32 4 1650 2 0 0 0
Node 0, zone Normal 79 2326 26 0 0 0
You can see that the number of unmovable page blocks is decreased
when the kernel has this commit.
Link: http://lkml.kernel.org/r/20190605100630.13293-2-teawaterz@linux.alibaba.com
Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-23 22:39:40 +00:00
|
|
|
gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
|
2023-06-20 19:46:44 +00:00
|
|
|
ret = zpool_malloc(zpool, dlen, gfp, &handle);
|
2013-07-10 23:05:03 +00:00
|
|
|
if (ret == -ENOSPC) {
|
|
|
|
zswap_reject_compress_poor++;
|
2015-09-09 22:35:19 +00:00
|
|
|
goto put_dstmem;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
if (ret) {
|
|
|
|
zswap_reject_alloc_fail++;
|
2015-09-09 22:35:19 +00:00
|
|
|
goto put_dstmem;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
2023-06-20 19:46:44 +00:00
|
|
|
buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO);
|
2023-06-12 09:38:15 +00:00
|
|
|
memcpy(buf, dst, dlen);
|
2023-06-20 19:46:44 +00:00
|
|
|
zpool_unmap_handle(zpool, handle);
|
2020-12-15 03:14:18 +00:00
|
|
|
mutex_unlock(acomp_ctx->mutex);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
/* populate entry */
|
2023-06-12 09:38:15 +00:00
|
|
|
entry->swpentry = swp_entry(type, offset);
|
2013-07-10 23:05:03 +00:00
|
|
|
entry->handle = handle;
|
|
|
|
entry->length = dlen;
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
insert_entry:
|
2022-05-19 21:08:53 +00:00
|
|
|
entry->objcg = objcg;
|
|
|
|
if (objcg) {
|
|
|
|
obj_cgroup_charge_zswap(objcg, entry->length);
|
|
|
|
/* Account before objcg ref is moved to tree */
|
|
|
|
count_objcg_event(objcg, ZSWPOUT);
|
|
|
|
}
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/* map */
|
|
|
|
spin_lock(&tree->lock);
|
2023-09-22 17:22:11 +00:00
|
|
|
/*
|
|
|
|
* A duplicate entry should have been removed at the beginning of this
|
|
|
|
* function. Since the swap entry should be pinned, if a duplicate is
|
|
|
|
* found again here it means that something went wrong in the swap
|
|
|
|
* cache.
|
|
|
|
*/
|
2023-07-17 16:02:27 +00:00
|
|
|
while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) {
|
2023-09-22 17:22:11 +00:00
|
|
|
WARN_ON(1);
|
2023-07-17 16:02:27 +00:00
|
|
|
zswap_duplicate_entry++;
|
2023-07-27 16:22:23 +00:00
|
|
|
zswap_invalidate_entry(tree, dupentry);
|
2023-07-17 16:02:27 +00:00
|
|
|
}
|
2023-06-12 09:38:13 +00:00
|
|
|
if (entry->length) {
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
spin_lock(&entry->pool->lru_lock);
|
|
|
|
list_add(&entry->lru, &entry->pool->lru);
|
|
|
|
spin_unlock(&entry->pool->lru_lock);
|
|
|
|
}
|
2013-07-10 23:05:03 +00:00
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
|
|
|
/* update stats */
|
|
|
|
atomic_inc(&zswap_stored_pages);
|
2015-09-09 22:35:19 +00:00
|
|
|
zswap_update_total_size();
|
2022-05-19 21:08:53 +00:00
|
|
|
count_vm_event(ZSWPOUT);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2023-07-17 16:02:27 +00:00
|
|
|
return true;
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
put_dstmem:
|
2020-12-15 03:14:18 +00:00
|
|
|
mutex_unlock(acomp_ctx->mutex);
|
2015-09-09 22:35:19 +00:00
|
|
|
zswap_pool_put(entry->pool);
|
|
|
|
freepage:
|
2013-07-10 23:05:03 +00:00
|
|
|
zswap_entry_cache_free(entry);
|
|
|
|
reject:
|
2022-05-19 21:08:53 +00:00
|
|
|
if (objcg)
|
|
|
|
obj_cgroup_put(objcg);
|
2023-07-17 16:02:27 +00:00
|
|
|
return false;
|
2022-05-19 21:08:53 +00:00
|
|
|
|
|
|
|
shrink:
|
|
|
|
pool = zswap_pool_last_get();
|
2023-10-06 16:00:24 +00:00
|
|
|
if (pool && !queue_work(shrink_wq, &pool->shrink_work))
|
|
|
|
zswap_pool_put(pool);
|
2022-05-19 21:08:53 +00:00
|
|
|
goto reject;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
|
2023-07-15 04:23:43 +00:00
|
|
|
bool zswap_load(struct folio *folio)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
2023-08-21 16:08:48 +00:00
|
|
|
swp_entry_t swp = folio->swap;
|
2023-07-17 16:02:27 +00:00
|
|
|
int type = swp_type(swp);
|
|
|
|
pgoff_t offset = swp_offset(swp);
|
2023-07-15 04:23:43 +00:00
|
|
|
struct page *page = &folio->page;
|
2013-07-10 23:05:03 +00:00
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
|
|
|
struct zswap_entry *entry;
|
2020-12-15 03:14:18 +00:00
|
|
|
struct scatterlist input, output;
|
|
|
|
struct crypto_acomp_ctx *acomp_ctx;
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
u8 *src, *dst, *tmp;
|
2023-06-20 19:46:44 +00:00
|
|
|
struct zpool *zpool;
|
2013-07-10 23:05:03 +00:00
|
|
|
unsigned int dlen;
|
2023-07-17 16:02:27 +00:00
|
|
|
bool ret;
|
|
|
|
|
2023-07-15 04:23:43 +00:00
|
|
|
VM_WARN_ON_ONCE(!folio_test_locked(folio));
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
/* find */
|
|
|
|
spin_lock(&tree->lock);
|
2013-11-12 23:08:27 +00:00
|
|
|
entry = zswap_entry_find_get(&tree->rbroot, offset);
|
2013-07-10 23:05:03 +00:00
|
|
|
if (!entry) {
|
|
|
|
spin_unlock(&tree->lock);
|
2023-07-17 16:02:27 +00:00
|
|
|
return false;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
if (!entry->length) {
|
|
|
|
dst = kmap_atomic(page);
|
|
|
|
zswap_fill_page(dst, entry->value);
|
|
|
|
kunmap_atomic(dst);
|
2023-07-17 16:02:27 +00:00
|
|
|
ret = true;
|
2022-05-19 21:08:53 +00:00
|
|
|
goto stats;
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
}
|
|
|
|
|
2023-06-20 19:46:44 +00:00
|
|
|
zpool = zswap_find_zpool(entry);
|
|
|
|
if (!zpool_can_sleep_mapped(zpool)) {
|
2022-11-22 03:30:22 +00:00
|
|
|
tmp = kmalloc(entry->length, GFP_KERNEL);
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
if (!tmp) {
|
2023-07-17 16:02:27 +00:00
|
|
|
ret = false;
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
goto freeentry;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
/* decompress */
|
|
|
|
dlen = PAGE_SIZE;
|
2023-06-20 19:46:44 +00:00
|
|
|
src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
|
2020-12-15 03:14:18 +00:00
|
|
|
|
2023-06-20 19:46:44 +00:00
|
|
|
if (!zpool_can_sleep_mapped(zpool)) {
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
memcpy(tmp, src, entry->length);
|
|
|
|
src = tmp;
|
2023-06-20 19:46:44 +00:00
|
|
|
zpool_unmap_handle(zpool, entry->handle);
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
}
|
|
|
|
|
2020-12-15 03:14:18 +00:00
|
|
|
acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
|
|
|
|
mutex_lock(acomp_ctx->mutex);
|
|
|
|
sg_init_one(&input, src, entry->length);
|
|
|
|
sg_init_table(&output, 1);
|
|
|
|
sg_set_page(&output, page, PAGE_SIZE, 0);
|
|
|
|
acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
|
2023-07-17 16:02:27 +00:00
|
|
|
if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait))
|
|
|
|
WARN_ON(1);
|
2020-12-15 03:14:18 +00:00
|
|
|
mutex_unlock(acomp_ctx->mutex);
|
|
|
|
|
2023-06-20 19:46:44 +00:00
|
|
|
if (zpool_can_sleep_mapped(zpool))
|
|
|
|
zpool_unmap_handle(zpool, entry->handle);
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
else
|
|
|
|
kfree(tmp);
|
|
|
|
|
2023-07-17 16:02:27 +00:00
|
|
|
ret = true;
|
2022-05-19 21:08:53 +00:00
|
|
|
stats:
|
|
|
|
count_vm_event(ZSWPIN);
|
2022-05-19 21:08:53 +00:00
|
|
|
if (entry->objcg)
|
|
|
|
count_objcg_event(entry->objcg, ZSWPIN);
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
freeentry:
|
2013-07-10 23:05:03 +00:00
|
|
|
spin_lock(&tree->lock);
|
2023-07-17 16:02:27 +00:00
|
|
|
if (ret && zswap_exclusive_loads_enabled) {
|
2023-06-07 19:51:43 +00:00
|
|
|
zswap_invalidate_entry(tree, entry);
|
2023-07-15 04:23:43 +00:00
|
|
|
folio_mark_dirty(folio);
|
2023-06-12 09:38:13 +00:00
|
|
|
} else if (entry->length) {
|
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-12 09:38:09 +00:00
|
|
|
spin_lock(&entry->pool->lru_lock);
|
|
|
|
list_move(&entry->lru, &entry->pool->lru);
|
|
|
|
spin_unlock(&entry->pool->lru_lock);
|
2023-06-07 19:51:43 +00:00
|
|
|
}
|
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before
returning from zswap_frontswap_load(), after dropping the local reference.
However, the tree lock is dropped during decompression after the local
reference is acquired, so the entry could be invalidated before we drop
the local ref. If this happens, the entry is freed once we drop the local
ref, and zswap_invalidate_entry() tries to invalidate an already freed
entry.
Fix this by:
(a) Making sure zswap_invalidate_entry() is always called with a local
ref held, to avoid being called on a freed entry.
(b) Making sure zswap_invalidate_entry() only drops the ref if the entry
was actually on the rbtree. Otherwise, another invalidation could
have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to
make sure the entry still exists in the tree in zswap_reclaim_entry()
before invalidating it, as zswap_reclaim_entry() will make this check
internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com
Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-21 09:30:09 +00:00
|
|
|
zswap_entry_put(tree, entry);
|
2013-07-10 23:05:03 +00:00
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
|
mm/zswap: add the flag can_sleep_mapped
Patch series "Fix the compatibility of zsmalloc and zswap".
Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.
The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context. So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.
This patch (of 2):
Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false. Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.
[natechancellor@gmail.com: add return value in zswap_frontswap_load]
Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 01:18:17 +00:00
|
|
|
return ret;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
|
2023-07-17 16:02:27 +00:00
|
|
|
void zswap_invalidate(int type, pgoff_t offset)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
|
|
|
struct zswap_entry *entry;
|
|
|
|
|
|
|
|
/* find */
|
|
|
|
spin_lock(&tree->lock);
|
|
|
|
entry = zswap_rb_search(&tree->rbroot, offset);
|
|
|
|
if (!entry) {
|
|
|
|
/* entry was written back */
|
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
return;
|
|
|
|
}
|
2023-06-07 19:51:43 +00:00
|
|
|
zswap_invalidate_entry(tree, entry);
|
2013-07-10 23:05:03 +00:00
|
|
|
spin_unlock(&tree->lock);
|
|
|
|
}
|
|
|
|
|
2023-07-17 16:02:27 +00:00
|
|
|
void zswap_swapon(int type)
|
|
|
|
{
|
|
|
|
struct zswap_tree *tree;
|
|
|
|
|
|
|
|
tree = kzalloc(sizeof(*tree), GFP_KERNEL);
|
|
|
|
if (!tree) {
|
|
|
|
pr_err("alloc failed, zswap disabled for swap type %d\n", type);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
tree->rbroot = RB_ROOT;
|
|
|
|
spin_lock_init(&tree->lock);
|
|
|
|
zswap_trees[type] = tree;
|
|
|
|
}
|
|
|
|
|
|
|
|
void zswap_swapoff(int type)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
|
|
|
struct zswap_tree *tree = zswap_trees[type];
|
2013-09-11 21:25:33 +00:00
|
|
|
struct zswap_entry *entry, *n;
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
if (!tree)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* walk the tree and free everything */
|
|
|
|
spin_lock(&tree->lock);
|
2013-11-12 23:08:27 +00:00
|
|
|
rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode)
|
2014-04-07 22:38:27 +00:00
|
|
|
zswap_free_entry(entry);
|
2013-07-10 23:05:03 +00:00
|
|
|
tree->rbroot = RB_ROOT;
|
|
|
|
spin_unlock(&tree->lock);
|
2013-10-16 20:46:54 +00:00
|
|
|
kfree(tree);
|
|
|
|
zswap_trees[type] = NULL;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* debugfs functions
|
|
|
|
**********************************/
|
|
|
|
#ifdef CONFIG_DEBUG_FS
|
|
|
|
#include <linux/debugfs.h>
|
|
|
|
|
|
|
|
static struct dentry *zswap_debugfs_root;
|
|
|
|
|
2023-04-03 12:13:18 +00:00
|
|
|
static int zswap_debugfs_init(void)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
|
|
|
if (!debugfs_initialized())
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
zswap_debugfs_root = debugfs_create_dir("zswap", NULL);
|
|
|
|
|
2018-06-14 22:27:58 +00:00
|
|
|
debugfs_create_u64("pool_limit_hit", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_pool_limit_hit);
|
|
|
|
debugfs_create_u64("reject_reclaim_fail", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_reclaim_fail);
|
|
|
|
debugfs_create_u64("reject_alloc_fail", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_alloc_fail);
|
|
|
|
debugfs_create_u64("reject_kmemcache_fail", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_kmemcache_fail);
|
2023-10-24 23:45:09 +00:00
|
|
|
debugfs_create_u64("reject_compress_fail", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_compress_fail);
|
2018-06-14 22:27:58 +00:00
|
|
|
debugfs_create_u64("reject_compress_poor", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_reject_compress_poor);
|
|
|
|
debugfs_create_u64("written_back_pages", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_written_back_pages);
|
|
|
|
debugfs_create_u64("duplicate_entry", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_duplicate_entry);
|
|
|
|
debugfs_create_u64("pool_total_size", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_pool_total_size);
|
|
|
|
debugfs_create_atomic_t("stored_pages", 0444,
|
|
|
|
zswap_debugfs_root, &zswap_stored_pages);
|
zswap: same-filled pages handling
Zswap is a cache which compresses the pages that are being swapped out
and stores them into a dynamically allocated RAM-based memory pool.
Experiments have shown that around 10-20% of pages stored in zswap are
same-filled pages (i.e. contents of the page are all same), but these
pages are handled as normal pages by compressing and allocating memory
in the pool.
This patch adds a check in zswap_frontswap_store() to identify
same-filled page before compression of the page. If the page is a
same-filled page, set zswap_entry.length to zero, save the same-filled
value and skip the compression of the page and alloction of memory in
zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
is zero corresponding to the page to be loaded. If zswap_entry.length
is zero, fill the page with same-filled value. This saves the
decompression time during load.
On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
relaunching different applications, out of ~64000 pages stored in zswap,
~11000 pages were same-value filled pages (including zero-filled pages)
and ~9000 pages were zero-filled pages.
An average of 17% of pages(including zero-filled pages) in zswap are
same-value filled pages and 14% pages are zero-filled pages. An average
of 3% of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 26.5ms 18ms 32%
(of same value pages)
*Zswap Load Time
(of same value pages) 25.5ms 13ms 49%
-----------------------------------------------------------------
On Ubuntu PC with 2GB RAM, while executing kernel build and other test
scripts and running multimedia applications, out of 360000 pages stored
in zswap 78000(~22%) of pages were found to be same-value filled pages
(including zero-filled pages) and 64000(~17%) are zero-filled pages. So
an average of %5 of pages are same-filled non-zero pages.
The below table shows the execution time profiling with the patch.
Baseline With patch % Improvement
-----------------------------------------------------------------
*Zswap Store Time 91ms 74ms 19%
(of same value pages)
*Zswap Load Time 50ms 7.5ms 85%
(of same value pages)
-----------------------------------------------------------------
*The execution times may vary with test device used.
Dan said:
: I did test this patch out this week, and I added some instrumentation to
: check the performance impact, and tested with a small program to try to
: check the best and worst cases.
:
: When doing a lot of swap where all (or almost all) pages are same-value, I
: found this patch does save both time and space, significantly. The exact
: improvement in time and space depends on which compressor is being used,
: but roughly agrees with the numbers you listed.
:
: In the worst case situation, where all (or almost all) pages have the
: same-value *except* the final long (meaning, zswap will check each long on
: the entire page but then still have to pass the page to the compressor),
: the same-value check is around 10-15% of the total time spent in
: zswap_frontswap_store(). That's a not-insignificant amount of time, but
: it's not huge. Considering that most systems will probably be swapping
: pages that aren't similar to the worst case (although I don't have any
: data to know that), I'd say the improvement is worth the possible
: worst-case performance impact.
[srividya.dr@samsung.com: add memset_l instead of for loop]
Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
Signed-off-by: Srividya Desireddy <srividya.dr@samsung.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
Cc: SHARAN ALLUR <sharan.allur@samsung.com>
Cc: RAJIB BASU <rajib.basu@samsung.com>
Cc: JUHUN KIM <juhunkim@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 00:15:59 +00:00
|
|
|
debugfs_create_atomic_t("same_filled_pages", 0444,
|
2018-06-14 22:27:58 +00:00
|
|
|
zswap_debugfs_root, &zswap_same_filled_pages);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#else
|
2023-04-03 12:13:18 +00:00
|
|
|
static int zswap_debugfs_init(void)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*********************************
|
|
|
|
* module init and exit
|
|
|
|
**********************************/
|
2023-04-03 12:13:18 +00:00
|
|
|
static int zswap_setup(void)
|
2013-07-10 23:05:03 +00:00
|
|
|
{
|
2015-09-09 22:35:19 +00:00
|
|
|
struct zswap_pool *pool;
|
2016-11-26 23:13:39 +00:00
|
|
|
int ret;
|
2014-04-07 22:38:27 +00:00
|
|
|
|
2023-04-03 12:13:16 +00:00
|
|
|
zswap_entry_cache = KMEM_CACHE(zswap_entry, 0);
|
|
|
|
if (!zswap_entry_cache) {
|
2013-07-10 23:05:03 +00:00
|
|
|
pr_err("entry cache creation failed\n");
|
2015-09-09 22:35:19 +00:00
|
|
|
goto cache_fail;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2016-11-26 23:13:39 +00:00
|
|
|
ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
|
|
|
|
zswap_dstmem_prepare, zswap_dstmem_dead);
|
|
|
|
if (ret) {
|
2015-09-09 22:35:19 +00:00
|
|
|
pr_err("dstmem alloc failed\n");
|
|
|
|
goto dstmem_fail;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2016-11-26 23:13:40 +00:00
|
|
|
ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
|
|
|
|
"mm/zswap_pool:prepare",
|
|
|
|
zswap_cpu_comp_prepare,
|
|
|
|
zswap_cpu_comp_dead);
|
|
|
|
if (ret)
|
|
|
|
goto hp_fail;
|
|
|
|
|
2015-09-09 22:35:19 +00:00
|
|
|
pool = __zswap_pool_create_fallback();
|
2017-02-27 22:26:47 +00:00
|
|
|
if (pool) {
|
|
|
|
pr_info("loaded using pool %s/%s\n", pool->tfm_name,
|
2023-06-20 19:46:44 +00:00
|
|
|
zpool_get_type(pool->zpools[0]));
|
2017-02-27 22:26:47 +00:00
|
|
|
list_add(&pool->list, &zswap_pools);
|
|
|
|
zswap_has_pool = true;
|
|
|
|
} else {
|
2015-09-09 22:35:19 +00:00
|
|
|
pr_err("pool creation failed\n");
|
2017-02-27 22:26:47 +00:00
|
|
|
zswap_enabled = false;
|
2013-07-10 23:05:03 +00:00
|
|
|
}
|
2014-04-07 22:38:27 +00:00
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
shrink_wq = create_workqueue("zswap-shrink");
|
|
|
|
if (!shrink_wq)
|
|
|
|
goto fallback_fail;
|
|
|
|
|
2013-07-10 23:05:03 +00:00
|
|
|
if (zswap_debugfs_init())
|
|
|
|
pr_warn("debugfs initialization failed\n");
|
2023-04-03 12:13:17 +00:00
|
|
|
zswap_init_state = ZSWAP_INIT_SUCCEED;
|
2013-07-10 23:05:03 +00:00
|
|
|
return 0;
|
2015-09-09 22:35:19 +00:00
|
|
|
|
2020-01-31 06:15:04 +00:00
|
|
|
fallback_fail:
|
2020-01-31 06:15:07 +00:00
|
|
|
if (pool)
|
|
|
|
zswap_pool_destroy(pool);
|
2016-11-26 23:13:40 +00:00
|
|
|
hp_fail:
|
2016-11-26 23:13:39 +00:00
|
|
|
cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
|
2015-09-09 22:35:19 +00:00
|
|
|
dstmem_fail:
|
2023-04-03 12:13:16 +00:00
|
|
|
kmem_cache_destroy(zswap_entry_cache);
|
2015-09-09 22:35:19 +00:00
|
|
|
cache_fail:
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
/* if built-in, we aren't unloaded on failure; don't allow use */
|
2023-04-03 12:13:17 +00:00
|
|
|
zswap_init_state = ZSWAP_INIT_FAILED;
|
zswap: disable changing params if init fails
Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false. Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.
Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs. Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist. This prevents that by
immediately exiting any of the param callbacks if initialization failed.
This was reported here:
https://marc.info/?l=linux-mm&m=147004228125528&w=4
And fixes this WARNING:
[ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
The warning is just noise, and not serious. However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache. The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.
If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).
Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 21:13:09 +00:00
|
|
|
zswap_enabled = false;
|
2013-07-10 23:05:03 +00:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2023-04-03 12:13:18 +00:00
|
|
|
|
|
|
|
static int __init zswap_init(void)
|
|
|
|
{
|
|
|
|
if (!zswap_enabled)
|
|
|
|
return 0;
|
|
|
|
return zswap_setup();
|
|
|
|
}
|
2013-07-10 23:05:03 +00:00
|
|
|
/* must be late so crypto has time to come up */
|
2023-04-03 12:13:18 +00:00
|
|
|
late_initcall(zswap_init);
|
2013-07-10 23:05:03 +00:00
|
|
|
|
2014-11-13 03:08:46 +00:00
|
|
|
MODULE_AUTHOR("Seth Jennings <sjennings@variantweb.net>");
|
2013-07-10 23:05:03 +00:00
|
|
|
MODULE_DESCRIPTION("Compressed cache for swap pages");
|