Writing to file-backed mappings which require folio dirty tracking using
GUP is a fundamentally broken operation, as kernel write access to GUP
mappings do not adhere to the semantics expected by a file system.
A GUP caller uses the direct mapping to access the folio, which does not
cause write notify to trigger, nor does it enforce that the caller marks
the folio dirty.
The problem arises when, after an initial write to the folio, writeback
results in the folio being cleaned and then the caller, via the GUP
interface, writes to the folio again.
As a result of the use of this secondary, direct, mapping to the folio no
write notify will occur, and if the caller does mark the folio dirty, this
will be done so unexpectedly.
For example, consider the following scenario:-
1. A folio is written to via GUP which write-faults the memory, notifying
the file system and dirtying the folio.
2. Later, writeback is triggered, resulting in the folio being cleaned and
the PTE being marked read-only.
3. The GUP caller writes to the folio, as it is mapped read/write via the
direct mapping.
4. The GUP caller, now done with the page, unpins it and sets it dirty
(though it does not have to).
This results in both data being written to a folio without writenotify,
and the folio being dirtied unexpectedly (if the caller decides to do so).
This issue was first reported by Jan Kara [1] in 2018, where the problem
resulted in file system crashes.
This is only relevant when the mappings are file-backed and the underlying
file system requires folio dirty tracking. File systems which do not,
such as shmem or hugetlb, are not at risk and therefore can be written to
without issue.
Unfortunately this limitation of GUP has been present for some time and
requires future rework of the GUP API in order to provide correct write
access to such mappings.
However, for the time being we introduce this check to prevent the most
egregious case of this occurring, use of the FOLL_LONGTERM pin.
These mappings are considerably more likely to be written to after folios
are cleaned and thus simply must not be permitted to do so.
This patch changes only the slow-path GUP functions, a following patch
adapts the GUP-fast path along similar lines.
[1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz/
Link: https://lkml.kernel.org/r/7282506742d2390c125949c2f9894722750bb68a.1683235180.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Mika Penttilä <mpenttil@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Kirill A . Shutemov <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/gup: disallow GUP writing to file-backed mappings by
default", v9.
Writing to file-backed mappings which require folio dirty tracking using
GUP is a fundamentally broken operation, as kernel write access to GUP
mappings do not adhere to the semantics expected by a file system.
A GUP caller uses the direct mapping to access the folio, which does not
cause write notify to trigger, nor does it enforce that the caller marks
the folio dirty.
The problem arises when, after an initial write to the folio, writeback
results in the folio being cleaned and then the caller, via the GUP
interface, writes to the folio again.
As a result of the use of this secondary, direct, mapping to the folio no
write notify will occur, and if the caller does mark the folio dirty, this
will be done so unexpectedly.
For example, consider the following scenario:-
1. A folio is written to via GUP which write-faults the memory, notifying
the file system and dirtying the folio.
2. Later, writeback is triggered, resulting in the folio being cleaned and
the PTE being marked read-only.
3. The GUP caller writes to the folio, as it is mapped read/write via the
direct mapping.
4. The GUP caller, now done with the page, unpins it and sets it dirty
(though it does not have to).
This change updates both the PUP FOLL_LONGTERM slow and fast APIs. As
pin_user_pages_fast_only() does not exist, we can rely on a slightly
imperfect whitelisting in the PUP-fast case and fall back to the slow case
should this fail.
This patch (of 3):
vma_wants_writenotify() is specifically intended for setting PTE page
table flags, accounting for existing page table flag state and whether the
underlying filesystem performs dirty tracking for a file-backed mapping.
Everything is predicated firstly on whether the mapping is shared
writable, as this is the only instance where dirty tracking is pertinent -
MAP_PRIVATE mappings will always be CoW'd and unshared, and read-only
file-backed shared mappings cannot be written to, even with FOLL_FORCE.
All other checks are in line with existing logic, though now separated
into checks eplicitily for dirty tracking and those for determining how to
set page table flags.
We make this change so we can perform checks in the GUP logic to determine
which mappings might be problematic when written to.
Link: https://lkml.kernel.org/r/cover.1683235180.git.lstoakes@gmail.com
Link: https://lkml.kernel.org/r/0f218370bd49b4e6bbfbb499f7c7b92c26ba1ceb.1683235180.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Mika Penttilä <mpenttil@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Kirill A . Shutemov <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use helper macro FLUSH_TIME to indicate the flush time to improve the
readability a bit. No functional change intended.
Link: https://lkml.kernel.org/r/20230603072116.1101690-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove some unneeded header files. No functional change intended.
Link: https://lkml.kernel.org/r/20230603112558.213694-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The loser of a race to service a pte for a device private entry in the
swap path previously unlocked the ptl, but failed to unmap the pte. This
only affects highmem systems since unmapping a pte is a noop on
non-highmem systems.
Link: https://lkml.kernel.org/r/20230602092949.545577-5-ryan.roberts@arm.com
Fixes: 16ce101db8 ("mm/memory.c: fix race when faulting a device private page")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With the fix in place to atomically test and clear young on ptes and pmds,
simplify the code to handle the clearing for both the primary mmu and the
mmu notifier with a single API call.
Link: https://lkml.kernel.org/r/20230602092949.545577-4-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It is racy to non-atomically read a pte, then clear the young bit, then
write it back as this could discard dirty information. Further, it is bad
practice to directly set a pte entry within a table. Instead clearing
young must go through the arch-provided helper,
ptep_test_and_clear_young() to ensure it is modified atomically and to
give the arch code visibility and allow it to check (and potentially
modify) the operation.
Link: https://lkml.kernel.org/r/20230602092949.545577-3-ryan.roberts@arm.com
Fixes: 3f49584b26 ("mm/damon: implement primitives for the virtual memory address spaces").
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fixes for pte encapsulation bypasses", v3.
A series to improve the encapsulation of pte entries by disallowing
non-arch code from directly dereferencing pte_t pointers.
This patch (of 4):
It is bad practice to directly set pte entries within a pte table.
Instead all modifications must go through arch-provided helpers such as
set_pte_at() to give the arch code visibility and allow it to check (and
potentially modify) the operation.
Link: https://lkml.kernel.org/r/20230602092949.545577-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230602092949.545577-2-ryan.roberts@arm.com
Fixes: 3e9a9e256b ("mm: add a vmap_pfn function")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A customer provided evidence indicating that a process
was stalled in direct reclaim:
- The process was trapped in throttle_direct_reclaim().
The function wait_event_killable() was called to wait condition
allow_direct_reclaim(pgdat) for current node to be true.
The allow_direct_reclaim(pgdat) examined the number of free pages
on the node by zone_page_state() which just returns value in
zone->vm_stat[NR_FREE_PAGES].
- On node #1, zone->vm_stat[NR_FREE_PAGES] was 0.
However, the freelist on this node was not empty.
- This inconsistent of vmstat value was caused by percpu vmstat on
nohz_full cpus. Every increment/decrement of vmstat is performed
on percpu vmstat counter at first, then pooled diffs are cumulated
to the zone's vmstat counter in timely manner. However, on nohz_full
cpus (in case of this customer's system, 48 of 52 cpus) these pooled
diffs were not cumulated once the cpu had no event on it so that
the cpu started sleeping infinitely.
I checked percpu vmstat and found there were total 69 counts not
cumulated to the zone's vmstat counter yet.
- In this situation, kswapd did not help the trapped process.
In pgdat_balanced(), zone_wakermark_ok_safe() examined the number
of free pages on the node by zone_page_state_snapshot() which
checks pending counts on percpu vmstat.
Therefore kswapd could know there were 69 free pages correctly.
Since zone->_watermark = {8, 20, 32}, kswapd did not work because
69 was greater than 32 as high watermark.
Change allow_direct_reclaim to use zone_page_state_snapshot, which
allows a more precise version of the vmstat counters to be used.
allow_direct_reclaim will only be called from try_to_free_pages,
which is not a hot path.
Testing: Due to difficulties accessing the system, it has not been
possible for the reproducer to test the patch (however its
clear from available data and analysis that it should fix it).
Link: https://lkml.kernel.org/r/20230530145335.677325196@redhat.com
Reviewed-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.
Link: https://lkml.kernel.org/r/20230601145904.1385409-11-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a helper dealing with handling the syncing of a buffered write
fallback for direct I/O.
Link: https://lkml.kernel.org/r/20230601145904.1385409-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use the common helpers for direct I/O page invalidation instead of open
coding the logic. This leads to a slight reordering of checks in
__iomap_dio_rw to keep the logic straight.
Link: https://lkml.kernel.org/r/20230601145904.1385409-9-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.
Link: https://lkml.kernel.org/r/20230601145904.1385409-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a helper to invalidate page cache after a dio write.
Link: https://lkml.kernel.org/r/20230601145904.1385409-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_range for the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.
Link: https://lkml.kernel.org/r/20230601145904.1385409-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Factor out a helper that does filemap_write_and_wait_range for the range
covered by a read kiocb, or returns -EAGAIN if the kiocb is marked as
nowait and there would be pages to write.
Link: https://lkml.kernel.org/r/20230601145904.1385409-5-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All callers of generic_perform_write need to updated ki_pos, move it into
common code.
Link: https://lkml.kernel.org/r/20230601145904.1385409-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move the ki_pos update down a bit to prepare for a better common helper
that invalidates pages based of an iocb.
Link: https://lkml.kernel.org/r/20230601145904.1385409-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "cleanup the filemap / direct I/O interaction", v4.
This series cleans up some of the generic write helper calling conventions
and the page cache writeback / invalidation for direct I/O. This is a
spinoff from the no-bufferhead kernel project, for which we'll want to an
use iomap based buffered write path in the block layer.
This patch (of 12):
The last user of current->backing_dev_info disappeared in commit
b9b1335e64 ("remove bdi_congested() and wb_congested() and related
functions"). Remove the field and all assignments to it.
Link: https://lkml.kernel.org/r/20230601145904.1385409-1-hch@lst.de
Link: https://lkml.kernel.org/r/20230601145904.1385409-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This update addresses an issue with the zswap reclaim mechanism, which
hinders the efficient offloading of cold pages to disk, thereby
compromising the preservation of the LRU order and consequently
diminishing, if not inverting, its performance benefits.
The functioning of the zswap shrink worker was found to be inadequate, as
shown by basic benchmark test. For the test, a kernel build was utilized
as a reference, with its memory confined to 1G via a cgroup and a 5G swap
file provided. The results are presented below, these are averages of
three runs without the use of zswap:
real 46m26s
user 35m4s
sys 7m37s
With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G
system), the results changed to:
real 56m4s
user 35m13s
sys 8m43s
written_back_pages: 18
reject_reclaim_fail: 0
pool_limit_hit:1478
Besides the evident regression, one thing to notice from this data is the
extremely low number of written_back_pages and pool_limit_hit.
The pool_limit_hit counter, which is increased in zswap_frontswap_store
when zswap is completely full, doesn't account for a particular scenario:
once zswap hits his limit, zswap_pool_reached_full is set to true; with
this flag on, zswap_frontswap_store rejects pages if zswap is still above
the acceptance threshold. Once we include the rejections due to
zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478
to a significant 21578266.
Zswap is stuck in an undesirable state where it rejects pages because it's
above the acceptance threshold, yet fails to attempt memory reclaimation.
This happens because the shrink work is only queued when
zswap_frontswap_store detects that it's full and the work itself only
reclaims one page per run.
This state results in hot pages getting written directly to disk, while
cold ones remain memory, waiting only to be invalidated. The LRU order is
completely broken and zswap ends up being just an overhead without
providing any benefits.
This commit applies 2 changes: a) the shrink worker is set to reclaim
pages until the acceptance threshold is met and b) the task is also
enqueued when zswap is not full but still above the threshold.
Testing this suggested update showed much better numbers:
real 36m37s
user 35m8s
sys 9m32s
written_back_pages: 10459423
reject_reclaim_fail: 12896
pool_limit_hit: 75653
Link: https://lkml.kernel.org/r/20230526183227.793977-1-cerasuolodomenico@gmail.com
Fixes: 45190f01dd ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pageblock_order only needs to be set once, there is no need to initialize
it in every zone/node.
Link: https://lkml.kernel.org/r/20230601063536.26882-1-haifeng.xu@shopee.com
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In __khugepaged_enter(), if "mm->flags" with MMF_VM_HUGEPAGE bit is set,
the "mm_slot" will be released and return, so we can call mm_slot_alloc()
after test_and_set_bit().
Link: https://lkml.kernel.org/r/20230531095817.11012-1-xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foudation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 73444bc4d8 ("mm, page_alloc: do not wake kswapd with zone lock
held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
using ZONE_BOOSTED_WATERMARK flag.
Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
__GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
risk of deadlock. But since zone->flags is a shared variable, a thread
doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag
being set immediately after another thread doing __GFP_KSWAPD_RECLAIM
allocation request set this flag, causing possibility of deadlock.
Link: https://lkml.kernel.org/r/c3c3dacf-dd3b-77c9-f96a-d0982b4b2a4f@I-love.SAKURA.ne.jp
Fixes: 73444bc4d8 ("mm, page_alloc: do not wake kswapd with zone lock held")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
free_area_init_memoryless_node() is just a wrapper of
free_area_init_node(), remove it to clean up.
Link: https://lkml.kernel.org/r/20230528045720.4835-1-haifeng.xu@shopee.com
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
free_transhuge_page() acquires split queue lock then check whether the THP
was added to deferred list or not. It brings high deferred queue lock
contention.
It's safe to check whether the THP is in deferred list or not without
holding the deferred queue lock in free_transhuge_page() because when code
hit free_transhuge_page(), there is no one tries to add the folio to
_deferred_list.
Running page_fault1 of will-it-scale + order 2 folio for anonymous
mapping with 96 processes on an Ice Lake 48C/96T test box, we could
see the 61% split_queue_lock contention:
- 63.02% 0.01% page_fault1_pro [kernel.kallsyms] [k] free_transhuge_page
- 63.01% free_transhuge_page
+ 62.91% _raw_spin_lock_irqsave
With this patch applied, the split_queue_lock contention is less
than 1%.
Link: https://lkml.kernel.org/r/20230429082759.1600796-2-fengwei.yin@intel.com
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The general rule to use a swap entry is as follows.
When we get a swap entry, if there aren't some other ways to prevent
swapoff, such as the folio in swap cache is locked, page table lock is
held, etc., the swap entry may become invalid because of swapoff.
Then, we need to enclose all swap related functions with
get_swap_device() and put_swap_device(), unless the swap functions
call get/put_swap_device() by themselves.
Add the rule as comments of get_swap_device().
Link: https://lkml.kernel.org/r/20230529061355.125791-6-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
__swap_duplicate() is called by
- swap_shmem_alloc(): the folio in swap cache is locked.
- copy_nonpresent_pte() -> swap_duplicate() and try_to_unmap_one() ->
swap_duplicate(): the page table lock is held.
- __read_swap_cache_async() -> swapcache_prepare(): enclosed with
get/put_swap_device() in __read_swap_cache_async() already.
So, it's safe to remove get/put_swap_device() in __swap_duplicate().
Link: https://lkml.kernel.org/r/20230529061355.125791-5-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
__swp_swapcount() just encloses the calling to swap_swapcount() with
get/put_swap_device(). It is called in __read_swap_cache_async() only,
which encloses the calling with get/put_swap_device() already. So,
__read_swap_cache_async() can call swap_swapcount() directly.
Link: https://lkml.kernel.org/r/20230529061355.125791-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This makes the function a little easier to be understood because we don't
need to consider swapoff. And this makes it possible to remove
get/put_swap_device() calling in some functions called by
__read_swap_cache_async().
Link: https://lkml.kernel.org/r/20230529061355.125791-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "swap: cleanup get/put_swap_device() usage", v3.
The general rule to use a swap entry is as follows.
When we get a swap entry, if there aren't some other ways to prevent
swapoff, such as the folio in swap cache is locked, page table lock is
held, etc., the swap entry may become invalid because of swapoff. Then,
we need to enclose all swap related functions with get_swap_device() and
put_swap_device(), unless the swap functions call get/put_swap_device() by
themselves.
Based on the above rule, all get/put_swap_device() usage are checked and
cleaned up if necessary.
This patch (of 5):
get/put_swap_device() are added to __swap_count() in commit
eb085574a7 ("mm, swap: fix race between swapoff and some swap
operations"). Later, in commit 2799e77529 ("swap: fix
do_swap_page() race with swapoff"), get/put_swap_device() are added to
do_swap_page(). And they enclose the only call site of
__swap_count(). So, it's safe to remove get/put_swap_device() in
__swap_count() now.
Link: https://lkml.kernel.org/r/20230529061355.125791-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20230529061355.125791-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chris Li (Google) <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In calculate_node_totalpages(), zone_start_pfn/zone_end_pfn are already
calculated in zone_spanned_pages_in_node(), so use them as parameters
instead of node_start_pfn/node_end_pfn and the duplicated calculation
process can de dropped.
Link: https://lkml.kernel.org/r/20230526085251.1977-2-haifeng.xu@shopee.com
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Haifeng Xu <haifeng.xu@shopee.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, no matter whether a node actually has memory or not,
calculate_node_totalpages() is used to account number of pages in
zone/node. However, for node without memory, these unnecessary
calculations can be skipped. All the zone/node page counts can be set to
0 directly. So introduce reset_memoryless_node_totalpages() to perform
this action.
Furthermore, calculate_node_totalpages() only gets called for the node
with memory.
Link: https://lkml.kernel.org/r/20230526085251.1977-1-haifeng.xu@shopee.com
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a section for covering DAMON modules layer to the design document.
Link: https://lkml.kernel.org/r/20230525214314.5204-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a section covering the API of DAMON core layer on the design document.
Link: https://lkml.kernel.org/r/20230525214314.5204-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add sections for advanced features of DAMOS including quotas,
prioritization, watermarks, and filters of DAMOS on the design document.
Link: https://lkml.kernel.org/r/20230525214314.5204-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMOS is an important part of DAMON, but the design doc is not covering
it. Add sections for covering the basic part of DAMOS.
Link: https://lkml.kernel.org/r/20230525214314.5204-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add overall desription of the interface and the relation between the Core
and the Modules layer under 'Overall Architecture' section.
Link: https://lkml.kernel.org/r/20230525214314.5204-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The 'Configurable Operations Set' section is a little bit outdated.
Update the text.
Link: https://lkml.kernel.org/r/20230525214314.5204-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMON design document is describing only the operations set layer and
monitoring part of the core logic. Update the layout based on the DAMON's
layers, so that more parts of DAMON including DAMOS core logic and DAMON
modules can easily be added.
Link: https://lkml.kernel.org/r/20230525214314.5204-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The design doc is missing overall picture of DAMON. Add a section for
overall architeucture and layers.
Link: https://lkml.kernel.org/r/20230525214314.5204-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix a few typos and grammar erros in DAMON Maintainer Profile document.
Link: https://lkml.kernel.org/r/20230525214314.5204-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Docs/mm/damon: Minor fixes and design doc update".
Some of the DAMON documents are outdated, or having minor typos or grammar
erros. Especially, the design doc has not updated for DAMOS, which is an
important part of DAMON. Fix the minor issues and update documents.
This patch (of 10):
The first two questions of DAMON faqs have raised when DAMON patches were
first submitted. More than one year has passed since DAMON patches get
merged in the mainline, and that kind of questions are not asked nowadays.
Remove the questions.
Link: https://lkml.kernel.org/r/20230525214314.5204-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20230525214314.5204-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
On Android app cycle workloads, MGLRU showed a significant reduction in
workingset refaults although pgpgin/pswpin remained relatively unchanged.
This indicated MGLRU may be undercounting workingset refaults.
This has impact on userspace programs, like Android's LMKD, that monitor
workingset refault statistics to detect thrashing.
It was found that refaults were only accounted if the MGLRU shadow entry
was for a recently evicted folio. However, recently evicted folios should
be accounted as workingset activation, and refaults should be accounted
regardless of recency.
Fix MGLRU's workingset refault and activation accounting to more closely
match that of the conventional active/inactive LRU.
Link: https://lkml.kernel.org/r/20230523205922.3852731-1-kaleshsingh@google.com
Fixes: ac35a49023 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Relocate the declaration of mas_empty_area_rev() so that mas_empty_area()
and mas_empty_area_rev() are together.
Link: https://lkml.kernel.org/r/20230524031247.65949-11-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Get whether the two gaps to be overwritten are empty to avoid calling
mas_update_gap() all the time. Also clean up the code and add comments.
Link: https://lkml.kernel.org/r/20230524031247.65949-9-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add comment for mas_wr_append(), move mas_update_gap() into
mas_wr_append(), and other cleanups to make mas_wr_modify() cleaner.
Link: https://lkml.kernel.org/r/20230524031247.65949-8-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>