mainlining shenanigans
Go to file
Huang Ying b99a342d4f NUMA balancing: reduce TLB flush via delaying mapping on hint page fault
With NUMA balancing, in hint page fault handler, the faulting page will be
migrated to the accessing node if necessary.  During the migration, TLB
will be shot down on all CPUs that the process has run on recently.
Because in the hint page fault handler, the PTE will be made accessible
before the migration is tried.  The overhead of TLB shooting down can be
high, so it's better to be avoided if possible.  In fact, if we delay
mapping the page until migration, that can be avoided.  This is what this
patch doing.

For the multiple threads applications, it's possible that a page is
accessed by multiple threads almost at the same time.  In the original
implementation, because the first thread will install the accessible PTE
before migrating the page, the other threads may access the page directly
before the page is made inaccessible again during migration.  While with
the patch, the second thread will go through the page fault handler too.
And because of the PageLRU() checking in the following code path,

  migrate_misplaced_page()
    numamigrate_isolate_page()
      isolate_lru_page()

the migrate_misplaced_page() will return 0, and the PTE will be made
accessible in the second thread.

This will introduce a little more overhead.  But we think the possibility
for a page to be accessed by the multiple threads at the same time is low,
and the overhead difference isn't too large.  If this becomes a problem in
some workloads, we need to consider how to reduce the overhead.

To test the patch, we run a test case as follows on a 2-socket Intel
server (1 NUMA node per socket) with 128GB DRAM (64GB per socket).

1. Run a memory eater on NUMA node 1 to use 40GB memory before running
   pmbench.

2. Run pmbench (normal accessing pattern) with 8 processes, and 8
   threads per process, so there are 64 threads in total.  The
   working-set size of each process is 8960MB, so the total working-set
   size is 8 * 8960MB = 70GB.  The CPU of all pmbench processes is bound
   to node 1.  The pmbench processes will access some DRAM on node 0.

3. After the pmbench processes run for 10 seconds, kill the memory
   eater.  Now, some pages will be migrated from node 0 to node 1 via
   NUMA balancing.

Test results show that, with the patch, the pmbench throughput (page
accesses/s) increases 5.5%.  The number of the TLB shootdowns interrupts
reduces 98% (from ~4.7e7 to ~9.7e5) with about 9.2e6 pages (35.8GB)
migrated.  From the perf profile, it can be found that the CPU cycles
spent by try_to_unmap() and its callees reduces from 6.02% to 0.47%.  That
is, the CPU cycles spent by TLB shooting down decreases greatly.

Link: https://lkml.kernel.org/r/20210408132236.1175607-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Matthew Wilcox" <willy@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:39 -07:00
arch x86/vmemmap: optimize for consecutive sections in partial populated PMDs 2021-04-30 11:20:38 -07:00
block cgroup: rstat: punt root-level optimization to individual controllers 2021-04-30 11:20:37 -07:00
certs certs: add 'x509_revocation_list' to gitignore 2021-04-26 10:48:07 -07:00
crypto for-5.13/drivers-2021-04-27 2021-04-28 14:39:37 -07:00
Documentation mm: gup: remove FOLL_SPLIT 2021-04-30 11:20:37 -07:00
drivers i915: fix remap_io_sg to verify the pgprot 2021-04-30 11:20:39 -07:00
fs iomap: use filemap_range_needs_writeback() for O_DIRECT reads 2021-04-30 11:20:36 -07:00
include mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
init Kconfig updates for v5.13 2021-04-29 14:32:00 -07:00
ipc fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
kernel cgroup: rstat: punt root-level optimization to individual controllers 2021-04-30 11:20:37 -07:00
lib mm/memtest: add ARCH_USE_MEMTEST 2021-04-30 11:20:36 -07:00
LICENSES LICENSES: Add the CC-BY-4.0 license 2020-12-08 10:33:27 -07:00
mm NUMA balancing: reduce TLB flush via delaying mapping on hint page fault 2021-04-30 11:20:39 -07:00
net Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
samples kfifo: fix ternary sign extension bugs 2021-04-30 11:20:35 -07:00
scripts scripts: a new script for checking duplicate struct declaration 2021-04-30 11:20:35 -07:00
security Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
sound - Core Frameworks 2021-04-28 15:59:13 -07:00
tools kselftests: cgroup: update kmem test for new vmstat implementation 2021-04-30 11:20:38 -07:00
usr Kbuild updates for v5.12 2021-02-25 10:17:31 -08:00
virt KVM: x86/mmu: Consider the hva in mmu_notifier retry 2021-02-22 13:16:53 -05:00
.clang-format cxl for 5.12 2021-02-24 09:38:36 -08:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-04-25 05:17:02 +09:00
.mailmap It's been a relatively busy cycle in docsland, though more than usually 2021-04-26 13:22:43 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS - Core Frameworks 2021-04-28 15:59:13 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Kbuild updates for v5.13 2021-04-29 14:24:39 -07:00
Makefile Kconfig updates for v5.13 2021-04-29 14:32:00 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.