mainlining shenanigans
Go to file
Kan Liang 8d97e71811 perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
Current perf can report both virtual addresses and physical addresses,
but not the MMU page size. Without the MMU page size information of the
utilized page, users cannot decide whether to promote/demote large pages
to optimize memory usage.

Add a new sample type for the data MMU page size.

Current perf already has a facility to collect data virtual addresses.
A page walker is required to walk the pages tables and calculate the
MMU page size from a given virtual address.

On some platforms, e.g., X86, the page walker is invoked in an NMI
handler. So the page walker must be NMI-safe and low overhead. Besides,
the page walker should work for both user and kernel virtual address.
The existing generic page walker, e.g., walk_page_range_novma(), is a
little bit complex and doesn't guarantee the NMI-safe. The follow_page()
is only for user-virtual address.

Add a new function perf_get_page_size() to walk the page tables and
calculate the MMU page size. In the function:
- Interrupts have to be disabled to prevent any teardown of the page
  tables.
- For user space threads, the current->mm is used for the page walker.
  For kernel threads and the like, the current->mm is NULL. The init_mm
  is used for the page walker. The active_mm is not used here, because
  it can be NULL.
  Quote from Peter Zijlstra,
  "context_switch() can set prev->active_mm to NULL when it transfers it
   to @next. It does this before @current is updated. So an NMI that
   comes in between this active_mm swizzling and updating @current will
   see !active_mm."
- The MMU page size is calculated from the page table level.

The method should work for all architectures, but it has only been
verified on X86. Should there be some architectures, which support perf,
where the method doesn't work, it can be fixed later separately.
Reporting the wrong page size would not be fatal for the architecture.

Some under discussion features may impact the method in the future.
Quote from Dave Hansen,
  "There are lots of weird things folks are trying to do with the page
   tables, like Address Space Isolation.  For instance, if you get a
   perf NMI when running userspace, current->mm->pgd is *different* than
   the PGD that was in use when userspace was running. It's close enough
   today, but it might not stay that way."
If the case happens later, lots of consecutive page walk errors will
happen. The worst case is that lots of page-size '0' are returned, which
would not be fatal.
In the perf tool, a check is implemented to detect this case. Once it
happens, a kernel patch could be implemented accordingly then.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201001135749.2804-2-kan.liang@linux.intel.com
2020-10-29 11:00:38 +01:00
arch perf/x86: Fix n_metric for cancelled txn 2020-10-06 15:18:17 +02:00
block block-5.9-2020-08-14 2020-08-15 20:36:42 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-08-14 13:09:15 -07:00
Documentation Misc fixes and small updates all around the place: 2020-08-15 10:38:03 -07:00
drivers block-5.9-2020-08-14 2020-08-15 20:36:42 -07:00
fs io_uring-5.9-2020-08-15 2020-08-16 10:55:12 -07:00
include perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE 2020-10-29 11:00:38 +01:00
init OpenRISC updates for 5.9 2020-08-14 14:04:53 -07:00
ipc ipc/shm.c: remove the superfluous break 2020-08-12 10:58:02 -07:00
kernel perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE 2020-10-29 11:00:38 +01:00
lib iomap: constify ioreadX() iomem argument (as in generic implementation) 2020-08-14 19:56:57 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm Merge branch 'akpm' (patches from Andrew) 2020-08-15 08:02:03 -07:00
net 9p pull request for inclusion in 5.9 2020-08-15 08:34:36 -07:00
samples Kbuild updates for v5.9 2020-08-09 14:10:26 -07:00
scripts Kconfig updates for v5.9 2020-08-14 11:04:45 -07:00
security Merge branch 'akpm' (patches from Andrew) 2020-08-12 11:24:12 -07:00
sound sound fixes for 5.9-rc1 2020-08-14 15:58:57 -07:00
tools Cleanup, SECCOMP_FILTER support, message printing fixes, and other 2020-08-15 18:50:32 -07:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt Merge branch 'akpm' (patches from Andrew) 2020-08-12 11:24:12 -07:00
.clang-format block: add bio_for_each_bvec_all() 2020-05-25 11:25:24 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: Add ZSTD-compressed files 2020-07-31 11:50:49 +02:00
.mailmap mailmap: add entry for Greg Kurz 2020-08-14 19:56:56 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS CREDITS: Replace HTTP links with HTTPS ones 2020-07-23 14:53:58 -06:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS perf tools changes for v5.9: 2nd batch 2020-08-15 11:17:15 -07:00
Makefile Linux 5.9-rc1 2020-08-16 13:04:57 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.