mirror of
https://github.com/torvalds/linux.git
synced 2024-11-21 19:41:42 +00:00
It has been a moderately calm cycle for documentation; the significant
changes include: - Some significant additions to the memory-management documentation - Some improvements to navigation in the HTML-rendered docs - More Spanish and Chinese translations ...and the usual set of typo fixes and such. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmPzkQUPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YC0QH/09u10xV3N+RuveNE/tArVxKcQi7JZd/xugQ toSXygh64WY10lzwi7Ms1bHZzpPYB0fOrqTGNqNQuhrVTjQzaZB0BBJqm8lwt2w/ S/Z5wj+IicJTmQ7+0C2Hc/dcK5SCPfY3CgwqOUVdr3dEm1oU+4QaBy31fuIJJ0Hx NdbXBco8BZqJX9P67jwp9vbrFrSGBjPI0U4HNHVjrWlcBy8JT0aAnf0fyWFy3orA T86EzmEw8drA1mXsHa5pmVwuHDx2X+D+eRurG9llCBrlIG9EDSmnalY4BeGqR4LS oDrEH6M91I5+9iWoJ0rBheD8rPclXO2HpjXLApXzTjrORgEYZsM= =MCdX -----END PGP SIGNATURE----- Merge tag 'docs-6.3' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "It has been a moderately calm cycle for documentation; the significant changes include: - Some significant additions to the memory-management documentation - Some improvements to navigation in the HTML-rendered docs - More Spanish and Chinese translations ... and the usual set of typo fixes and such" * tag 'docs-6.3' of git://git.lwn.net/linux: (68 commits) Documentation/watchdog/hpwdt: Fix Format Documentation/watchdog/hpwdt: Fix Reference Documentation: core-api: padata: correct spelling docs/mm: Physical Memory: correct spelling in reference to CONFIG_PAGE_EXTENSION docs: Use HTML comments for the kernel-toc SPDX line docs: Add more information to the HTML sidebar Documentation: KVM: Update AMD memory encryption link printk: Document that CONFIG_BOOT_PRINTK_DELAY required for boot_delay= Documentation: userspace-api: correct spelling Documentation: sparc: correct spelling Documentation: driver-api: correct spelling Documentation: admin-guide: correct spelling docs: add workload-tracing document to admin-guide docs/admin-guide/mm: remove useless markup docs/mm: remove useless markup docs/mm: Physical Memory: remove useless markup docs/sp_SP: Add process magic-number translation docs: ftrace: always use canonical ftrace path Doc/damon: fix the data path error dma-buf: Add "dma-buf" to title of documentation ...
This commit is contained in:
commit
70756b49be
@ -1,6 +1,9 @@
|
||||
if COMPILE_TEST
|
||||
|
||||
menu "Documentation"
|
||||
|
||||
config WARN_MISSING_DOCUMENTS
|
||||
bool "Warn if there's a missing documentation file"
|
||||
depends on COMPILE_TEST
|
||||
help
|
||||
It is not uncommon that a document gets renamed.
|
||||
This option makes the Kernel to check for missing dependencies,
|
||||
@ -11,7 +14,6 @@ config WARN_MISSING_DOCUMENTS
|
||||
|
||||
config WARN_ABI_ERRORS
|
||||
bool "Warn if there are errors at ABI files"
|
||||
depends on COMPILE_TEST
|
||||
help
|
||||
The files under Documentation/ABI should follow what's
|
||||
described at Documentation/ABI/README. Yet, as they're manually
|
||||
@ -20,3 +22,7 @@ config WARN_ABI_ERRORS
|
||||
scripts/get_abi.pl. Add a check to verify them.
|
||||
|
||||
If unsure, select 'N'.
|
||||
|
||||
endmenu
|
||||
|
||||
endif
|
||||
|
@ -1,8 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
Linux PCI Bus Subsystem
|
||||
=======================
|
||||
=================
|
||||
PCI Bus Subsystem
|
||||
=================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
@ -69,7 +69,7 @@ The accelerator devices will be exposed to the user space with the dedicated
|
||||
|
||||
- device char files - /dev/accel/accel*
|
||||
- sysfs - /sys/class/accel/accel*/
|
||||
- debugfs - /sys/kernel/debug/accel/accel*/
|
||||
- debugfs - /sys/kernel/debug/accel/*/
|
||||
|
||||
Getting Started
|
||||
===============
|
||||
|
@ -204,7 +204,7 @@ For example::
|
||||
This should present your unmodified backing device data in /dev/loop0
|
||||
|
||||
If your cache is in writethrough mode, then you can safely discard the
|
||||
cache device without loosing data.
|
||||
cache device without losing data.
|
||||
|
||||
|
||||
E) Wiping a cache device
|
||||
|
@ -106,7 +106,7 @@ Proportional weight policy files
|
||||
see Documentation/block/bfq-iosched.rst.
|
||||
|
||||
blkio.bfq.weight_device
|
||||
Specifes per cgroup per device weights, overriding the default group
|
||||
Specifies per cgroup per device weights, overriding the default group
|
||||
weight. For more details, see Documentation/block/bfq-iosched.rst.
|
||||
|
||||
Following is the format::
|
||||
|
@ -2289,7 +2289,7 @@ Cpuset Interface Files
|
||||
For a valid partition root with the sibling cpu exclusivity
|
||||
rule enabled, changes made to "cpuset.cpus" that violate the
|
||||
exclusivity rule will invalidate the partition as well as its
|
||||
sibiling partitions with conflicting cpuset.cpus values. So
|
||||
sibling partitions with conflicting cpuset.cpus values. So
|
||||
care must be taking in changing "cpuset.cpus".
|
||||
|
||||
A valid non-root parent partition may distribute out all its CPUs
|
||||
|
@ -399,7 +399,7 @@ A partial list of the supported mount options follows:
|
||||
sep
|
||||
if first mount option (after the -o), overrides
|
||||
the comma as the separator between the mount
|
||||
parms. e.g.::
|
||||
parameters. e.g.::
|
||||
|
||||
-o user=myname,password=mypassword,domain=mydom
|
||||
|
||||
@ -765,7 +765,7 @@ cifsFYI If set to non-zero value, additional debug information
|
||||
Some debugging statements are not compiled into the
|
||||
cifs kernel unless CONFIG_CIFS_DEBUG2 is enabled in the
|
||||
kernel configuration. cifsFYI may be set to one or
|
||||
nore of the following flags (7 sets them all)::
|
||||
more of the following flags (7 sets them all)::
|
||||
|
||||
+-----------------------------------------------+------+
|
||||
| log cifs informational messages | 0x01 |
|
||||
|
@ -70,7 +70,7 @@ the entries (each hotspot block covers a larger area than a single
|
||||
cache block).
|
||||
|
||||
All this means smq uses ~25bytes per cache block. Still a lot of
|
||||
memory, but a substantial improvement nontheless.
|
||||
memory, but a substantial improvement nonetheless.
|
||||
|
||||
Level balancing
|
||||
^^^^^^^^^^^^^^^
|
||||
|
@ -31,7 +31,7 @@ Mandatory parameters:
|
||||
|
||||
Optional parameter:
|
||||
|
||||
<underyling sectors>:
|
||||
<underlying sectors>:
|
||||
Number of sectors defining the logical block size of <dev path>.
|
||||
2^N supported, e.g. 8 = emulate 8 sectors of 512 bytes = 4KiB.
|
||||
If not provided, the logical block size of <dev path> will be used.
|
||||
|
@ -46,7 +46,7 @@ just like conventional zones.
|
||||
The zones of the device(s) are separated into 2 types:
|
||||
|
||||
1) Metadata zones: these are conventional zones used to store metadata.
|
||||
Metadata zones are not reported as useable capacity to the user.
|
||||
Metadata zones are not reported as usable capacity to the user.
|
||||
|
||||
2) Data zones: all remaining zones, the vast majority of which will be
|
||||
sequential zones used exclusively to store user data. The conventional
|
||||
|
@ -111,7 +111,7 @@ Example dmsetup usage
|
||||
=====================
|
||||
|
||||
unstriped on top of Intel NVMe device that has 2 cores
|
||||
-----------------------------------------------------
|
||||
------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
@ -125,7 +125,7 @@ respectively::
|
||||
/dev/mapper/nvmset1
|
||||
|
||||
unstriped on top of striped with 4 drives using 128K chunk size
|
||||
--------------------------------------------------------------
|
||||
---------------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
|
@ -330,7 +330,7 @@ Examples
|
||||
|
||||
// boot-args example, with newlines and comments for readability
|
||||
Kernel command line: ...
|
||||
// see whats going on in dyndbg=value processing
|
||||
// see what's going on in dyndbg=value processing
|
||||
dynamic_debug.verbose=3
|
||||
// enable pr_debugs in the btrfs module (can be builtin or loadable)
|
||||
btrfs.dyndbg="+p"
|
||||
|
@ -123,7 +123,7 @@ Each simulated GPIO chip creates a separate sysfs group under its device
|
||||
directory for each exposed line
|
||||
(e.g. ``/sys/devices/platform/gpio-sim.X/gpiochipY/``). The name of each group
|
||||
is of the form: ``'sim_gpioX'`` where X is the offset of the line. Inside each
|
||||
group there are two attibutes:
|
||||
group there are two attributes:
|
||||
|
||||
``pull`` - allows to read and set the current simulated pull setting for
|
||||
every line, when writing the value must be one of: ``'pull-up'``,
|
||||
|
@ -64,8 +64,8 @@ architecture section: :ref:`Documentation/x86/mds.rst <mds>`.
|
||||
Attack scenarios
|
||||
----------------
|
||||
|
||||
Attacks against the MDS vulnerabilities can be mounted from malicious non
|
||||
priviledged user space applications running on hosts or guest. Malicious
|
||||
Attacks against the MDS vulnerabilities can be mounted from malicious non-
|
||||
privileged user space applications running on hosts or guest. Malicious
|
||||
guest OSes can obviously mount attacks as well.
|
||||
|
||||
Contrary to other speculation based vulnerabilities the MDS vulnerability
|
||||
|
@ -56,6 +56,17 @@ ABI will be found here.
|
||||
|
||||
sysfs-rules
|
||||
|
||||
This is the beginning of a section with information of interest to
|
||||
application developers and system integrators doing analysis of the
|
||||
Linux kernel for safety critical applications. Documents supporting
|
||||
analysis of kernel interactions with applications, and key kernel
|
||||
subsystems expectations will be found here.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
workload-tracing
|
||||
|
||||
The rest of this manual consists of various unordered guides on how to
|
||||
configure specific aspects of kernel behavior to your liking.
|
||||
|
||||
|
@ -378,18 +378,16 @@
|
||||
autoconf= [IPV6]
|
||||
See Documentation/networking/ipv6.rst.
|
||||
|
||||
show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
|
||||
Limit apic dumping. The parameter defines the maximal
|
||||
number of local apics being dumped. Also it is possible
|
||||
to set it to "all" by meaning -- no limit here.
|
||||
Format: { 1 (default) | 2 | ... | all }.
|
||||
The parameter valid if only apic=debug or
|
||||
apic=verbose is specified.
|
||||
Example: apic=debug show_lapic=all
|
||||
|
||||
apm= [APM] Advanced Power Management
|
||||
See header of arch/x86/kernel/apm_32.c.
|
||||
|
||||
apparmor= [APPARMOR] Disable or enable AppArmor at boot time
|
||||
Format: { "0" | "1" }
|
||||
See security/apparmor/Kconfig help text
|
||||
0 -- disable.
|
||||
1 -- enable.
|
||||
Default value is set via kernel config option.
|
||||
|
||||
arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
|
||||
Format: <io>,<irq>,<nodeID>
|
||||
|
||||
@ -480,8 +478,10 @@
|
||||
See Documentation/block/cmdline-partition.rst
|
||||
|
||||
boot_delay= Milliseconds to delay each printk during boot.
|
||||
Values larger than 10 seconds (10000) are changed to
|
||||
no delay (0).
|
||||
Only works if CONFIG_BOOT_PRINTK_DELAY is enabled,
|
||||
and you may also have to specify "lpj=". Boot_delay
|
||||
values larger than 10 seconds (10000) are assumed
|
||||
erroneous and ignored.
|
||||
Format: integer
|
||||
|
||||
bootconfig [KNL]
|
||||
@ -673,7 +673,7 @@
|
||||
Sets the size of kernel per-numa memory area for
|
||||
contiguous memory allocations. A value of 0 disables
|
||||
per-numa CMA altogether. And If this option is not
|
||||
specificed, the default value is 0.
|
||||
specified, the default value is 0.
|
||||
With per-numa CMA enabled, DMA users on node nid will
|
||||
first try to allocate buffer from the pernuma area
|
||||
which is located in node nid, if the allocation fails,
|
||||
@ -945,7 +945,7 @@
|
||||
driver code when a CPU writes to (or reads from) a
|
||||
random memory location. Note that there exists a class
|
||||
of memory corruptions problems caused by buggy H/W or
|
||||
F/W or by drivers badly programing DMA (basically when
|
||||
F/W or by drivers badly programming DMA (basically when
|
||||
memory is written at bus level and the CPU MMU is
|
||||
bypassed) which are not detectable by
|
||||
CONFIG_DEBUG_PAGEALLOC, hence this option will not help
|
||||
@ -1046,26 +1046,12 @@
|
||||
can be useful when debugging issues that require an SLB
|
||||
miss to occur.
|
||||
|
||||
stress_slb [PPC]
|
||||
Limits the number of kernel SLB entries, and flushes
|
||||
them frequently to increase the rate of SLB faults
|
||||
on kernel addresses.
|
||||
|
||||
stress_hpt [PPC]
|
||||
Limits the number of kernel HPT entries in the hash
|
||||
page table to increase the rate of hash page table
|
||||
faults on kernel addresses.
|
||||
|
||||
disable= [IPV6]
|
||||
See Documentation/networking/ipv6.rst.
|
||||
|
||||
disable_radix [PPC]
|
||||
Disable RADIX MMU mode on POWER9
|
||||
|
||||
radix_hcall_invalidate=on [PPC/PSERIES]
|
||||
Disable RADIX GTSE feature and use hcall for TLB
|
||||
invalidate.
|
||||
|
||||
disable_tlbie [PPC]
|
||||
Disable TLBIE instruction. Currently does not work
|
||||
with KVM, with HASH MMU, or with coherent accelerators.
|
||||
@ -1167,16 +1153,6 @@
|
||||
Documentation/admin-guide/dynamic-debug-howto.rst
|
||||
for details.
|
||||
|
||||
nopku [X86] Disable Memory Protection Keys CPU feature found
|
||||
in some Intel CPUs.
|
||||
|
||||
<module>.async_probe[=<bool>] [KNL]
|
||||
If no <bool> value is specified or if the value
|
||||
specified is not a valid <bool>, enable asynchronous
|
||||
probe on this module. Otherwise, enable/disable
|
||||
asynchronous probe on this module as indicated by the
|
||||
<bool> value. See also: module.async_probe
|
||||
|
||||
early_ioremap_debug [KNL]
|
||||
Enable debug messages in early_ioremap support. This
|
||||
is useful for tracking down temporary early mappings
|
||||
@ -1753,7 +1729,7 @@
|
||||
boot-time allocation of gigantic hugepages is skipped.
|
||||
|
||||
hugetlb_free_vmemmap=
|
||||
[KNL] Reguires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
|
||||
[KNL] Requires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
|
||||
enabled.
|
||||
Control if HugeTLB Vmemmap Optimization (HVO) is enabled.
|
||||
Allows heavy hugetlb users to free up some more
|
||||
@ -1792,12 +1768,6 @@
|
||||
which allow the hypervisor to 'idle' the
|
||||
guest on lock contention.
|
||||
|
||||
keep_bootcon [KNL]
|
||||
Do not unregister boot console at start. This is only
|
||||
useful for debugging when something happens in the window
|
||||
between unregistering the boot console and initializing
|
||||
the real console.
|
||||
|
||||
i2c_bus= [HW] Override the default board specific I2C bus speed
|
||||
or register an additional I2C bus that is not
|
||||
registered from board initialization code.
|
||||
@ -2367,17 +2337,18 @@
|
||||
js= [HW,JOY] Analog joystick
|
||||
See Documentation/input/joydev/joystick.rst.
|
||||
|
||||
nokaslr [KNL]
|
||||
When CONFIG_RANDOMIZE_BASE is set, this disables
|
||||
kernel and module base offset ASLR (Address Space
|
||||
Layout Randomization).
|
||||
|
||||
kasan_multi_shot
|
||||
[KNL] Enforce KASAN (Kernel Address Sanitizer) to print
|
||||
report on every invalid memory access. Without this
|
||||
parameter KASAN will print report only for the first
|
||||
invalid access.
|
||||
|
||||
keep_bootcon [KNL]
|
||||
Do not unregister boot console at start. This is only
|
||||
useful for debugging when something happens in the window
|
||||
between unregistering the boot console and initializing
|
||||
the real console.
|
||||
|
||||
keepinitrd [HW,ARM]
|
||||
|
||||
kernelcore= [KNL,X86,IA-64,PPC]
|
||||
@ -3326,6 +3297,13 @@
|
||||
For details see:
|
||||
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
|
||||
|
||||
<module>.async_probe[=<bool>] [KNL]
|
||||
If no <bool> value is specified or if the value
|
||||
specified is not a valid <bool>, enable asynchronous
|
||||
probe on this module. Otherwise, enable/disable
|
||||
asynchronous probe on this module as indicated by the
|
||||
<bool> value. See also: module.async_probe
|
||||
|
||||
module.async_probe=<bool>
|
||||
[KNL] When set to true, modules will use async probing
|
||||
by default. To enable/disable async probing for a
|
||||
@ -3709,7 +3687,7 @@
|
||||
implementation; requires CONFIG_GENERIC_IDLE_POLL_SETUP
|
||||
to be effective. This is useful on platforms where the
|
||||
sleep(SH) or wfi(ARM,ARM64) instructions do not work
|
||||
correctly or when doing power measurements to evalute
|
||||
correctly or when doing power measurements to evaluate
|
||||
the impact of the sleep instructions. This is also
|
||||
useful when using JTAG debugger.
|
||||
|
||||
@ -3780,6 +3758,11 @@
|
||||
|
||||
nojitter [IA-64] Disables jitter checking for ITC timers.
|
||||
|
||||
nokaslr [KNL]
|
||||
When CONFIG_RANDOMIZE_BASE is set, this disables
|
||||
kernel and module base offset ASLR (Address Space
|
||||
Layout Randomization).
|
||||
|
||||
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
|
||||
|
||||
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
|
||||
@ -3825,6 +3808,19 @@
|
||||
|
||||
nopcid [X86-64] Disable the PCID cpu feature.
|
||||
|
||||
nopku [X86] Disable Memory Protection Keys CPU feature found
|
||||
in some Intel CPUs.
|
||||
|
||||
nopv= [X86,XEN,KVM,HYPER_V,VMWARE]
|
||||
Disables the PV optimizations forcing the guest to run
|
||||
as generic guest with no PV drivers. Currently support
|
||||
XEN HVM, KVM, HYPER_V and VMWARE guest.
|
||||
|
||||
nopvspin [X86,XEN,KVM]
|
||||
Disables the qspinlock slow path using PV optimizations
|
||||
which allow the hypervisor to 'idle' the guest on lock
|
||||
contention.
|
||||
|
||||
norandmaps Don't use address space randomization. Equivalent to
|
||||
echo 0 > /proc/sys/kernel/randomize_va_space
|
||||
|
||||
@ -4592,6 +4588,10 @@
|
||||
|
||||
r128= [HW,DRM]
|
||||
|
||||
radix_hcall_invalidate=on [PPC/PSERIES]
|
||||
Disable RADIX GTSE feature and use hcall for TLB
|
||||
invalidate.
|
||||
|
||||
raid= [HW,RAID]
|
||||
See Documentation/admin-guide/md.rst.
|
||||
|
||||
@ -5584,13 +5584,6 @@
|
||||
1 -- enable.
|
||||
Default value is 1.
|
||||
|
||||
apparmor= [APPARMOR] Disable or enable AppArmor at boot time
|
||||
Format: { "0" | "1" }
|
||||
See security/apparmor/Kconfig help text
|
||||
0 -- disable.
|
||||
1 -- enable.
|
||||
Default value is set via kernel config option.
|
||||
|
||||
serialnumber [BUGS=X86-32]
|
||||
|
||||
sev=option[,option...] [X86-64] See Documentation/x86/x86_64/boot-options.rst
|
||||
@ -5598,6 +5591,15 @@
|
||||
shapers= [NET]
|
||||
Maximal number of shapers.
|
||||
|
||||
show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
|
||||
Limit apic dumping. The parameter defines the maximal
|
||||
number of local apics being dumped. Also it is possible
|
||||
to set it to "all" by meaning -- no limit here.
|
||||
Format: { 1 (default) | 2 | ... | all }.
|
||||
The parameter valid if only apic=debug or
|
||||
apic=verbose is specified.
|
||||
Example: apic=debug show_lapic=all
|
||||
|
||||
simeth= [IA-64]
|
||||
simscsi=
|
||||
|
||||
@ -6037,6 +6039,16 @@
|
||||
be used to filter out binaries which have
|
||||
not yet been made aware of AT_MINSIGSTKSZ.
|
||||
|
||||
stress_hpt [PPC]
|
||||
Limits the number of kernel HPT entries in the hash
|
||||
page table to increase the rate of hash page table
|
||||
faults on kernel addresses.
|
||||
|
||||
stress_slb [PPC]
|
||||
Limits the number of kernel SLB entries, and flushes
|
||||
them frequently to increase the rate of SLB faults
|
||||
on kernel addresses.
|
||||
|
||||
sunrpc.min_resvport=
|
||||
sunrpc.max_resvport=
|
||||
[NFS,SUNRPC]
|
||||
@ -6290,7 +6302,7 @@
|
||||
that can be enabled or disabled just as if you were
|
||||
to echo the option name into
|
||||
|
||||
/sys/kernel/debug/tracing/trace_options
|
||||
/sys/kernel/tracing/trace_options
|
||||
|
||||
For example, to enable stacktrace option (to dump the
|
||||
stack trace of each event), add to the command line:
|
||||
@ -6323,7 +6335,7 @@
|
||||
[FTRACE] enable this option to disable tracing when a
|
||||
warning is hit. This turns off "tracing_on". Tracing can
|
||||
be enabled again by echoing '1' into the "tracing_on"
|
||||
file located in /sys/kernel/debug/tracing/
|
||||
file located in /sys/kernel/tracing/
|
||||
|
||||
This option is useful, as it disables the trace before
|
||||
the WARNING dump is called, which prevents the trace to
|
||||
@ -6778,11 +6790,11 @@
|
||||
functions are at fixed addresses, they make nice
|
||||
targets for exploits that can control RIP.
|
||||
|
||||
emulate [default] Vsyscalls turn into traps and are
|
||||
emulated reasonably safely. The vsyscall
|
||||
page is readable.
|
||||
emulate Vsyscalls turn into traps and are emulated
|
||||
reasonably safely. The vsyscall page is
|
||||
readable.
|
||||
|
||||
xonly Vsyscalls turn into traps and are
|
||||
xonly [default] Vsyscalls turn into traps and are
|
||||
emulated reasonably safely. The vsyscall
|
||||
page is not readable.
|
||||
|
||||
@ -6979,16 +6991,6 @@
|
||||
fairer and the number of possible event channels is
|
||||
much higher. Default is on (use fifo events).
|
||||
|
||||
nopv= [X86,XEN,KVM,HYPER_V,VMWARE]
|
||||
Disables the PV optimizations forcing the guest to run
|
||||
as generic guest with no PV drivers. Currently support
|
||||
XEN HVM, KVM, HYPER_V and VMWARE guest.
|
||||
|
||||
nopvspin [X86,XEN,KVM]
|
||||
Disables the qspinlock slow path using PV optimizations
|
||||
which allow the hypervisor to 'idle' the guest on lock
|
||||
contention.
|
||||
|
||||
xirc2ps_cs= [NET,PCMCIA]
|
||||
Format:
|
||||
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
|
||||
|
@ -25,7 +25,7 @@ References
|
||||
|
||||
- In order to locate kernel-generated OS jitter on CPU N:
|
||||
|
||||
cd /sys/kernel/debug/tracing
|
||||
cd /sys/kernel/tracing
|
||||
echo 1 > max_graph_depth # Increase the "1" for more detail
|
||||
echo function_graph > current_tracer
|
||||
# run workload
|
||||
|
@ -1488,7 +1488,7 @@ Example of command to set keyboard language is mentioned below::
|
||||
Text corresponding to keyboard layout to be set in sysfs are: be(Belgian),
|
||||
cz(Czech), da(Danish), de(German), en(English), es(Spain), et(Estonian),
|
||||
fr(French), fr-ch(French(Switzerland)), hu(Hungarian), it(Italy), jp (Japan),
|
||||
nl(Dutch), nn(Norway), pl(Polish), pt(portugese), sl(Slovenian), sv(Sweden),
|
||||
nl(Dutch), nn(Norway), pl(Polish), pt(portuguese), sl(Slovenian), sv(Sweden),
|
||||
tr(Turkey)
|
||||
|
||||
WWAN Antenna type
|
||||
|
@ -317,7 +317,7 @@ All md devices contain:
|
||||
suspended (not supported yet)
|
||||
All IO requests will block. The array can be reconfigured.
|
||||
|
||||
Writing this, if accepted, will block until array is quiessent
|
||||
Writing this, if accepted, will block until array is quiescent
|
||||
|
||||
readonly
|
||||
no resync can happen. no superblocks get written.
|
||||
|
@ -909,7 +909,7 @@ DE hat diverse Treiber fuer diese Modelle (Stand 09/2002):
|
||||
- TVPhone98 (Bt878)
|
||||
- AVerTV und TVCapture98 w/VCR (Bt 878)
|
||||
- AVerTVStudio und TVPhone98 w/VCR (Bt878)
|
||||
- AVerTV GO Serie (Kein SVideo Input)
|
||||
- AVerTV GO Series (Kein SVideo Input)
|
||||
- AVerTV98 (BT-878 chip)
|
||||
- AVerTV98 mit Fernbedienung (BT-878 chip)
|
||||
- AVerTV/FM98 (BT-878 chip)
|
||||
|
@ -137,7 +137,7 @@ The ``LIRC user interface`` option adds enhanced functionality when using the
|
||||
from remote controllers.
|
||||
|
||||
The ``Support for eBPF programs attached to lirc devices`` option allows
|
||||
the usage of special programs (called eBPF) that would allow aplications
|
||||
the usage of special programs (called eBPF) that would allow applications
|
||||
to add extra remote controller decoding functionality to the Linux Kernel.
|
||||
|
||||
The ``Remote controller decoders`` option allows selecting the
|
||||
|
@ -142,7 +142,7 @@ The drivers exposes following files:
|
||||
indicator
|
||||
0x18 lassi Signed Low side adjacent Channel
|
||||
Strength indicator
|
||||
0x19 hassi ditto fpr High side
|
||||
0x19 hassi ditto for High side
|
||||
0x20 mult Multipath indicator
|
||||
0x21 dev Frequency deviation
|
||||
0x24 assi Adjacent channel SSI
|
||||
|
@ -580,7 +580,7 @@ Metadata Capture
|
||||
----------------
|
||||
|
||||
The Metadata capture generates UVC format metadata. The PTS and SCR are
|
||||
transmitted based on the values set in vivid contols.
|
||||
transmitted based on the values set in vivid controls.
|
||||
|
||||
The Metadata device will only work for the Webcam input, it will give
|
||||
back an error for all other inputs.
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _mm_concepts:
|
||||
|
||||
=================
|
||||
Concepts overview
|
||||
=================
|
||||
@ -86,16 +84,15 @@ memory with the huge pages. The first one is `HugeTLB filesystem`, or
|
||||
hugetlbfs. It is a pseudo filesystem that uses RAM as its backing
|
||||
store. For the files created in this filesystem the data resides in
|
||||
the memory and mapped using huge pages. The hugetlbfs is described at
|
||||
:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`.
|
||||
Documentation/admin-guide/mm/hugetlbpage.rst.
|
||||
|
||||
Another, more recent, mechanism that enables use of the huge pages is
|
||||
called `Transparent HugePages`, or THP. Unlike the hugetlbfs that
|
||||
requires users and/or system administrators to configure what parts of
|
||||
the system memory should and can be mapped by the huge pages, THP
|
||||
manages such mappings transparently to the user and hence the
|
||||
name. See
|
||||
:ref:`Documentation/admin-guide/mm/transhuge.rst <admin_guide_transhuge>`
|
||||
for more details about THP.
|
||||
name. See Documentation/admin-guide/mm/transhuge.rst for more details
|
||||
about THP.
|
||||
|
||||
Zones
|
||||
=====
|
||||
@ -125,8 +122,8 @@ processor. Each bank is referred to as a `node` and for each node Linux
|
||||
constructs an independent memory management subsystem. A node has its
|
||||
own set of zones, lists of free and used pages and various statistics
|
||||
counters. You can find more details about NUMA in
|
||||
:ref:`Documentation/mm/numa.rst <numa>` and in
|
||||
:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`.
|
||||
Documentation/mm/numa.rst` and in
|
||||
Documentation/admin-guide/mm/numa_memory_policy.rst.
|
||||
|
||||
Page cache
|
||||
==========
|
||||
|
@ -54,7 +54,7 @@ that is built with ``CONFIG_DAMON_LRU_SORT=y``.
|
||||
To let sysadmins enable or disable it and tune for the given system,
|
||||
DAMON_LRU_SORT utilizes module parameters. That is, you can put
|
||||
``damon_lru_sort.<parameter>=<value>`` on the kernel boot command line or write
|
||||
proper values to ``/sys/modules/damon_lru_sort/parameters/<parameter>`` files.
|
||||
proper values to ``/sys/module/damon_lru_sort/parameters/<parameter>`` files.
|
||||
|
||||
Below are the description of each parameter.
|
||||
|
||||
@ -283,7 +283,7 @@ doesn't make progress and therefore the free memory rate becomes lower than
|
||||
20%, it asks DAMON_LRU_SORT to do nothing again, so that we can fall back to
|
||||
the LRU-list based page granularity reclamation. ::
|
||||
|
||||
# cd /sys/modules/damon_lru_sort/parameters
|
||||
# cd /sys/module/damon_lru_sort/parameters
|
||||
# echo 500 > hot_thres_access_freq
|
||||
# echo 120000000 > cold_min_age
|
||||
# echo 10 > quota_ms
|
||||
|
@ -46,7 +46,7 @@ that is built with ``CONFIG_DAMON_RECLAIM=y``.
|
||||
To let sysadmins enable or disable it and tune for the given system,
|
||||
DAMON_RECLAIM utilizes module parameters. That is, you can put
|
||||
``damon_reclaim.<parameter>=<value>`` on the kernel boot command line or write
|
||||
proper values to ``/sys/modules/damon_reclaim/parameters/<parameter>`` files.
|
||||
proper values to ``/sys/module/damon_reclaim/parameters/<parameter>`` files.
|
||||
|
||||
Below are the description of each parameter.
|
||||
|
||||
@ -251,7 +251,7 @@ therefore the free memory rate becomes lower than 20%, it asks DAMON_RECLAIM to
|
||||
do nothing again, so that we can fall back to the LRU-list based page
|
||||
granularity reclamation. ::
|
||||
|
||||
# cd /sys/modules/damon_reclaim/parameters
|
||||
# cd /sys/module/damon_reclaim/parameters
|
||||
# echo 30000000 > min_age
|
||||
# echo $((1 * 1024 * 1024 * 1024)) > quota_sz
|
||||
# echo 1000 > quota_reset_interval_ms
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _hugetlbpage:
|
||||
|
||||
=============
|
||||
HugeTLB Pages
|
||||
=============
|
||||
@ -86,7 +84,7 @@ by increasing or decreasing the value of ``nr_hugepages``.
|
||||
|
||||
Note: When the feature of freeing unused vmemmap pages associated with each
|
||||
hugetlb page is enabled, we can fail to free the huge pages triggered by
|
||||
the user when ths system is under memory pressure. Please try again later.
|
||||
the user when the system is under memory pressure. Please try again later.
|
||||
|
||||
Pages that are used as huge pages are reserved inside the kernel and cannot
|
||||
be used for other purposes. Huge pages cannot be swapped out under
|
||||
@ -313,7 +311,7 @@ memory policy mode--bind, preferred, local or interleave--may be used. The
|
||||
resulting effect on persistent huge page allocation is as follows:
|
||||
|
||||
#. Regardless of mempolicy mode [see
|
||||
:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`],
|
||||
Documentation/admin-guide/mm/numa_memory_policy.rst],
|
||||
persistent huge pages will be distributed across the node or nodes
|
||||
specified in the mempolicy as if "interleave" had been specified.
|
||||
However, if a node in the policy does not contain sufficient contiguous
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _idle_page_tracking:
|
||||
|
||||
==================
|
||||
Idle Page Tracking
|
||||
==================
|
||||
@ -70,9 +68,8 @@ If the tool is run initially with the appropriate option, it will mark all the
|
||||
queried pages as idle. Subsequent runs of the tool can then show which pages have
|
||||
their idle flag cleared in the interim.
|
||||
|
||||
See :ref:`Documentation/admin-guide/mm/pagemap.rst <pagemap>` for more
|
||||
information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and
|
||||
``/proc/kpagecgroup``.
|
||||
See Documentation/admin-guide/mm/pagemap.rst for more information about
|
||||
``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``.
|
||||
|
||||
.. _impl_details:
|
||||
|
||||
|
@ -16,8 +16,7 @@ are described in Documentation/admin-guide/sysctl/vm.rst and in `man 5 proc`_.
|
||||
.. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
|
||||
|
||||
Linux memory management has its own jargon and if you are not yet
|
||||
familiar with it, consider reading
|
||||
:ref:`Documentation/admin-guide/mm/concepts.rst <mm_concepts>`.
|
||||
familiar with it, consider reading Documentation/admin-guide/mm/concepts.rst.
|
||||
|
||||
Here we document in detail how to interact with various mechanisms in
|
||||
the Linux memory management.
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _admin_guide_ksm:
|
||||
|
||||
=======================
|
||||
Kernel Samepage Merging
|
||||
=======================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _admin_guide_memory_hotplug:
|
||||
|
||||
==================
|
||||
Memory Hot(Un)Plug
|
||||
==================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _numa_memory_policy:
|
||||
|
||||
==================
|
||||
NUMA Memory Policy
|
||||
==================
|
||||
@ -246,7 +244,7 @@ MPOL_INTERLEAVED
|
||||
interleaved system default policy works in this mode.
|
||||
|
||||
MPOL_PREFERRED_MANY
|
||||
This mode specifices that the allocation should be preferrably
|
||||
This mode specifies that the allocation should be preferably
|
||||
satisfied from the nodemask specified in the policy. If there is
|
||||
a memory pressure on all nodes in the nodemask, the allocation
|
||||
can fall back to all existing numa nodes. This is effectively
|
||||
@ -360,7 +358,7 @@ and NUMA nodes. "Usage" here means one of the following:
|
||||
2) examination of the policy to determine the policy mode and associated node
|
||||
or node lists, if any, for page allocation. This is considered a "hot
|
||||
path". Note that for MPOL_BIND, the "usage" extends across the entire
|
||||
allocation process, which may sleep during page reclaimation, because the
|
||||
allocation process, which may sleep during page reclamation, because the
|
||||
BIND policy nodemask is used, by reference, to filter ineligible nodes.
|
||||
|
||||
We can avoid taking an extra reference during the usages listed above as
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _numaperf:
|
||||
|
||||
=============
|
||||
NUMA Locality
|
||||
=============
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _pagemap:
|
||||
|
||||
=============================
|
||||
Examining Process Page Tables
|
||||
=============================
|
||||
@ -19,10 +17,10 @@ There are four components to pagemap:
|
||||
* Bits 0-4 swap type if swapped
|
||||
* Bits 5-54 swap offset if swapped
|
||||
* Bit 55 pte is soft-dirty (see
|
||||
:ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
|
||||
Documentation/admin-guide/mm/soft-dirty.rst)
|
||||
* Bit 56 page exclusively mapped (since 4.2)
|
||||
* Bit 57 pte is uffd-wp write-protected (since 5.13) (see
|
||||
:ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
|
||||
Documentation/admin-guide/mm/userfaultfd.rst)
|
||||
* Bits 58-60 zero
|
||||
* Bit 61 page is file-page or shared-anon (since 3.5)
|
||||
* Bit 62 page swapped
|
||||
@ -105,8 +103,7 @@ Short descriptions to the page flags
|
||||
A compound page with order N consists of 2^N physically contiguous pages.
|
||||
A compound page with order 2 takes the form of "HTTT", where H donates its
|
||||
head page and T donates its tail page(s). The major consumers of compound
|
||||
pages are hugeTLB pages
|
||||
(:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
|
||||
pages are hugeTLB pages (Documentation/admin-guide/mm/hugetlbpage.rst),
|
||||
the SLUB etc. memory allocators and various device drivers.
|
||||
However in this interface, only huge/giga pages are made visible
|
||||
to end users.
|
||||
@ -128,7 +125,7 @@ Short descriptions to the page flags
|
||||
Zero page for pfn_zero or huge_zero page.
|
||||
25 - IDLE
|
||||
The page has not been accessed since it was marked idle (see
|
||||
:ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
|
||||
Documentation/admin-guide/mm/idle_page_tracking.rst).
|
||||
Note that this flag may be stale in case the page was accessed via
|
||||
a PTE. To make sure the flag is up-to-date one has to read
|
||||
``/sys/kernel/mm/page_idle/bitmap`` first.
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _shrinker_debugfs:
|
||||
|
||||
==========================
|
||||
Shrinker Debugfs Interface
|
||||
==========================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _soft_dirty:
|
||||
|
||||
===============
|
||||
Soft-Dirty PTEs
|
||||
===============
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _swap_numa:
|
||||
|
||||
===========================================
|
||||
Automatically bind swap device to numa node
|
||||
===========================================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _admin_guide_transhuge:
|
||||
|
||||
============================
|
||||
Transparent Hugepage Support
|
||||
============================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _userfaultfd:
|
||||
|
||||
===========
|
||||
Userfaultfd
|
||||
===========
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _zswap:
|
||||
|
||||
=====
|
||||
zswap
|
||||
=====
|
||||
|
@ -53,7 +53,7 @@ two events have same value of bits 0~15 of config, that means they are
|
||||
event pair. And the bit 16 of config indicates getting counter 0 or
|
||||
counter 1 of hardware event.
|
||||
|
||||
After getting two values of event pair in usersapce, the formula of
|
||||
After getting two values of event pair in userspace, the formula of
|
||||
computation to calculate real performance data is:::
|
||||
|
||||
counter 0 / counter 1
|
||||
|
@ -473,7 +473,7 @@ Unit Tests for amd-pstate
|
||||
|
||||
* We can introduce more functional or performance tests to align the result together, it will benefit power and performance scale optimization.
|
||||
|
||||
1. Test case decriptions
|
||||
1. Test case descriptions
|
||||
|
||||
1). Basic tests
|
||||
|
||||
|
@ -712,7 +712,7 @@ it works in the `active mode <Active Mode_>`_.
|
||||
The following sequence of shell commands can be used to enable them and see
|
||||
their output (if the kernel is generally configured to support event tracing)::
|
||||
|
||||
# cd /sys/kernel/debug/tracing/
|
||||
# cd /sys/kernel/tracing/
|
||||
# echo 1 > events/power/pstate_sample/enable
|
||||
# echo 1 > events/power/cpu_frequency/enable
|
||||
# cat trace
|
||||
@ -732,7 +732,7 @@ The ``ftrace`` interface can be used for low-level diagnostics of
|
||||
P-state is called, the ``ftrace`` filter can be set to
|
||||
:c:func:`intel_pstate_set_pstate`::
|
||||
|
||||
# cd /sys/kernel/debug/tracing/
|
||||
# cd /sys/kernel/tracing/
|
||||
# cat available_filter_functions | grep -i pstate
|
||||
intel_pstate_set_pstate
|
||||
intel_pstate_cpu_init
|
||||
|
@ -1105,8 +1105,8 @@ speakup load
|
||||
Alternatively, you can add the above line to your file
|
||||
~/.bashrc or ~/.bash_profile.
|
||||
|
||||
If your system administrator ran himself the script, all the users will be able
|
||||
to change from English to the language choosed by root and do directly
|
||||
If your system administrator himself ran the script, all the users will be able
|
||||
to change from English to the language chosen by root and do directly
|
||||
speakupconf load (or add this to the ~/.bashrc or
|
||||
~/.bash_profile file). If there are several languages to handle, the
|
||||
administrator (or every user) will have to run the first steps until speakupconf
|
||||
|
@ -356,7 +356,7 @@ The lowmem_reserve_ratio is an array. You can see them by reading this file::
|
||||
|
||||
But, these values are not used directly. The kernel calculates # of protection
|
||||
pages for each zones from them. These are shown as array of protection pages
|
||||
in /proc/zoneinfo like followings. (This is an example of x86-64 box).
|
||||
in /proc/zoneinfo like the following. (This is an example of x86-64 box).
|
||||
Each zone has an array of protection pages like this::
|
||||
|
||||
Node 0, zone DMA
|
||||
@ -433,7 +433,7 @@ a 2bit error in a memory module) is detected in the background by hardware
|
||||
that cannot be handled by the kernel. In some cases (like the page
|
||||
still having a valid copy on disk) the kernel will handle the failure
|
||||
transparently without affecting any applications. But if there is
|
||||
no other uptodate copy of the data it will kill to prevent any data
|
||||
no other up-to-date copy of the data it will kill to prevent any data
|
||||
corruptions from propagating.
|
||||
|
||||
1: Kill all processes that have the corrupted and not reloadable page mapped
|
||||
|
@ -138,7 +138,7 @@ Command Function
|
||||
``v`` Forcefully restores framebuffer console
|
||||
``v`` Causes ETM buffer dump [ARM-specific]
|
||||
|
||||
``w`` Dumps tasks that are in uninterruptable (blocked) state.
|
||||
``w`` Dumps tasks that are in uninterruptible (blocked) state.
|
||||
|
||||
``x`` Used by xmon interface on ppc/powerpc platforms.
|
||||
Show global PMU Registers on sparc64.
|
||||
|
@ -87,7 +87,7 @@ migrated, unless the CPU is taken offline. In this case, threads
|
||||
belong to the offlined CPUs will be terminated immediately.
|
||||
|
||||
Running as SCHED_FIFO and relatively high priority, also allows such
|
||||
scheme to work for both preemptable and non-preemptable kernels.
|
||||
scheme to work for both preemptible and non-preemptible kernels.
|
||||
Alignment of idle time around jiffies ensures scalability for HZ
|
||||
values. This effect can be better visualized using a Perf timechart.
|
||||
The following diagram shows the behavior of kernel thread
|
||||
|
606
Documentation/admin-guide/workload-tracing.rst
Normal file
606
Documentation/admin-guide/workload-tracing.rst
Normal file
@ -0,0 +1,606 @@
|
||||
.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
|
||||
|
||||
======================================================
|
||||
Discovering Linux kernel subsystems used by a workload
|
||||
======================================================
|
||||
|
||||
:Authors: - Shuah Khan <skhan@linuxfoundation.org>
|
||||
- Shefali Sharma <sshefali021@gmail.com>
|
||||
:maintained-by: Shuah Khan <skhan@linuxfoundation.org>
|
||||
|
||||
Key Points
|
||||
==========
|
||||
|
||||
* Understanding system resources necessary to build and run a workload
|
||||
is important.
|
||||
* Linux tracing and strace can be used to discover the system resources
|
||||
in use by a workload. The completeness of the system usage information
|
||||
depends on the completeness of coverage of a workload.
|
||||
* Performance and security of the operating system can be analyzed with
|
||||
the help of tools such as:
|
||||
`perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_,
|
||||
`stress-ng <https://www.mankier.com/1/stress-ng>`_,
|
||||
`paxtest <https://github.com/opntr/paxtest-freebsd>`_.
|
||||
* Once we discover and understand the workload needs, we can focus on them
|
||||
to avoid regressions and use it to evaluate safety considerations.
|
||||
|
||||
Methodology
|
||||
===========
|
||||
|
||||
`strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ is a
|
||||
diagnostic, instructional, and debugging tool and can be used to discover
|
||||
the system resources in use by a workload. Once we discover and understand
|
||||
the workload needs, we can focus on them to avoid regressions and use it
|
||||
to evaluate safety considerations. We use strace tool to trace workloads.
|
||||
|
||||
This method of tracing using strace tells us the system calls invoked by
|
||||
the workload and doesn't include all the system calls that can be invoked
|
||||
by it. In addition, this tracing method tells us just the code paths within
|
||||
these system calls that are invoked. As an example, if a workload opens a
|
||||
file and reads from it successfully, then the success path is the one that
|
||||
is traced. Any error paths in that system call will not be traced. If there
|
||||
is a workload that provides full coverage of a workload then the method
|
||||
outlined here will trace and find all possible code paths. The completeness
|
||||
of the system usage information depends on the completeness of coverage of a
|
||||
workload.
|
||||
|
||||
The goal is tracing a workload on a system running a default kernel without
|
||||
requiring custom kernel installs.
|
||||
|
||||
How do we gather fine-grained system information?
|
||||
=================================================
|
||||
|
||||
strace tool can be used to trace system calls made by a process and signals
|
||||
it receives. System calls are the fundamental interface between an
|
||||
application and the operating system kernel. They enable a program to
|
||||
request services from the kernel. For instance, the open() system call in
|
||||
Linux is used to provide access to a file in the file system. strace enables
|
||||
us to track all the system calls made by an application. It lists all the
|
||||
system calls made by a process and their resulting output.
|
||||
|
||||
You can generate profiling data combining strace and perf record tools to
|
||||
record the events and information associated with a process. This provides
|
||||
insight into the process. "perf annotate" tool generates the statistics of
|
||||
each instruction of the program. This document goes over the details of how
|
||||
to gather fine-grained information on a workload's usage of system resources.
|
||||
|
||||
We used strace to trace the perf, stress-ng, paxtest workloads to illustrate
|
||||
our methodology to discover resources used by a workload. This process can
|
||||
be applied to trace other workloads.
|
||||
|
||||
Getting the system ready for tracing
|
||||
====================================
|
||||
|
||||
Before we can get started we will show you how to get your system ready.
|
||||
We assume that you have a Linux distribution running on a physical system
|
||||
or a virtual machine. Most distributions will include strace command. Let’s
|
||||
install other tools that aren’t usually included to build Linux kernel.
|
||||
Please note that the following works on Debian based distributions. You
|
||||
might have to find equivalent packages on other Linux distributions.
|
||||
|
||||
Install tools to build Linux kernel and tools in kernel repository.
|
||||
scripts/ver_linux is a good way to check if your system already has
|
||||
the necessary tools::
|
||||
|
||||
sudo apt-get build-essentials flex bison yacc
|
||||
sudo apt install libelf-dev systemtap-sdt-dev libaudit-dev libslang2-dev libperl-dev libdw-dev
|
||||
|
||||
cscope is a good tool to browse kernel sources. Let's install it now::
|
||||
|
||||
sudo apt-get install cscope
|
||||
|
||||
Install stress-ng and paxtest::
|
||||
|
||||
apt-get install stress-ng
|
||||
apt-get install paxtest
|
||||
|
||||
Workload overview
|
||||
=================
|
||||
|
||||
As mentioned earlier, we used strace to trace perf bench, stress-ng and
|
||||
paxtest workloads to show how to analyze a workload and identify Linux
|
||||
subsystems used by these workloads. Let's start with an overview of these
|
||||
three workloads to get a better understanding of what they do and how to
|
||||
use them.
|
||||
|
||||
perf bench (all) workload
|
||||
-------------------------
|
||||
|
||||
The perf bench command contains multiple multi-threaded microkernel
|
||||
benchmarks for executing different subsystems in the Linux kernel and
|
||||
system calls. This allows us to easily measure the impact of changes,
|
||||
which can help mitigate performance regressions. It also acts as a common
|
||||
benchmarking framework, enabling developers to easily create test cases,
|
||||
integrate transparently, and use performance-rich tooling subsystems.
|
||||
|
||||
Stress-ng netdev stressor workload
|
||||
----------------------------------
|
||||
|
||||
stress-ng is used for performing stress testing on the kernel. It allows
|
||||
you to exercise various physical subsystems of the computer, as well as
|
||||
interfaces of the OS kernel, using "stressor-s". They are available for
|
||||
CPU, CPU cache, devices, I/O, interrupts, file system, memory, network,
|
||||
operating system, pipelines, schedulers, and virtual machines. Please refer
|
||||
to the `stress-ng man-page <https://www.mankier.com/1/stress-ng>`_ to
|
||||
find the description of all the available stressor-s. The netdev stressor
|
||||
starts specified number (N) of workers that exercise various netdevice
|
||||
ioctl commands across all the available network devices.
|
||||
|
||||
paxtest kiddie workload
|
||||
-----------------------
|
||||
|
||||
paxtest is a program that tests buffer overflows in the kernel. It tests
|
||||
kernel enforcements over memory usage. Generally, execution in some memory
|
||||
segments makes buffer overflows possible. It runs a set of programs that
|
||||
attempt to subvert memory usage. It is used as a regression test suite for
|
||||
PaX, but might be useful to test other memory protection patches for the
|
||||
kernel. We used paxtest kiddie mode which looks for simple vulnerabilities.
|
||||
|
||||
What is strace and how do we use it?
|
||||
====================================
|
||||
|
||||
As mentioned earlier, strace which is a useful diagnostic, instructional,
|
||||
and debugging tool and can be used to discover the system resources in use
|
||||
by a workload. It can be used:
|
||||
|
||||
* To see how a process interacts with the kernel.
|
||||
* To see why a process is failing or hanging.
|
||||
* For reverse engineering a process.
|
||||
* To find the files on which a program depends.
|
||||
* For analyzing the performance of an application.
|
||||
* For troubleshooting various problems related to the operating system.
|
||||
|
||||
In addition, strace can generate run-time statistics on times, calls, and
|
||||
errors for each system call and report a summary when program exits,
|
||||
suppressing the regular output. This attempts to show system time (CPU time
|
||||
spent running in the kernel) independent of wall clock time. We plan to use
|
||||
these features to get information on workload system usage.
|
||||
|
||||
strace command supports basic, verbose, and stats modes. strace command when
|
||||
run in verbose mode gives more detailed information about the system calls
|
||||
invoked by a process.
|
||||
|
||||
Running strace -c generates a report of the percentage of time spent in each
|
||||
system call, the total time in seconds, the microseconds per call, the total
|
||||
number of calls, the count of each system call that has failed with an error
|
||||
and the type of system call made.
|
||||
|
||||
* Usage: strace <command we want to trace>
|
||||
* Verbose mode usage: strace -v <command>
|
||||
* Gather statistics: strace -c <command>
|
||||
|
||||
We used the “-c” option to gather fine-grained run-time statistics in use
|
||||
by three workloads we have chose for this analysis.
|
||||
|
||||
* perf
|
||||
* stress-ng
|
||||
* paxtest
|
||||
|
||||
What is cscope and how do we use it?
|
||||
====================================
|
||||
|
||||
Now let’s look at `cscope <https://cscope.sourceforge.net/>`_, a command
|
||||
line tool for browsing C, C++ or Java code-bases. We can use it to find
|
||||
all the references to a symbol, global definitions, functions called by a
|
||||
function, functions calling a function, text strings, regular expression
|
||||
patterns, files including a file.
|
||||
|
||||
We can use cscope to find which system call belongs to which subsystem.
|
||||
This way we can find the kernel subsystems used by a process when it is
|
||||
executed.
|
||||
|
||||
Let’s checkout the latest Linux repository and build cscope database::
|
||||
|
||||
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux
|
||||
cd linux
|
||||
cscope -R -p10 # builds cscope.out database before starting browse session
|
||||
cscope -d -p10 # starts browse session on cscope.out database
|
||||
|
||||
Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to
|
||||
enter into the browsing session. cscope by default cscope.out database.
|
||||
To get out of this mode press ctrl+d. -p option is used to specify the
|
||||
number of file path components to display. -p10 is optimal for browsing
|
||||
kernel sources.
|
||||
|
||||
What is perf and how do we use it?
|
||||
==================================
|
||||
|
||||
Perf is an analysis tool based on Linux 2.6+ systems, which abstracts the
|
||||
CPU hardware difference in performance measurement in Linux, and provides
|
||||
a simple command line interface. Perf is based on the perf_events interface
|
||||
exported by the kernel. It is very useful for profiling the system and
|
||||
finding performance bottlenecks in an application.
|
||||
|
||||
If you haven't already checked out the Linux mainline repository, you can do
|
||||
so and then build kernel and perf tool::
|
||||
|
||||
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux
|
||||
cd linux
|
||||
make -j3 all
|
||||
cd tools/perf
|
||||
make
|
||||
|
||||
Note: The perf command can be built without building the kernel in the
|
||||
repository and can be run on older kernels. However matching the kernel
|
||||
and perf revisions gives more accurate information on the subsystem usage.
|
||||
|
||||
We used "perf stat" and "perf bench" options. For a detailed information on
|
||||
the perf tool, run "perf -h".
|
||||
|
||||
perf stat
|
||||
---------
|
||||
The perf stat command generates a report of various hardware and software
|
||||
events. It does so with the help of hardware counter registers found in
|
||||
modern CPUs that keep the count of these activities. "perf stat cal" shows
|
||||
stats for cal command.
|
||||
|
||||
Perf bench
|
||||
----------
|
||||
The perf bench command contains multiple multi-threaded microkernel
|
||||
benchmarks for executing different subsystems in the Linux kernel and
|
||||
system calls. This allows us to easily measure the impact of changes,
|
||||
which can help mitigate performance regressions. It also acts as a common
|
||||
benchmarking framework, enabling developers to easily create test cases,
|
||||
integrate transparently, and use performance-rich tooling.
|
||||
|
||||
"perf bench all" command runs the following benchmarks:
|
||||
|
||||
* sched/messaging
|
||||
* sched/pipe
|
||||
* syscall/basic
|
||||
* mem/memcpy
|
||||
* mem/memset
|
||||
|
||||
What is stress-ng and how do we use it?
|
||||
=======================================
|
||||
|
||||
As mentioned earlier, stress-ng is used for performing stress testing on
|
||||
the kernel. It allows you to exercise various physical subsystems of the
|
||||
computer, as well as interfaces of the OS kernel, using stressor-s. They
|
||||
are available for CPU, CPU cache, devices, I/O, interrupts, file system,
|
||||
memory, network, operating system, pipelines, schedulers, and virtual
|
||||
machines.
|
||||
|
||||
The netdev stressor starts N workers that exercise various netdevice ioctl
|
||||
commands across all the available network devices. The following ioctls are
|
||||
exercised:
|
||||
|
||||
* SIOCGIFCONF, SIOCGIFINDEX, SIOCGIFNAME, SIOCGIFFLAGS
|
||||
* SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFMETRIC, SIOCGIFMTU
|
||||
* SIOCGIFHWADDR, SIOCGIFMAP, SIOCGIFTXQLEN
|
||||
|
||||
The following command runs the stressor::
|
||||
|
||||
stress-ng --netdev 1 -t 60 --metrics command.
|
||||
|
||||
We can use the perf record command to record the events and information
|
||||
associated with a process. This command records the profiling data in the
|
||||
perf.data file in the same directory.
|
||||
|
||||
Using the following commands you can record the events associated with the
|
||||
netdev stressor, view the generated report perf.data and annotate the to
|
||||
view the statistics of each instruction of the program::
|
||||
|
||||
perf record stress-ng --netdev 1 -t 60 --metrics command.
|
||||
perf report
|
||||
perf annotate
|
||||
|
||||
What is paxtest and how do we use it?
|
||||
=====================================
|
||||
|
||||
paxtest is a program that tests buffer overflows in the kernel. It tests
|
||||
kernel enforcements over memory usage. Generally, execution in some memory
|
||||
segments makes buffer overflows possible. It runs a set of programs that
|
||||
attempt to subvert memory usage. It is used as a regression test suite for
|
||||
PaX, and will be useful to test other memory protection patches for the
|
||||
kernel.
|
||||
|
||||
paxtest provides kiddie and blackhat modes. The paxtest kiddie mode runs
|
||||
in normal mode, whereas the blackhat mode tries to get around the protection
|
||||
of the kernel testing for vulnerabilities. We focus on the kiddie mode here
|
||||
and combine "paxtest kiddie" run with "perf record" to collect CPU stack
|
||||
traces for the paxtest kiddie run to see which function is calling other
|
||||
functions in the performance profile. Then the "dwarf" (DWARF's Call Frame
|
||||
Information) mode can be used to unwind the stack.
|
||||
|
||||
The following command can be used to view resulting report in call-graph
|
||||
format::
|
||||
|
||||
perf record --call-graph dwarf paxtest kiddie
|
||||
perf report --stdio
|
||||
|
||||
Tracing workloads
|
||||
=================
|
||||
|
||||
Now that we understand the workloads, let's start tracing them.
|
||||
|
||||
Tracing perf bench all workload
|
||||
-------------------------------
|
||||
|
||||
Run the following command to trace perf bench all workload::
|
||||
|
||||
strace -c perf bench all
|
||||
|
||||
**System Calls made by the workload**
|
||||
|
||||
The below table shows the system calls invoked by the workload, number of
|
||||
times each system call is invoked, and the corresponding Linux subsystem.
|
||||
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| System Call | # calls | Linux Subsystem | System Call (API) |
|
||||
+===================+===========+=================+=========================+
|
||||
| getppid | 10000001 | Process Mgmt | sys_getpid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| clone | 1077 | Process Mgmt. | sys_clone() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| prctl | 23 | Process Mgmt. | sys_prctl() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| prlimit64 | 7 | Process Mgmt. | sys_prlimit64() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getpid | 10 | Process Mgmt. | sys_getpid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| uname | 3 | Process Mgmt. | sys_uname() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| sysinfo | 1 | Process Mgmt. | sys_sysinfo() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getuid | 1 | Process Mgmt. | sys_getuid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getgid | 1 | Process Mgmt. | sys_getgid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| geteuid | 1 | Process Mgmt. | sys_geteuid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getegid | 1 | Process Mgmt. | sys_getegid |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| close | 49951 | Filesystem | sys_close() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| pipe | 604 | Filesystem | sys_pipe() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| openat | 48560 | Filesystem | sys_opennat() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| fstat | 8338 | Filesystem | sys_fstat() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| stat | 1573 | Filesystem | sys_stat() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| pread64 | 9646 | Filesystem | sys_pread64() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getdents64 | 1873 | Filesystem | sys_getdents64() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| access | 3 | Filesystem | sys_access() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| lstat | 1880 | Filesystem | sys_lstat() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| lseek | 6 | Filesystem | sys_lseek() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| ioctl | 3 | Filesystem | sys_ioctl() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| dup2 | 1 | Filesystem | sys_dup2() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| execve | 2 | Filesystem | sys_execve() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| fcntl | 8779 | Filesystem | sys_fcntl() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| statfs | 1 | Filesystem | sys_statfs() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| epoll_create | 2 | Filesystem | sys_epoll_create() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| epoll_ctl | 64 | Filesystem | sys_epoll_ctl() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| newfstatat | 8318 | Filesystem | sys_newfstatat() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| eventfd2 | 192 | Filesystem | sys_eventfd2() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| mmap | 243 | Memory Mgmt. | sys_mmap() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| mprotect | 32 | Memory Mgmt. | sys_mprotect() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| brk | 21 | Memory Mgmt. | sys_brk() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| munmap | 128 | Memory Mgmt. | sys_munmap() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| set_mempolicy | 156 | Memory Mgmt. | sys_set_mempolicy() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| set_robust_list | 1 | Futex | sys_set_robust_list() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| futex | 341 | Futex | sys_futex() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| sched_getaffinity | 79 | Scheduler | sys_sched_getaffinity() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| sched_setaffinity | 223 | Scheduler | sys_sched_setaffinity() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| socketpair | 202 | Network | sys_socketpair() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| rt_sigprocmask | 21 | Signal | sys_rt_sigprocmask() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| rt_sigaction | 36 | Signal | sys_rt_sigaction() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| rt_sigreturn | 2 | Signal | sys_rt_sigreturn() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| wait4 | 889 | Time | sys_wait4() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| clock_nanosleep | 37 | Time | sys_clock_nanosleep() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| capget | 4 | Capability | sys_capget() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
|
||||
Tracing stress-ng netdev stressor workload
|
||||
------------------------------------------
|
||||
|
||||
Run the following command to trace stress-ng netdev stressor workload::
|
||||
|
||||
strace -c stress-ng --netdev 1 -t 60 --metrics
|
||||
|
||||
**System Calls made by the workload**
|
||||
|
||||
The below table shows the system calls invoked by the workload, number of
|
||||
times each system call is invoked, and the corresponding Linux subsystem.
|
||||
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| System Call | # calls | Linux Subsystem | System Call (API) |
|
||||
+===================+===========+=================+=========================+
|
||||
| openat | 74 | Filesystem | sys_openat() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| close | 75 | Filesystem | sys_close() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| read | 58 | Filesystem | sys_read() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| fstat | 20 | Filesystem | sys_fstat() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| flock | 10 | Filesystem | sys_flock() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| write | 7 | Filesystem | sys_write() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getdents64 | 8 | Filesystem | sys_getdents64() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| pread64 | 8 | Filesystem | sys_pread64() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| lseek | 1 | Filesystem | sys_lseek() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| access | 2 | Filesystem | sys_access() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getcwd | 1 | Filesystem | sys_getcwd() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| execve | 1 | Filesystem | sys_execve() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| mmap | 61 | Memory Mgmt. | sys_mmap() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| munmap | 3 | Memory Mgmt. | sys_munmap() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| mprotect | 20 | Memory Mgmt. | sys_mprotect() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| mlock | 2 | Memory Mgmt. | sys_mlock() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| brk | 3 | Memory Mgmt. | sys_brk() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| rt_sigaction | 21 | Signal | sys_rt_sigaction() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| rt_sigprocmask | 1 | Signal | sys_rt_sigprocmask() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| sigaltstack | 1 | Signal | sys_sigaltstack() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| rt_sigreturn | 1 | Signal | sys_rt_sigreturn() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getpid | 8 | Process Mgmt. | sys_getpid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| prlimit64 | 5 | Process Mgmt. | sys_prlimit64() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| sysinfo | 2 | Process Mgmt. | sys_sysinfo() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getuid | 2 | Process Mgmt. | sys_getuid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| uname | 1 | Process Mgmt. | sys_uname() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| setpgid | 1 | Process Mgmt. | sys_setpgid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getrusage | 1 | Process Mgmt. | sys_getrusage() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| geteuid | 1 | Process Mgmt. | sys_geteuid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| getppid | 1 | Process Mgmt. | sys_getppid() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| sendto | 3 | Network | sys_sendto() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| connect | 1 | Network | sys_connect() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| socket | 1 | Network | sys_socket() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| clone | 1 | Process Mgmt. | sys_clone() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| wait4 | 2 | Time | sys_wait4() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| alarm | 1 | Time | sys_alarm() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
| set_robust_list | 1 | Futex | sys_set_robust_list() |
|
||||
+-------------------+-----------+-----------------+-------------------------+
|
||||
|
||||
Tracing paxtest kiddie workload
|
||||
-------------------------------
|
||||
|
||||
Run the following command to trace paxtest kiddie workload::
|
||||
|
||||
strace -c paxtest kiddie
|
||||
|
||||
**System Calls made by the workload**
|
||||
|
||||
The below table shows the system calls invoked by the workload, number of
|
||||
times each system call is invoked, and the corresponding Linux subsystem.
|
||||
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| System Call | # calls | Linux Subsystem | System Call (API) |
|
||||
+===================+===========+=================+======================+
|
||||
| read | 3 | Filesystem | sys_read() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| write | 11 | Filesystem | sys_write() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| close | 41 | Filesystem | sys_close() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| stat | 24 | Filesystem | sys_stat() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| fstat | 2 | Filesystem | sys_fstat() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| pread64 | 6 | Filesystem | sys_pread64() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| access | 1 | Filesystem | sys_access() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| pipe | 1 | Filesystem | sys_pipe() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| dup2 | 24 | Filesystem | sys_dup2() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| execve | 1 | Filesystem | sys_execve() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| fcntl | 26 | Filesystem | sys_fcntl() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| openat | 14 | Filesystem | sys_openat() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| rt_sigaction | 7 | Signal | sys_rt_sigaction() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| rt_sigreturn | 38 | Signal | sys_rt_sigreturn() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| clone | 38 | Process Mgmt. | sys_clone() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| wait4 | 44 | Time | sys_wait4() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| mmap | 7 | Memory Mgmt. | sys_mmap() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| mprotect | 3 | Memory Mgmt. | sys_mprotect() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| munmap | 1 | Memory Mgmt. | sys_munmap() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| brk | 3 | Memory Mgmt. | sys_brk() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| getpid | 1 | Process Mgmt. | sys_getpid() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| getuid | 1 | Process Mgmt. | sys_getuid() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| getgid | 1 | Process Mgmt. | sys_getgid() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| geteuid | 2 | Process Mgmt. | sys_geteuid() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| getegid | 1 | Process Mgmt. | sys_getegid() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| getppid | 1 | Process Mgmt. | sys_getppid() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
| arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() |
|
||||
+-------------------+-----------+-----------------+----------------------+
|
||||
|
||||
Conclusion
|
||||
==========
|
||||
|
||||
This document is intended to be used as a guide on how to gather fine-grained
|
||||
information on the resources in use by workloads using strace.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* `Discovery Linux Kernel Subsystems used by OpenAPS <https://elisa.tech/blog/2022/02/02/discovery-linux-kernel-subsystems-used-by-openaps>`_
|
||||
* `ELISA-White-Papers-Discovering Linux kernel subsystems used by a workload <https://github.com/elisa-tech/ELISA-White-Papers/blob/master/Processes/Discovering_Linux_kernel_subsystems_used_by_a_workload.md>`_
|
||||
* `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_
|
||||
* `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_
|
||||
* `paxtest README <https://github.com/opntr/paxtest-freebsd/blob/hardenedbsd/0.9.14-hbsd/README>`_
|
||||
* `stress-ng <https://www.mankier.com/1/stress-ng>`_
|
||||
* `Monitoring and managing system status and performance <https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/index>`_
|
@ -156,7 +156,7 @@ else:
|
||||
math_renderer = 'mathjax'
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
templates_path = ['sphinx/templates']
|
||||
|
||||
# The suffix(es) of source filenames.
|
||||
# You can specify multiple suffix as a list of string:
|
||||
@ -331,6 +331,7 @@ if html_theme == 'alabaster':
|
||||
'description': get_cline_version(),
|
||||
'page_width': '65em',
|
||||
'sidebar_width': '15em',
|
||||
'fixed_sidebar': 'true',
|
||||
'font_size': 'inherit',
|
||||
'font_family': 'serif',
|
||||
}
|
||||
@ -348,7 +349,7 @@ html_use_smartypants = False
|
||||
|
||||
# Custom sidebar templates, maps document names to template names.
|
||||
# Note that the RTD theme ignores this
|
||||
html_sidebars = { '**': ['searchbox.html', 'localtoc.html', 'sourcelink.html']}
|
||||
html_sidebars = { '**': ['searchbox.html', 'kernel-toc.html', 'sourcelink.html']}
|
||||
|
||||
# about.html is available for alabaster theme. Add it at the front.
|
||||
if html_theme == 'alabaster':
|
||||
|
@ -42,7 +42,7 @@ padata_shells associated with it, each allowing a separate series of jobs.
|
||||
Modifying cpumasks
|
||||
------------------
|
||||
|
||||
The CPUs used to run jobs can be changed in two ways, programatically with
|
||||
The CPUs used to run jobs can be changed in two ways, programmatically with
|
||||
padata_set_cpumask() or via sysfs. The former is defined::
|
||||
|
||||
int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
|
||||
|
@ -370,8 +370,8 @@ of possible problems:
|
||||
|
||||
The first one can be tracked using tracing: ::
|
||||
|
||||
$ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
|
||||
$ cat /sys/kernel/debug/tracing/trace_pipe > out.txt
|
||||
$ echo workqueue:workqueue_queue_work > /sys/kernel/tracing/set_event
|
||||
$ cat /sys/kernel/tracing/trace_pipe > out.txt
|
||||
(wait a few secs)
|
||||
^C
|
||||
|
||||
|
@ -1,8 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============================================================================
|
||||
Linux CPUFreq - CPU frequency and voltage scaling code in the Linux(TM) kernel
|
||||
==============================================================================
|
||||
========================================================================
|
||||
CPUFreq - CPU frequency and voltage scaling code in the Linux(TM) kernel
|
||||
========================================================================
|
||||
|
||||
Author: Dominik Brodowski <linux@brodo.de>
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
=======================
|
||||
Linux Kernel Crypto API
|
||||
=======================
|
||||
==========
|
||||
Crypto API
|
||||
==========
|
||||
|
||||
:Author: Stephan Mueller
|
||||
:Author: Marek Vasut
|
||||
|
@ -219,7 +219,7 @@ instance::
|
||||
cat cocci.err
|
||||
|
||||
You can use SPFLAGS to add debugging flags; for instance you may want to
|
||||
add both --profile --show-trying to SPFLAGS when debugging. For example
|
||||
add both ``--profile --show-trying`` to SPFLAGS when debugging. For example
|
||||
you may want to use::
|
||||
|
||||
rm -f err.log
|
||||
@ -248,7 +248,7 @@ variables for .cocciconfig is as follows:
|
||||
|
||||
- Your current user's home directory is processed first
|
||||
- Your directory from which spatch is called is processed next
|
||||
- The directory provided with the --dir option is processed last, if used
|
||||
- The directory provided with the ``--dir`` option is processed last, if used
|
||||
|
||||
Since coccicheck runs through make, it naturally runs from the kernel
|
||||
proper dir; as such the second rule above would be implied for picking up a
|
||||
@ -265,8 +265,8 @@ The kernel coccicheck script has::
|
||||
fi
|
||||
|
||||
KBUILD_EXTMOD is set when an explicit target with M= is used. For both cases
|
||||
the spatch --dir argument is used, as such third rule applies when whether M=
|
||||
is used or not, and when M= is used the target directory can have its own
|
||||
the spatch ``--dir`` argument is used, as such third rule applies when whether
|
||||
M= is used or not, and when M= is used the target directory can have its own
|
||||
.cocciconfig file. When M= is not passed as an argument to coccicheck the
|
||||
target directory is the same as the directory from where spatch was called.
|
||||
|
||||
|
@ -39,6 +39,10 @@ Setup
|
||||
this mode. In this case, you should build the kernel with
|
||||
CONFIG_RANDOMIZE_BASE disabled if the architecture supports KASLR.
|
||||
|
||||
- Build the gdb scripts (required on kernels v5.1 and above)::
|
||||
|
||||
make scripts_gdb
|
||||
|
||||
- Enable the gdb stub of QEMU/KVM, either
|
||||
|
||||
- at VM startup time by appending "-s" to the QEMU command line
|
||||
|
@ -1,5 +1,5 @@
|
||||
Buffer Sharing and Synchronization
|
||||
==================================
|
||||
Buffer Sharing and Synchronization (dma-buf)
|
||||
============================================
|
||||
|
||||
The dma-buf subsystem provides the framework for sharing buffers for
|
||||
hardware (DMA) access across multiple device drivers and subsystems, and
|
||||
@ -264,7 +264,7 @@ through memory management dependencies which userspace is unaware of, which
|
||||
randomly hangs workloads until the timeout kicks in. Workloads, which from
|
||||
userspace's perspective, do not contain a deadlock. In such a mixed fencing
|
||||
architecture there is no single entity with knowledge of all dependencies.
|
||||
Thefore preventing such deadlocks from within the kernel is not possible.
|
||||
Therefore preventing such deadlocks from within the kernel is not possible.
|
||||
|
||||
The only solution to avoid dependencies loops is by not allowing indefinite
|
||||
fences in the kernel. This means:
|
||||
|
@ -175,7 +175,7 @@ The details of these operations are:
|
||||
driver can ask for the pointer, maximum size and the currently used size of
|
||||
the metadata and can directly update or read it.
|
||||
|
||||
Becasue the DMA driver manages the memory area containing the metadata,
|
||||
Because the DMA driver manages the memory area containing the metadata,
|
||||
clients must make sure that they do not try to access or get the pointer
|
||||
after their transfer completion callback has run for the descriptor.
|
||||
If no completion callback has been defined for the transfer, then the
|
||||
|
@ -89,7 +89,7 @@ The following command returns the state of the test. ::
|
||||
|
||||
% cat /sys/module/dmatest/parameters/run
|
||||
|
||||
To wait for test completion userpace can poll 'run' until it is false, or use
|
||||
To wait for test completion userspace can poll 'run' until it is false, or use
|
||||
the wait parameter. Specifying 'wait=1' when loading the module causes module
|
||||
initialization to pause until a test run has completed, while reading
|
||||
/sys/module/dmatest/parameters/wait waits for any running test to complete
|
||||
|
@ -4,7 +4,7 @@ High Speed Synchronous Serial Interface (HSI)
|
||||
Introduction
|
||||
---------------
|
||||
|
||||
High Speed Syncronous Interface (HSI) is a fullduplex, low latency protocol,
|
||||
High Speed Synchronous Interface (HSI) is a full duplex, low latency protocol,
|
||||
that is optimized for die-level interconnect between an Application Processor
|
||||
and a Baseband chipset. It has been specified by the MIPI alliance in 2003 and
|
||||
implemented by multiple vendors since then.
|
||||
@ -52,7 +52,7 @@ hsi-char Device
|
||||
------------------
|
||||
|
||||
Each port automatically registers a generic client driver called hsi_char,
|
||||
which provides a charecter device for userspace representing the HSI port.
|
||||
which provides a character device for userspace representing the HSI port.
|
||||
It can be used to communicate via HSI from userspace. Userspace may
|
||||
configure the hsi_char device using the following ioctl commands:
|
||||
|
||||
|
@ -1,6 +1,8 @@
|
||||
========================================
|
||||
The Linux driver implementer's API guide
|
||||
========================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============================
|
||||
Driver implementer's API guide
|
||||
==============================
|
||||
|
||||
The kernel offers a wide variety of interfaces to support the development
|
||||
of device drivers. This document is an only somewhat organized collection
|
||||
|
@ -44,7 +44,7 @@ This _wc variant returns a write-combining map to the page and may only be
|
||||
used with mappings created by io_mapping_create_wc()
|
||||
|
||||
Temporary mappings are only valid in the context of the caller. The mapping
|
||||
is not guaranteed to be globaly visible.
|
||||
is not guaranteed to be globally visible.
|
||||
|
||||
io_mapping_map_local_wc() has a side effect on X86 32bit as it disables
|
||||
migration to make the mapping code work. No caller can rely on this side
|
||||
@ -78,7 +78,7 @@ variant, although this may be significantly slower::
|
||||
unsigned long offset)
|
||||
|
||||
This works like io_mapping_map_atomic/local_wc() except it has no side
|
||||
effects and the pointer is globaly visible.
|
||||
effects and the pointer is globally visible.
|
||||
|
||||
The mappings are released with::
|
||||
|
||||
|
@ -65,7 +65,7 @@ There are three groups of locks for managing the device:
|
||||
2.3 new-device management
|
||||
-------------------------
|
||||
|
||||
A single lock: "no-new-dev" is used to co-ordinate the addition of
|
||||
A single lock: "no-new-dev" is used to coordinate the addition of
|
||||
new devices - this must be synchronized across the array.
|
||||
Normally all nodes hold a concurrent-read lock on this device.
|
||||
|
||||
|
@ -81,7 +81,7 @@ The write-through and write-back cache use the same disk format. The cache disk
|
||||
is organized as a simple write log. The log consists of 'meta data' and 'data'
|
||||
pairs. The meta data describes the data. It also includes checksum and sequence
|
||||
ID for recovery identification. Data can be IO data and parity data. Data is
|
||||
checksumed too. The checksum is stored in the meta data ahead of the data. The
|
||||
checksummed too. The checksum is stored in the meta data ahead of the data. The
|
||||
checksum is an optimization because MD can write meta and data freely without
|
||||
worry about the order. MD superblock has a field pointed to the valid meta data
|
||||
of log head.
|
||||
|
@ -28,7 +28,7 @@ Currently, it consists of:
|
||||
takes parameters at initialization that will dictate how the simulation
|
||||
behaves.
|
||||
|
||||
- Code reponsible for encoding a valid MPEG Transport Stream, which is then
|
||||
- Code responsible for encoding a valid MPEG Transport Stream, which is then
|
||||
passed to the bridge driver. This fake stream contains some hardcoded content.
|
||||
For now, we have a single, audio-only channel containing a single MPEG
|
||||
Elementary Stream, which in turn contains a SMPTE 302m encoded sine-wave.
|
||||
|
@ -24,7 +24,7 @@ unless this is fixed in the HW platform.
|
||||
|
||||
The demux kABI only controls front-ends regarding to their connections with
|
||||
demuxes; the kABI used to set the other front-end parameters, such as
|
||||
tuning, are devined via the Digital TV Frontend kABI.
|
||||
tuning, are defined via the Digital TV Frontend kABI.
|
||||
|
||||
The functions that implement the abstract interface demux should be defined
|
||||
static or module private and registered to the Demux core for external
|
||||
|
@ -321,7 +321,7 @@ response to video node operations. This hides the complexity of the underlying
|
||||
hardware from applications. For complex devices, finer-grained control of the
|
||||
device than what the video nodes offer may be required. In those cases, bridge
|
||||
drivers that implement :ref:`the media controller API <media_controller>` may
|
||||
opt for making the subdevice operations directly accessible from userpace.
|
||||
opt for making the subdevice operations directly accessible from userspace.
|
||||
|
||||
Device nodes named ``v4l-subdev``\ *X* can be created in ``/dev`` to access
|
||||
sub-devices directly. If a sub-device supports direct userspace configuration
|
||||
@ -574,7 +574,7 @@ issues with subdevice drivers that let the V4L2 core manage the active state,
|
||||
as they expect to receive the appropriate state as a parameter. To help the
|
||||
conversion of subdevice drivers to a managed active state without having to
|
||||
convert all callers at the same time, an additional wrapper layer has been
|
||||
added to v4l2_subdev_call(), which handles the NULL case by geting and locking
|
||||
added to v4l2_subdev_call(), which handles the NULL case by getting and locking
|
||||
the callee's active state with :c:func:`v4l2_subdev_lock_and_get_active_state()`,
|
||||
and unlocking the state after the call.
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
MEI NFC
|
||||
-------
|
||||
|
||||
Some Intel 8 and 9 Serieses chipsets supports NFC devices connected behind
|
||||
Some Intel 8 and 9 Series chipsets support NFC devices connected behind
|
||||
the Intel Management Engine controller.
|
||||
MEI client bus exposes the NFC chips as NFC phy devices and enables
|
||||
binding with Microread and NXP PN544 NFC device driver from the Linux NFC
|
||||
|
@ -150,7 +150,7 @@ LLC
|
||||
|
||||
Communication between the CPU and the chip often requires some link layer
|
||||
protocol. Those are isolated as modules managed by the HCI layer. There are
|
||||
currently two modules : nop (raw transfert) and shdlc.
|
||||
currently two modules : nop (raw transfer) and shdlc.
|
||||
A new llc must implement the following functions::
|
||||
|
||||
struct nfc_llc_ops {
|
||||
|
@ -82,7 +82,7 @@ LABEL:
|
||||
Metadata stored on a DIMM device that partitions and identifies
|
||||
(persistently names) capacity allocated to different PMEM namespaces. It
|
||||
also indicates whether an address abstraction like a BTT is applied to
|
||||
the namepsace. Note that traditional partition tables, GPT/MBR, are
|
||||
the namespace. Note that traditional partition tables, GPT/MBR, are
|
||||
layered on top of a PMEM namespace, or an address abstraction like BTT
|
||||
if present, but partition support is deprecated going forward.
|
||||
|
||||
|
@ -83,7 +83,7 @@ passed in.
|
||||
6. Freeze
|
||||
---------
|
||||
The freeze operation does not require any keys. The security config can be
|
||||
frozen by a user with root privelege.
|
||||
frozen by a user with root privilege.
|
||||
|
||||
7. Disable
|
||||
----------
|
||||
|
@ -836,7 +836,7 @@ hardware and shall be put into different subsystems:
|
||||
|
||||
Depending on the exact HW register design, some functions exposed by the
|
||||
GPIO subsystem may call into the pinctrl subsystem in order to
|
||||
co-ordinate register settings across HW modules. In particular, this may
|
||||
coordinate register settings across HW modules. In particular, this may
|
||||
be needed for HW with separate GPIO and pin controller HW modules, where
|
||||
e.g. GPIO direction is determined by a register in the pin controller HW
|
||||
module rather than the GPIO HW module.
|
||||
|
@ -20,7 +20,7 @@ Overview of the ``pldmfw`` library
|
||||
|
||||
The ``pldmfw`` library is intended to be used by device drivers for
|
||||
implementing device flash update based on firmware files following the PLDM
|
||||
firwmare file format.
|
||||
firmware file format.
|
||||
|
||||
It is implemented using an ops table that allows device drivers to provide
|
||||
the underlying device specific functionality.
|
||||
|
@ -24,7 +24,7 @@ console support.
|
||||
Console Support
|
||||
---------------
|
||||
|
||||
The serial core provides a few helper functions. This includes identifing
|
||||
The serial core provides a few helper functions. This includes identifying
|
||||
the correct port structure (via uart_get_console()) and decoding command line
|
||||
arguments (uart_parse_options()).
|
||||
|
||||
|
@ -77,7 +77,7 @@ after the frame structure and before the payload. The payload is followed by
|
||||
its own CRC (over all payload bytes). If the payload is not present (i.e.
|
||||
the frame has ``LEN=0``), the CRC of the payload is still present and will
|
||||
evaluate to ``0xffff``. The |LEN| field does not include any of the CRCs, it
|
||||
equals the number of bytes inbetween the CRC of the frame and the CRC of the
|
||||
equals the number of bytes between the CRC of the frame and the CRC of the
|
||||
payload.
|
||||
|
||||
Additionally, the following fixed two-byte sequences are used:
|
||||
|
@ -18,7 +18,7 @@ controller which can be configured in one of 4 ways:
|
||||
4. Hub configuration
|
||||
|
||||
Linux currently supports several versions of this controller. In all
|
||||
likelyhood, the version in your SoC is already supported. At the time
|
||||
likelihood, the version in your SoC is already supported. At the time
|
||||
of this writing, known tested versions range from 2.02a to 3.10a. As a
|
||||
rule of thumb, anything above 2.02a should work reliably well.
|
||||
|
||||
|
@ -48,7 +48,7 @@ kernel boot parameter::
|
||||
"earlyprintk=xdbc"
|
||||
|
||||
If there are multiple xHCI controllers in your system, you can
|
||||
append a host contoller index to this kernel parameter. This
|
||||
append a host controller index to this kernel parameter. This
|
||||
index starts from 0.
|
||||
|
||||
Current design doesn't support DbC runtime suspend/resume. As
|
||||
|
@ -1284,6 +1284,7 @@ support this. Table 1-9 lists the files and their meaning.
|
||||
rt_cache Routing cache
|
||||
snmp SNMP data
|
||||
sockstat Socket statistics
|
||||
softnet_stat Per-CPU incoming packets queues statistics of online CPUs
|
||||
tcp TCP sockets
|
||||
udp UDP sockets
|
||||
unix UNIX domain sockets
|
||||
|
@ -1,6 +1,6 @@
|
||||
==================================
|
||||
Linux GPU Driver Developer's Guide
|
||||
==================================
|
||||
============================
|
||||
GPU Driver Developer's Guide
|
||||
============================
|
||||
|
||||
.. toctree::
|
||||
|
||||
|
@ -344,8 +344,8 @@ Documentation/ABI/testing/sysfs-bus-iio for IIO ABIs to user space.
|
||||
|
||||
To debug ISH, event tracing mechanism is used. To enable debug logs::
|
||||
|
||||
echo 1 > /sys/kernel/debug/tracing/events/intel_ish/enable
|
||||
cat /sys/kernel/debug/tracing/trace
|
||||
echo 1 > /sys/kernel/tracing/events/intel_ish/enable
|
||||
cat /sys/kernel/tracing/trace
|
||||
|
||||
3.8 ISH IIO sysfs Example on Lenovo thinkpad Yoga 260
|
||||
-----------------------------------------------------
|
||||
|
@ -1,8 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================
|
||||
Linux Hardware Monitoring
|
||||
=========================
|
||||
===================
|
||||
Hardware Monitoring
|
||||
===================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
@ -1,6 +1,6 @@
|
||||
=============================
|
||||
The Linux Input Documentation
|
||||
=============================
|
||||
===================
|
||||
Input Documentation
|
||||
===================
|
||||
|
||||
Contents:
|
||||
|
||||
|
@ -26,3 +26,4 @@ LEDs
|
||||
leds-lp55xx
|
||||
leds-mlxcpld
|
||||
leds-sc27xx
|
||||
leds-qcom-lpg
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _active_mm:
|
||||
|
||||
=========
|
||||
Active MM
|
||||
=========
|
||||
|
@ -1,7 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
.. _arch_page_table_helpers:
|
||||
|
||||
===============================
|
||||
Architecture Page Table Helpers
|
||||
===============================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _balance:
|
||||
|
||||
================
|
||||
Memory Balancing
|
||||
================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _free_page_reporting:
|
||||
|
||||
=====================
|
||||
Free Page Reporting
|
||||
=====================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _frontswap:
|
||||
|
||||
=========
|
||||
Frontswap
|
||||
=========
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _highmem:
|
||||
|
||||
====================
|
||||
High Memory Handling
|
||||
====================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _hmm:
|
||||
|
||||
=====================================
|
||||
Heterogeneous Memory Management (HMM)
|
||||
=====================================
|
||||
@ -304,7 +302,7 @@ devm_memunmap_pages(), and devm_release_mem_region() when the resources can
|
||||
be tied to a ``struct device``.
|
||||
|
||||
The overall migration steps are similar to migrating NUMA pages within system
|
||||
memory (see :ref:`Page migration <page_migration>`) but the steps are split
|
||||
memory (see Documentation/mm/page_migration.rst) but the steps are split
|
||||
between device driver specific code and shared common code:
|
||||
|
||||
1. ``mmap_read_lock()``
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _hugetlbfs_reserve:
|
||||
|
||||
=====================
|
||||
Hugetlbfs Reservation
|
||||
=====================
|
||||
@ -7,7 +5,7 @@ Hugetlbfs Reservation
|
||||
Overview
|
||||
========
|
||||
|
||||
Huge pages as described at :ref:`hugetlbpage` are typically
|
||||
Huge pages as described at Documentation/mm/hugetlbpage.rst are typically
|
||||
preallocated for application use. These huge pages are instantiated in a
|
||||
task's address space at page fault time if the VMA indicates huge pages are
|
||||
to be used. If no huge page exists at page fault time, the task is sent
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. hwpoison:
|
||||
|
||||
========
|
||||
hwpoison
|
||||
========
|
||||
|
@ -1,6 +1,6 @@
|
||||
=====================================
|
||||
Linux Memory Management Documentation
|
||||
=====================================
|
||||
===============================
|
||||
Memory Management Documentation
|
||||
===============================
|
||||
|
||||
Memory Management Guide
|
||||
=======================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _ksm:
|
||||
|
||||
=======================
|
||||
Kernel Samepage Merging
|
||||
=======================
|
||||
@ -8,7 +6,7 @@ KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y,
|
||||
added to the Linux kernel in 2.6.32. See ``mm/ksm.c`` for its implementation,
|
||||
and http://lwn.net/Articles/306704/ and https://lwn.net/Articles/330589/
|
||||
|
||||
The userspace interface of KSM is described in :ref:`Documentation/admin-guide/mm/ksm.rst <admin_guide_ksm>`
|
||||
The userspace interface of KSM is described in Documentation/admin-guide/mm/ksm.rst
|
||||
|
||||
Design
|
||||
======
|
||||
|
@ -1,7 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
.. _physical_memory_model:
|
||||
|
||||
=====================
|
||||
Physical Memory Model
|
||||
=====================
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _mmu_notifier:
|
||||
|
||||
When do you need to notify inside page table lock ?
|
||||
===================================================
|
||||
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _numa:
|
||||
|
||||
Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
|
||||
|
||||
=============
|
||||
@ -64,7 +62,7 @@ In addition, for some architectures, again x86 is an example, Linux supports
|
||||
the emulation of additional nodes. For NUMA emulation, linux will carve up
|
||||
the existing nodes--or the system memory for non-NUMA platforms--into multiple
|
||||
nodes. Each emulated node will manage a fraction of the underlying cells'
|
||||
physical memory. NUMA emluation is useful for testing NUMA kernel and
|
||||
physical memory. NUMA emulation is useful for testing NUMA kernel and
|
||||
application features on non-NUMA platforms, and as a sort of memory resource
|
||||
management mechanism when used together with cpusets.
|
||||
[see Documentation/admin-guide/cgroup-v1/cpusets.rst]
|
||||
@ -110,7 +108,7 @@ to improve NUMA locality using various CPU affinity command line interfaces,
|
||||
such as taskset(1) and numactl(1), and program interfaces such as
|
||||
sched_setaffinity(2). Further, one can modify the kernel's default local
|
||||
allocation behavior using Linux NUMA memory policy. [see
|
||||
:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`].
|
||||
Documentation/admin-guide/mm/numa_memory_policy.rst].
|
||||
|
||||
System administrators can restrict the CPUs and nodes' memories that a non-
|
||||
privileged user can specify in the scheduling or NUMA commands and functions
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _page_frags:
|
||||
|
||||
==============
|
||||
Page fragments
|
||||
==============
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _page_migration:
|
||||
|
||||
==============
|
||||
Page migration
|
||||
==============
|
||||
@ -9,8 +7,8 @@ nodes in a NUMA system while the process is running. This means that the
|
||||
virtual addresses that the process sees do not change. However, the
|
||||
system rearranges the physical location of those pages.
|
||||
|
||||
Also see :ref:`Heterogeneous Memory Management (HMM) <hmm>`
|
||||
for migrating pages to or from device private memory.
|
||||
Also see Documentation/mm/hmm.rst for migrating pages to or from device
|
||||
private memory.
|
||||
|
||||
The main intent of page migration is to reduce the latency of memory accesses
|
||||
by moving pages near to the processor where the process accessing that memory
|
||||
|
@ -1,5 +1,3 @@
|
||||
.. _page_owner:
|
||||
|
||||
==================================================
|
||||
page owner: Tracking about who allocated each page
|
||||
==================================================
|
||||
@ -52,7 +50,7 @@ pages are investigated and marked as allocated in initialization phase.
|
||||
Although it doesn't mean that they have the right owner information,
|
||||
at least, we can tell whether the page is allocated or not,
|
||||
more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages
|
||||
are catched and marked, although they are mostly allocated from struct
|
||||
are caught and marked, although they are mostly allocated from struct
|
||||
page extension feature. Anyway, after that, no page is left in
|
||||
un-tracking state.
|
||||
|
||||
@ -178,7 +176,7 @@ STANDARD FORMAT SPECIFIERS
|
||||
at alloc_ts timestamp of the page when it was allocated
|
||||
ator allocator memory allocator for pages
|
||||
|
||||
For --curl option:
|
||||
For --cull option:
|
||||
|
||||
KEY LONG DESCRIPTION
|
||||
p pid process ID
|
||||
|
@ -1,7 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
.. _page_table_check:
|
||||
|
||||
================
|
||||
Page Table Check
|
||||
================
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user