Merge branches 'for-5.2/fixes', 'for-5.3/doc', 'for-5.3/ish', 'for-5.3/logitech' and 'for-5.3/wacom' into for-linus

This commit is contained in:
Jiri Kosina 2019-07-10 01:39:57 +02:00
29558 changed files with 552480 additions and 481209 deletions
.clang-format.get_maintainer.ignore.gitignore.mailmapCREDITS
Documentation
ABI
DMA-API-HOWTO.txtMakefile
accounting
acpi/dsd
admin-guide
arm64
atomic_bitops.txt
block
bpf
cgroup-v1
clearing-warn-once.txtconf.py
core-api
dev-tools
device-mapper
devicetree/bindings

View File

@ -387,14 +387,14 @@ ForEachMacros:
- 'rhl_for_each_entry_rcu'
- 'rhl_for_each_rcu'
- 'rht_for_each'
- 'rht_for_each_continue'
- 'rht_for_each_from'
- 'rht_for_each_entry'
- 'rht_for_each_entry_continue'
- 'rht_for_each_entry_from'
- 'rht_for_each_entry_rcu'
- 'rht_for_each_entry_rcu_continue'
- 'rht_for_each_entry_rcu_from'
- 'rht_for_each_entry_safe'
- 'rht_for_each_rcu'
- 'rht_for_each_rcu_continue'
- 'rht_for_each_rcu_from'
- '__rq_for_each_bio'
- 'rq_for_each_bvec'
- 'rq_for_each_segment'

View File

@ -1 +1,2 @@
Christoph Hellwig <hch@lst.de>
Marc Gonzalez <marc.w.gonzalez@free.fr>

24
.gitignore vendored
View File

@ -58,6 +58,7 @@ modules.builtin
/vmlinuz
/System.map
/Module.markers
/modules.builtin.modinfo
#
# RPM spec file (make rpm-pkg)
@ -80,20 +81,22 @@ modules.builtin
/tar-install/
#
# git files that we don't want to ignore even if they are dot-files
# We don't want to ignore the following even if they are dot-files
#
!.clang-format
!.cocciconfig
!.get_maintainer.ignore
!.gitattributes
!.gitignore
!.mailmap
!.cocciconfig
!.clang-format
#
# Generated include files
#
include/config
include/generated
include/ksym
arch/*/include/generated
/include/config/
/include/generated/
/include/ksym/
/arch/*/include/generated/
# stgit generated dirs
patches-*
@ -129,7 +132,12 @@ signing_key.x509
x509.genkey
# Kconfig presets
all.config
/all.config
/alldef.config
/allmod.config
/allno.config
/allrandom.config
/allyes.config
# Kdevelop4
*.kdev4

View File

@ -16,6 +16,11 @@ Alan Cox <alan@lxorguk.ukuu.org.uk>
Alan Cox <root@hraefn.swansea.linux.org.uk>
Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Aleksandar Markovic <aleksandar.markovic@mips.com> <aleksandar.markovic@imgtec.com>
Alex Shi <alex.shi@linux.alibaba.com> <alex.shi@intel.com>
Alex Shi <alex.shi@linux.alibaba.com> <alex.shi@linaro.org>
Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com>
Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
Alexei Starovoitov <ast@kernel.org> <ast@fb.com>
Al Viro <viro@ftp.linux.org.uk>
Al Viro <viro@zenIV.linux.org.uk>
Andi Shyti <andi@etezian.org> <andi.shyti@samsung.com>
@ -46,6 +51,12 @@ Christoph Hellwig <hch@lst.de>
Christophe Ricard <christophe.ricard@gmail.com>
Corey Minyard <minyard@acm.org>
Damian Hobson-Garcia <dhobsong@igel.co.jp>
Daniel Borkmann <daniel@iogearbox.net> <dborkman@redhat.com>
Daniel Borkmann <daniel@iogearbox.net> <dborkmann@redhat.com>
Daniel Borkmann <daniel@iogearbox.net> <danborkmann@iogearbox.net>
Daniel Borkmann <daniel@iogearbox.net> <daniel.borkmann@tik.ee.ethz.ch>
Daniel Borkmann <daniel@iogearbox.net> <danborkmann@googlemail.com>
Daniel Borkmann <daniel@iogearbox.net> <dxchgb@gmail.com>
David Brownell <david-b@pacbell.net>
David Woodhouse <dwmw2@shinybook.infradead.org>
Dengcheng Zhu <dzhu@wavecomp.com> <dengcheng.zhu@mips.com>
@ -70,6 +81,7 @@ Greg Kroah-Hartman <greg@echidna.(none)>
Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman <greg@kroah.com>
Gregory CLEMENT <gregory.clement@bootlin.com> <gregory.clement@free-electrons.com>
Hanjun Guo <guohanjun@huawei.com> <hanjun.guo@linaro.org>
Henk Vergonet <Henk.Vergonet@gmail.com>
Henrik Kretzschmar <henne@nachtwindheim.de>
Henrik Rydberg <rydberg@bitmath.org>
@ -117,6 +129,8 @@ Leonid I Ananiev <leonid.i.ananiev@intel.com>
Linas Vepstas <linas@austin.ibm.com>
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de>
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@ascom.ch>
Li Yang <leoyang.li@nxp.com> <leo@zh-kernel.org>
Li Yang <leoyang.li@nxp.com> <leoli@freescale.com>
Maciej W. Rozycki <macro@mips.com> <macro@imgtec.com>
Marcin Nowakowski <marcin.nowakowski@mips.com> <marcin.nowakowski@imgtec.com>
Mark Brown <broonie@sirena.org.uk>
@ -189,6 +203,7 @@ Santosh Shilimkar <ssantosh@kernel.org>
Santosh Shilimkar <santosh.shilimkar@oracle.org>
Sascha Hauer <s.hauer@pengutronix.de>
S.Çağlar Onur <caglar@pardus.org.tr>
Sean Nyekjaer <sean@geanix.com> <sean.nyekjaer@prevas.dk>
Sebastian Reichel <sre@kernel.org> <sre@debian.org>
Sebastian Reichel <sre@kernel.org> <sebastian.reichel@collabora.co.uk>
Shiraz Hashim <shiraz.linux.kernel@gmail.com> <shiraz.hashim@st.com>
@ -207,6 +222,8 @@ Tejun Heo <htejun@gmail.com>
Thomas Graf <tgraf@suug.ch>
Thomas Pedersen <twp@codeaurora.org>
Tony Luck <tony.luck@intel.com>
TripleX Chung <xxx.phy@gmail.com> <zhongyu@18mail.cn>
TripleX Chung <xxx.phy@gmail.com> <triplex@zh-kernel.org>
Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Uwe Kleine-König <ukl@pengutronix.de>
@ -222,6 +239,7 @@ Vlad Dogaru <ddvlad@gmail.com> <vlad.dogaru@intel.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@virtuozzo.com>
Vladimir Davydov <vdavydov.dev@gmail.com> <vdavydov@parallels.com>
Takashi YOSHII <takashi.yoshii.zj@renesas.com>
Will Deacon <will@kernel.org> <will.deacon@arm.com>
Yakir Yang <kuankuan.y@gmail.com> <ykk@rock-chips.com>
Yusuke Goda <goda.yusuke@renesas.com>
Gustavo Padovan <gustavo@las.ic.unicamp.br>

View File

@ -3364,6 +3364,14 @@ S: Braunschweiger Strasse 79
S: 31134 Hildesheim
S: Germany
N: Martin Schwidefsky
D: Martin was the most significant contributor to the initial s390
D: port of the Linux Kernel and later the maintainer of the s390
D: architecture backend for almost two decades.
D: He passed away in 2019, and will be greatly missed.
S: Germany
W: https://lwn.net/Articles/789028/
N: Marcel Selhorst
E: tpmdd@selhorst.net
D: TPM driver

View File

@ -0,0 +1,32 @@
This ABI is deprecated and will be removed after 2021. It is
replaced with the batadv generic netlink family.
What: /sys/class/net/<iface>/batman-adv/elp_interval
Date: Feb 2014
Contact: Linus Lüssing <linus.luessing@web.de>
Description:
Defines the interval in milliseconds in which batman
emits probing packets for neighbor sensing (ELP).
What: /sys/class/net/<iface>/batman-adv/iface_status
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Indicates the status of <iface> as it is seen by batman.
What: /sys/class/net/<iface>/batman-adv/mesh_iface
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
The /sys/class/net/<iface>/batman-adv/mesh_iface file
displays the batman mesh interface this <iface>
currently is associated with.
What: /sys/class/net/<iface>/batman-adv/throughput_override
Date: Feb 2014
Contact: Antonio Quartulli <a@unstable.cc>
description:
Defines the throughput value to be used by B.A.T.M.A.N. V
when estimating the link throughput using this interface.
If the value is set to 0 then batman-adv will try to
estimate the throughput by itself.

View File

@ -0,0 +1,110 @@
This ABI is deprecated and will be removed after 2021. It is
replaced with the batadv generic netlink family.
What: /sys/class/net/<mesh_iface>/mesh/aggregated_ogms
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Indicates whether the batman protocol messages of the
mesh <mesh_iface> shall be aggregated or not.
What: /sys/class/net/<mesh_iface>/mesh/<vlan_subdir>/ap_isolation
Date: May 2011
Contact: Antonio Quartulli <a@unstable.cc>
Description:
Indicates whether the data traffic going from a
wireless client to another wireless client will be
silently dropped. <vlan_subdir> is empty when referring
to the untagged lan.
What: /sys/class/net/<mesh_iface>/mesh/bonding
Date: June 2010
Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the data traffic going through the
mesh will be sent using multiple interfaces at the
same time (if available).
What: /sys/class/net/<mesh_iface>/mesh/bridge_loop_avoidance
Date: November 2011
Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the bridge loop avoidance feature
is enabled. This feature detects and avoids loops
between the mesh and devices bridged with the soft
interface <mesh_iface>.
What: /sys/class/net/<mesh_iface>/mesh/fragmentation
Date: October 2010
Contact: Andreas Langer <an.langer@gmx.de>
Description:
Indicates whether the data traffic going through the
mesh will be fragmented or silently discarded if the
packet size exceeds the outgoing interface MTU.
What: /sys/class/net/<mesh_iface>/mesh/gw_bandwidth
Date: October 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the bandwidth which is propagated by this
node if gw_mode was set to 'server'.
What: /sys/class/net/<mesh_iface>/mesh/gw_mode
Date: October 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the state of the gateway features. Can be
either 'off', 'client' or 'server'.
What: /sys/class/net/<mesh_iface>/mesh/gw_sel_class
Date: October 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the selection criteria this node will use
to choose a gateway if gw_mode was set to 'client'.
What: /sys/class/net/<mesh_iface>/mesh/hop_penalty
Date: Oct 2010
Contact: Linus Lüssing <linus.luessing@web.de>
Description:
Defines the penalty which will be applied to an
originator message's tq-field on every hop.
What: /sys/class/net/<mesh_iface>/mesh/isolation_mark
Date: Nov 2013
Contact: Antonio Quartulli <a@unstable.cc>
Description:
Defines the isolation mark (and its bitmask) which
is used to classify clients as "isolated" by the
Extended Isolation feature.
What: /sys/class/net/<mesh_iface>/mesh/multicast_mode
Date: Feb 2014
Contact: Linus Lüssing <linus.luessing@web.de>
Description:
Indicates whether multicast optimizations are enabled
or disabled. If set to zero then all nodes in the
mesh are going to use classic flooding for any
multicast packet with no optimizations.
What: /sys/class/net/<mesh_iface>/mesh/network_coding
Date: Nov 2012
Contact: Martin Hundeboll <martin@hundeboll.net>
Description:
Controls whether Network Coding (using some magic
to send fewer wifi packets but still the same
content) is enabled or not.
What: /sys/class/net/<mesh_iface>/mesh/orig_interval
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the interval in milliseconds in which batman
sends its protocol messages.
What: /sys/class/net/<mesh_iface>/mesh/routing_algo
Date: Dec 2011
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the routing procotol this mesh instance
uses to find the optimal paths through the mesh.

View File

@ -6,6 +6,8 @@ Description:
This file allows user to read/write the raw NVMEM contents.
Permissions for write to this file depends on the nvmem
provider configuration.
Note: This file is only present if CONFIG_NVMEM_SYSFS
is enabled
ex:
hexdump /sys/bus/nvmem/devices/qfprom0/nvmem

View File

@ -81,7 +81,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/latency
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Channel signaling latency
Description: Channel signaling latency. This file is available only for
performance critical channels (storage, network, etc.) that use
the monitor page mechanism.
Users: Debugging tools
What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/out_mask
@ -95,7 +97,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/pending
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Channel interrupt pending state
Description: Channel interrupt pending state. This file is available only for
performance critical channels (storage, network, etc.) that use
the monitor page mechanism.
Users: Debugging tools
What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/read_avail
@ -137,7 +141,9 @@ What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/monitor_id
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Monitor bit associated with channel
Description: Monitor bit associated with channel. This file is available only
for performance critical channels (storage, network, etc.) that
use the monitor page mechanism.
Users: Debugging tools and userspace drivers
What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/ring

View File

@ -90,4 +90,89 @@ Date: December 2009
Contact: Lee Schermerhorn <lee.schermerhorn@hp.com>
Description:
The node's huge page size control/query attributes.
See Documentation/admin-guide/mm/hugetlbpage.rst
See Documentation/admin-guide/mm/hugetlbpage.rst
What: /sys/devices/system/node/nodeX/accessY/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The node's relationship to other nodes for access class "Y".
What: /sys/devices/system/node/nodeX/accessY/initiators/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The directory containing symlinks to memory initiator
nodes that have class "Y" access to this target node's
memory. CPUs and other memory initiators in nodes not in
the list accessing this node's memory may have different
performance.
What: /sys/devices/system/node/nodeX/accessY/targets/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The directory containing symlinks to memory targets that
this initiator node has class "Y" access.
What: /sys/devices/system/node/nodeX/accessY/initiators/read_bandwidth
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's read bandwidth in MB/s when accessed from
nodes found in this access class's linked initiators.
What: /sys/devices/system/node/nodeX/accessY/initiators/read_latency
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's read latency in nanoseconds when accessed
from nodes found in this access class's linked initiators.
What: /sys/devices/system/node/nodeX/accessY/initiators/write_bandwidth
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's write bandwidth in MB/s when accessed from
found in this access class's linked initiators.
What: /sys/devices/system/node/nodeX/accessY/initiators/write_latency
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's write latency in nanoseconds when access
from nodes found in this class's linked initiators.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The directory containing attributes for the memory-side cache
level 'Y'.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/indexing
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The caches associativity indexing: 0 for direct mapped,
non-zero if indexed.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/line_size
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The number of bytes accessed from the next cache level on a
cache miss.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/size
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The size of this memory side cache in bytes.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/write_policy
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The cache write policy: 0 for write-back, 1 for write-through,
other or unknown.

View File

@ -1,23 +1,46 @@
What: /sys/kernel/debug/wilco_ec/h1_gpio
Date: April 2019
KernelVersion: 5.2
Description:
As part of Chrome OS's FAFT (Fully Automated Firmware Testing)
tests, we need to ensure that the H1 chip is properly setting
some GPIO lines. The h1_gpio attribute exposes the state
of the lines:
- ENTRY_TO_FACT_MODE in BIT(0)
- SPI_CHROME_SEL in BIT(1)
Output will formatted with "0x%02x\n".
What: /sys/kernel/debug/wilco_ec/raw
Date: January 2019
KernelVersion: 5.1
Description:
Write and read raw mailbox commands to the EC.
For writing:
Bytes 0-1 indicate the message type:
00 F0 = Execute Legacy Command
00 F2 = Read/Write NVRAM Property
Byte 2 provides the command code
Bytes 3+ consist of the data passed in the request
You can write a hexadecimal sentence to raw, and that series of
bytes will be sent to the EC. Then, you can read the bytes of
response by reading from raw.
At least three bytes are required, for the msg type and command,
with additional bytes optional for additional data.
For writing, bytes 0-1 indicate the message type, one of enum
wilco_ec_msg_type. Byte 2+ consist of the data passed in the
request, starting at MBOX[0]
At least three bytes are required for writing, two for the type
and at least a single byte of data. Only the first
EC_MAILBOX_DATA_SIZE bytes of MBOX will be used.
Example:
// Request EC info type 3 (EC firmware build date)
$ echo 00 f0 38 00 03 00 > raw
// Corresponds with sending type 0x00f0 with
// MBOX = [38, 00, 03, 00]
$ echo 00 f0 38 00 03 00 > /sys/kernel/debug/wilco_ec/raw
// View the result. The decoded ASCII result "12/21/18" is
// included after the raw hex.
$ cat raw
00 31 32 2f 32 31 2f 31 38 00 38 00 01 00 2f 00 .12/21/18.8...
// Corresponds with MBOX = [00, 00, 31, 32, 2f, 32, 31, 38, ...]
$ cat /sys/kernel/debug/wilco_ec/raw
00 00 31 32 2f 32 31 2f 31 38 00 38 00 01 00 2f 00 ..12/21/18.8...
Note that the first 32 bytes of the received MBOX[] will be
printed, even if some of the data is junk. It is up to you to
know how many of the first bytes of data are the actual
response.

View File

@ -0,0 +1,230 @@
What: /sys/bus/counter/devices/counterX/countY/count
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Count data of Count Y represented as a string.
What: /sys/bus/counter/devices/counterX/countY/ceiling
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Count value ceiling for Count Y. This is the upper limit for the
respective counter.
What: /sys/bus/counter/devices/counterX/countY/floor
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Count value floor for Count Y. This is the lower limit for the
respective counter.
What: /sys/bus/counter/devices/counterX/countY/count_mode
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Count mode for channel Y. The ceiling and floor values for
Count Y are used by the count mode where required. The following
count modes are available:
normal:
Counting is continuous in either direction.
range limit:
An upper or lower limit is set, mimicking limit switches
in the mechanical counterpart. The upper limit is set to
the Count Y ceiling value, while the lower limit is set
to the Count Y floor value. The counter freezes at
count = ceiling when counting up, and at count = floor
when counting down. At either of these limits, the
counting is resumed only when the count direction is
reversed.
non-recycle:
The counter is disabled whenever a counter overflow or
underflow takes place. The counter is re-enabled when a
new count value is loaded to the counter via a preset
operation or direct write.
modulo-n:
A count value boundary is set between the Count Y floor
value and the Count Y ceiling value. The counter is
reset to the Count Y floor value at count = ceiling when
counting up, while the counter is set to the Count Y
ceiling value at count = floor when counting down; the
counter does not freeze at the boundary points, but
counts continuously throughout.
What: /sys/bus/counter/devices/counterX/countY/count_mode_available
What: /sys/bus/counter/devices/counterX/countY/error_noise_available
What: /sys/bus/counter/devices/counterX/countY/function_available
What: /sys/bus/counter/devices/counterX/countY/signalZ_action_available
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Discrete set of available values for the respective Count Y
configuration are listed in this file. Values are delimited by
newline characters.
What: /sys/bus/counter/devices/counterX/countY/direction
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates the count direction of Count
Y. Two count directions are available: forward and backward.
Some counter devices are able to determine the direction of
their counting. For example, quadrature encoding counters can
determine the direction of movement by evaluating the leading
phase of the respective A and B quadrature encoding signals.
This attribute exposes such count directions.
What: /sys/bus/counter/devices/counterX/countY/enable
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Whether channel Y counter is enabled. Valid attribute values are
boolean.
This attribute is intended to serve as a pause/unpause mechanism
for Count Y. Suppose a counter device is used to count the total
movement of a conveyor belt: this attribute allows an operator
to temporarily pause the counter, service the conveyor belt,
and then finally unpause the counter to continue where it had
left off.
What: /sys/bus/counter/devices/counterX/countY/error_noise
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates whether excessive noise is
present at the channel Y counter inputs.
What: /sys/bus/counter/devices/counterX/countY/function
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Count function mode of Count Y; count function evaluation is
triggered by conditions specified by the Count Y signalZ_action
attributes. The following count functions are available:
increase:
Accumulated count is incremented.
decrease:
Accumulated count is decremented.
pulse-direction:
Rising edges on signal A updates the respective count.
The input level of signal B determines direction.
quadrature x1 a:
If direction is forward, rising edges on quadrature pair
signal A updates the respective count; if the direction
is backward, falling edges on quadrature pair signal A
updates the respective count. Quadrature encoding
determines the direction.
quadrature x1 b:
If direction is forward, rising edges on quadrature pair
signal B updates the respective count; if the direction
is backward, falling edges on quadrature pair signal B
updates the respective count. Quadrature encoding
determines the direction.
quadrature x2 a:
Any state transition on quadrature pair signal A updates
the respective count. Quadrature encoding determines the
direction.
quadrature x2 b:
Any state transition on quadrature pair signal B updates
the respective count. Quadrature encoding determines the
direction.
quadrature x4:
Any state transition on either quadrature pair signals
updates the respective count. Quadrature encoding
determines the direction.
What: /sys/bus/counter/devices/counterX/countY/name
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates the device-specific name of
Count Y. If possible, this should match the name of the
respective channel as it appears in the device datasheet.
What: /sys/bus/counter/devices/counterX/countY/preset
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
If the counter device supports preset registers -- registers
used to load counter channels to a set count upon device-defined
preset operation trigger events -- the preset count for channel
Y is provided by this attribute.
What: /sys/bus/counter/devices/counterX/countY/preset_enable
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Whether channel Y counter preset operation is enabled. Valid
attribute values are boolean.
What: /sys/bus/counter/devices/counterX/countY/signalZ_action
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Action mode of Count Y for Signal Z. This attribute indicates
the condition of Signal Z that triggers the count function
evaluation for Count Y. The following action modes are
available:
none:
Signal does not trigger the count function. In
Pulse-Direction count function mode, this Signal is
evaluated as Direction.
rising edge:
Low state transitions to high state.
falling edge:
High state transitions to low state.
both edges:
Any state transition.
What: /sys/bus/counter/devices/counterX/name
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates the device-specific name of
the Counter. This should match the name of the device as it
appears in its respective datasheet.
What: /sys/bus/counter/devices/counterX/num_counts
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates the total number of Counts
belonging to the Counter.
What: /sys/bus/counter/devices/counterX/num_signals
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates the total number of Signals
belonging to the Counter.
What: /sys/bus/counter/devices/counterX/signalY/signal
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Signal data of Signal Y represented as a string.
What: /sys/bus/counter/devices/counterX/signalY/name
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates the device-specific name of
Signal Y. If possible, this should match the name of the
respective signal as it appears in the device datasheet.

View File

@ -0,0 +1,36 @@
What: /sys/bus/counter/devices/counterX/signalY/index_polarity
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Active level of index input Signal Y; irrelevant in
non-synchronous load mode.
What: /sys/bus/counter/devices/counterX/signalY/index_polarity_available
What: /sys/bus/counter/devices/counterX/signalY/synchronous_mode_available
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Discrete set of available values for the respective Signal Y
configuration are listed in this file.
What: /sys/bus/counter/devices/counterX/signalY/synchronous_mode
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Configure the counter associated with Signal Y for
non-synchronous or synchronous load mode. Synchronous load mode
cannot be selected in non-quadrature (Pulse-Direction) clock
mode.
non-synchronous:
A logic low level is the active level at this index
input. The index function (as enabled via preset_enable)
is performed directly on the active level of the index
input.
synchronous:
Intended for interfacing with encoder Index output in
quadrature clock mode. The active level is configured
via index_polarity. The index function (as enabled via
preset_enable) is performed synchronously with the
quadrature clock on the active level of the index input.

View File

@ -0,0 +1,16 @@
What: /sys/bus/counter/devices/counterX/countY/prescaler_available
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Discrete set of available values for the respective Count Y
configuration are listed in this file. Values are delimited by
newline characters.
What: /sys/bus/counter/devices/counterX/countY/prescaler
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
Description:
Configure the prescaler value associated with Count Y.
On the FlexTimer, the counter clock source passes through a
prescaler (i.e. a counter). This acts like a clock
divider.

View File

@ -0,0 +1,20 @@
What: /sys/bus/i2c/.../idle_state
Date: January 2019
KernelVersion: 5.2
Contact: Robert Shearman <robert.shearman@att.com>
Description:
Value that exists only for mux devices that can be
written to control the behaviour of the multiplexer on
idle. Possible values:
-2 - disconnect on idle, i.e. deselect the last used
channel, which is useful when there is a device
with an address that conflicts with another
device on another mux on the same parent bus.
-1 - leave the mux as-is, which is the most optimal
setting in terms of I2C operations and is the
default mode.
0..<nchans> - set the mux to a predetermined channel,
which is useful if there is one channel that is
used almost always, and you want to reduce the
latency for normal operations after rare
transactions on other channels

View File

@ -1656,6 +1656,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_countY_raw
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Raw counter device counts from channel Y. For quadrature
counters, multiplication by an available [Y]_scale results in
the counts of a single quadrature signal phase from channel Y.
@ -1664,6 +1666,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_indexY_raw
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Raw counter device index value from channel Y. This attribute
provides an absolute positional reference (e.g. a pulse once per
revolution) which may be used to home positional systems as
@ -1673,6 +1677,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_count_count_direction_available
KernelVersion: 4.12
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
A list of possible counting directions which are:
- "up" : counter device is increasing.
- "down": counter device is decreasing.
@ -1681,6 +1687,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_direction
KernelVersion: 4.12
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Raw counter device counters direction for channel Y.
What: /sys/bus/iio/devices/iio:deviceX/in_phaseY_raw

View File

@ -6,6 +6,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_index_synchronous_mode_available
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Discrete set of available values for the respective counter
configuration are listed in this file.
@ -13,6 +15,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_countY_count_mode
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Count mode for channel Y. Four count modes are available:
normal, range limit, non-recycle, and modulo-n. The preset value
for channel Y is used by the count mode where required.
@ -47,6 +51,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_countY_noise_error
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Read-only attribute that indicates whether excessive noise is
present at the channel Y count inputs in quadrature clock mode;
irrelevant in non-quadrature clock mode.
@ -55,6 +61,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_countY_preset
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
If the counter device supports preset registers, the preset
count for channel Y is provided by this attribute.
@ -62,6 +70,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_countY_quadrature_mode
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Configure channel Y counter for non-quadrature or quadrature
clock mode. Selecting non-quadrature clock mode will disable
synchronous load mode. In quadrature clock mode, the channel Y
@ -83,6 +93,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_countY_set_to_preset_on_index
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Whether to set channel Y counter with channel Y preset value
when channel Y index input is active, or continuously count.
Valid attribute values are boolean.
@ -91,6 +103,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_indexY_index_polarity
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Active level of channel Y index input; irrelevant in
non-synchronous load mode.
@ -98,6 +112,8 @@ What: /sys/bus/iio/devices/iio:deviceX/in_indexY_synchronous_mode
KernelVersion: 4.10
Contact: linux-iio@vger.kernel.org
Description:
This interface is deprecated; please use the Counter subsystem.
Configure channel Y counter for non-synchronous or synchronous
load mode. Synchronous load mode cannot be selected in
non-quadrature clock mode.

View File

@ -0,0 +1,35 @@
What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_frequency_start
Date: March 2019
KernelVersion: 3.1.0
Contact: linux-iio@vger.kernel.org
Description:
Frequency sweep start frequency in Hz.
What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_frequency_increment
Date: March 2019
KernelVersion: 3.1.0
Contact: linux-iio@vger.kernel.org
Description:
Frequency increment in Hz (step size) between consecutive
frequency points along the sweep.
What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_frequency_points
Date: March 2019
KernelVersion: 3.1.0
Contact: linux-iio@vger.kernel.org
Description:
Number of frequency points (steps) in the frequency sweep.
This value, in conjunction with the
out_altvoltageY_frequency_start and the
out_altvoltageY_frequency_increment, determines the frequency
sweep range for the sweep operation.
What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_settling_cycles
Date: March 2019
KernelVersion: 3.1.0
Contact: linux-iio@vger.kernel.org
Description:
Number of output excitation cycles (settling time cycles)
that are allowed to pass through the unknown impedance,
after each frequency increment, and before the ADC is triggered
to perform a conversion sequence of the response signal.

View File

@ -1,6 +1,6 @@
What: /sys/bus/iio/devices/iio:deviceX/start_cleaning
Date: December 2018
KernelVersion: 4.22
KernelVersion: 5.0
Contact: linux-iio@vger.kernel.org
Description:
Writing 1 starts sensor self cleaning. Internal fan accelerates

View File

@ -0,0 +1,24 @@
What: /sys/bus/iio/devices/iio:deviceX/fault_oc
KernelVersion: 5.1
Contact: linux-iio@vger.kernel.org
Description:
Open-circuit fault. The detection of open-circuit faults,
such as those caused by broken thermocouple wires.
Reading returns either '1' or '0'.
'1' = An open circuit such as broken thermocouple wires
has been detected.
'0' = No open circuit or broken thermocouple wires are detected
What: /sys/bus/iio/devices/iio:deviceX/fault_ovuv
KernelVersion: 5.1
Contact: linux-iio@vger.kernel.org
Description:
Overvoltage or Undervoltage Input Fault. The internal circuitry
is protected from excessive voltages applied to the thermocouple
cables by integrated MOSFETs at the T+ and T- inputs, and the
BIAS output. These MOSFETs turn off when the input voltage is
negative or greater than VDD.
Reading returns either '1' or '0'.
'1' = The input voltage is negative or greater than VDD.
'0' = The input voltage is positive and less than VDD (normal
state).

View File

@ -30,4 +30,12 @@ Description: (RW) Configure MSC buffer size for "single" or "multi" modes.
there are no active users and tracing is not enabled) and then
allocates a new one.
What: /sys/bus/intel_th/devices/<intel_th_id>-msc<msc-id>/win_switch
Date: May 2019
KernelVersion: 5.2
Contact: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Description: (RW) Trigger window switch for the MSC's buffer, in
multi-window mode. In "multi" mode, accepts writes of "1", thereby
triggering a window switch for the buffer. Returns an error in any
other operating mode or attempts to write something other than "1".

View File

@ -1,29 +0,0 @@
What: /sys/bus/mdio_bus/devices/.../phy_id
Date: November 2012
KernelVersion: 3.8
Contact: netdev@vger.kernel.org
Description:
This attribute contains the 32-bit PHY Identifier as reported
by the device during bus enumeration, encoded in hexadecimal.
This ID is used to match the device with the appropriate
driver.
What: /sys/bus/mdio_bus/devices/.../phy_interface
Date: February 2014
KernelVersion: 3.15
Contact: netdev@vger.kernel.org
Description:
This attribute contains the PHY interface as configured by the
Ethernet driver during bus enumeration, encoded in string.
This interface mode is used to configure the Ethernet MAC with the
appropriate mode for its data lines to the PHY hardware.
What: /sys/bus/mdio_bus/devices/.../phy_has_fixups
Date: February 2014
KernelVersion: 3.15
Contact: netdev@vger.kernel.org
Description:
This attribute contains the boolean value whether a given PHY
device has had any "fixup" workaround running on it, encoded as
a boolean. This information is provided to help troubleshooting
PHY configurations.

View File

@ -1,6 +1,6 @@
What: /sys/bus/siox/devices/siox-X/active
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
On reading represents the current state of the bus. If it
contains a "0" the bus is stopped and connected devices are
@ -12,7 +12,7 @@ Description:
What: /sys/bus/siox/devices/siox-X/device_add
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Write-only file. Write
@ -27,13 +27,13 @@ Description:
What: /sys/bus/siox/devices/siox-X/device_remove
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Write-only file. A single write removes the last device in the siox chain.
What: /sys/bus/siox/devices/siox-X/poll_interval_ns
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Defines the interval between two poll cycles in nano seconds.
Note this is rounded to jiffies on writing. On reading the current value
@ -41,33 +41,33 @@ Description:
What: /sys/bus/siox/devices/siox-X-Y/connected
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value. "0" means the Yth device on siox bus X isn't "connected" i.e.
communication with it is not ensured. "1" signals a working connection.
What: /sys/bus/siox/devices/siox-X-Y/inbytes
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the inbytes value provided to siox-X/device_add
What: /sys/bus/siox/devices/siox-X-Y/status_errors
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Counts the number of time intervals when the read status byte doesn't yield the
expected value.
What: /sys/bus/siox/devices/siox-X-Y/type
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the type value provided to siox-X/device_add.
What: /sys/bus/siox/devices/siox-X-Y/watchdog
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting if the watchdog of the siox device is
active. "0" means the watchdog is not active and the device is expected to
@ -75,13 +75,13 @@ Description:
What: /sys/bus/siox/devices/siox-X-Y/watchdog_errors
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the number to time intervals when the
watchdog was active.
What: /sys/bus/siox/devices/siox-X-Y/outbytes
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the outbytes value provided to siox-X/device_add.

View File

@ -65,3 +65,18 @@ Description: Display the ME firmware version.
<platform>:<major>.<minor>.<milestone>.<build_no>.
There can be up to three such blocks for different
FW components.
What: /sys/class/mei/meiN/dev_state
Date: Mar 2019
KernelVersion: 5.1
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Display the ME device state.
The device state can have following values:
INITIALIZING
INIT_CLIENTS
ENABLED
RESETTING
DISABLED
POWER_DOWN
POWER_UP

View File

@ -1,30 +0,0 @@
What: /sys/class/net/<iface>/batman-adv/elp_interval
Date: Feb 2014
Contact: Linus Lüssing <linus.luessing@web.de>
Description:
Defines the interval in milliseconds in which batman
emits probing packets for neighbor sensing (ELP).
What: /sys/class/net/<iface>/batman-adv/iface_status
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Indicates the status of <iface> as it is seen by batman.
What: /sys/class/net/<iface>/batman-adv/mesh_iface
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
The /sys/class/net/<iface>/batman-adv/mesh_iface file
displays the batman mesh interface this <iface>
currently is associated with.
What: /sys/class/net/<iface>/batman-adv/throughput_override
Date: Feb 2014
Contact: Antonio Quartulli <a@unstable.cc>
description:
Defines the throughput value to be used by B.A.T.M.A.N. V
when estimating the link throughput using this interface.
If the value is set to 0 then batman-adv will try to
estimate the throughput by itself.

View File

@ -1,108 +0,0 @@
What: /sys/class/net/<mesh_iface>/mesh/aggregated_ogms
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Indicates whether the batman protocol messages of the
mesh <mesh_iface> shall be aggregated or not.
What: /sys/class/net/<mesh_iface>/mesh/<vlan_subdir>/ap_isolation
Date: May 2011
Contact: Antonio Quartulli <a@unstable.cc>
Description:
Indicates whether the data traffic going from a
wireless client to another wireless client will be
silently dropped. <vlan_subdir> is empty when referring
to the untagged lan.
What: /sys/class/net/<mesh_iface>/mesh/bonding
Date: June 2010
Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the data traffic going through the
mesh will be sent using multiple interfaces at the
same time (if available).
What: /sys/class/net/<mesh_iface>/mesh/bridge_loop_avoidance
Date: November 2011
Contact: Simon Wunderlich <sw@simonwunderlich.de>
Description:
Indicates whether the bridge loop avoidance feature
is enabled. This feature detects and avoids loops
between the mesh and devices bridged with the soft
interface <mesh_iface>.
What: /sys/class/net/<mesh_iface>/mesh/fragmentation
Date: October 2010
Contact: Andreas Langer <an.langer@gmx.de>
Description:
Indicates whether the data traffic going through the
mesh will be fragmented or silently discarded if the
packet size exceeds the outgoing interface MTU.
What: /sys/class/net/<mesh_iface>/mesh/gw_bandwidth
Date: October 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the bandwidth which is propagated by this
node if gw_mode was set to 'server'.
What: /sys/class/net/<mesh_iface>/mesh/gw_mode
Date: October 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the state of the gateway features. Can be
either 'off', 'client' or 'server'.
What: /sys/class/net/<mesh_iface>/mesh/gw_sel_class
Date: October 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the selection criteria this node will use
to choose a gateway if gw_mode was set to 'client'.
What: /sys/class/net/<mesh_iface>/mesh/hop_penalty
Date: Oct 2010
Contact: Linus Lüssing <linus.luessing@web.de>
Description:
Defines the penalty which will be applied to an
originator message's tq-field on every hop.
What: /sys/class/net/<mesh_iface>/mesh/isolation_mark
Date: Nov 2013
Contact: Antonio Quartulli <a@unstable.cc>
Description:
Defines the isolation mark (and its bitmask) which
is used to classify clients as "isolated" by the
Extended Isolation feature.
What: /sys/class/net/<mesh_iface>/mesh/multicast_mode
Date: Feb 2014
Contact: Linus Lüssing <linus.luessing@web.de>
Description:
Indicates whether multicast optimizations are enabled
or disabled. If set to zero then all nodes in the
mesh are going to use classic flooding for any
multicast packet with no optimizations.
What: /sys/class/net/<mesh_iface>/mesh/network_coding
Date: Nov 2012
Contact: Martin Hundeboll <martin@hundeboll.net>
Description:
Controls whether Network Coding (using some magic
to send fewer wifi packets but still the same
content) is enabled or not.
What: /sys/class/net/<mesh_iface>/mesh/orig_interval
Date: May 2010
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the interval in milliseconds in which batman
sends its protocol messages.
What: /sys/class/net/<mesh_iface>/mesh/routing_algo
Date: Dec 2011
Contact: Marek Lindner <mareklindner@neomailbox.ch>
Description:
Defines the routing procotol this mesh instance
uses to find the optimal paths through the mesh.

View File

@ -11,24 +11,31 @@ Date: February 2014
KernelVersion: 3.15
Contact: netdev@vger.kernel.org
Description:
Boolean value indicating whether the PHY device has
any fixups registered against it (phy_register_fixup)
This attribute contains the boolean value whether a given PHY
device has had any "fixup" workaround running on it, encoded as
a boolean. This information is provided to help troubleshooting
PHY configurations.
What: /sys/class/mdio_bus/<bus>/<device>/phy_id
Date: November 2012
KernelVersion: 3.8
Contact: netdev@vger.kernel.org
Description:
32-bit hexadecimal value corresponding to the PHY device's OUI,
model and revision number.
This attribute contains the 32-bit PHY Identifier as reported
by the device during bus enumeration, encoded in hexadecimal.
This ID is used to match the device with the appropriate
driver.
What: /sys/class/mdio_bus/<bus>/<device>/phy_interface
Date: February 2014
KernelVersion: 3.15
Contact: netdev@vger.kernel.org
Description:
String value indicating the PHY interface, possible
values are:.
This attribute contains the PHY interface as configured by the
Ethernet driver during bus enumeration, encoded in string.
This interface mode is used to configure the Ethernet MAC with the
appropriate mode for its data lines to the PHY hardware.
Possible values are:
<empty> (not available), mii, gmii, sgmii, tbi, rev-mii,
rmii, rgmii, rgmii-id, rgmii-rxid, rgmii-txid, rtbi, smii
xgmii, moca, qsgmii, trgmii, 1000base-x, 2500base-x, rxaui,

View File

@ -29,7 +29,7 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
Write a number ranging from 1 to 127 to add a qmap mux
Write a number ranging from 1 to 254 to add a qmap mux
based network device, supported by recent Qualcomm based
modems.
@ -46,5 +46,5 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
Write a number ranging from 1 to 127 to delete a previously
Write a number ranging from 1 to 254 to delete a previously
created qmap mux based network device.

View File

@ -114,15 +114,60 @@ Description:
Access: Read
Valid values: Represented in microamps
What: /sys/class/power_supply/<supply_name>/charge_control_limit
Date: Oct 2012
Contact: linux-pm@vger.kernel.org
Description:
Maximum allowable charging current. Used for charge rate
throttling for thermal cooling or improving battery health.
Access: Read, Write
Valid values: Represented in microamps
What: /sys/class/power_supply/<supply_name>/charge_control_limit_max
Date: Oct 2012
Contact: linux-pm@vger.kernel.org
Description:
Maximum legal value for the charge_control_limit property.
Access: Read
Valid values: Represented in microamps
What: /sys/class/power_supply/<supply_name>/charge_control_start_threshold
Date: April 2019
Contact: linux-pm@vger.kernel.org
Description:
Represents a battery percentage level, below which charging will
begin.
Access: Read, Write
Valid values: 0 - 100 (percent)
What: /sys/class/power_supply/<supply_name>/charge_control_end_threshold
Date: April 2019
Contact: linux-pm@vger.kernel.org
Description:
Represents a battery percentage level, above which charging will
stop.
Access: Read, Write
Valid values: 0 - 100 (percent)
What: /sys/class/power_supply/<supply_name>/charge_type
Date: July 2009
Contact: linux-pm@vger.kernel.org
Description:
Represents the type of charging currently being applied to the
battery.
battery. "Trickle", "Fast", and "Standard" all mean different
charging speeds. "Adaptive" means that the charger uses some
algorithm to adjust the charge rate dynamically, without
any user configuration required. "Custom" means that the charger
uses the charge_control_* properties as configuration for some
different algorithm.
Access: Read
Valid values: "Unknown", "N/A", "Trickle", "Fast"
Access: Read, Write
Valid values: "Unknown", "N/A", "Trickle", "Fast", "Standard",
"Adaptive", "Custom"
What: /sys/class/power_supply/<supply_name>/charge_term_current
Date: July 2014

View File

@ -212,7 +212,7 @@ Description:
Messages may be broken into parts if
they are long.
receieved_messages: (RO) Number of message responses
received_messages: (RO) Number of message responses
received.
received_message_parts: (RO) Number of message fragments

View File

@ -484,6 +484,7 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/spectre_v2
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/l1tf
/sys/devices/system/cpu/vulnerabilities/mds
Date: January 2018
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Information about CPU vulnerabilities
@ -496,8 +497,7 @@ Description: Information about CPU vulnerabilities
"Vulnerable" CPU is affected and no mitigation in effect
"Mitigation: $M" CPU is affected and mitigation $M is in effect
Details about the l1tf file can be found in
Documentation/admin-guide/l1tf.rst
See also: Documentation/admin-guide/hw-vuln/index.rst
What: /sys/devices/system/cpu/smt
/sys/devices/system/cpu/smt/active

View File

@ -0,0 +1,6 @@
What: /sys/bus/i2c/drivers/ucsi_ccg/.../do_flash
Date: May 2019
Contact: Ajay Gupta <ajayg@nvidia.com>
Description:
Tell the driver for Cypress CCGx Type-C controller to attempt
firmware upgrade by writing [Yy1] to the file.

View File

@ -45,7 +45,7 @@ Description:
use this feature without a clearance from a patch
distributor. Removal (rmmod) of patch modules is permanently
disabled when the feature is used. See
Documentation/livepatch/livepatch.txt for more information.
Documentation/livepatch/livepatch.rst for more information.
What: /sys/kernel/livepatch/<patch>/<object>
Date: Nov 2014

View File

@ -0,0 +1,27 @@
What: Raise a uevent when a USB Host Controller has died
Date: 2019-04-17
KernelVersion: 5.2
Contact: linux-usb@vger.kernel.org
Description: When the USB Host Controller has entered a state where it is no
longer functional a uevent will be raised. The uevent will
contain ACTION=offline and ERROR=DEAD.
Here is an example taken using udevadm monitor -p:
KERNEL[130.428945] offline /devices/pci0000:00/0000:00:10.0/usb2 (usb)
ACTION=offline
BUSNUM=002
DEVNAME=/dev/bus/usb/002/001
DEVNUM=001
DEVPATH=/devices/pci0000:00/0000:00:10.0/usb2
DEVTYPE=usb_device
DRIVER=usb
ERROR=DEAD
MAJOR=189
MINOR=128
PRODUCT=1d6b/2/414
SEQNUM=2168
SUBSYSTEM=usb
TYPE=9/0/1
Users: chromium-os-dev@chromium.org

View File

@ -147,7 +147,7 @@ networking subsystems make sure that the buffers they use are valid
for you to DMA from/to.
DMA addressing capabilities
==========================
===========================
By default, the kernel assumes that your device can address 32-bits of DMA
addressing. For a 64-bit capable device, this needs to be increased, and for
@ -365,13 +365,12 @@ __get_free_pages() (but takes size instead of a page order). If your
driver needs regions sized smaller than a page, you may prefer using
the dma_pool interface, described below.
The consistent DMA mapping interfaces, for non-NULL dev, will by
default return a DMA address which is 32-bit addressable. Even if the
device indicates (via DMA mask) that it may address the upper 32-bits,
consistent allocation will only return > 32-bit addresses for DMA if
the consistent DMA mask has been explicitly changed via
dma_set_coherent_mask(). This is true of the dma_pool interface as
well.
The consistent DMA mapping interfaces, will by default return a DMA address
which is 32-bit addressable. Even if the device indicates (via the DMA mask)
that it may address the upper 32-bits, consistent allocation will only
return > 32-bit addresses for DMA if the consistent DMA mask has been
explicitly changed via dma_set_coherent_mask(). This is true of the
dma_pool interface as well.
dma_alloc_coherent() returns two values: the virtual address which you
can use to access it from the CPU and dma_handle which you pass to the

View File

@ -28,8 +28,13 @@ ifeq ($(HAVE_SPHINX),0)
else # HAVE_SPHINX
# User-friendly check for pdflatex
# User-friendly check for pdflatex and latexmk
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
ifeq ($(HAVE_LATEXMK),1)
PDFLATEX := latexmk -$(PDFLATEX)
endif #HAVE_LATEXMK
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
@ -82,7 +87,7 @@ pdfdocs:
else # HAVE_PDFLATEX
pdfdocs: latexdocs
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
endif # HAVE_PDFLATEX

View File

@ -63,6 +63,110 @@ as well as medium and long term trends. The total absolute stall time
spikes which wouldn't necessarily make a dent in the time averages,
or to average trends over custom time frames.
Monitoring for pressure thresholds
==================================
Users can register triggers and use poll() to be woken up when resource
pressure exceeds certain thresholds.
A trigger describes the maximum cumulative stall time over a specific
time window, e.g. 100ms of total stall time within any 500ms window to
generate a wakeup event.
To register a trigger user has to open psi interface file under
/proc/pressure/ representing the resource to be monitored and write the
desired threshold and time window. The open file descriptor should be
used to wait for trigger events using select(), poll() or epoll().
The following format is used:
<some|full> <stall amount in us> <time window in us>
For example writing "some 150000 1000000" into /proc/pressure/memory
would add 150ms threshold for partial memory stall measured within
1sec time window. Writing "full 50000 1000000" into /proc/pressure/io
would add 50ms threshold for full io stall measured within 1sec time window.
Triggers can be set on more than one psi metric and more than one trigger
for the same psi metric can be specified. However for each trigger a separate
file descriptor is required to be able to poll it separately from others,
therefore for each trigger a separate open() syscall should be made even
when opening the same psi interface file.
Monitors activate only when system enters stall state for the monitored
psi metric and deactivates upon exit from the stall state. While system is
in the stall state psi signal growth is monitored at a rate of 10 times per
tracking window.
The kernel accepts window sizes ranging from 500ms to 10s, therefore min
monitoring update interval is 50ms and max is 1s. Min limit is set to
prevent overly frequent polling. Max limit is chosen as a high enough number
after which monitors are most likely not needed and psi averages can be used
instead.
When activated, psi monitor stays active for at least the duration of one
tracking window to avoid repeated activations/deactivations when system is
bouncing in and out of the stall state.
Notifications to the userspace are rate-limited to one per tracking window.
The trigger will de-register when the file descriptor used to define the
trigger is closed.
Userspace monitor usage example
===============================
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>
/*
* Monitor memory partial stall with 1s tracking window size
* and 150ms threshold.
*/
int main() {
const char trig[] = "some 150000 1000000";
struct pollfd fds;
int n;
fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
if (fds.fd < 0) {
printf("/proc/pressure/memory open error: %s\n",
strerror(errno));
return 1;
}
fds.events = POLLPRI;
if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
printf("/proc/pressure/memory write error: %s\n",
strerror(errno));
return 1;
}
printf("waiting for events...\n");
while (1) {
n = poll(&fds, 1, -1);
if (n < 0) {
printf("poll error: %s\n", strerror(errno));
return 1;
}
if (fds.revents & POLLERR) {
printf("got POLLERR, event source is gone\n");
return 0;
}
if (fds.revents & POLLPRI) {
printf("event triggered!\n");
} else {
printf("unknown event received: 0x%x\n", fds.revents);
return 1;
}
}
return 0;
}
Cgroup2 interface
=================
@ -71,3 +175,6 @@ mounted, pressure stall information is also tracked for tasks grouped
into cgroups. Each subdirectory in the cgroupfs mountpoint contains
cpu.pressure, memory.pressure, and io.pressure files; the format is
the same as the /proc/pressure/ files.
Per-cgroup psi monitors can be specified and used the same way as
system-wide ones.

View File

@ -0,0 +1,99 @@
Describing and referring to LEDs in ACPI
Individual LEDs are described by hierarchical data extension [6] nodes under the
device node, the LED driver chip. The "reg" property in the LED specific nodes
tells the numerical ID of each individual LED output to which the LEDs are
connected. [3] The hierarchical data nodes are named "led@X", where X is the
number of the LED output.
Referring to LEDs in Device tree is documented in [4], in "flash-leds" property
documentation. In short, LEDs are directly referred to by using phandles.
While Device tree allows referring to any node in the tree[1], in ACPI
references are limited to device nodes only [2]. For this reason using the same
mechanism on ACPI is not possible. A mechanism to refer to non-device ACPI nodes
is documented in [7].
ACPI allows (as does DT) using integer arguments after the reference. A
combination of the LED driver device reference and an integer argument,
referring to the "reg" property of the relevant LED, is used to identify
individual LEDs. The value of the "reg" property is a contract between the
firmware and software, it uniquely identifies the LED driver outputs.
Under the LED driver device, The first hierarchical data extension package list
entry shall contain the string "led@" followed by the number of the LED,
followed by the referred object name. That object shall be named "LED" followed
by the number of the LED.
An ASL example of a camera sensor device and a LED driver device for two LEDs.
Objects not relevant for LEDs or the references to them have been omitted.
Device (LED)
{
Name (_DSD, Package () {
ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
Package () {
Package () { "led@0", LED0 },
Package () { "led@1", LED1 },
}
})
Name (LED0, Package () {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () { "reg", 0 },
Package () { "flash-max-microamp", 1000000 },
Package () { "flash-timeout-us", 200000 },
Package () { "led-max-microamp", 100000 },
Package () { "label", "white:flash" },
}
})
Name (LED1, Package () {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () { "reg", 1 },
Package () { "led-max-microamp", 10000 },
Package () { "label", "red:indicator" },
}
})
}
Device (SEN)
{
Name (_DSD, Package () {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () {
"flash-leds",
Package () { ^LED, "led@0", ^LED, "led@1" },
}
}
})
}
where
LED LED driver device
LED0 First LED
LED1 Second LED
SEN Camera sensor device (or another device the LED is
related to)
[1] Device tree. <URL:http://www.devicetree.org>, referenced 2019-02-21.
[2] Advanced Configuration and Power Interface Specification.
<URL:https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf>,
referenced 2019-02-21.
[3] Documentation/devicetree/bindings/leds/common.txt
[4] Documentation/devicetree/bindings/media/video-interfaces.txt
[5] Device Properties UUID For _DSD.
<URL:http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf>,
referenced 2019-02-21.
[6] Hierarchical Data Extension UUID For _DSD.
<URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
referenced 2019-02-21.
[7] Documentation/acpi/dsd/data-node-reference.txt

View File

@ -177,6 +177,15 @@ cgroup v2 currently supports the following mount options.
ignored on non-init namespace mounts. Please refer to the
Delegation section for details.
memory_localevents
Only populate memory.events with data for the current cgroup,
and not any subtrees. This is legacy behaviour, the default
behaviour without this option is to include subtree counts.
This option is system wide and can only be set on mount or
modified through remount from the init namespace. The mount
option is ignored on non-init namespace mounts.
Organizing Processes and Threads
--------------------------------
@ -864,6 +873,8 @@ All cgroup core files are prefixed with "cgroup."
populated
1 if the cgroup or its descendants contains any live
processes; otherwise, 0.
frozen
1 if the cgroup is frozen; otherwise, 0.
cgroup.max.descendants
A read-write single value files. The default is "max".
@ -897,6 +908,31 @@ All cgroup core files are prefixed with "cgroup."
A dying cgroup can consume system resources not exceeding
limits, which were active at the moment of cgroup deletion.
cgroup.freeze
A read-write single value file which exists on non-root cgroups.
Allowed values are "0" and "1". The default is "0".
Writing "1" to the file causes freezing of the cgroup and all
descendant cgroups. This means that all belonging processes will
be stopped and will not run until the cgroup will be explicitly
unfrozen. Freezing of the cgroup may take some time; when this action
is completed, the "frozen" value in the cgroup.events control file
will be updated to "1" and the corresponding notification will be
issued.
A cgroup can be frozen either by its own settings, or by settings
of any ancestor cgroups. If any of ancestor cgroups is frozen, the
cgroup will remain frozen.
Processes in the frozen cgroup can be killed by a fatal signal.
They also can enter and leave a frozen cgroup: either by an explicit
move by a user, or if freezing of the cgroup races with fork().
If a process is moved to a frozen cgroup, it stops. If a process is
moved out of a frozen cgroup, it becomes running.
Frozen status of a cgroup doesn't affect any cgroup tree operations:
it's possible to delete a frozen (and empty) cgroup, as well as
create new sub-cgroups.
Controllers
===========

View File

@ -91,10 +91,48 @@ Currently Available
* large block (up to pagesize) support
* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
the ordering)
* Case-insensitive file name lookups
[1] Filesystems with a block size of 1k may see a limit imposed by the
directory hash tree having a maximum depth of two.
case-insensitive file name lookups
======================================================
The case-insensitive file name lookup feature is supported on a
per-directory basis, allowing the user to mix case-insensitive and
case-sensitive directories in the same filesystem. It is enabled by
flipping the +F inode attribute of an empty directory. The
case-insensitive string match operation is only defined when we know how
text in encoded in a byte sequence. For that reason, in order to enable
case-insensitive directories, the filesystem must have the
casefold feature, which stores the filesystem-wide encoding
model used. By default, the charset adopted is the latest version of
Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
form. The comparison algorithm is implemented by normalizing the
strings to the Canonical decomposition form, as defined by Unicode,
followed by a byte per byte comparison.
The case-awareness is name-preserving on the disk, meaning that the file
name provided by userspace is a byte-per-byte match to what is actually
written in the disk. The Unicode normalization format used by the
kernel is thus an internal representation, and not exposed to the
userspace nor to the disk, with the important exception of disk hashes,
used on large case-insensitive directories with DX feature. On DX
directories, the hash must be calculated using the casefolded version of
the filename, meaning that the normalization format used actually has an
impact on where the directory entry is stored.
When we change from viewing filenames as opaque byte sequences to seeing
them as encoded strings we need to address what happens when a program
tries to create a file with an invalid name. The Unicode subsystem
within the kernel leaves the decision of what to do in this case to the
filesystem, which select its preferred behavior by enabling/disabling
the strict mode. When Ext4 encounters one of those strings and the
filesystem did not require strict mode, it falls back to considering the
entire string as an opaque byte sequence, which still allows the user to
operate on that file, but the case-insensitive lookups won't work.
Options
=======

View File

@ -0,0 +1,13 @@
========================
Hardware vulnerabilities
========================
This section describes CPU vulnerabilities and provides an overview of the
possible mitigations along with guidance for selecting mitigations if they
are configurable at compile, boot or run time.
.. toctree::
:maxdepth: 1
l1tf
mds

View File

@ -0,0 +1,615 @@
L1TF - L1 Terminal Fault
========================
L1 Terminal Fault is a hardware vulnerability which allows unprivileged
speculative access to data which is available in the Level 1 Data Cache
when the page table entry controlling the virtual address, which is used
for the access, has the Present bit cleared or other reserved bits set.
Affected processors
-------------------
This vulnerability affects a wide range of Intel processors. The
vulnerability is not present on:
- Processors from AMD, Centaur and other non Intel vendors
- Older processor models, where the CPU family is < 6
- A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
Penwell, Pineview, Silvermont, Airmont, Merrifield)
- The Intel XEON PHI family
- Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
by the Meltdown vulnerability either. These CPUs should become
available by end of 2018.
Whether a processor is affected or not can be read out from the L1TF
vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
Related CVEs
------------
The following CVE entries are related to the L1TF vulnerability:
============= ================= ==============================
CVE-2018-3615 L1 Terminal Fault SGX related aspects
CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects
CVE-2018-3646 L1 Terminal Fault Virtualization related aspects
============= ================= ==============================
Problem
-------
If an instruction accesses a virtual address for which the relevant page
table entry (PTE) has the Present bit cleared or other reserved bits set,
then speculative execution ignores the invalid PTE and loads the referenced
data if it is present in the Level 1 Data Cache, as if the page referenced
by the address bits in the PTE was still present and accessible.
While this is a purely speculative mechanism and the instruction will raise
a page fault when it is retired eventually, the pure act of loading the
data and making it available to other speculative instructions opens up the
opportunity for side channel attacks to unprivileged malicious code,
similar to the Meltdown attack.
While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the attack
works across all protection domains. It allows an attack of SGX and also
works from inside virtual machines because the speculation bypasses the
extended page table (EPT) protection mechanism.
Attack scenarios
----------------
1. Malicious user space
^^^^^^^^^^^^^^^^^^^^^^^
Operating Systems store arbitrary information in the address bits of a
PTE which is marked non present. This allows a malicious user space
application to attack the physical memory to which these PTEs resolve.
In some cases user-space can maliciously influence the information
encoded in the address bits of the PTE, thus making attacks more
deterministic and more practical.
The Linux kernel contains a mitigation for this attack vector, PTE
inversion, which is permanently enabled and has no performance
impact. The kernel ensures that the address bits of PTEs, which are not
marked present, never point to cacheable physical memory space.
A system with an up to date kernel is protected against attacks from
malicious user space applications.
2. Malicious guest in a virtual machine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The fact that L1TF breaks all domain protections allows malicious guest
OSes, which can control the PTEs directly, and malicious guest user
space applications, which run on an unprotected guest kernel lacking the
PTE inversion mitigation for L1TF, to attack physical host memory.
A special aspect of L1TF in the context of virtualization is symmetric
multi threading (SMT). The Intel implementation of SMT is called
HyperThreading. The fact that Hyperthreads on the affected processors
share the L1 Data Cache (L1D) is important for this. As the flaw allows
only to attack data which is present in L1D, a malicious guest running
on one Hyperthread can attack the data which is brought into the L1D by
the context which runs on the sibling Hyperthread of the same physical
core. This context can be host OS, host user space or a different guest.
If the processor does not support Extended Page Tables, the attack is
only possible, when the hypervisor does not sanitize the content of the
effective (shadow) page tables.
While solutions exist to mitigate these attack vectors fully, these
mitigations are not enabled by default in the Linux kernel because they
can affect performance significantly. The kernel provides several
mechanisms which can be utilized to address the problem depending on the
deployment scenario. The mitigations, their protection scope and impact
are described in the next sections.
The default mitigations and the rationale for choosing them are explained
at the end of this document. See :ref:`default_mitigations`.
.. _l1tf_sys_info:
L1TF system information
-----------------------
The Linux kernel provides a sysfs interface to enumerate the current L1TF
status of the system: whether the system is vulnerable, and which
mitigations are active. The relevant sysfs file is:
/sys/devices/system/cpu/vulnerabilities/l1tf
The possible values in this file are:
=========================== ===============================
'Not affected' The processor is not vulnerable
'Mitigation: PTE Inversion' The host protection is active
=========================== ===============================
If KVM/VMX is enabled and the processor is vulnerable then the following
information is appended to the 'Mitigation: PTE Inversion' part:
- SMT status:
===================== ================
'VMX: SMT vulnerable' SMT is enabled
'VMX: SMT disabled' SMT is disabled
===================== ================
- L1D Flush mode:
================================ ====================================
'L1D vulnerable' L1D flushing is disabled
'L1D conditional cache flushes' L1D flush is conditionally enabled
'L1D cache flushes' L1D flush is unconditionally enabled
================================ ====================================
The resulting grade of protection is discussed in the following sections.
Host mitigation mechanism
-------------------------
The kernel is unconditionally protected against L1TF attacks from malicious
user space running on the host.
Guest mitigation mechanisms
---------------------------
.. _l1d_flush:
1. L1D flush on VMENTER
^^^^^^^^^^^^^^^^^^^^^^^
To make sure that a guest cannot attack data which is present in the L1D
the hypervisor flushes the L1D before entering the guest.
Flushing the L1D evicts not only the data which should not be accessed
by a potentially malicious guest, it also flushes the guest
data. Flushing the L1D has a performance impact as the processor has to
bring the flushed guest data back into the L1D. Depending on the
frequency of VMEXIT/VMENTER and the type of computations in the guest
performance degradation in the range of 1% to 50% has been observed. For
scenarios where guest VMEXIT/VMENTER are rare the performance impact is
minimal. Virtio and mechanisms like posted interrupts are designed to
confine the VMEXITs to a bare minimum, but specific configurations and
application scenarios might still suffer from a high VMEXIT rate.
The kernel provides two L1D flush modes:
- conditional ('cond')
- unconditional ('always')
The conditional mode avoids L1D flushing after VMEXITs which execute
only audited code paths before the corresponding VMENTER. These code
paths have been verified that they cannot expose secrets or other
interesting data to an attacker, but they can leak information about the
address space layout of the hypervisor.
Unconditional mode flushes L1D on all VMENTER invocations and provides
maximum protection. It has a higher overhead than the conditional
mode. The overhead cannot be quantified correctly as it depends on the
workload scenario and the resulting number of VMEXITs.
The general recommendation is to enable L1D flush on VMENTER. The kernel
defaults to conditional mode on affected processors.
**Note**, that L1D flush does not prevent the SMT problem because the
sibling thread will also bring back its data into the L1D which makes it
attackable again.
L1D flush can be controlled by the administrator via the kernel command
line and sysfs control files. See :ref:`mitigation_control_command_line`
and :ref:`mitigation_control_kvm`.
.. _guest_confinement:
2. Guest VCPU confinement to dedicated physical cores
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To address the SMT problem, it is possible to make a guest or a group of
guests affine to one or more physical cores. The proper mechanism for
that is to utilize exclusive cpusets to ensure that no other guest or
host tasks can run on these cores.
If only a single guest or related guests run on sibling SMT threads on
the same physical core then they can only attack their own memory and
restricted parts of the host memory.
Host memory is attackable, when one of the sibling SMT threads runs in
host OS (hypervisor) context and the other in guest context. The amount
of valuable information from the host OS context depends on the context
which the host OS executes, i.e. interrupts, soft interrupts and kernel
threads. The amount of valuable data from these contexts cannot be
declared as non-interesting for an attacker without deep inspection of
the code.
**Note**, that assigning guests to a fixed set of physical cores affects
the ability of the scheduler to do load balancing and might have
negative effects on CPU utilization depending on the hosting
scenario. Disabling SMT might be a viable alternative for particular
scenarios.
For further information about confining guests to a single or to a group
of cores consult the cpusets documentation:
https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
.. _interrupt_isolation:
3. Interrupt affinity
^^^^^^^^^^^^^^^^^^^^^
Interrupts can be made affine to logical CPUs. This is not universally
true because there are types of interrupts which are truly per CPU
interrupts, e.g. the local timer interrupt. Aside of that multi queue
devices affine their interrupts to single CPUs or groups of CPUs per
queue without allowing the administrator to control the affinities.
Moving the interrupts, which can be affinity controlled, away from CPUs
which run untrusted guests, reduces the attack vector space.
Whether the interrupts with are affine to CPUs, which run untrusted
guests, provide interesting data for an attacker depends on the system
configuration and the scenarios which run on the system. While for some
of the interrupts it can be assumed that they won't expose interesting
information beyond exposing hints about the host OS memory layout, there
is no way to make general assumptions.
Interrupt affinity can be controlled by the administrator via the
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
available at:
https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
.. _smt_control:
4. SMT control
^^^^^^^^^^^^^^
To prevent the SMT issues of L1TF it might be necessary to disable SMT
completely. Disabling SMT can have a significant performance impact, but
the impact depends on the hosting scenario and the type of workloads.
The impact of disabling SMT needs also to be weighted against the impact
of other mitigation solutions like confining guests to dedicated cores.
The kernel provides a sysfs interface to retrieve the status of SMT and
to control it. It also provides a kernel command line interface to
control SMT.
The kernel command line interface consists of the following options:
=========== ==========================================================
nosmt Affects the bring up of the secondary CPUs during boot. The
kernel tries to bring all present CPUs online during the
boot process. "nosmt" makes sure that from each physical
core only one - the so called primary (hyper) thread is
activated. Due to a design flaw of Intel processors related
to Machine Check Exceptions the non primary siblings have
to be brought up at least partially and are then shut down
again. "nosmt" can be undone via the sysfs interface.
nosmt=force Has the same effect as "nosmt" but it does not allow to
undo the SMT disable via the sysfs interface.
=========== ==========================================================
The sysfs interface provides two files:
- /sys/devices/system/cpu/smt/control
- /sys/devices/system/cpu/smt/active
/sys/devices/system/cpu/smt/control:
This file allows to read out the SMT control state and provides the
ability to disable or (re)enable SMT. The possible states are:
============== ===================================================
on SMT is supported by the CPU and enabled. All
logical CPUs can be onlined and offlined without
restrictions.
off SMT is supported by the CPU and disabled. Only
the so called primary SMT threads can be onlined
and offlined without restrictions. An attempt to
online a non-primary sibling is rejected
forceoff Same as 'off' but the state cannot be controlled.
Attempts to write to the control file are rejected.
notsupported The processor does not support SMT. It's therefore
not affected by the SMT implications of L1TF.
Attempts to write to the control file are rejected.
============== ===================================================
The possible states which can be written into this file to control SMT
state are:
- on
- off
- forceoff
/sys/devices/system/cpu/smt/active:
This file reports whether SMT is enabled and active, i.e. if on any
physical core two or more sibling threads are online.
SMT control is also possible at boot time via the l1tf kernel command
line parameter in combination with L1D flush control. See
:ref:`mitigation_control_command_line`.
5. Disabling EPT
^^^^^^^^^^^^^^^^
Disabling EPT for virtual machines provides full mitigation for L1TF even
with SMT enabled, because the effective page tables for guests are
managed and sanitized by the hypervisor. Though disabling EPT has a
significant performance impact especially when the Meltdown mitigation
KPTI is enabled.
EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
There is ongoing research and development for new mitigation mechanisms to
address the performance impact of disabling SMT or EPT.
.. _mitigation_control_command_line:
Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control the L1TF mitigations at boot
time with the option "l1tf=". The valid arguments for this option are:
============ =============================================================
full Provides all available mitigations for the L1TF
vulnerability. Disables SMT and enables all mitigations in
the hypervisors, i.e. unconditional L1D flushing
SMT control and L1D flush control via the sysfs interface
is still possible after boot. Hypervisors will issue a
warning when the first VM is started in a potentially
insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force Same as 'full', but disables SMT and L1D flush runtime
control. Implies the 'nosmt=force' command line option.
(i.e. sysfs control of SMT is disabled.)
flush Leaves SMT enabled and enables the default hypervisor
mitigation, i.e. conditional L1D flushing
SMT control and L1D flush control via the sysfs interface
is still possible after boot. Hypervisors will issue a
warning when the first VM is started in a potentially
insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt Disables SMT and enables the default hypervisor mitigation,
i.e. conditional L1D flushing.
SMT control and L1D flush control via the sysfs interface
is still possible after boot. Hypervisors will issue a
warning when the first VM is started in a potentially
insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is
started in a potentially insecure configuration.
off Disables hypervisor mitigations and doesn't emit any
warnings.
It also drops the swap size and available RAM limit restrictions
on both hypervisor and bare metal.
============ =============================================================
The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
.. _mitigation_control_kvm:
Mitigation control for KVM - module parameter
-------------------------------------------------------------
The KVM hypervisor mitigation mechanism, flushing the L1D cache when
entering a guest, can be controlled with a module parameter.
The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
following arguments:
============ ==============================================================
always L1D cache flush on every VMENTER.
cond Flush L1D on VMENTER only when the code between VMEXIT and
VMENTER can leak host memory which is considered
interesting for an attacker. This still can leak host memory
which allows e.g. to determine the hosts address space layout.
never Disables the mitigation
============ ==============================================================
The parameter can be provided on the kernel command line, as a module
parameter when loading the modules and at runtime modified via the sysfs
file:
/sys/module/kvm_intel/parameters/vmentry_l1d_flush
The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
module parameter is ignored and writes to the sysfs file are rejected.
.. _mitigation_selection:
Mitigation selection guide
--------------------------
1. No virtualization in use
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The system is protected by the kernel unconditionally and no further
action is required.
2. Virtualization with trusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the guest comes from a trusted source and the guest OS kernel is
guaranteed to have the L1TF mitigations in place the system is fully
protected against L1TF and no further action is required.
To avoid the overhead of the default L1D flushing on VMENTER the
administrator can disable the flushing via the kernel command line and
sysfs control files. See :ref:`mitigation_control_command_line` and
:ref:`mitigation_control_kvm`.
3. Virtualization with untrusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3.1. SMT not supported or disabled
""""""""""""""""""""""""""""""""""
If SMT is not supported by the processor or disabled in the BIOS or by
the kernel, it's only required to enforce L1D flushing on VMENTER.
Conditional L1D flushing is the default behaviour and can be tuned. See
:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
3.2. EPT not supported or disabled
""""""""""""""""""""""""""""""""""
If EPT is not supported by the processor or disabled in the hypervisor,
the system is fully protected. SMT can stay enabled and L1D flushing on
VMENTER is not required.
EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
3.3. SMT and EPT supported and active
"""""""""""""""""""""""""""""""""""""
If SMT and EPT are supported and active then various degrees of
mitigations can be employed:
- L1D flushing on VMENTER:
L1D flushing on VMENTER is the minimal protection requirement, but it
is only potent in combination with other mitigation methods.
Conditional L1D flushing is the default behaviour and can be tuned. See
:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
- Guest confinement:
Confinement of guests to a single or a group of physical cores which
are not running any other processes, can reduce the attack surface
significantly, but interrupts, soft interrupts and kernel threads can
still expose valuable data to a potential attacker. See
:ref:`guest_confinement`.
- Interrupt isolation:
Isolating the guest CPUs from interrupts can reduce the attack surface
further, but still allows a malicious guest to explore a limited amount
of host physical memory. This can at least be used to gain knowledge
about the host address space layout. The interrupts which have a fixed
affinity to the CPUs which run the untrusted guests can depending on
the scenario still trigger soft interrupts and schedule kernel threads
which might expose valuable information. See
:ref:`interrupt_isolation`.
The above three mitigation methods combined can provide protection to a
certain degree, but the risk of the remaining attack surface has to be
carefully analyzed. For full protection the following methods are
available:
- Disabling SMT:
Disabling SMT and enforcing the L1D flushing provides the maximum
amount of protection. This mitigation is not depending on any of the
above mitigation methods.
SMT control and L1D flushing can be tuned by the command line
parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
time with the matching sysfs control files. See :ref:`smt_control`,
:ref:`mitigation_control_command_line` and
:ref:`mitigation_control_kvm`.
- Disabling EPT:
Disabling EPT provides the maximum amount of protection as well. It is
not depending on any of the above mitigation methods. SMT can stay
enabled and L1D flushing is not required, but the performance impact is
significant.
EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
parameter.
3.4. Nested virtual machines
""""""""""""""""""""""""""""
When nested virtualization is in use, three operating systems are involved:
the bare metal hypervisor, the nested hypervisor and the nested virtual
machine. VMENTER operations from the nested hypervisor into the nested
guest will always be processed by the bare metal hypervisor. If KVM is the
bare metal hypervisor it will:
- Flush the L1D cache on every switch from the nested hypervisor to the
nested virtual machine, so that the nested hypervisor's secrets are not
exposed to the nested virtual machine;
- Flush the L1D cache on every switch from the nested virtual machine to
the nested hypervisor; this is a complex operation, and flushing the L1D
cache avoids that the bare metal hypervisor's secrets are exposed to the
nested virtual machine;
- Instruct the nested hypervisor to not perform any L1D cache flush. This
is an optimization to avoid double L1D flushing.
.. _default_mitigations:
Default mitigations
-------------------
The kernel default mitigations for vulnerable processors are:
- PTE inversion to protect against malicious user space. This is done
unconditionally and cannot be controlled. The swap storage is limited
to ~16TB.
- L1D conditional flushing on VMENTER when EPT is enabled for
a guest.
The kernel does not by default enforce the disabling of SMT, which leaves
SMT systems vulnerable when running untrusted guests with EPT enabled.
The rationale for this choice is:
- Force disabling SMT can break existing setups, especially with
unattended updates.
- If regular users run untrusted guests on their machine, then L1TF is
just an add on to other malware which might be embedded in an untrusted
guest, e.g. spam-bots or attacks on the local network.
There is no technical way to prevent a user from running untrusted code
on their machines blindly.
- It's technically extremely unlikely and from today's knowledge even
impossible that L1TF can be exploited via the most popular attack
mechanisms like JavaScript because these mechanisms have no way to
control PTEs. If this would be possible and not other mitigation would
be possible, then the default might be different.
- The administrators of cloud and hosting setups have to carefully
analyze the risk for their scenarios and make the appropriate
mitigation choices, which might even vary across their deployed
machines and also result in other changes of their overall setup.
There is no way for the kernel to provide a sensible default for this
kind of scenarios.

View File

@ -0,0 +1,308 @@
MDS - Microarchitectural Data Sampling
======================================
Microarchitectural Data Sampling is a hardware vulnerability which allows
unprivileged speculative access to data which is available in various CPU
internal buffers.
Affected processors
-------------------
This vulnerability affects a wide range of Intel processors. The
vulnerability is not present on:
- Processors from AMD, Centaur and other non Intel vendors
- Older processor models, where the CPU family is < 6
- Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)
- Intel processors which have the ARCH_CAP_MDS_NO bit set in the
IA32_ARCH_CAPABILITIES MSR.
Whether a processor is affected or not can be read out from the MDS
vulnerability file in sysfs. See :ref:`mds_sys_info`.
Not all processors are affected by all variants of MDS, but the mitigation
is identical for all of them so the kernel treats them as a single
vulnerability.
Related CVEs
------------
The following CVE entries are related to the MDS vulnerability:
============== ===== ===================================================
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
============== ===== ===================================================
Problem
-------
When performing store, load, L1 refill operations, processors write data
into temporary microarchitectural structures (buffers). The data in the
buffer can be forwarded to load operations as an optimization.
Under certain conditions, usually a fault/assist caused by a load
operation, data unrelated to the load memory address can be speculatively
forwarded from the buffers. Because the load operation causes a fault or
assist and its result will be discarded, the forwarded data will not cause
incorrect program execution or state changes. But a malicious operation
may be able to forward this speculative data to a disclosure gadget which
allows in turn to infer the value via a cache side channel attack.
Because the buffers are potentially shared between Hyper-Threads cross
Hyper-Thread attacks are possible.
Deeper technical information is available in the MDS specific x86
architecture section: :ref:`Documentation/x86/mds.rst <mds>`.
Attack scenarios
----------------
Attacks against the MDS vulnerabilities can be mounted from malicious non
priviledged user space applications running on hosts or guest. Malicious
guest OSes can obviously mount attacks as well.
Contrary to other speculation based vulnerabilities the MDS vulnerability
does not allow the attacker to control the memory target address. As a
consequence the attacks are purely sampling based, but as demonstrated with
the TLBleed attack samples can be postprocessed successfully.
Web-Browsers
^^^^^^^^^^^^
It's unclear whether attacks through Web-Browsers are possible at
all. The exploitation through Java-Script is considered very unlikely,
but other widely used web technologies like Webassembly could possibly be
abused.
.. _mds_sys_info:
MDS system information
-----------------------
The Linux kernel provides a sysfs interface to enumerate the current MDS
status of the system: whether the system is vulnerable, and which
mitigations are active. The relevant sysfs file is:
/sys/devices/system/cpu/vulnerabilities/mds
The possible values in this file are:
.. list-table::
* - 'Not affected'
- The processor is not vulnerable
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- The processor is vulnerable but microcode is not updated.
The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
If the processor is vulnerable then the following information is appended
to the above information:
======================== ============================================
'SMT vulnerable' SMT is enabled
'SMT mitigated' SMT is enabled and mitigated
'SMT disabled' SMT is disabled
'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
======================== ============================================
.. _vmwerv:
Best effort mitigation mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the processor is vulnerable, but the availability of the microcode based
mitigation mechanism is not advertised via CPUID the kernel selects a best
effort mitigation mode. This mode invokes the mitigation instructions
without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to expose
the CPUID to the guest. If the host has updated microcode the protection
takes effect otherwise a few cpu cycles are wasted pointlessly.
The state in the mds sysfs file reflects this situation accordingly.
Mitigation mechanism
-------------------------
The kernel detects the affected CPUs and the presence of the microcode
which is required.
If a CPU is affected and the microcode is available, then the kernel
enables the mitigation by default. The mitigation can be controlled at boot
time via a kernel command line option. See
:ref:`mds_mitigation_control_command_line`.
.. _cpu_buffer_clear:
CPU buffer clearing
^^^^^^^^^^^^^^^^^^^
The mitigation for MDS clears the affected CPU buffers on return to user
space and when entering a guest.
If SMT is enabled it also clears the buffers on idle entry when the CPU
is only affected by MSBDS and not any other MDS variant, because the
other variants cannot be protected against cross Hyper-Thread attacks.
For CPUs which are only affected by MSBDS the user space, guest and idle
transition mitigations are sufficient and SMT is not affected.
.. _virt_mechanism:
Virtualization mitigation
^^^^^^^^^^^^^^^^^^^^^^^^^
The protection for host to guest transition depends on the L1TF
vulnerability of the CPU:
- CPU is affected by L1TF:
If the L1D flush mitigation is enabled and up to date microcode is
available, the L1D flush mitigation is automatically protecting the
guest transition.
If the L1D flush mitigation is disabled then the MDS mitigation is
invoked explicit when the host MDS mitigation is enabled.
For details on L1TF and virtualization see:
:ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.
- CPU is not affected by L1TF:
CPU buffers are flushed before entering the guest when the host MDS
mitigation is enabled.
The resulting MDS protection matrix for the host to guest transition:
============ ===== ============= ============ =================
L1TF MDS VMX-L1FLUSH Host MDS MDS-State
Don't care No Don't care N/A Not affected
Yes Yes Disabled Off Vulnerable
Yes Yes Disabled Full Mitigated
Yes Yes Enabled Don't care Mitigated
No Yes N/A Off Vulnerable
No Yes N/A Full Mitigated
============ ===== ============= ============ =================
This only covers the host to guest transition, i.e. prevents leakage from
host to guest, but does not protect the guest internally. Guests need to
have their own protections.
.. _xeon_phi:
XEON PHI specific considerations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The XEON PHI processor family is affected by MSBDS which can be exploited
cross Hyper-Threads when entering idle states. Some XEON PHI variants allow
to use MWAIT in user space (Ring 3) which opens an potential attack vector
for malicious user space. The exposure can be disabled on the kernel
command line with the 'ring3mwait=disable' command line option.
XEON PHI is not affected by the other MDS variants and MSBDS is mitigated
before the CPU enters a idle state. As XEON PHI is not affected by L1TF
either disabling SMT is not required for full protection.
.. _mds_smt_control:
SMT control
^^^^^^^^^^^
All MDS variants except MSBDS can be attacked cross Hyper-Threads. That
means on CPUs which are affected by MFBDS or MLPDS it is necessary to
disable SMT for full protection. These are most of the affected CPUs; the
exception is XEON PHI, see :ref:`xeon_phi`.
Disabling SMT can have a significant performance impact, but the impact
depends on the type of workloads.
See the relevant chapter in the L1TF mitigation documentation for details:
:ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
.. _mds_mitigation_control_command_line:
Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control the MDS mitigations at boot
time with the option "mds=". The valid arguments for this option are:
============ =============================================================
full If the CPU is vulnerable, enable all available mitigations
for the MDS vulnerability, CPU buffer clearing on exit to
userspace and when entering a VM. Idle transitions are
protected as well if SMT is enabled.
It does not automatically disable SMT.
full,nosmt The same as mds=full, with SMT disabled on vulnerable
CPUs. This is the complete mitigation.
off Disables MDS mitigations completely.
============ =============================================================
Not specifying this option is equivalent to "mds=full".
Mitigation selection guide
--------------------------
1. Trusted userspace
^^^^^^^^^^^^^^^^^^^^
If all userspace applications are from a trusted source and do not
execute untrusted code which is supplied externally, then the mitigation
can be disabled.
2. Virtualization with trusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The same considerations as above versus trusted user space apply.
3. Virtualization with untrusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The protection depends on the state of the L1TF mitigations.
See :ref:`virt_mechanism`.
If the MDS mitigation is enabled and SMT is disabled, guest to host and
guest to guest attacks are prevented.
.. _mds_default_mitigations:
Default mitigations
-------------------
The kernel default mitigations for vulnerable processors are:
- Enable CPU buffer clearing
The kernel does not by default enforce the disabling of SMT, which leaves
SMT systems vulnerable when running untrusted code. The same rationale as
for L1TF applies.
See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.

View File

@ -17,14 +17,12 @@ etc.
kernel-parameters
devices
This section describes CPU vulnerabilities and provides an overview of the
possible mitigations along with guidance for selecting mitigations if they
are configurable at compile, boot or run time.
This section describes CPU vulnerabilities and their mitigations.
.. toctree::
:maxdepth: 1
l1tf
hw-vuln/index
Here is a set of documents aimed at users who are trying to track down
problems and bugs in particular.

View File

@ -1588,7 +1588,7 @@
Format: { "off" | "enforce" | "fix" | "log" }
default: "enforce"
ima_appraise_tcb [IMA]
ima_appraise_tcb [IMA] Deprecated. Use ima_policy= instead.
The builtin appraise policy appraises all files
owned by uid=0.
@ -1615,8 +1615,7 @@
uid=0.
The "appraise_tcb" policy appraises the integrity of
all files owned by root. (This is the equivalent
of ima_appraise_tcb.)
all files owned by root.
The "secure_boot" policy appraises the integrity
of files (eg. kexec kernel image, kernel modules,
@ -1831,6 +1830,9 @@
ip= [IP_PNP]
See Documentation/filesystems/nfs/nfsroot.txt.
ipcmni_extend [KNL] Extend the maximum number of unique System V
IPC identifiers from 32,768 to 16,777,216.
irqaffinity= [SMP] Set the default irq affinity mask
The argument is a cpu list, as described above.
@ -2144,7 +2146,7 @@
Default is 'flush'.
For details see: Documentation/admin-guide/l1tf.rst
For details see: Documentation/admin-guide/hw-vuln/l1tf.rst
l2cr= [PPC]
@ -2390,6 +2392,32 @@
Format: <first>,<last>
Specifies range of consoles to be captured by the MDA.
mds= [X86,INTEL]
Control mitigation for the Micro-architectural Data
Sampling (MDS) vulnerability.
Certain CPUs are vulnerable to an exploit against CPU
internal buffers which can forward information to a
disclosure gadget under certain conditions.
In vulnerable processors, the speculatively
forwarded data can be used in a cache side channel
attack, to access data to which the attacker does
not have direct access.
This parameter controls the MDS mitigation. The
options are:
full - Enable MDS mitigation on vulnerable CPUs
full,nosmt - Enable MDS mitigation and disable
SMT on vulnerable CPUs
off - Unconditionally disable MDS mitigation
Not specifying this option is equivalent to
mds=full.
For details see: Documentation/admin-guide/hw-vuln/mds.rst
mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
Amount of memory to be used when the kernel is not able
to see the whole system memory or for test.
@ -2566,6 +2594,7 @@
spec_store_bypass_disable=off [X86,PPC]
ssbd=force-off [ARM64]
l1tf=off [X86]
mds=off [X86]
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
@ -2580,6 +2609,7 @@
if needed. This is for users who always want to
be fully mitigated, even if it means losing SMT.
Equivalent to: l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
@ -2876,11 +2906,11 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings
nosmap [X86]
nosmap [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
nosmep [X86]
nosmep [X86,PPC]
Disable SMEP (Supervisor Mode Execution Prevention)
even if it is supported by processor.
@ -3147,6 +3177,16 @@
This will also cause panics on machine check exceptions.
Useful together with panic=30 to trigger a reboot.
page_alloc.shuffle=
[KNL] Boolean flag to control whether the page allocator
should randomize its free lists. The randomization may
be automatically enabled if the kernel detects it is
running on a platform with a direct-mapped memory-side
cache, and this parameter can be used to
override/disable that behavior. The state of the flag
can be read from sysfs at:
/sys/module/page_alloc/parameters/shuffle.
page_owner= [KNL] Boot-time page_owner enabling option.
Storage of the information about who allocated
each page is disabled in default. With this switch,
@ -3172,6 +3212,7 @@
bit 2: print timer info
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer
panic_on_warn panic() instead of WARN(). Useful to cause kdump
on a WARN().
@ -4027,7 +4068,9 @@
[[,]s[mp]#### \
[[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
[[,]f[orce]
Where reboot_mode is one of warm (soft) or cold (hard) or gpio,
Where reboot_mode is one of warm (soft) or cold (hard) or gpio
(prefix with 'panic_' to set mode for panic
reboot only),
reboot_type is one of bios, acpi, kbd, triple, efi, or pci,
reboot_force is either force or not specified,
reboot_cpu is s[mp]#### with #### being the processor
@ -5218,6 +5261,13 @@
with /sys/devices/system/xen_memory/xen_memory0/scrub_pages.
Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
xen_timer_slop= [X86-64,XEN]
Set the timer slop (in nanoseconds) for the virtual Xen
timers (default is 100000). This adjusts the minimum
delta of virtualized Xen timers, where lower values
improve timer resolution at the expense of processing
more timer interrupts.
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]

View File

@ -1,614 +0,0 @@
L1TF - L1 Terminal Fault
========================
L1 Terminal Fault is a hardware vulnerability which allows unprivileged
speculative access to data which is available in the Level 1 Data Cache
when the page table entry controlling the virtual address, which is used
for the access, has the Present bit cleared or other reserved bits set.
Affected processors
-------------------
This vulnerability affects a wide range of Intel processors. The
vulnerability is not present on:
- Processors from AMD, Centaur and other non Intel vendors
- Older processor models, where the CPU family is < 6
- A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
Penwell, Pineview, Silvermont, Airmont, Merrifield)
- The Intel XEON PHI family
- Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
by the Meltdown vulnerability either. These CPUs should become
available by end of 2018.
Whether a processor is affected or not can be read out from the L1TF
vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
Related CVEs
------------
The following CVE entries are related to the L1TF vulnerability:
============= ================= ==============================
CVE-2018-3615 L1 Terminal Fault SGX related aspects
CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects
CVE-2018-3646 L1 Terminal Fault Virtualization related aspects
============= ================= ==============================
Problem
-------
If an instruction accesses a virtual address for which the relevant page
table entry (PTE) has the Present bit cleared or other reserved bits set,
then speculative execution ignores the invalid PTE and loads the referenced
data if it is present in the Level 1 Data Cache, as if the page referenced
by the address bits in the PTE was still present and accessible.
While this is a purely speculative mechanism and the instruction will raise
a page fault when it is retired eventually, the pure act of loading the
data and making it available to other speculative instructions opens up the
opportunity for side channel attacks to unprivileged malicious code,
similar to the Meltdown attack.
While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the attack
works across all protection domains. It allows an attack of SGX and also
works from inside virtual machines because the speculation bypasses the
extended page table (EPT) protection mechanism.
Attack scenarios
----------------
1. Malicious user space
^^^^^^^^^^^^^^^^^^^^^^^
Operating Systems store arbitrary information in the address bits of a
PTE which is marked non present. This allows a malicious user space
application to attack the physical memory to which these PTEs resolve.
In some cases user-space can maliciously influence the information
encoded in the address bits of the PTE, thus making attacks more
deterministic and more practical.
The Linux kernel contains a mitigation for this attack vector, PTE
inversion, which is permanently enabled and has no performance
impact. The kernel ensures that the address bits of PTEs, which are not
marked present, never point to cacheable physical memory space.
A system with an up to date kernel is protected against attacks from
malicious user space applications.
2. Malicious guest in a virtual machine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The fact that L1TF breaks all domain protections allows malicious guest
OSes, which can control the PTEs directly, and malicious guest user
space applications, which run on an unprotected guest kernel lacking the
PTE inversion mitigation for L1TF, to attack physical host memory.
A special aspect of L1TF in the context of virtualization is symmetric
multi threading (SMT). The Intel implementation of SMT is called
HyperThreading. The fact that Hyperthreads on the affected processors
share the L1 Data Cache (L1D) is important for this. As the flaw allows
only to attack data which is present in L1D, a malicious guest running
on one Hyperthread can attack the data which is brought into the L1D by
the context which runs on the sibling Hyperthread of the same physical
core. This context can be host OS, host user space or a different guest.
If the processor does not support Extended Page Tables, the attack is
only possible, when the hypervisor does not sanitize the content of the
effective (shadow) page tables.
While solutions exist to mitigate these attack vectors fully, these
mitigations are not enabled by default in the Linux kernel because they
can affect performance significantly. The kernel provides several
mechanisms which can be utilized to address the problem depending on the
deployment scenario. The mitigations, their protection scope and impact
are described in the next sections.
The default mitigations and the rationale for choosing them are explained
at the end of this document. See :ref:`default_mitigations`.
.. _l1tf_sys_info:
L1TF system information
-----------------------
The Linux kernel provides a sysfs interface to enumerate the current L1TF
status of the system: whether the system is vulnerable, and which
mitigations are active. The relevant sysfs file is:
/sys/devices/system/cpu/vulnerabilities/l1tf
The possible values in this file are:
=========================== ===============================
'Not affected' The processor is not vulnerable
'Mitigation: PTE Inversion' The host protection is active
=========================== ===============================
If KVM/VMX is enabled and the processor is vulnerable then the following
information is appended to the 'Mitigation: PTE Inversion' part:
- SMT status:
===================== ================
'VMX: SMT vulnerable' SMT is enabled
'VMX: SMT disabled' SMT is disabled
===================== ================
- L1D Flush mode:
================================ ====================================
'L1D vulnerable' L1D flushing is disabled
'L1D conditional cache flushes' L1D flush is conditionally enabled
'L1D cache flushes' L1D flush is unconditionally enabled
================================ ====================================
The resulting grade of protection is discussed in the following sections.
Host mitigation mechanism
-------------------------
The kernel is unconditionally protected against L1TF attacks from malicious
user space running on the host.
Guest mitigation mechanisms
---------------------------
.. _l1d_flush:
1. L1D flush on VMENTER
^^^^^^^^^^^^^^^^^^^^^^^
To make sure that a guest cannot attack data which is present in the L1D
the hypervisor flushes the L1D before entering the guest.
Flushing the L1D evicts not only the data which should not be accessed
by a potentially malicious guest, it also flushes the guest
data. Flushing the L1D has a performance impact as the processor has to
bring the flushed guest data back into the L1D. Depending on the
frequency of VMEXIT/VMENTER and the type of computations in the guest
performance degradation in the range of 1% to 50% has been observed. For
scenarios where guest VMEXIT/VMENTER are rare the performance impact is
minimal. Virtio and mechanisms like posted interrupts are designed to
confine the VMEXITs to a bare minimum, but specific configurations and
application scenarios might still suffer from a high VMEXIT rate.
The kernel provides two L1D flush modes:
- conditional ('cond')
- unconditional ('always')
The conditional mode avoids L1D flushing after VMEXITs which execute
only audited code paths before the corresponding VMENTER. These code
paths have been verified that they cannot expose secrets or other
interesting data to an attacker, but they can leak information about the
address space layout of the hypervisor.
Unconditional mode flushes L1D on all VMENTER invocations and provides
maximum protection. It has a higher overhead than the conditional
mode. The overhead cannot be quantified correctly as it depends on the
workload scenario and the resulting number of VMEXITs.
The general recommendation is to enable L1D flush on VMENTER. The kernel
defaults to conditional mode on affected processors.
**Note**, that L1D flush does not prevent the SMT problem because the
sibling thread will also bring back its data into the L1D which makes it
attackable again.
L1D flush can be controlled by the administrator via the kernel command
line and sysfs control files. See :ref:`mitigation_control_command_line`
and :ref:`mitigation_control_kvm`.
.. _guest_confinement:
2. Guest VCPU confinement to dedicated physical cores
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To address the SMT problem, it is possible to make a guest or a group of
guests affine to one or more physical cores. The proper mechanism for
that is to utilize exclusive cpusets to ensure that no other guest or
host tasks can run on these cores.
If only a single guest or related guests run on sibling SMT threads on
the same physical core then they can only attack their own memory and
restricted parts of the host memory.
Host memory is attackable, when one of the sibling SMT threads runs in
host OS (hypervisor) context and the other in guest context. The amount
of valuable information from the host OS context depends on the context
which the host OS executes, i.e. interrupts, soft interrupts and kernel
threads. The amount of valuable data from these contexts cannot be
declared as non-interesting for an attacker without deep inspection of
the code.
**Note**, that assigning guests to a fixed set of physical cores affects
the ability of the scheduler to do load balancing and might have
negative effects on CPU utilization depending on the hosting
scenario. Disabling SMT might be a viable alternative for particular
scenarios.
For further information about confining guests to a single or to a group
of cores consult the cpusets documentation:
https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
.. _interrupt_isolation:
3. Interrupt affinity
^^^^^^^^^^^^^^^^^^^^^
Interrupts can be made affine to logical CPUs. This is not universally
true because there are types of interrupts which are truly per CPU
interrupts, e.g. the local timer interrupt. Aside of that multi queue
devices affine their interrupts to single CPUs or groups of CPUs per
queue without allowing the administrator to control the affinities.
Moving the interrupts, which can be affinity controlled, away from CPUs
which run untrusted guests, reduces the attack vector space.
Whether the interrupts with are affine to CPUs, which run untrusted
guests, provide interesting data for an attacker depends on the system
configuration and the scenarios which run on the system. While for some
of the interrupts it can be assumed that they won't expose interesting
information beyond exposing hints about the host OS memory layout, there
is no way to make general assumptions.
Interrupt affinity can be controlled by the administrator via the
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
available at:
https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
.. _smt_control:
4. SMT control
^^^^^^^^^^^^^^
To prevent the SMT issues of L1TF it might be necessary to disable SMT
completely. Disabling SMT can have a significant performance impact, but
the impact depends on the hosting scenario and the type of workloads.
The impact of disabling SMT needs also to be weighted against the impact
of other mitigation solutions like confining guests to dedicated cores.
The kernel provides a sysfs interface to retrieve the status of SMT and
to control it. It also provides a kernel command line interface to
control SMT.
The kernel command line interface consists of the following options:
=========== ==========================================================
nosmt Affects the bring up of the secondary CPUs during boot. The
kernel tries to bring all present CPUs online during the
boot process. "nosmt" makes sure that from each physical
core only one - the so called primary (hyper) thread is
activated. Due to a design flaw of Intel processors related
to Machine Check Exceptions the non primary siblings have
to be brought up at least partially and are then shut down
again. "nosmt" can be undone via the sysfs interface.
nosmt=force Has the same effect as "nosmt" but it does not allow to
undo the SMT disable via the sysfs interface.
=========== ==========================================================
The sysfs interface provides two files:
- /sys/devices/system/cpu/smt/control
- /sys/devices/system/cpu/smt/active
/sys/devices/system/cpu/smt/control:
This file allows to read out the SMT control state and provides the
ability to disable or (re)enable SMT. The possible states are:
============== ===================================================
on SMT is supported by the CPU and enabled. All
logical CPUs can be onlined and offlined without
restrictions.
off SMT is supported by the CPU and disabled. Only
the so called primary SMT threads can be onlined
and offlined without restrictions. An attempt to
online a non-primary sibling is rejected
forceoff Same as 'off' but the state cannot be controlled.
Attempts to write to the control file are rejected.
notsupported The processor does not support SMT. It's therefore
not affected by the SMT implications of L1TF.
Attempts to write to the control file are rejected.
============== ===================================================
The possible states which can be written into this file to control SMT
state are:
- on
- off
- forceoff
/sys/devices/system/cpu/smt/active:
This file reports whether SMT is enabled and active, i.e. if on any
physical core two or more sibling threads are online.
SMT control is also possible at boot time via the l1tf kernel command
line parameter in combination with L1D flush control. See
:ref:`mitigation_control_command_line`.
5. Disabling EPT
^^^^^^^^^^^^^^^^
Disabling EPT for virtual machines provides full mitigation for L1TF even
with SMT enabled, because the effective page tables for guests are
managed and sanitized by the hypervisor. Though disabling EPT has a
significant performance impact especially when the Meltdown mitigation
KPTI is enabled.
EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
There is ongoing research and development for new mitigation mechanisms to
address the performance impact of disabling SMT or EPT.
.. _mitigation_control_command_line:
Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control the L1TF mitigations at boot
time with the option "l1tf=". The valid arguments for this option are:
============ =============================================================
full Provides all available mitigations for the L1TF
vulnerability. Disables SMT and enables all mitigations in
the hypervisors, i.e. unconditional L1D flushing
SMT control and L1D flush control via the sysfs interface
is still possible after boot. Hypervisors will issue a
warning when the first VM is started in a potentially
insecure configuration, i.e. SMT enabled or L1D flush
disabled.
full,force Same as 'full', but disables SMT and L1D flush runtime
control. Implies the 'nosmt=force' command line option.
(i.e. sysfs control of SMT is disabled.)
flush Leaves SMT enabled and enables the default hypervisor
mitigation, i.e. conditional L1D flushing
SMT control and L1D flush control via the sysfs interface
is still possible after boot. Hypervisors will issue a
warning when the first VM is started in a potentially
insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nosmt Disables SMT and enables the default hypervisor mitigation,
i.e. conditional L1D flushing.
SMT control and L1D flush control via the sysfs interface
is still possible after boot. Hypervisors will issue a
warning when the first VM is started in a potentially
insecure configuration, i.e. SMT enabled or L1D flush
disabled.
flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is
started in a potentially insecure configuration.
off Disables hypervisor mitigations and doesn't emit any
warnings.
It also drops the swap size and available RAM limit restrictions
on both hypervisor and bare metal.
============ =============================================================
The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
.. _mitigation_control_kvm:
Mitigation control for KVM - module parameter
-------------------------------------------------------------
The KVM hypervisor mitigation mechanism, flushing the L1D cache when
entering a guest, can be controlled with a module parameter.
The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
following arguments:
============ ==============================================================
always L1D cache flush on every VMENTER.
cond Flush L1D on VMENTER only when the code between VMEXIT and
VMENTER can leak host memory which is considered
interesting for an attacker. This still can leak host memory
which allows e.g. to determine the hosts address space layout.
never Disables the mitigation
============ ==============================================================
The parameter can be provided on the kernel command line, as a module
parameter when loading the modules and at runtime modified via the sysfs
file:
/sys/module/kvm_intel/parameters/vmentry_l1d_flush
The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
module parameter is ignored and writes to the sysfs file are rejected.
Mitigation selection guide
--------------------------
1. No virtualization in use
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The system is protected by the kernel unconditionally and no further
action is required.
2. Virtualization with trusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the guest comes from a trusted source and the guest OS kernel is
guaranteed to have the L1TF mitigations in place the system is fully
protected against L1TF and no further action is required.
To avoid the overhead of the default L1D flushing on VMENTER the
administrator can disable the flushing via the kernel command line and
sysfs control files. See :ref:`mitigation_control_command_line` and
:ref:`mitigation_control_kvm`.
3. Virtualization with untrusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3.1. SMT not supported or disabled
""""""""""""""""""""""""""""""""""
If SMT is not supported by the processor or disabled in the BIOS or by
the kernel, it's only required to enforce L1D flushing on VMENTER.
Conditional L1D flushing is the default behaviour and can be tuned. See
:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
3.2. EPT not supported or disabled
""""""""""""""""""""""""""""""""""
If EPT is not supported by the processor or disabled in the hypervisor,
the system is fully protected. SMT can stay enabled and L1D flushing on
VMENTER is not required.
EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
3.3. SMT and EPT supported and active
"""""""""""""""""""""""""""""""""""""
If SMT and EPT are supported and active then various degrees of
mitigations can be employed:
- L1D flushing on VMENTER:
L1D flushing on VMENTER is the minimal protection requirement, but it
is only potent in combination with other mitigation methods.
Conditional L1D flushing is the default behaviour and can be tuned. See
:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
- Guest confinement:
Confinement of guests to a single or a group of physical cores which
are not running any other processes, can reduce the attack surface
significantly, but interrupts, soft interrupts and kernel threads can
still expose valuable data to a potential attacker. See
:ref:`guest_confinement`.
- Interrupt isolation:
Isolating the guest CPUs from interrupts can reduce the attack surface
further, but still allows a malicious guest to explore a limited amount
of host physical memory. This can at least be used to gain knowledge
about the host address space layout. The interrupts which have a fixed
affinity to the CPUs which run the untrusted guests can depending on
the scenario still trigger soft interrupts and schedule kernel threads
which might expose valuable information. See
:ref:`interrupt_isolation`.
The above three mitigation methods combined can provide protection to a
certain degree, but the risk of the remaining attack surface has to be
carefully analyzed. For full protection the following methods are
available:
- Disabling SMT:
Disabling SMT and enforcing the L1D flushing provides the maximum
amount of protection. This mitigation is not depending on any of the
above mitigation methods.
SMT control and L1D flushing can be tuned by the command line
parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
time with the matching sysfs control files. See :ref:`smt_control`,
:ref:`mitigation_control_command_line` and
:ref:`mitigation_control_kvm`.
- Disabling EPT:
Disabling EPT provides the maximum amount of protection as well. It is
not depending on any of the above mitigation methods. SMT can stay
enabled and L1D flushing is not required, but the performance impact is
significant.
EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
parameter.
3.4. Nested virtual machines
""""""""""""""""""""""""""""
When nested virtualization is in use, three operating systems are involved:
the bare metal hypervisor, the nested hypervisor and the nested virtual
machine. VMENTER operations from the nested hypervisor into the nested
guest will always be processed by the bare metal hypervisor. If KVM is the
bare metal hypervisor it will:
- Flush the L1D cache on every switch from the nested hypervisor to the
nested virtual machine, so that the nested hypervisor's secrets are not
exposed to the nested virtual machine;
- Flush the L1D cache on every switch from the nested virtual machine to
the nested hypervisor; this is a complex operation, and flushing the L1D
cache avoids that the bare metal hypervisor's secrets are exposed to the
nested virtual machine;
- Instruct the nested hypervisor to not perform any L1D cache flush. This
is an optimization to avoid double L1D flushing.
.. _default_mitigations:
Default mitigations
-------------------
The kernel default mitigations for vulnerable processors are:
- PTE inversion to protect against malicious user space. This is done
unconditionally and cannot be controlled. The swap storage is limited
to ~16TB.
- L1D conditional flushing on VMENTER when EPT is enabled for
a guest.
The kernel does not by default enforce the disabling of SMT, which leaves
SMT systems vulnerable when running untrusted guests with EPT enabled.
The rationale for this choice is:
- Force disabling SMT can break existing setups, especially with
unattended updates.
- If regular users run untrusted guests on their machine, then L1TF is
just an add on to other malware which might be embedded in an untrusted
guest, e.g. spam-bots or attacks on the local network.
There is no technical way to prevent a user from running untrusted code
on their machines blindly.
- It's technically extremely unlikely and from today's knowledge even
impossible that L1TF can be exploited via the most popular attack
mechanisms like JavaScript because these mechanisms have no way to
control PTEs. If this would be possible and not other mitigation would
be possible, then the default might be different.
- The administrators of cloud and hosting setups have to carefully
analyze the risk for their scenarios and make the appropriate
mitigation choices, which might even vary across their deployed
machines and also result in other changes of their overall setup.
There is no way for the kernel to provide a sensible default for this
kind of scenarios.

View File

@ -31,6 +31,7 @@ the Linux memory management.
ksm
memory-hotplug
numa_memory_policy
numaperf
pagemap
soft-dirty
transhuge

View File

@ -0,0 +1,169 @@
.. _numaperf:
=============
NUMA Locality
=============
Some platforms may have multiple types of memory attached to a compute
node. These disparate memory ranges may share some characteristics, such
as CPU cache coherence, but may have different performance. For example,
different media types and buses affect bandwidth and latency.
A system supports such heterogeneous memory by grouping each memory type
under different domains, or "nodes", based on locality and performance
characteristics. Some memory may share the same node as a CPU, and others
are provided as memory only nodes. While memory only nodes do not provide
CPUs, they may still be local to one or more compute nodes relative to
other nodes. The following diagram shows one such example of two compute
nodes with local memory and a memory only node for each of compute node::
+------------------+ +------------------+
| Compute Node 0 +-----+ Compute Node 1 |
| Local Node0 Mem | | Local Node1 Mem |
+--------+---------+ +--------+---------+
| |
+--------+---------+ +--------+---------+
| Slower Node2 Mem | | Slower Node3 Mem |
+------------------+ +--------+---------+
A "memory initiator" is a node containing one or more devices such as
CPUs or separate memory I/O devices that can initiate memory requests.
A "memory target" is a node containing one or more physical address
ranges accessible from one or more memory initiators.
When multiple memory initiators exist, they may not all have the same
performance when accessing a given memory target. Each initiator-target
pair may be organized into different ranked access classes to represent
this relationship. The highest performing initiator to a given target
is considered to be one of that target's local initiators, and given
the highest access class, 0. Any given target may have one or more
local initiators, and any given initiator may have multiple local
memory targets.
To aid applications matching memory targets with their initiators, the
kernel provides symlinks to each other. The following example lists the
relationship for the access class "0" memory initiators and targets::
# symlinks -v /sys/devices/system/node/nodeX/access0/targets/
relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
# symlinks -v /sys/devices/system/node/nodeY/access0/initiators/
relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
A memory initiator may have multiple memory targets in the same access
class. The target memory's initiators in a given class indicate the
nodes' access characteristics share the same performance relative to other
linked initiator nodes. Each target within an initiator's access class,
though, do not necessarily perform the same as each other.
================
NUMA Performance
================
Applications may wish to consider which node they want their memory to
be allocated from based on the node's performance characteristics. If
the system provides these attributes, the kernel exports them under the
node sysfs hierarchy by appending the attributes directory under the
memory node's access class 0 initiators as follows::
/sys/devices/system/node/nodeY/access0/initiators/
These attributes apply only when accessed from nodes that have the
are linked under the this access's inititiators.
The performance characteristics the kernel provides for the local initiators
are exported are as follows::
# tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/
/sys/devices/system/node/nodeY/access0/initiators/
|-- read_bandwidth
|-- read_latency
|-- write_bandwidth
`-- write_latency
The bandwidth attributes are provided in MiB/second.
The latency attributes are provided in nanoseconds.
The values reported here correspond to the rated latency and bandwidth
for the platform.
==========
NUMA Cache
==========
System memory may be constructed in a hierarchy of elements with various
performance characteristics in order to provide large address space of
slower performing memory cached by a smaller higher performing memory. The
system physical addresses memory initiators are aware of are provided
by the last memory level in the hierarchy. The system meanwhile uses
higher performing memory to transparently cache access to progressively
slower levels.
The term "far memory" is used to denote the last level memory in the
hierarchy. Each increasing cache level provides higher performing
initiator access, and the term "near memory" represents the fastest
cache provided by the system.
This numbering is different than CPU caches where the cache level (ex:
L1, L2, L3) uses the CPU-side view where each increased level is lower
performing. In contrast, the memory cache level is centric to the last
level memory, so the higher numbered cache level corresponds to memory
nearer to the CPU, and further from far memory.
The memory-side caches are not directly addressable by software. When
software accesses a system address, the system will return it from the
near memory cache if it is present. If it is not present, the system
accesses the next level of memory until there is either a hit in that
cache level, or it reaches far memory.
An application does not need to know about caching attributes in order
to use the system. Software may optionally query the memory cache
attributes in order to maximize the performance out of such a setup.
If the system provides a way for the kernel to discover this information,
for example with ACPI HMAT (Heterogeneous Memory Attribute Table),
the kernel will append these attributes to the NUMA node memory target.
When the kernel first registers a memory cache with a node, the kernel
will create the following directory::
/sys/devices/system/node/nodeX/memory_side_cache/
If that directory is not present, the system either does not not provide
a memory-side cache, or that information is not accessible to the kernel.
The attributes for each level of cache is provided under its cache
level index::
/sys/devices/system/node/nodeX/memory_side_cache/indexA/
/sys/devices/system/node/nodeX/memory_side_cache/indexB/
/sys/devices/system/node/nodeX/memory_side_cache/indexC/
Each cache level's directory provides its attributes. For example, the
following shows a single cache level and the attributes available for
software to query::
# tree sys/devices/system/node/node0/memory_side_cache/
/sys/devices/system/node/node0/memory_side_cache/
|-- index1
| |-- indexing
| |-- line_size
| |-- size
| `-- write_policy
The "indexing" will be 0 if it is a direct-mapped cache, and non-zero
for any other indexed based, multi-way associativity.
The "line_size" is the number of bytes accessed from the next cache
level on a miss.
The "size" is the number of bytes provided by this cache level.
The "write_policy" will be 0 for write-back, and non-zero for
write-through caching.
========
See Also
========
.. [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
Section 5.2.27

View File

@ -0,0 +1,85 @@
Perf Event Attributes
=====================
Author: Andrew Murray <andrew.murray@arm.com>
Date: 2019-03-06
exclude_user
------------
This attribute excludes userspace.
Userspace always runs at EL0 and thus this attribute will exclude EL0.
exclude_kernel
--------------
This attribute excludes the kernel.
The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
at EL1.
For the host this attribute will exclude EL1 and additionally EL2 on a VHE
system.
For the guest this attribute will exclude EL1. Please note that EL2 is
never counted within a guest.
exclude_hv
----------
This attribute excludes the hypervisor.
For a VHE host this attribute is ignored as we consider the host kernel to
be the hypervisor.
For a non-VHE host this attribute will exclude EL2 as we consider the
hypervisor to be any code that runs at EL2 which is predominantly used for
guest/host transitions.
For the guest this attribute has no effect. Please note that EL2 is
never counted within a guest.
exclude_host / exclude_guest
----------------------------
These attributes exclude the KVM host and guest, respectively.
The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE
kernel or non-VHE hypervisor).
The KVM guest may run at EL0 (userspace) and EL1 (kernel).
Due to the overlapping exception levels between host and guests we cannot
exclusively rely on the PMU's hardware exception filtering - therefore we
must enable/disable counting on the entry and exit to the guest. This is
performed differently on VHE and non-VHE systems.
For non-VHE systems we exclude EL2 for exclude_host - upon entering and
exiting the guest we disable/enable the event as appropriate based on the
exclude_host and exclude_guest attributes.
For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2
for exclude_host. Upon entering and exiting the guest we modify the event
to include/exclude EL0 as appropriate based on the exclude_host and
exclude_guest attributes.
The statements above also apply when these attributes are used within a
non-VHE guest however please note that EL2 is never counted within a guest.
Accuracy
--------
On non-VHE hosts we enable/disable counters on the entry/exit of host/guest
transition at EL2 - however there is a period of time between
enabling/disabling the counters and entering/exiting the guest. We are
able to eliminate counters counting host events on the boundaries of guest
entry/exit when counting guest events by filtering out EL2 for
exclude_host. However when using !exclude_hv there is a small blackout
window at the guest entry/exit where host events are not captured.
On VHE systems there are no blackout windows.

View File

@ -87,7 +87,21 @@ used to get and set the keys for a thread.
Virtualization
--------------
Pointer authentication is not currently supported in KVM guests. KVM
will mask the feature bits from ID_AA64ISAR1_EL1, and attempted use of
the feature will result in an UNDEFINED exception being injected into
the guest.
Pointer authentication is enabled in KVM guest when each virtual cpu is
initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and
requesting these two separate cpu features to be enabled. The current KVM
guest implementation works by enabling both features together, so both
these userspace flags are checked before enabling pointer authentication.
The separate userspace flag will allow to have no userspace ABI changes
if support is added in the future to allow these two features to be
enabled independently of one another.
As Arm Architecture specifies that Pointer Authentication feature is
implemented along with the VHE feature so KVM arm64 ptrauth code relies
on VHE mode to be present.
Additionally, when these vcpu feature flags are not set then KVM will
filter out the Pointer Authentication system key registers from
KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID
register. Any attempt to use the Pointer Authentication instructions will
result in an UNDEFINED exception being injected into the guest.

View File

@ -58,13 +58,14 @@ stable kernels.
| ARM | Cortex-A72 | #853709 | N/A |
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
| ARM | Cortex-A76 | #1188873 | ARM64_ERRATUM_1188873 |
| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 |
| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 |
| ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 |
| ARM | Neoverse-N1 | #1188873 | ARM64_ERRATUM_1188873 |
| ARM | MMU-500 | #841119,#826419 | N/A |
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
| ARM | MMU-500 | #841119,826419 | N/A |
| | | | |
| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 |
| Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 |
| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 |
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |

View File

@ -56,6 +56,18 @@ model features for SVE is included in Appendix A.
is to connect to a target process first and then attempt a
ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
between userspace and the kernel, the register value is encoded in memory in
an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at
byte offset i from the start of the memory representation. This affects for
example the signal frame (struct sve_context) and ptrace interface
(struct user_sve_header) and associated data.
Beware that on big-endian systems this results in a different byte order than
for the FPSIMD V-registers, which are stored as single host-endian 128-bit
values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at
byte offset i. (struct fpsimd_context, struct user_fpsimd_state).
2. Vector length terminology
-----------------------------
@ -124,6 +136,10 @@ the SVE instruction set architecture.
size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to
the members.
* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant
layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the
start of the register's representation in memory.
* If the SVE context is too big to fit in sigcontext.__reserved[], then extra
space is allocated on the stack, an extra_context record is written in
__reserved[] referencing this space. sve_context is then written in the

View File

@ -1,6 +1,6 @@
On atomic bitops.
=============
Atomic bitops
=============
While our bitmap_{}() functions are non-atomic, we have a number of operations
operating on single bits in a bitmap that are atomic.

View File

@ -20,13 +20,26 @@ for that device, by setting low_latency to 0. See Section 3 for
details on how to configure BFQ for the desired tradeoff between
latency and throughput, or on how to maximize throughput.
BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
can process for a device scheduled with BFQ. To give an idea of the
limits on slow or average CPUs, here are, first, the limits of BFQ for
three different CPUs, on, respectively, an average laptop, an old
desktop, and a cheap embedded system, in case full hierarchical
support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
As every I/O scheduler, BFQ adds some overhead to per-I/O-request
processing. To give an idea of this overhead, the total,
single-lock-protected, per-request processing time of BFQ---i.e., the
sum of the execution times of the request insertion, dispatch and
completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
(dated CPU for notebooks; time measured with simple code
instrumentation, and using the throughput-sync.sh script of the S
suite [1], in performance-profiling mode). To put this result into
context, the total, single-lock-protected, per-request execution time
of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7
us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).
Scheduling overhead further limits the maximum IOPS that a CPU can
process (already limited by the execution of the rest of the I/O
stack). To give an idea of the limits with BFQ, on slow or average
CPUs, here are, first, the limits of BFQ for three different CPUs, on,
respectively, an average laptop, an old desktop, and a cheap embedded
system, in case full hierarchical support is enabled (i.e.,
CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_DEBUG_BLK_CGROUP is not
set (Section 4-2):
- Intel i7-4850HQ: 400 KIOPS
- AMD A8-3850: 250 KIOPS
- ARM CortexTM-A53 Octa-core: 80 KIOPS
@ -566,3 +579,5 @@ applications. Unset this tunable if you need/want to control weights.
Slightly extended version:
http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
results.pdf
[3] https://github.com/Algodev-github/S

View File

@ -93,3 +93,7 @@ zoned=[0/1]: Default: 0
zone_size=[MB]: Default: 256
Per zone size when exposed as a zoned block device. Must be a power of two.
zone_nr_conv=[nr_conv]: Default: 0
The number of conventional zones to create when block device is zoned. If
zone_nr_conv >= nr_zones, it will be reduced to nr_zones - 1.

View File

@ -13,11 +13,9 @@ you can do so by typing:
# mount none /sys -t sysfs
As of the Linux 2.6.10 kernel, it is now possible to change the
IO scheduler for a given block device on the fly (thus making it possible,
for instance, to set the CFQ scheduler for the system default, but
set a specific device to use the deadline or noop schedulers - which
can improve that device's throughput).
It is possible to change the IO scheduler for a given block device on
the fly to select one of mq-deadline, none, bfq, or kyber schedulers -
which can improve that device's throughput.
To set a specific scheduler, simply do this:
@ -30,8 +28,8 @@ The list of defined schedulers can be found by simply doing
a "cat /sys/block/DEV/queue/scheduler" - the list of valid names
will be displayed, with the currently selected scheduler in brackets:
# cat /sys/block/hda/queue/scheduler
noop deadline [cfq]
# echo deadline > /sys/block/hda/queue/scheduler
# cat /sys/block/hda/queue/scheduler
noop [deadline] cfq
# cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
# echo none >/sys/block/sda/queue/scheduler
# cat /sys/block/sda/queue/scheduler
[none] mq-deadline kyber bfq

View File

@ -85,8 +85,33 @@ Q: Can loops be supported in a safe way?
A: It's not clear yet.
BPF developers are trying to find a way to
support bounded loops where the verifier can guarantee that
the program terminates in less than 4096 instructions.
support bounded loops.
Q: What are the verifier limits?
--------------------------------
A: The only limit known to the user space is BPF_MAXINSNS (4096).
It's the maximum number of instructions that the unprivileged bpf
program can have. The verifier has various internal limits.
Like the maximum number of instructions that can be explored during
program analysis. Currently, that limit is set to 1 million.
Which essentially means that the largest program can consist
of 1 million NOP instructions. There is a limit to the maximum number
of subsequent branches, a limit to the number of nested bpf-to-bpf
calls, a limit to the number of the verifier states per instruction,
a limit to the number of maps used by the program.
All these limits can be hit with a sufficiently complex program.
There are also non-numerical limits that can cause the program
to be rejected. The verifier used to recognize only pointer + constant
expressions. Now it can recognize pointer + bounded_register.
bpf_lookup_map_elem(key) had a requirement that 'key' must be
a pointer to the stack. Now, 'key' can be a pointer to map value.
The verifier is steadily getting 'smarter'. The limits are
being removed. The only way to know that the program is going to
be accepted by the verifier is to try to load it.
The bpf development process guarantees that the future kernel
versions will accept all bpf programs that were accepted by
the earlier versions.
Instruction level questions
---------------------------

View File

@ -82,6 +82,8 @@ sequentially and type id is assigned to each recognized type starting from id
#define BTF_KIND_RESTRICT 11 /* Restrict */
#define BTF_KIND_FUNC 12 /* Function */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
#define BTF_KIND_VAR 14 /* Variable */
#define BTF_KIND_DATASEC 15 /* Section */
Note that the type section encodes debug info, not just pure types.
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
@ -129,7 +131,7 @@ The following sections detail encoding of each kind.
``btf_type`` is followed by a ``u32`` with the following bits arrangement::
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
#define BTF_INT_OFFSET(VAL) (((VAL) & 0x00ff0000) >> 16)
#define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff)
The ``BTF_INT_ENCODING`` has the following attributes::
@ -393,6 +395,61 @@ refers to parameter type.
If the function has variable arguments, the last parameter is encoded with
``name_off = 0`` and ``type = 0``.
2.2.14 BTF_KIND_VAR
~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: offset to a valid C identifier
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_VAR
* ``info.vlen``: 0
* ``type``: the type of the variable
``btf_type`` is followed by a single ``struct btf_variable`` with the
following data::
struct btf_var {
__u32 linkage;
};
``struct btf_var`` encoding:
* ``linkage``: currently only static variable 0, or globally allocated
variable in ELF sections 1
Not all type of global variables are supported by LLVM at this point.
The following is currently available:
* static variables with or without section attributes
* global variables with section attributes
The latter is for future extraction of map key/value type id's from a
map definition.
2.2.15 BTF_KIND_DATASEC
~~~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: offset to a valid name associated with a variable or
one of .data/.bss/.rodata
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_DATASEC
* ``info.vlen``: # of variables
* ``size``: total section size in bytes (0 at compilation time, patched
to actual size by BPF loaders such as libbpf)
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_var_secinfo``.::
struct btf_var_secinfo {
__u32 type;
__u32 offset;
__u32 size;
};
``struct btf_var_secinfo`` encoding:
* ``type``: the type of the BTF_KIND_VAR variable
* ``offset``: the in-section offset of the variable
* ``size``: the size of the variable in bytes
3. BTF Kernel API
*****************
@ -521,6 +578,7 @@ For line_info, the line number and column number are defined as below:
#define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff)
3.4 BPF_{PROG,MAP}_GET_NEXT_ID
==============================
In kernel, every loaded program, map or btf has a unique id. The id won't
change during the lifetime of a program, map, or btf.
@ -530,6 +588,7 @@ each command, to user space, for bpf program or maps, respectively, so an
inspection tool can inspect all programs and maps.
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
===============================
An introspection tool cannot use id to get details about program or maps.
A file descriptor needs to be obtained first for reference-counting purpose.

View File

@ -36,6 +36,16 @@ Two sets of Questions and Answers (Q&A) are maintained.
bpf_devel_QA
Program types
=============
.. toctree::
:maxdepth: 1
prog_cgroup_sysctl
prog_flow_dissector
.. Links:
.. _Documentation/networking/filter.txt: ../networking/filter.txt
.. _man-pages: https://www.kernel.org/doc/man-pages/

View File

@ -0,0 +1,125 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
===========================
BPF_PROG_TYPE_CGROUP_SYSCTL
===========================
This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
provides cgroup-bpf hook for sysctl.
The hook has to be attached to a cgroup and will be called every time a
process inside that cgroup tries to read from or write to sysctl knob in proc.
1. Attach type
**************
``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.
2. Context
**********
``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
BPF program::
struct bpf_sysctl {
__u32 write;
__u32 file_pos;
};
* ``write`` indicates whether sysctl value is being read (``0``) or written
(``1``). This field is read-only.
* ``file_pos`` indicates file position sysctl is being accessed at, read
or written. This field is read-write. Writing to the field sets the starting
position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
will be writing to. Writing zero to the field can be used e.g. to override
whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
when it's called by user space on ``file_pos > 0``. Writing non-zero
value to the field can be used to access part of sysctl value starting from
specified ``file_pos``. Not all sysctl support access with ``file_pos !=
0``, e.g. writes to numeric sysctl entries must always be at file position
``0``. See also ``kernel.sysctl_writes_strict`` sysctl.
See `linux/bpf.h`_ for more details on how context field can be accessed.
3. Return code
**************
``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
return codes:
* ``0`` means "reject access to sysctl";
* ``1`` means "proceed with access".
If program returns ``0`` user space will get ``-1`` from ``read(2)`` or
``write(2)`` and ``errno`` will be set to ``EPERM``.
4. Helpers
**********
Since sysctl knob is represented by a name and a value, sysctl specific BPF
helpers focus on providing access to these properties:
* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
``/proc/sys`` into provided by BPF program buffer;
* ``bpf_sysctl_get_current_value()`` to get string value currently held by
sysctl into provided by BPF program buffer. This helper is available on both
``read(2)`` from and ``write(2)`` to sysctl;
* ``bpf_sysctl_get_new_value()`` to get new string value currently being
written to sysctl before actual write happens. This helper can be used only
on ``ctx->write == 1``;
* ``bpf_sysctl_set_new_value()`` to override new string value currently being
written to sysctl before actual write happens. Sysctl value will be
overridden starting from the current ``ctx->file_pos``. If the whole value
has to be overridden BPF program can set ``file_pos`` to zero before calling
to the helper. This helper can be used only on ``ctx->write == 1``. New
string value set by the helper is treated and verified by kernel same way as
an equivalent string passed by user space.
BPF program sees sysctl value same way as user space does in proc filesystem,
i.e. as a string. Since many sysctl values represent an integer or a vector
of integers, the following helpers can be used to get numeric value from the
string:
* ``bpf_strtol()`` to convert initial part of the string to long integer
similar to user space `strtol(3)`_;
* ``bpf_strtoul()`` to convert initial part of the string to unsigned long
integer similar to user space `strtoul(3)`_;
See `linux/bpf.h`_ for more details on helpers described here.
5. Examples
***********
See `test_sysctl_prog.c`_ for an example of BPF program in C that access
sysctl name and value, parses string value to get vector of integers and uses
the result to make decision whether to allow or deny access to sysctl.
6. Notes
********
``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root
environment, for example to monitor sysctl usage or catch unreasonable values
an application, running as root in a separate cgroup, is trying to set.
Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
may return results different from that at `sys_open` time, i.e. process that
opened sysctl file in proc filesystem may differ from process that is trying
to read from / write to it and two such processes may run in different
cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
security mechanism to limit sysctl usage.
As with any cgroup-bpf program additional care should be taken if an
application running as root in a cgroup should not be allowed to
detach/replace BPF program attached by administrator.
.. Links
.. _linux/bpf.h: ../../include/uapi/linux/bpf.h
.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html
.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html
.. _test_sysctl_prog.c:
../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c

View File

@ -0,0 +1,126 @@
.. SPDX-License-Identifier: GPL-2.0
============================
BPF_PROG_TYPE_FLOW_DISSECTOR
============================
Overview
========
Flow dissector is a routine that parses metadata out of the packets. It's
used in the various places in the networking subsystem (RFS, flow hash, etc).
BPF flow dissector is an attempt to reimplement C-based flow dissector logic
in BPF to gain all the benefits of BPF verifier (namely, limits on the
number of instructions and tail calls).
API
===
BPF flow dissector programs operate on an ``__sk_buff``. However, only the
limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
and output arguments.
The inputs are:
* ``nhoff`` - initial offset of the networking header
* ``thoff`` - initial offset of the transport header, initialized to nhoff
* ``n_proto`` - L3 protocol type, parsed out of L2 header
Flow dissector BPF program should fill out the rest of the ``struct
bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
also adjusted accordingly.
The return code of the BPF program is either BPF_OK to indicate successful
dissection, or BPF_DROP to indicate parsing error.
__sk_buff->data
===============
In the VLAN-less case, this is what the initial state of the BPF flow
dissector looks like::
+------+------+------------+-----------+
| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
+------+------+------------+-----------+
^
|
+-- flow dissector starts here
.. code:: c
skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
flow_keys->thoff = nhoff
flow_keys->n_proto = ETHER_TYPE
In case of VLAN, flow dissector can be called with the two different states.
Pre-VLAN parsing::
+------+------+------+-----+-----------+-----------+
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+------+------+------+-----+-----------+-----------+
^
|
+-- flow dissector starts here
.. code:: c
skb->data + flow_keys->nhoff point the to first byte of TCI
flow_keys->thoff = nhoff
flow_keys->n_proto = TPID
Please note that TPID can be 802.1AD and, hence, BPF program would
have to parse VLAN information twice for double tagged packets.
Post-VLAN parsing::
+------+------+------+-----+-----------+-----------+
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+------+------+------+-----+-----------+-----------+
^
|
+-- flow dissector starts here
.. code:: c
skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
flow_keys->thoff = nhoff
flow_keys->n_proto = ETHER_TYPE
In this case VLAN information has been processed before the flow dissector
and BPF flow dissector is not required to handle it.
The takeaway here is as follows: BPF flow dissector program can be called with
the optional VLAN header and should gracefully handle both cases: when single
or double VLAN is present and when it is not present. The same program
can be called for both cases and would have to be written carefully to
handle both cases.
Reference Implementation
========================
See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
for the loader. bpftool can be used to load BPF flow dissector program as well.
The reference implementation is organized as follows:
* ``jmp_table`` map that contains sub-programs for each supported L3 protocol
* ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
does ``bpf_tail_call`` to the appropriate L3 handler
Since BPF at this point doesn't support looping (or any jumping back),
jmp_table is used instead to handle multiple levels of encapsulation (and
IPv6 options).
Current Limitations
===================
BPF flow dissector doesn't support exporting all the metadata that in-kernel
C-based implementation can export. Notable example is single VLAN (802.1Q)
and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
for a set of information that's currently can be exported from the BPF context.

View File

@ -8,61 +8,13 @@ both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
Plan is to use the same cgroup based management interface for blkio controller
and based on user options switch IO policies in the background.
Currently two IO control policies are implemented. First one is proportional
weight time based division of disk policy. It is implemented in CFQ. Hence
this policy takes effect only on leaf nodes when CFQ is being used. The second
one is throttling policy which can be used to specify upper IO rate limits
on devices. This policy is implemented in generic block layer and can be
used on leaf nodes as well as higher level logical devices like device mapper.
One IO control policy is throttling policy which can be used to
specify upper IO rate limits on devices. This policy is implemented in
generic block layer and can be used on leaf nodes as well as higher
level logical devices like device mapper.
HOWTO
=====
Proportional Weight division of bandwidth
-----------------------------------------
You can do a very simple testing of running two dd threads in two different
cgroups. Here is what you can do.
- Enable Block IO controller
CONFIG_BLK_CGROUP=y
- Enable group scheduling in CFQ
CONFIG_CFQ_GROUP_IOSCHED=y
- Compile and boot into kernel and mount IO controller (blkio); see
cgroups.txt, Why are cgroups needed?.
mount -t tmpfs cgroup_root /sys/fs/cgroup
mkdir /sys/fs/cgroup/blkio
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
- Create two cgroups
mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2
- Set weights of group test1 and test2
echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight
echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight
- Create two same size files (say 512MB each) on same disk (file1, file2) and
launch two dd threads in different cgroup to read those files.
sync
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/sdb/zerofile1 of=/dev/null &
echo $! > /sys/fs/cgroup/blkio/test1/tasks
cat /sys/fs/cgroup/blkio/test1/tasks
dd if=/mnt/sdb/zerofile2 of=/dev/null &
echo $! > /sys/fs/cgroup/blkio/test2/tasks
cat /sys/fs/cgroup/blkio/test2/tasks
- At macro level, first dd should finish first. To get more precise data, keep
on looking at (with the help of script), at blkio.disk_time and
blkio.disk_sectors files of both test1 and test2 groups. This will tell how
much disk time (in milliseconds), each group got and how many sectors each
group dispatched to the disk. We provide fairness in terms of disk time, so
ideally io.disk_time of cgroups should be in proportion to the weight.
Throttling/Upper Limit policy
-----------------------------
- Enable Block IO controller
@ -94,7 +46,7 @@ Throttling/Upper Limit policy
Hierarchical Cgroups
====================
Both CFQ and throttling implement hierarchy support; however,
Throttling implements hierarchy support; however,
throttling's hierarchy support is enabled iff "sane_behavior" is
enabled from cgroup side, which currently is a development option and
not publicly available.
@ -107,9 +59,8 @@ If somebody created a hierarchy like as follows.
|
test3
CFQ by default and throttling with "sane_behavior" will handle the
hierarchy correctly. For details on CFQ hierarchy support, refer to
Documentation/block/cfq-iosched.txt. For throttling, all limits apply
Throttling with "sane_behavior" will handle the
hierarchy correctly. For throttling, all limits apply
to the whole subtree while all statistics are local to the IOs
directly generated by tasks in that cgroup.
@ -130,10 +81,6 @@ CONFIG_DEBUG_BLK_CGROUP
- Debug help. Right now some additional stats file show up in cgroup
if this option is enabled.
CONFIG_CFQ_GROUP_IOSCHED
- Enables group scheduling in CFQ. Currently only 1 level of group
creation is allowed.
CONFIG_BLK_DEV_THROTTLING
- Enable block device throttling support in block layer.
@ -344,32 +291,3 @@ Common files among various policies
- blkio.reset_stats
- Writing an int to this file will result in resetting all the stats
for that cgroup.
CFQ sysfs tunable
=================
/sys/block/<disk>/queue/iosched/slice_idle
------------------------------------------
On a faster hardware CFQ can be slow, especially with sequential workload.
This happens because CFQ idles on a single queue and single queue might not
drive deeper request queue depths to keep the storage busy. In such scenarios
one can try setting slice_idle=0 and that would switch CFQ to IOPS
(IO operations per second) mode on NCQ supporting hardware.
That means CFQ will not idle between cfq queues of a cfq group and hence be
able to driver higher queue depth and achieve better throughput. That also
means that cfq provides fairness among groups in terms of IOPS and not in
terms of disk time.
/sys/block/<disk>/queue/iosched/group_idle
------------------------------------------
If one disables idling on individual cfq queues and cfq service trees by
setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
on the group in an attempt to provide fairness among groups.
By default group_idle is same as slice_idle and does not do anything if
slice_idle is enabled.
One can experience an overall throughput drop if you have created multiple
groups and put applications in that group which are not driving enough
IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
on individual groups and throughput should improve.

View File

@ -32,14 +32,18 @@ Brief summary of control files
hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit
For a system supporting two hugepage size (16M and 16G) the control
For a system supporting three hugepage sizes (64k, 32M and 1G), the control
files include:
hugetlb.16GB.limit_in_bytes
hugetlb.16GB.max_usage_in_bytes
hugetlb.16GB.usage_in_bytes
hugetlb.16GB.failcnt
hugetlb.16MB.limit_in_bytes
hugetlb.16MB.max_usage_in_bytes
hugetlb.16MB.usage_in_bytes
hugetlb.16MB.failcnt
hugetlb.1GB.limit_in_bytes
hugetlb.1GB.max_usage_in_bytes
hugetlb.1GB.usage_in_bytes
hugetlb.1GB.failcnt
hugetlb.64KB.limit_in_bytes
hugetlb.64KB.max_usage_in_bytes
hugetlb.64KB.usage_in_bytes
hugetlb.64KB.failcnt
hugetlb.32MB.limit_in_bytes
hugetlb.32MB.max_usage_in_bytes
hugetlb.32MB.usage_in_bytes
hugetlb.32MB.failcnt

View File

@ -1,5 +1,7 @@
Clearing WARN_ONCE
------------------
WARN_ONCE / WARN_ON_ONCE only print a warning once.
WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once.
echo 1 > /sys/kernel/debug/clear_warn_once

View File

@ -37,7 +37,7 @@ needs_sphinx = '1.3'
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
# The name of the math extension changed on Sphinx 1.4
if major == 1 and minor > 3:
if (major == 1 and minor > 3) or (major > 1):
extensions.append("sphinx.ext.imgmath")
else:
extensions.append("sphinx.ext.pngmath")

View File

@ -22,7 +22,6 @@ Core utilities
workqueue
genericirq
xarray
flexible-arrays
librs
genalloc
errseq

View File

@ -147,10 +147,10 @@ Division Functions
.. kernel-doc:: include/linux/math64.h
:internal:
.. kernel-doc:: lib/div64.c
.. kernel-doc:: lib/math/div64.c
:functions: div_s64_rem div64_u64_rem div64_u64 div64_s64
.. kernel-doc:: lib/gcd.c
.. kernel-doc:: lib/math/gcd.c
:export:
UUID/GUID

View File

@ -58,6 +58,14 @@ A raw pointer value may be printed with %p which will hash the address
before printing. The kernel also supports extended specifiers for printing
pointers of different types.
Some of the extended specifiers print the data on the given address instead
of printing the address itself. In this case, the following error messages
might be printed instead of the unreachable information::
(null) data on plain NULL address
(efault) data on invalid address
(einval) invalid data on a valid address
Plain Pointers
--------------

View File

@ -34,10 +34,6 @@ Configure the kernel with::
CONFIG_DEBUG_FS=y
CONFIG_GCOV_KERNEL=y
select the gcc's gcov format, default is autodetect based on gcc version::
CONFIG_GCOV_FORMAT_AUTODETECT=y
and to get coverage data for the entire kernel::
CONFIG_GCOV_PROFILE_ALL=y
@ -169,6 +165,20 @@ b) gcov is run on the BUILD machine
[user@build] gcov -o /tmp/coverage/tmp/out/init main.c
Note on compilers
-----------------
GCC and LLVM gcov tools are not necessarily compatible. Use gcov_ to work with
GCC-generated .gcno and .gcda files, and use llvm-cov_ for Clang.
.. _gcov: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
.. _llvm-cov: https://llvm.org/docs/CommandGuide/llvm-cov.html
Build differences between GCC and Clang gcov are handled by Kconfig. It
automatically selects the appropriate gcov format depending on the detected
toolchain.
Troubleshooting
---------------

View File

@ -7,6 +7,11 @@ directory. These are intended to be small tests to exercise individual code
paths in the kernel. Tests are intended to be run after building, installing
and booting a kernel.
You can find additional information on Kselftest framework, how to
write new tests using the framework on Kselftest wiki:
https://kselftest.wiki.kernel.org/
On some systems, hot-plug tests could hang forever waiting for cpu and
memory to be ready to be offlined. A special hot-plug target is created
to run the full range of hot-plug tests. In default mode, hot-plug tests run
@ -35,17 +40,32 @@ To build and run the tests with a single command, use::
Note that some tests will require root privileges.
Build and run from user specific object directory (make O=dir)::
Kselftest supports saving output files in a separate directory and then
running tests. To locate output files in a separate directory two syntaxes
are supported. In both cases the working directory must be the root of the
kernel src. This is applicable to "Running a subset of selftests" section
below.
To build, save output files in a separate directory with O= ::
$ make O=/tmp/kselftest kselftest
Build and run KBUILD_OUTPUT directory (make KBUILD_OUTPUT=)::
To build, save output files in a separate directory with KBUILD_OUTPUT ::
$ make KBUILD_OUTPUT=/tmp/kselftest kselftest
$ export KBUILD_OUTPUT=/tmp/kselftest; make kselftest
The above commands run the tests and print pass/fail summary to make it
easier to understand the test results. Please find the detailed individual
test results for each test in /tmp/testname file(s).
The O= assignment takes precedence over the KBUILD_OUTPUT environment
variable.
The above commands by default run the tests and print full pass/fail report.
Kselftest supports "summary" option to make it easier to understand the test
results. Please find the detailed individual test results for each test in
/tmp/testname file(s) when summary option is specified. This is applicable
to "Running a subset of selftests" section below.
To run kselftest with summary option enabled ::
$ make summary=1 kselftest
Running a subset of selftests
=============================
@ -61,17 +81,13 @@ You can specify multiple tests to build and run::
$ make TARGETS="size timers" kselftest
Build and run from user specific object directory (make O=dir)::
To build, save output files in a separate directory with O= ::
$ make O=/tmp/kselftest TARGETS="size timers" kselftest
Build and run KBUILD_OUTPUT directory (make KBUILD_OUTPUT=)::
To build, save output files in a separate directory with KBUILD_OUTPUT ::
$ make KBUILD_OUTPUT=/tmp/kselftest TARGETS="size timers" kselftest
The above commands run the tests and print pass/fail summary to make it
easier to understand the test results. Please find the detailed individual
test results for each test in /tmp/testname file(s).
$ export KBUILD_OUTPUT=/tmp/kselftest; make TARGETS="size timers" kselftest
See the top-level tools/testing/selftests/Makefile for the list of all
possible targets.

View File

@ -0,0 +1,272 @@
dm-dust
=======
This target emulates the behavior of bad sectors at arbitrary
locations, and the ability to enable the emulation of the failures
at an arbitrary time.
This target behaves similarly to a linear target. At a given time,
the user can send a message to the target to start failing read
requests on specific blocks (to emulate the behavior of a hard disk
drive with bad sectors).
When the failure behavior is enabled (i.e.: when the output of
"dmsetup status" displays "fail_read_on_bad_block"), reads of blocks
in the "bad block list" will fail with EIO ("Input/output error").
Writes of blocks in the "bad block list will result in the following:
1. Remove the block from the "bad block list".
2. Successfully complete the write.
This emulates the "remapped sector" behavior of a drive with bad
sectors.
Normally, a drive that is encountering bad sectors will most likely
encounter more bad sectors, at an unknown time or location.
With dm-dust, the user can use the "addbadblock" and "removebadblock"
messages to add arbitrary bad blocks at new locations, and the
"enable" and "disable" messages to modulate the state of whether the
configured "bad blocks" will be treated as bad, or bypassed.
This allows the pre-writing of test data and metadata prior to
simulating a "failure" event where bad sectors start to appear.
Table parameters:
-----------------
<device_path> <offset> <blksz>
Mandatory parameters:
<device_path>: path to the block device.
<offset>: offset to data area from start of device_path
<blksz>: block size in bytes
(minimum 512, maximum 1073741824, must be a power of 2)
Usage instructions:
-------------------
First, find the size (in 512-byte sectors) of the device to be used:
$ sudo blockdev --getsz /dev/vdb1
33552384
Create the dm-dust device:
(For a device with a block size of 512 bytes)
$ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 512'
(For a device with a block size of 4096 bytes)
$ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 4096'
Check the status of the read behavior ("bypass" indicates that all I/O
will be passed through to the underlying device):
$ sudo dmsetup status dust1
0 33552384 dust 252:17 bypass
$ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=128 iflag=direct
128+0 records in
128+0 records out
$ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct
128+0 records in
128+0 records out
Adding and removing bad blocks:
-------------------------------
At any time (i.e.: whether the device has the "bad block" emulation
enabled or disabled), bad blocks may be added or removed from the
device via the "addbadblock" and "removebadblock" messages:
$ sudo dmsetup message dust1 0 addbadblock 60
kernel: device-mapper: dust: badblock added at block 60
$ sudo dmsetup message dust1 0 addbadblock 67
kernel: device-mapper: dust: badblock added at block 67
$ sudo dmsetup message dust1 0 addbadblock 72
kernel: device-mapper: dust: badblock added at block 72
These bad blocks will be stored in the "bad block list".
While the device is in "bypass" mode, reads and writes will succeed:
$ sudo dmsetup status dust1
0 33552384 dust 252:17 bypass
Enabling block read failures:
-----------------------------
To enable the "fail read on bad block" behavior, send the "enable" message:
$ sudo dmsetup message dust1 0 enable
kernel: device-mapper: dust: enabling read failures on bad sectors
$ sudo dmsetup status dust1
0 33552384 dust 252:17 fail_read_on_bad_block
With the device in "fail read on bad block" mode, attempting to read a
block will encounter an "Input/output error":
$ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=1 skip=67 iflag=direct
dd: error reading '/dev/mapper/dust1': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 0.00040651 s, 0.0 kB/s
...and writing to the bad blocks will remove the blocks from the list,
therefore emulating the "remap" behavior of hard disk drives:
$ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct
128+0 records in
128+0 records out
kernel: device-mapper: dust: block 60 removed from badblocklist by write
kernel: device-mapper: dust: block 67 removed from badblocklist by write
kernel: device-mapper: dust: block 72 removed from badblocklist by write
kernel: device-mapper: dust: block 87 removed from badblocklist by write
Bad block add/remove error handling:
------------------------------------
Attempting to add a bad block that already exists in the list will
result in an "Invalid argument" error, as well as a helpful message:
$ sudo dmsetup message dust1 0 addbadblock 88
device-mapper: message ioctl on dust1 failed: Invalid argument
kernel: device-mapper: dust: block 88 already in badblocklist
Attempting to remove a bad block that doesn't exist in the list will
result in an "Invalid argument" error, as well as a helpful message:
$ sudo dmsetup message dust1 0 removebadblock 87
device-mapper: message ioctl on dust1 failed: Invalid argument
kernel: device-mapper: dust: block 87 not found in badblocklist
Counting the number of bad blocks in the bad block list:
--------------------------------------------------------
To count the number of bad blocks configured in the device, run the
following message command:
$ sudo dmsetup message dust1 0 countbadblocks
A message will print with the number of bad blocks currently
configured on the device:
kernel: device-mapper: dust: countbadblocks: 895 badblock(s) found
Querying for specific bad blocks:
---------------------------------
To find out if a specific block is in the bad block list, run the
following message command:
$ sudo dmsetup message dust1 0 queryblock 72
The following message will print if the block is in the list:
device-mapper: dust: queryblock: block 72 found in badblocklist
The following message will print if the block is in the list:
device-mapper: dust: queryblock: block 72 not found in badblocklist
The "queryblock" message command will work in both the "enabled"
and "disabled" modes, allowing the verification of whether a block
will be treated as "bad" without having to issue I/O to the device,
or having to "enable" the bad block emulation.
Clearing the bad block list:
----------------------------
To clear the bad block list (without needing to individually run
a "removebadblock" message command for every block), run the
following message command:
$ sudo dmsetup message dust1 0 clearbadblocks
After clearing the bad block list, the following message will appear:
kernel: device-mapper: dust: clearbadblocks: badblocks cleared
If there were no bad blocks to clear, the following message will
appear:
kernel: device-mapper: dust: clearbadblocks: no badblocks found
Message commands list:
----------------------
Below is a list of the messages that can be sent to a dust device:
Operations on blocks (requires a <blknum> argument):
addbadblock <blknum>
queryblock <blknum>
removebadblock <blknum>
...where <blknum> is a block number within range of the device
(corresponding to the block size of the device.)
Single argument message commands:
countbadblocks
clearbadblocks
disable
enable
quiet
Device removal:
---------------
When finished, remove the device via the "dmsetup remove" command:
$ sudo dmsetup remove dust1
Quiet mode:
-----------
On test runs with many bad blocks, it may be desirable to avoid
excessive logging (from bad blocks added, removed, or "remapped").
This can be done by enabling "quiet mode" via the following message:
$ sudo dmsetup message dust1 0 quiet
This will suppress log messages from add / remove / removed by write
operations. Log messages from "countbadblocks" or "queryblock"
message commands will still print in quiet mode.
The status of quiet mode can be seen by running "dmsetup status":
$ sudo dmsetup status dust1
0 33552384 dust 252:17 fail_read_on_bad_block quiet
To disable quiet mode, send the "quiet" message again:
$ sudo dmsetup message dust1 0 quiet
$ sudo dmsetup status dust1
0 33552384 dust 252:17 fail_read_on_bad_block verbose
(The presence of "verbose" indicates normal logging.)
"Why not...?"
-------------
scsi_debug has a "medium error" mode that can fail reads on one
specified sector (sector 0x1234, hardcoded in the source code), but
it uses RAM for the persistent storage, which drastically decreases
the potential device size.
dm-flakey fails all I/O from all block locations at a specified time
frequency, and not a given point in time.
When a bad sector occurs on a hard disk drive, reads to that sector
are failed by the device, usually resulting in an error code of EIO
("I/O error") or ENODATA ("No data available"). However, a write to
the sector may succeed, and result in the sector becoming readable
after the device controller no longer experiences errors reading the
sector (or after a reallocation of the sector). However, there may
be bad sectors that occur on the device in the future, in a different,
unpredictable location.
This target seeks to provide a device that can exhibit the behavior
of a bad sector at a known sector location, at a known time, based
on a large storage device (at least tens of gigabytes, not occupying
system memory).

View File

@ -21,6 +21,13 @@ mode it calculates and verifies the integrity tag internally. In this
mode, the dm-integrity target can be used to detect silent data
corruption on the disk or in the I/O path.
There's an alternate mode of operation where dm-integrity uses bitmap
instead of a journal. If a bit in the bitmap is 1, the corresponding
region's data and integrity tags are not synchronized - if the machine
crashes, the unsynchronized regions will be recalculated. The bitmap mode
is faster than the journal mode, because we don't have to write the data
twice, but it is also less reliable, because if data corruption happens
when the machine crashes, it may not be detected.
When loading the target for the first time, the kernel driver will format
the device. But it will only format the device if the superblock contains
@ -59,6 +66,10 @@ Target arguments:
either both data and tag or none of them are written. The
journaled mode degrades write throughput twice because the
data have to be written twice.
B - bitmap mode - data and metadata are written without any
synchronization, the driver maintains a bitmap of dirty
regions where data and metadata don't match. This mode can
only be used with internal hash.
R - recovery mode - in this mode, journal is not replayed,
checksums are not checked and writes to the device are not
allowed. This mode is useful for data recovery if the
@ -79,6 +90,10 @@ interleave_sectors:number
a power of two. If the device is already formatted, the value from
the superblock is used.
meta_device:device
Don't interleave the data and metadata on on device. Use a
separate device for metadata.
buffer_sectors:number
The number of sectors in one buffer. The value is rounded down to
a power of two.
@ -146,6 +161,15 @@ block_size:number
Supported values are 512, 1024, 2048 and 4096 bytes. If not
specified the default block size is 512 bytes.
sectors_per_bit:number
In the bitmap mode, this parameter specifies the number of
512-byte sectors that corresponds to one bitmap bit.
bitmap_flush_interval:number
The bitmap flush interval in milliseconds. The metadata buffers
are synchronized when this interval expires.
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
be changed when reloading the target (load an inactive table and swap the
tables with suspend and resume). The other arguments should not be changed
@ -167,7 +191,13 @@ The layout of the formatted block device:
provides (i.e. the size of the device minus the size of all
metadata and padding). The user of this target should not send
bios that access data beyond the "provided data sectors" limit.
* flags - a flag is set if journal_mac is used
* flags
SB_FLAG_HAVE_JOURNAL_MAC - a flag is set if journal_mac is used
SB_FLAG_RECALCULATING - recalculating is in progress
SB_FLAG_DIRTY_BITMAP - journal area contains the bitmap of dirty
blocks
* log2(sectors per block)
* a position where recalculating finished
* journal
The journal is divided into sections, each section contains:
* metadata area (4kiB), it contains journal entries

View File

@ -5,7 +5,7 @@ DT_MK_SCHEMA ?= dt-mk-schema
DT_MK_SCHEMA_FLAGS := $(if $(DT_SCHEMA_FILES), -u)
quiet_cmd_chk_binding = CHKDT $(patsubst $(srctree)/%,%,$<)
cmd_chk_binding = $(DT_DOC_CHECKER) $< ; \
cmd_chk_binding = $(DT_DOC_CHECKER) -u $(srctree)/$(src) $< ; \
$(DT_EXTRACT_EX) $< > $@
$(obj)/%.example.dts: $(src)/%.yaml FORCE

View File

@ -11,3 +11,15 @@ Example:
reg = <0xffd08000 0x1000>;
cpu1-start-addr = <0xffd080c4>;
};
ARM64 - Stratix10
Required properties:
- compatible : "altr,sys-mgr-s10"
- reg : Should contain 1 register range(address and length)
for system manager register.
Example:
sysmgr@ffd12000 {
compatible = "altr,sys-mgr-s10";
reg = <0xffd12000 0x228>;
};

View File

@ -110,6 +110,7 @@ Board compatible values (alphabetically, grouped by SoC):
- "amlogic,u200" (Meson g12a s905d2)
- "amediatech,x96-max" (Meson g12a s905x2)
- "seirobotics,sei510" (Meson g12a s905x2)
Amlogic Meson Firmware registers Interface
------------------------------------------

View File

@ -216,7 +216,7 @@ Example:
#size-cells = <0>;
A57_0: cpu@0 {
compatible = "arm,cortex-a57","arm,armv8";
compatible = "arm,cortex-a57";
reg = <0x0 0x0>;
device_type = "cpu";
enable-method = "psci";
@ -225,7 +225,7 @@ Example:
.....
A53_0: cpu@100 {
compatible = "arm,cortex-a53","arm,armv8";
compatible = "arm,cortex-a53";
reg = <0x0 0x100>;
device_type = "cpu";
enable-method = "psci";

View File

@ -25,6 +25,7 @@ compatible: must be one of:
o "atmel,at91sam9n12"
o "atmel,at91sam9rl"
o "atmel,at91sam9xe"
o "microchip,sam9x60"
* "atmel,sama5" for SoCs using a Cortex-A5, shall be extended with the specific
SoC family:
o "atmel,sama5d2" shall be extended with the specific SoC compatible:

View File

@ -84,7 +84,7 @@ SHDWC SAMA5D2-Compatible Shutdown Controller
1) shdwc node
required properties:
- compatible: should be "atmel,sama5d2-shdwc".
- compatible: should be "atmel,sama5d2-shdwc" or "microchip,sam9x60-shdwc".
- reg: should contain registers location and length
- clocks: phandle to input clock.
- #address-cells: should be one. The cell is the wake-up input index.
@ -96,6 +96,9 @@ optional properties:
microseconds. It's usually a board-related property.
- atmel,wakeup-rtc-timer: boolean to enable Real-Time Clock wake-up.
optional microchip,sam9x60-shdwc properties:
- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up.
The node contains child nodes for each wake-up input that the platform uses.
2) input nodes

View File

@ -8,7 +8,8 @@ through the intermediate links connecting the source to the currently selected
sink. Each CoreSight component device should use these properties to describe
its hardware characteristcs.
* Required properties for all components *except* non-configurable replicators:
* Required properties for all components *except* non-configurable replicators
and non-configurable funnels:
* compatible: These have to be supplemented with "arm,primecell" as
drivers are using the AMBA bus interface. Possible values include:
@ -24,8 +25,10 @@ its hardware characteristcs.
discovered at boot time when the device is probed.
"arm,coresight-tmc", "arm,primecell";
- Trace Funnel:
"arm,coresight-funnel", "arm,primecell";
- Trace Programmable Funnel:
"arm,coresight-dynamic-funnel", "arm,primecell";
"arm,coresight-funnel", "arm,primecell"; (OBSOLETE. For
backward compatibility and will be removed)
- Embedded Trace Macrocell (version 3.x) and
Program Flow Trace Macrocell:
@ -65,11 +68,17 @@ its hardware characteristcs.
"stm-stimulus-base", each corresponding to the areas defined in "reg".
* Required properties for devices that don't show up on the AMBA bus, such as
non-configurable replicators:
non-configurable replicators and non-configurable funnels:
* compatible: Currently supported value is (note the absence of the
AMBA markee):
- "arm,coresight-replicator"
- Coresight Non-configurable Replicator:
"arm,coresight-static-replicator";
"arm,coresight-replicator"; (OBSOLETE. For backward
compatibility and will be removed)
- Coresight Non-configurable Funnel:
"arm,coresight-static-funnel";
* port or ports: see "Graph bindings for Coresight" below.
@ -169,7 +178,7 @@ Example:
/* non-configurable replicators don't show up on the
* AMBA bus. As such no need to add "arm,primecell".
*/
compatible = "arm,coresight-replicator";
compatible = "arm,coresight-static-replicator";
out-ports {
#address-cells = <1>;
@ -200,8 +209,45 @@ Example:
};
};
funnel {
/*
* non-configurable funnel don't show up on the AMBA
* bus. As such no need to add "arm,primecell".
*/
compatible = "arm,coresight-static-funnel";
clocks = <&crg_ctrl HI3660_PCLK>;
clock-names = "apb_pclk";
out-ports {
port {
combo_funnel_out: endpoint {
remote-endpoint = <&top_funnel_in>;
};
};
};
in-ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
combo_funnel_in0: endpoint {
remote-endpoint = <&cluster0_etf_out>;
};
};
port@1 {
reg = <1>;
combo_funnel_in1: endpoint {
remote-endpoint = <&cluster1_etf_out>;
};
};
};
};
funnel@20040000 {
compatible = "arm,coresight-funnel", "arm,primecell";
compatible = "arm,coresight-dynamic-funnel", "arm,primecell";
reg = <0 0x20040000 0 0x1000>;
clocks = <&oscclk6a>;

View File

@ -118,7 +118,7 @@ cpus {
};
A57_0: cpu@0 {
compatible = "arm,cortex-a57","arm,armv8";
compatible = "arm,cortex-a57";
reg = <0x0 0x0>;
device_type = "cpu";
enable-method = "psci";
@ -129,7 +129,7 @@ cpus {
};
A57_1: cpu@1 {
compatible = "arm,cortex-a57","arm,armv8";
compatible = "arm,cortex-a57";
reg = <0x0 0x1>;
device_type = "cpu";
enable-method = "psci";
@ -140,7 +140,7 @@ cpus {
};
A53_0: cpu@100 {
compatible = "arm,cortex-a53","arm,armv8";
compatible = "arm,cortex-a53";
reg = <0x0 0x100>;
device_type = "cpu";
enable-method = "psci";
@ -151,7 +151,7 @@ cpus {
};
A53_1: cpu@101 {
compatible = "arm,cortex-a53","arm,armv8";
compatible = "arm,cortex-a53";
reg = <0x0 0x101>;
device_type = "cpu";
enable-method = "psci";
@ -162,7 +162,7 @@ cpus {
};
A53_2: cpu@102 {
compatible = "arm,cortex-a53","arm,armv8";
compatible = "arm,cortex-a53";
reg = <0x0 0x102>;
device_type = "cpu";
enable-method = "psci";
@ -173,7 +173,7 @@ cpus {
};
A53_3: cpu@103 {
compatible = "arm,cortex-a53","arm,armv8";
compatible = "arm,cortex-a53";
reg = <0x0 0x103>;
device_type = "cpu";
enable-method = "psci";

View File

@ -67,6 +67,7 @@ properties:
patternProperties:
'^cpu@[0-9a-f]+$':
type: object
properties:
device_type:
const: cpu

View File

@ -22,9 +22,11 @@ Required properties:
-------------------
- compatible: should be "fsl,imx-scu".
- mbox-names: should include "tx0", "tx1", "tx2", "tx3",
"rx0", "rx1", "rx2", "rx3".
- mboxes: List of phandle of 4 MU channels for tx and 4 MU channels
for rx. All 8 MU channels must be in the same MU instance.
"rx0", "rx1", "rx2", "rx3";
include "gip3" if want to support general MU interrupt.
- mboxes: List of phandle of 4 MU channels for tx, 4 MU channels for
rx, and 1 optional MU channel for general interrupt.
All MU channels must be in the same MU instance.
Cross instances are not allowed. The MU instance can only
be one of LSIO MU0~M4 for imx8qxp and imx8qm. Users need
to make sure use the one which is not conflict with other
@ -34,6 +36,7 @@ Required properties:
Channel 1 must be "tx1" or "rx1".
Channel 2 must be "tx2" or "rx2".
Channel 3 must be "tx3" or "rx3".
General interrupt rx channel must be "gip3".
e.g.
mboxes = <&lsio_mu1 0 0
&lsio_mu1 0 1
@ -42,10 +45,18 @@ Required properties:
&lsio_mu1 1 0
&lsio_mu1 1 1
&lsio_mu1 1 2
&lsio_mu1 1 3>;
&lsio_mu1 1 3
&lsio_mu1 3 3>;
See Documentation/devicetree/bindings/mailbox/fsl,mu.txt
for detailed mailbox binding.
Note: Each mu which supports general interrupt should have an alias correctly
numbered in "aliases" node.
e.g.
aliases {
mu1 = &lsio_mu1;
};
i.MX SCU Client Device Node:
============================================================
@ -124,6 +135,10 @@ Required properties:
Example (imx8qxp):
-------------
aliases {
mu1 = &lsio_mu1;
};
lsio_mu1: mailbox@5d1c0000 {
...
#mbox-cells = <2>;
@ -133,7 +148,8 @@ firmware {
scu {
compatible = "fsl,imx-scu";
mbox-names = "tx0", "tx1", "tx2", "tx3",
"rx0", "rx1", "rx2", "rx3";
"rx0", "rx1", "rx2", "rx3",
"gip3";
mboxes = <&lsio_mu1 0 0
&lsio_mu1 0 1
&lsio_mu1 0 2
@ -141,7 +157,8 @@ firmware {
&lsio_mu1 1 0
&lsio_mu1 1 1
&lsio_mu1 1 2
&lsio_mu1 1 3>;
&lsio_mu1 1 3
&lsio_mu1 3 3>;
clk: clk {
compatible = "fsl,imx8qxp-clk", "fsl,scu-clk";

View File

@ -51,6 +51,13 @@ properties:
- const: i2se,duckbill-2
- const: fsl,imx28
- description: i.MX50 based Boards
items:
- enum:
- fsl,imx50-evk
- kobo,aura
- const: fsl,imx50
- description: i.MX51 Babbage Board
items:
- enum:
@ -67,6 +74,7 @@ properties:
- fsl,imx53-evk
- fsl,imx53-qsb
- fsl,imx53-smd
- menlo,m53menlo
- const: fsl,imx53
- description: i.MX6Q based Boards
@ -90,6 +98,7 @@ properties:
- description: i.MX6DL based Boards
items:
- enum:
- eckelmann,imx6dl-ci4x10
- fsl,imx6dl-sabreauto # i.MX6 DualLite/Solo SABRE Automotive Board
- fsl,imx6dl-sabresd # i.MX6 DualLite SABRE Smart Device Board
- technologic,imx6dl-ts4900
@ -137,10 +146,18 @@ properties:
- const: fsl,imx6ull # This seems odd. Should be last?
- const: fsl,imx6ulz
- description: i.MX7S based Boards
items:
- enum:
- tq,imx7s-mba7 # i.MX7S TQ MBa7 with TQMa7S SoM
- const: fsl,imx7s
- description: i.MX7D based Boards
items:
- enum:
- fsl,imx7d-sdb # i.MX7 SabreSD Board
- tq,imx7d-mba7 # i.MX7D TQ MBa7 with TQMa7D SoM
- zii,imx7d-rpu2 # ZII RPU2 Board
- const: fsl,imx7d
- description:
@ -154,6 +171,12 @@ properties:
- const: compulab,cl-som-imx7
- const: fsl,imx7d
- description: i.MX8MM based Boards
items:
- enum:
- fsl,imx8mm-evk # i.MX8MM EVK Board
- const: fsl,imx8mm
- description: i.MX8QXP based Boards
items:
- enum:
@ -176,6 +199,19 @@ properties:
- fsl,vf610
- fsl,vf610m4
- description: ZII's VF610 based Boards
items:
- enum:
- zii,vf610cfu1 # ZII VF610 CFU1 Board
- zii,vf610dev-c # ZII VF610 Development Board, Rev C
- zii,vf610dev-b # ZII VF610 Development Board, Rev B
- zii,vf610scu4-aib # ZII VF610 SCU4 AIB
- zii,vf610dtu # ZII VF610 SSMB DTU Board
- zii,vf610spu3 # ZII VF610 SSMB SPU3 Board
- zii,vf610spb4 # ZII VF610 SPB4 Board
- const: zii,vf610dev
- const: fsl,vf610
- description: LS1012A based Boards
items:
- enum:

View File

@ -0,0 +1,22 @@
# SPDX-License-Identifier: GPL-2.0
%YAML 1.2
---
$id: http://devicetree.org/schemas/arm/intel-ixp4xx.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Intel IXP4xx Device Tree Bindings
maintainers:
- Linus Walleij <linus.walleij@linaro.org>
properties:
compatible:
oneOf:
- items:
- enum:
- linksys,nslu2
- const: intel,ixp42x
- items:
- enum:
- gateworks,gw2358
- const: intel,ixp43x

View File

@ -24,7 +24,8 @@ relationship between the TI-SCI parent node to the child node.
Required properties:
-------------------
- compatible: should be "ti,k2g-sci"
- compatible: should be "ti,k2g-sci" for TI 66AK2G SoC
should be "ti,am654-sci" for for TI AM654 SoC
- mbox-names:
"rx" - Mailbox corresponding to receive path
"tx" - Mailbox corresponding to transmit path

View File

@ -14,6 +14,8 @@ Required Properties:
- "mediatek,mt7629-apmixedsys"
- "mediatek,mt8135-apmixedsys"
- "mediatek,mt8173-apmixedsys"
- "mediatek,mt8183-apmixedsys", "syscon"
- "mediatek,mt8516-apmixedsys"
- #clock-cells: Must be 1
The apmixedsys controller uses the common clk binding from

View File

@ -9,6 +9,7 @@ Required Properties:
- "mediatek,mt2701-audsys", "syscon"
- "mediatek,mt7622-audsys", "syscon"
- "mediatek,mt7623-audsys", "mediatek,mt2701-audsys", "syscon"
- "mediatek,mt8183-audiosys", "syscon"
- #clock-cells: Must be 1
The AUDSYS controller uses the common clk binding from

View File

@ -0,0 +1,22 @@
MediaTek CAMSYS controller
============================
The MediaTek camsys controller provides various clocks to the system.
Required Properties:
- compatible: Should be one of:
- "mediatek,mt8183-camsys", "syscon"
- #clock-cells: Must be 1
The camsys controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
camsys: camsys@1a000000 {
compatible = "mediatek,mt8183-camsys", "syscon";
reg = <0 0x1a000000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -11,6 +11,7 @@ Required Properties:
- "mediatek,mt6797-imgsys", "syscon"
- "mediatek,mt7623-imgsys", "mediatek,mt2701-imgsys", "syscon"
- "mediatek,mt8173-imgsys", "syscon"
- "mediatek,mt8183-imgsys", "syscon"
- #clock-cells: Must be 1
The imgsys controller uses the common clk binding from

View File

@ -15,6 +15,8 @@ Required Properties:
- "mediatek,mt7629-infracfg", "syscon"
- "mediatek,mt8135-infracfg", "syscon"
- "mediatek,mt8173-infracfg", "syscon"
- "mediatek,mt8183-infracfg", "syscon"
- "mediatek,mt8516-infracfg", "syscon"
- #clock-cells: Must be 1
- #reset-cells: Must be 1

View File

@ -0,0 +1,43 @@
Mediatek IPU controller
============================
The Mediatek ipu controller provides various clocks to the system.
Required Properties:
- compatible: Should be one of:
- "mediatek,mt8183-ipu_conn", "syscon"
- "mediatek,mt8183-ipu_adl", "syscon"
- "mediatek,mt8183-ipu_core0", "syscon"
- "mediatek,mt8183-ipu_core1", "syscon"
- #clock-cells: Must be 1
The ipu controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
ipu_conn: syscon@19000000 {
compatible = "mediatek,mt8183-ipu_conn", "syscon";
reg = <0 0x19000000 0 0x1000>;
#clock-cells = <1>;
};
ipu_adl: syscon@19010000 {
compatible = "mediatek,mt8183-ipu_adl", "syscon";
reg = <0 0x19010000 0 0x1000>;
#clock-cells = <1>;
};
ipu_core0: syscon@19180000 {
compatible = "mediatek,mt8183-ipu_core0", "syscon";
reg = <0 0x19180000 0 0x1000>;
#clock-cells = <1>;
};
ipu_core1: syscon@19280000 {
compatible = "mediatek,mt8183-ipu_core1", "syscon";
reg = <0 0x19280000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -7,6 +7,7 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2712-mcucfg", "syscon"
- "mediatek,mt8183-mcucfg", "syscon"
- #clock-cells: Must be 1
The mcucfg controller uses the common clk binding from

View File

@ -7,6 +7,7 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2712-mfgcfg", "syscon"
- "mediatek,mt8183-mfgcfg", "syscon"
- #clock-cells: Must be 1
The mfgcfg controller uses the common clk binding from

View File

@ -11,6 +11,7 @@ Required Properties:
- "mediatek,mt6797-mmsys", "syscon"
- "mediatek,mt7623-mmsys", "mediatek,mt2701-mmsys", "syscon"
- "mediatek,mt8173-mmsys", "syscon"
- "mediatek,mt8183-mmsys", "syscon"
- #clock-cells: Must be 1
The mmsys controller uses the common clk binding from

View File

@ -14,6 +14,8 @@ Required Properties:
- "mediatek,mt7629-topckgen"
- "mediatek,mt8135-topckgen"
- "mediatek,mt8173-topckgen"
- "mediatek,mt8183-topckgen", "syscon"
- "mediatek,mt8516-topckgen"
- #clock-cells: Must be 1
The topckgen controller uses the common clk binding from

View File

@ -11,6 +11,7 @@ Required Properties:
- "mediatek,mt6797-vdecsys", "syscon"
- "mediatek,mt7623-vdecsys", "mediatek,mt2701-vdecsys", "syscon"
- "mediatek,mt8173-vdecsys", "syscon"
- "mediatek,mt8183-vdecsys", "syscon"
- #clock-cells: Must be 1
The vdecsys controller uses the common clk binding from

View File

@ -9,6 +9,7 @@ Required Properties:
- "mediatek,mt2712-vencsys", "syscon"
- "mediatek,mt6797-vencsys", "syscon"
- "mediatek,mt8173-vencsys", "syscon"
- "mediatek,mt8183-vencsys", "syscon"
- #clock-cells: Must be 1
The vencsys controller uses the common clk binding from

View File

@ -41,7 +41,7 @@ Examples:
Consumer:
========
See Documentation/devicetree/bindings/interrupt-controller/interrupts.txt and
Documentation/devicetree/bindings/interrupt-controller/arm,gic.txt for
Documentation/devicetree/bindings/interrupt-controller/arm,gic.yaml for
further details.
An interrupt consumer on an SoC using crossbar will use:

View File

@ -92,6 +92,9 @@ SoCs:
- DRA718
compatible = "ti,dra718", "ti,dra722", "ti,dra72", "ti,dra7"
- AM5748
compatible = "ti,am5748", "ti,dra762", "ti,dra7"
- AM5728
compatible = "ti,am5728", "ti,dra742", "ti,dra74", "ti,dra7"
@ -184,6 +187,9 @@ Boards:
- AM57XX SBC-AM57x
compatible = "compulab,sbc-am57x", "compulab,cl-som-am57x", "ti,am5728", "ti,dra742", "ti,dra74", "ti,dra7"
- AM5748 IDK
compatible = "ti,am5748-idk", "ti,am5748", "ti,dra762", "ti,dra7";
- AM5728 IDK
compatible = "ti,am5728-idk", "ti,am5728", "ti,dra742", "ti,dra74", "ti,dra7"

View File

@ -97,6 +97,7 @@ properties:
- enum:
- friendlyarm,nanopc-t4
- friendlyarm,nanopi-m4
- friendlyarm,nanopi-neo4
- const: rockchip,rk3399
- description: GeekBuying GeekBox
@ -146,7 +147,7 @@ properties:
- const: google,gru
- const: rockchip,rk3399
- description: Google Jaq (Haier Chromebook 11 and more)
- description: Google Jaq (Haier Chromebook 11 and more w/ uSD)
items:
- const: google,veyron-jaq-rev5
- const: google,veyron-jaq-rev4
@ -159,6 +160,12 @@ properties:
- description: Google Jerry (Hisense Chromebook C11 and more)
items:
- const: google,veyron-jerry-rev15
- const: google,veyron-jerry-rev14
- const: google,veyron-jerry-rev13
- const: google,veyron-jerry-rev12
- const: google,veyron-jerry-rev11
- const: google,veyron-jerry-rev10
- const: google,veyron-jerry-rev7
- const: google,veyron-jerry-rev6
- const: google,veyron-jerry-rev5
@ -199,6 +206,17 @@ properties:
- const: google,veyron
- const: rockchip,rk3288
- description: Google Mighty (Haier Chromebook 11 and more w/ SD)
items:
- const: google,veyron-mighty-rev5
- const: google,veyron-mighty-rev4
- const: google,veyron-mighty-rev3
- const: google,veyron-mighty-rev2
- const: google,veyron-mighty-rev1
- const: google,veyron-mighty
- const: google,veyron
- const: rockchip,rk3288
- description: Google Minnie (Asus Chromebook Flip C100P)
items:
- const: google,veyron-minnie-rev4
@ -308,6 +326,11 @@ properties:
- const: netxeon,r89
- const: rockchip,rk3288
- description: Orange Pi RK3399 board
items:
- const: rockchip,rk3399-orangepi
- const: rockchip,rk3399
- description: Phytec phyCORE-RK3288 Rapid Development Kit
items:
- const: phytec,rk3288-pcm-947

Some files were not shown because too many files have changed in this diff Show More