mirror of
https://github.com/torvalds/linux.git
synced 2024-11-22 04:02:20 +00:00
Networking changes for 6.2.
Core ---- - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations. - Add inet drop monitor support. - A few GRO performance improvements. - Add infrastructure for atomic dev stats, addressing long standing data races. - De-duplicate common code between OVS and conntrack offloading infrastructure. - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements. - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs. - Add the helper support for connection-tracking OVS offload. BPF --- - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF. - Make cgroup local storage available to non-cgroup attached BPF programs. - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers. - A relevant bunch of BPF verifier fixes and improvements. - Veristat tool improvements to support custom filtering, sorting, and replay of results. - Add LLVM disassembler as default library for dumping JITed code. - Lots of new BPF documentation for various BPF maps. - Add bpf_rcu_read_{,un}lock() support for sleepable programs. - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs. - Add support storing struct task_struct objects as kptrs in maps. - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values. - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions. Protocols --------- - TCP: implement Protective Load Balancing across switch links. - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path. - UDP: Introduce optional per-netns hash lookup table. - IPv6: simplify and cleanup sockets disposal. - Netlink: support different type policies for each generic netlink operation. - MPTCP: add MSG_FASTOPEN and FastOpen listener side support. - MPTCP: add netlink notification support for listener sockets events. - SCTP: add VRF support, allowing sctp sockets binding to VRF devices. - Add bridging MAC Authentication Bypass (MAB) support. - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios. - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage. - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading. - IPSec: extended ack support for more descriptive XFRM error reporting. - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking. - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks. - Tun: bump the link speed from 10Mbps to 10Gbps. - Tun/VirtioNet: implement UDP segmentation offload support. Driver API ---------- - PHY/SFP: improve power level switching between standard level 1 and the higher power levels. - New API for netdev <-> devlink_port linkage. - PTP: convert existing drivers to new frequency adjustment implementation. - DSA: add support for rx offloading. - Autoload DSA tagging driver when dynamically changing protocol. - Add new PCP and APPTRUST attributes to Data Center Bridging. - Add configuration support for 800Gbps link speed. - Add devlink port function attribute to enable/disable RoCE and migratable. - Extend devlink-rate to support strict prioriry and weighted fair queuing. - Add devlink support to directly reading from region memory. - New device tree helper to fetch MAC address from nvmem. - New big TCP helper to simplify temporary header stripping. New hardware / drivers ---------------------- - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches. - Marvel Prestera AC5X Ethernet Switch. - WangXun 10 Gigabit NIC. - Motorcomm yt8521 Gigabit Ethernet. - Microchip ksz9563 Gigabit Ethernet Switch. - Microsoft Azure Network Adapter. - Linux Automation 10Base-T1L adapter. - PHY: - Aquantia AQR112 and AQR412. - Motorcomm YT8531S. - PTP: - Orolia ART-CARD. - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices. - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices. - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets. - Realtek RTL8852BE and RTL8723DS. - Cypress.CYW4373A0 WiFi + Bluetooth combo device. Drivers ------- - CAN: - gs_usb: bus error reporting support. - kvaser_usb: listen only and bus error reporting support. - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping. - implement devlink-rate support. - support direct read from memory. - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate. - Support for enhanced events compression. - extend H/W offload packet manipulation capabilities. - implement IPSec packet offload mode. - nVidia/Mellanox (mlx4): - better big TCP support. - Netronome Ethernet NICs (nfp): - IPsec offload support. - add support for multicast filter. - Broadcom: - RSS and PTP support improvements. - AMD/SolarFlare: - netlink extened ack improvements. - add basic flower matches to offload, and related stats. - Virtual NICs: - ibmvnic: introduce affinity hint support. - small / embedded: - FreeScale fec: add initial XDP support. - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood. - TI am65-cpsw: add suspend/resume support. - Mediatek MT7986: add RX wireless wthernet dispatch support. - Realtek 8169: enable GRO software interrupt coalescing per default. - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP. - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support. - add ip6gre support. - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support. - enable flow offload support. - Renesas: - add rswitch R-Car Gen4 gPTP support. - Microchip (lan966x): - add full XDP support. - add TC H/W offload via VCAP. - enable PTP on bridge interfaces. - Microchip (ksz8): - add MTU support for KSZ8 series. - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan. - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support. - add ack signal support. - enable coredump support. - remain_on_channel support. - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities. - 320 MHz channels support. - RealTek WiFi (rtw89): - new dynamic header firmware format support. - wake-over-WLAN support. Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmOYXUcSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOk8zQP/R7BZtbJMTPiWkRnSoKHnAyupDVwrz5U ktukLkwPsCyJuEbAjgxrxf4EEEQ9uq2FFlxNSYuKiiQMqIpFxV6KED7LCUygn4Tc kxtkp0Q+5XiqisWlQmtfExf2OjuuPqcjV9tWCDBI6GebKUbfNwY/eI44RcMu4BSv DzIlW5GkX/kZAPqnnuqaLsN3FudDTJHGEAD7NbA++7wJ076RWYSLXlFv0Z+SCSPS H8/PEG0/ZK/65rIWMAFRClJ9BNIDwGVgp0GrsIvs1gqbRUOlA1hl1rDM21TqtNFf 5QPQT7sIfTcCE/nerxKJD5JE3JyP+XRlRn96PaRw3rt4MgI6I/EOj/HOKQ5tMCNc oPiqb7N70+hkLZyr42qX+vN9eDPjp2koEQm7EO2Zs+/534/zWDs24Zfk/Aa1ps0I Fa82oGjAgkBhGe/FZ6i5cYoLcyxqRqZV1Ws9XQMl72qRC7/BwvNbIW6beLpCRyeM yYIU+0e9dEm+wHQEdh2niJuVtR63hy8tvmPx56lyh+6u0+pondkwbfSiC5aD3kAC ikKsN5DyEsdXyiBAlytCEBxnaOjQy4RAz+3YXSiS0eBNacXp03UUrNGx4Pzpu/D0 QLFJhBnMFFCgy5to8/DvKnrTPgZdSURwqbIUcZdvU21f1HLR8tUTpaQnYffc/Whm V8gnt1EL+0cc =CbJC -----END PGP SIGNATURE----- Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations - Add inet drop monitor support - A few GRO performance improvements - Add infrastructure for atomic dev stats, addressing long standing data races - De-duplicate common code between OVS and conntrack offloading infrastructure - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs - Add the helper support for connection-tracking OVS offload BPF: - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF - Make cgroup local storage available to non-cgroup attached BPF programs - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers - A relevant bunch of BPF verifier fixes and improvements - Veristat tool improvements to support custom filtering, sorting, and replay of results - Add LLVM disassembler as default library for dumping JITed code - Lots of new BPF documentation for various BPF maps - Add bpf_rcu_read_{,un}lock() support for sleepable programs - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs - Add support storing struct task_struct objects as kptrs in maps - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions Protocols: - TCP: implement Protective Load Balancing across switch links - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path - UDP: Introduce optional per-netns hash lookup table - IPv6: simplify and cleanup sockets disposal - Netlink: support different type policies for each generic netlink operation - MPTCP: add MSG_FASTOPEN and FastOpen listener side support - MPTCP: add netlink notification support for listener sockets events - SCTP: add VRF support, allowing sctp sockets binding to VRF devices - Add bridging MAC Authentication Bypass (MAB) support - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading - IPSec: extended ack support for more descriptive XFRM error reporting - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks - Tun: bump the link speed from 10Mbps to 10Gbps - Tun/VirtioNet: implement UDP segmentation offload support Driver API: - PHY/SFP: improve power level switching between standard level 1 and the higher power levels - New API for netdev <-> devlink_port linkage - PTP: convert existing drivers to new frequency adjustment implementation - DSA: add support for rx offloading - Autoload DSA tagging driver when dynamically changing protocol - Add new PCP and APPTRUST attributes to Data Center Bridging - Add configuration support for 800Gbps link speed - Add devlink port function attribute to enable/disable RoCE and migratable - Extend devlink-rate to support strict prioriry and weighted fair queuing - Add devlink support to directly reading from region memory - New device tree helper to fetch MAC address from nvmem - New big TCP helper to simplify temporary header stripping New hardware / drivers: - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter - PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S - PTP: - Orolia ART-CARD - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device Drivers: - CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support - RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support" * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...
This commit is contained in:
commit
7e68dd7d07
@ -298,3 +298,48 @@ A: NO.
|
||||
|
||||
The BTF_ID macro does not cause a function to become part of the ABI
|
||||
any more than does the EXPORT_SYMBOL_GPL macro.
|
||||
|
||||
Q: What is the compatibility story for special BPF types in map values?
|
||||
-----------------------------------------------------------------------
|
||||
Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map
|
||||
values (when using BTF support for BPF maps). This allows to use helpers for
|
||||
such objects on these fields inside map values. Users are also allowed to embed
|
||||
pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the
|
||||
kernel preserve backwards compatibility for these features?
|
||||
|
||||
A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else:
|
||||
NO, but see below.
|
||||
|
||||
For struct types that have been added already, like bpf_spin_lock and bpf_timer,
|
||||
the kernel will preserve backwards compatibility, as they are part of UAPI.
|
||||
|
||||
For kptrs, they are also part of UAPI, but only with respect to the kptr
|
||||
mechanism. The types that you can use with a __kptr and __kptr_ref tagged
|
||||
pointer in your struct are NOT part of the UAPI contract. The supported types can
|
||||
and will change across kernel releases. However, operations like accessing kptr
|
||||
fields and bpf_kptr_xchg() helper will continue to be supported across kernel
|
||||
releases for the supported types.
|
||||
|
||||
For any other supported struct type, unless explicitly stated in this document
|
||||
and added to bpf.h UAPI header, such types can and will arbitrarily change their
|
||||
size, type, and alignment, or any other user visible API or ABI detail across
|
||||
kernel releases. The users must adapt their BPF programs to the new changes and
|
||||
update them to make sure their programs continue to work correctly.
|
||||
|
||||
NOTE: BPF subsystem specially reserves the 'bpf\_' prefix for type names, in
|
||||
order to introduce more special fields in the future. Hence, user programs must
|
||||
avoid defining types with 'bpf\_' prefix to not be broken in future releases.
|
||||
In other words, no backwards compatibility is guaranteed if one using a type
|
||||
in BTF with 'bpf\_' prefix.
|
||||
|
||||
Q: What is the compatibility story for special BPF types in allocated objects?
|
||||
------------------------------------------------------------------------------
|
||||
Q: Same as above, but for allocated objects (i.e. objects allocated using
|
||||
bpf_obj_new for user defined types). Will the kernel preserve backwards
|
||||
compatibility for these features?
|
||||
|
||||
A: NO.
|
||||
|
||||
Unlike map value types, there are no stability guarantees for this case. The
|
||||
whole API to work with allocated objects and any support for special fields
|
||||
inside them is unstable (since it is exposed through kfuncs).
|
||||
|
@ -44,6 +44,33 @@ is a guarantee that the reported issue will be overlooked.**
|
||||
Submitting patches
|
||||
==================
|
||||
|
||||
Q: How do I run BPF CI on my changes before sending them out for review?
|
||||
------------------------------------------------------------------------
|
||||
A: BPF CI is GitHub based and hosted at https://github.com/kernel-patches/bpf.
|
||||
While GitHub also provides a CLI that can be used to accomplish the same
|
||||
results, here we focus on the UI based workflow.
|
||||
|
||||
The following steps lay out how to start a CI run for your patches:
|
||||
|
||||
- Create a fork of the aforementioned repository in your own account (one time
|
||||
action)
|
||||
|
||||
- Clone the fork locally, check out a new branch tracking either the bpf-next
|
||||
or bpf branch, and apply your to-be-tested patches on top of it
|
||||
|
||||
- Push the local branch to your fork and create a pull request against
|
||||
kernel-patches/bpf's bpf-next_base or bpf_base branch, respectively
|
||||
|
||||
Shortly after the pull request has been created, the CI workflow will run. Note
|
||||
that capacity is shared with patches submitted upstream being checked and so
|
||||
depending on utilization the run can take a while to finish.
|
||||
|
||||
Note furthermore that both base branches (bpf-next_base and bpf_base) will be
|
||||
updated as patches are pushed to the respective upstream branches they track. As
|
||||
such, your patch set will automatically (be attempted to) be rebased as well.
|
||||
This behavior can result in a CI run being aborted and restarted with the new
|
||||
base line.
|
||||
|
||||
Q: To which mailing list do I need to submit my BPF patches?
|
||||
------------------------------------------------------------
|
||||
A: Please submit your BPF patches to the bpf kernel mailing list:
|
||||
|
485
Documentation/bpf/bpf_iterators.rst
Normal file
485
Documentation/bpf/bpf_iterators.rst
Normal file
@ -0,0 +1,485 @@
|
||||
=============
|
||||
BPF Iterators
|
||||
=============
|
||||
|
||||
|
||||
----------
|
||||
Motivation
|
||||
----------
|
||||
|
||||
There are a few existing ways to dump kernel data into user space. The most
|
||||
popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps
|
||||
all tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink
|
||||
sockets in the system. However, their output format tends to be fixed, and if
|
||||
users want more information about these sockets, they have to patch the kernel,
|
||||
which often takes time to publish upstream and release. The same is true for popular
|
||||
tools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any
|
||||
additional information needs a kernel patch.
|
||||
|
||||
To solve this problem, the `drgn
|
||||
<https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to
|
||||
dig out the kernel data with no kernel change. However, the main drawback for
|
||||
drgn is performance, as it cannot do pointer tracing inside the kernel. In
|
||||
addition, drgn cannot validate a pointer value and may read invalid data if the
|
||||
pointer becomes invalid inside the kernel.
|
||||
|
||||
The BPF iterator solves the above problem by providing flexibility on what data
|
||||
(e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel
|
||||
data object.
|
||||
|
||||
----------------------
|
||||
How BPF Iterators Work
|
||||
----------------------
|
||||
|
||||
A BPF iterator is a type of BPF program that allows users to iterate over
|
||||
specific types of kernel objects. Unlike traditional BPF tracing programs that
|
||||
allow users to define callbacks that are invoked at particular points of
|
||||
execution in the kernel, BPF iterators allow users to define callbacks that
|
||||
should be executed for every entry in a variety of kernel data structures.
|
||||
|
||||
For example, users can define a BPF iterator that iterates over every task on
|
||||
the system and dumps the total amount of CPU runtime currently used by each of
|
||||
them. Another BPF task iterator may instead dump the cgroup information for each
|
||||
task. Such flexibility is the core value of BPF iterators.
|
||||
|
||||
A BPF program is always loaded into the kernel at the behest of a user space
|
||||
process. A user space process loads a BPF program by opening and initializing
|
||||
the program skeleton as required and then invoking a syscall to have the BPF
|
||||
program verified and loaded by the kernel.
|
||||
|
||||
In traditional tracing programs, a program is activated by having user space
|
||||
obtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once
|
||||
activated, the program callback will be invoked whenever the tracepoint is
|
||||
triggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the
|
||||
program is obtained using ``bpf_link_create()``, and the program callback is
|
||||
invoked by issuing system calls from user space.
|
||||
|
||||
Next, let us see how you can use the iterators to iterate on kernel objects and
|
||||
read data.
|
||||
|
||||
------------------------
|
||||
How to Use BPF iterators
|
||||
------------------------
|
||||
|
||||
BPF selftests are a great resource to illustrate how to use the iterators. In
|
||||
this section, we’ll walk through a BPF selftest which shows how to load and use
|
||||
a BPF iterator program. To begin, we’ll look at `bpf_iter.c
|
||||
<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_,
|
||||
which illustrates how to load and trigger BPF iterators on the user space side.
|
||||
Later, we’ll look at a BPF program that runs in kernel space.
|
||||
|
||||
Loading a BPF iterator in the kernel from user space typically involves the
|
||||
following steps:
|
||||
|
||||
* The BPF program is loaded into the kernel through ``libbpf``. Once the kernel
|
||||
has verified and loaded the program, it returns a file descriptor (fd) to user
|
||||
space.
|
||||
* Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()``
|
||||
specified with the BPF program file descriptor received from the kernel.
|
||||
* Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the
|
||||
``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2.
|
||||
* Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is
|
||||
available.
|
||||
* Close the iterator fd using ``close(bpf_iter_fd)``.
|
||||
* If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again.
|
||||
|
||||
The following are a few examples of selftest BPF iterator programs:
|
||||
|
||||
* `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_
|
||||
* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_
|
||||
* `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_
|
||||
|
||||
Let us look at ``bpf_iter_task_file.c``, which runs in kernel space:
|
||||
|
||||
Here is the definition of ``bpf_iter__task_file`` in `vmlinux.h
|
||||
<https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_.
|
||||
Any struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>``
|
||||
represents a BPF iterator. The suffix ``<iter_name>`` represents the type of
|
||||
iterator.
|
||||
|
||||
::
|
||||
|
||||
struct bpf_iter__task_file {
|
||||
union {
|
||||
struct bpf_iter_meta *meta;
|
||||
};
|
||||
union {
|
||||
struct task_struct *task;
|
||||
};
|
||||
u32 fd;
|
||||
union {
|
||||
struct file *file;
|
||||
};
|
||||
};
|
||||
|
||||
In the above code, the field 'meta' contains the metadata, which is the same for
|
||||
all BPF iterator programs. The rest of the fields are specific to different
|
||||
iterators. For example, for task_file iterators, the kernel layer provides the
|
||||
'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference
|
||||
counted
|
||||
<https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_,
|
||||
so they won't go away when the BPF program runs.
|
||||
|
||||
Here is a snippet from the ``bpf_iter_task_file.c`` file:
|
||||
|
||||
::
|
||||
|
||||
SEC("iter/task_file")
|
||||
int dump_task_file(struct bpf_iter__task_file *ctx)
|
||||
{
|
||||
struct seq_file *seq = ctx->meta->seq;
|
||||
struct task_struct *task = ctx->task;
|
||||
struct file *file = ctx->file;
|
||||
__u32 fd = ctx->fd;
|
||||
|
||||
if (task == NULL || file == NULL)
|
||||
return 0;
|
||||
|
||||
if (ctx->meta->seq_num == 0) {
|
||||
count = 0;
|
||||
BPF_SEQ_PRINTF(seq, " tgid gid fd file\n");
|
||||
}
|
||||
|
||||
if (tgid == task->tgid && task->tgid != task->pid)
|
||||
count++;
|
||||
|
||||
if (last_tgid != task->tgid) {
|
||||
last_tgid = task->tgid;
|
||||
unique_tgid_count++;
|
||||
}
|
||||
|
||||
BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
|
||||
(long)file->f_op);
|
||||
return 0;
|
||||
}
|
||||
|
||||
In the above example, the section name ``SEC(iter/task_file)``, indicates that
|
||||
the program is a BPF iterator program to iterate all files from all tasks. The
|
||||
context of the program is ``bpf_iter__task_file`` struct.
|
||||
|
||||
The user space program invokes the BPF iterator program running in the kernel
|
||||
by issuing a ``read()`` syscall. Once invoked, the BPF
|
||||
program can export data to user space using a variety of BPF helper functions.
|
||||
You can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or
|
||||
``bpf_seq_write()`` function based on whether you need formatted output or just
|
||||
binary data, respectively. For binary-encoded data, the user space applications
|
||||
can process the data from ``bpf_seq_write()`` as needed. For the formatted data,
|
||||
you can use ``cat <path>`` to print the results similar to ``cat
|
||||
/proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later,
|
||||
use ``rm -f <path>`` to remove the pinned iterator.
|
||||
|
||||
For example, you can use the following command to create a BPF iterator from the
|
||||
``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route``
|
||||
path:
|
||||
|
||||
::
|
||||
|
||||
$ bpftool iter pin ./bpf_iter_ipv6_route.o /sys/fs/bpf/my_route
|
||||
|
||||
And then print out the results using the following command:
|
||||
|
||||
::
|
||||
|
||||
$ cat /sys/fs/bpf/my_route
|
||||
|
||||
|
||||
-------------------------------------------------------
|
||||
Implement Kernel Support for BPF Iterator Program Types
|
||||
-------------------------------------------------------
|
||||
|
||||
To implement a BPF iterator in the kernel, the developer must make a one-time
|
||||
change to the following key data structure defined in the `bpf.h
|
||||
<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_
|
||||
file.
|
||||
|
||||
::
|
||||
|
||||
struct bpf_iter_reg {
|
||||
const char *target;
|
||||
bpf_iter_attach_target_t attach_target;
|
||||
bpf_iter_detach_target_t detach_target;
|
||||
bpf_iter_show_fdinfo_t show_fdinfo;
|
||||
bpf_iter_fill_link_info_t fill_link_info;
|
||||
bpf_iter_get_func_proto_t get_func_proto;
|
||||
u32 ctx_arg_info_size;
|
||||
u32 feature;
|
||||
struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX];
|
||||
const struct bpf_iter_seq_info *seq_info;
|
||||
};
|
||||
|
||||
After filling the data structure fields, call ``bpf_iter_reg_target()`` to
|
||||
register the iterator to the main BPF iterator subsystem.
|
||||
|
||||
The following is the breakdown for each field in struct ``bpf_iter_reg``.
|
||||
|
||||
.. list-table::
|
||||
:widths: 25 50
|
||||
:header-rows: 1
|
||||
|
||||
* - Fields
|
||||
- Description
|
||||
* - target
|
||||
- Specifies the name of the BPF iterator. For example: ``bpf_map``,
|
||||
``bpf_map_elem``. The name should be different from other ``bpf_iter`` target names in the kernel.
|
||||
* - attach_target and detach_target
|
||||
- Allows for target specific ``link_create`` action since some targets
|
||||
may need special processing. Called during the user space link_create stage.
|
||||
* - show_fdinfo and fill_link_info
|
||||
- Called to fill target specific information when user tries to get link
|
||||
info associated with the iterator.
|
||||
* - get_func_proto
|
||||
- Permits a BPF iterator to access BPF helpers specific to the iterator.
|
||||
* - ctx_arg_info_size and ctx_arg_info
|
||||
- Specifies the verifier states for BPF program arguments associated with
|
||||
the bpf iterator.
|
||||
* - feature
|
||||
- Specifies certain action requests in the kernel BPF iterator
|
||||
infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
|
||||
that the kernel function cond_resched() is called to avoid other kernel
|
||||
subsystem (e.g., rcu) misbehaving.
|
||||
* - seq_info
|
||||
- Specifies certain action requests in the kernel BPF iterator
|
||||
infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
|
||||
that the kernel function cond_resched() is called to avoid other kernel
|
||||
subsystem (e.g., rcu) misbehaving.
|
||||
|
||||
|
||||
`Click here
|
||||
<https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_
|
||||
to see an implementation of the ``task_vma`` BPF iterator in the kernel.
|
||||
|
||||
---------------------------------
|
||||
Parameterizing BPF Task Iterators
|
||||
---------------------------------
|
||||
|
||||
By default, BPF iterators walk through all the objects of the specified types
|
||||
(processes, cgroups, maps, etc.) across the entire system to read relevant
|
||||
kernel data. But often, there are cases where we only care about a much smaller
|
||||
subset of iterable kernel objects, such as only iterating tasks within a
|
||||
specific process. Therefore, BPF iterator programs support filtering out objects
|
||||
from iteration by allowing user space to configure the iterator program when it
|
||||
is attached.
|
||||
|
||||
--------------------------
|
||||
BPF Task Iterator Program
|
||||
--------------------------
|
||||
|
||||
The following code is a BPF iterator program to print files and task information
|
||||
through the ``seq_file`` of the iterator. It is a standard BPF iterator program
|
||||
that visits every file of an iterator. We will use this BPF program in our
|
||||
example later.
|
||||
|
||||
::
|
||||
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
|
||||
SEC("iter/task_file")
|
||||
int dump_task_file(struct bpf_iter__task_file *ctx)
|
||||
{
|
||||
struct seq_file *seq = ctx->meta->seq;
|
||||
struct task_struct *task = ctx->task;
|
||||
struct file *file = ctx->file;
|
||||
__u32 fd = ctx->fd;
|
||||
if (task == NULL || file == NULL)
|
||||
return 0;
|
||||
if (ctx->meta->seq_num == 0) {
|
||||
BPF_SEQ_PRINTF(seq, " tgid pid fd file\n");
|
||||
}
|
||||
BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
|
||||
(long)file->f_op);
|
||||
return 0;
|
||||
}
|
||||
|
||||
----------------------------------------
|
||||
Creating a File Iterator with Parameters
|
||||
----------------------------------------
|
||||
|
||||
Now, let us look at how to create an iterator that includes only files of a
|
||||
process.
|
||||
|
||||
First, fill the ``bpf_iter_attach_opts`` struct as shown below:
|
||||
|
||||
::
|
||||
|
||||
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
|
||||
union bpf_iter_link_info linfo;
|
||||
memset(&linfo, 0, sizeof(linfo));
|
||||
linfo.task.pid = getpid();
|
||||
opts.link_info = &linfo;
|
||||
opts.link_info_len = sizeof(linfo);
|
||||
|
||||
``linfo.task.pid``, if it is non-zero, directs the kernel to create an iterator
|
||||
that only includes opened files for the process with the specified ``pid``. In
|
||||
this example, we will only be iterating files for our process. If
|
||||
``linfo.task.pid`` is zero, the iterator will visit every opened file of every
|
||||
process. Similarly, ``linfo.task.tid`` directs the kernel to create an iterator
|
||||
that visits opened files of a specific thread, not a process. In this example,
|
||||
``linfo.task.tid`` is different from ``linfo.task.pid`` only if the thread has a
|
||||
separate file descriptor table. In most circumstances, all process threads share
|
||||
a single file descriptor table.
|
||||
|
||||
Now, in the userspace program, pass the pointer of struct to the
|
||||
``bpf_program__attach_iter()``.
|
||||
|
||||
::
|
||||
|
||||
link = bpf_program__attach_iter(prog, &opts); iter_fd =
|
||||
bpf_iter_create(bpf_link__fd(link));
|
||||
|
||||
If both *tid* and *pid* are zero, an iterator created from this struct
|
||||
``bpf_iter_attach_opts`` will include every opened file of every task in the
|
||||
system (in the namespace, actually.) It is the same as passing a NULL as the
|
||||
second argument to ``bpf_program__attach_iter()``.
|
||||
|
||||
The whole program looks like the following code:
|
||||
|
||||
::
|
||||
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
#include <bpf/bpf.h>
|
||||
#include <bpf/libbpf.h>
|
||||
#include "bpf_iter_task_ex.skel.h"
|
||||
|
||||
static int do_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
|
||||
{
|
||||
struct bpf_link *link;
|
||||
char buf[16] = {};
|
||||
int iter_fd = -1, len;
|
||||
int ret = 0;
|
||||
|
||||
link = bpf_program__attach_iter(prog, opts);
|
||||
if (!link) {
|
||||
fprintf(stderr, "bpf_program__attach_iter() fails\n");
|
||||
return -1;
|
||||
}
|
||||
iter_fd = bpf_iter_create(bpf_link__fd(link));
|
||||
if (iter_fd < 0) {
|
||||
fprintf(stderr, "bpf_iter_create() fails\n");
|
||||
ret = -1;
|
||||
goto free_link;
|
||||
}
|
||||
/* not check contents, but ensure read() ends without error */
|
||||
while ((len = read(iter_fd, buf, sizeof(buf) - 1)) > 0) {
|
||||
buf[len] = 0;
|
||||
printf("%s", buf);
|
||||
}
|
||||
printf("\n");
|
||||
free_link:
|
||||
if (iter_fd >= 0)
|
||||
close(iter_fd);
|
||||
bpf_link__destroy(link);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void test_task_file(void)
|
||||
{
|
||||
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
|
||||
struct bpf_iter_task_ex *skel;
|
||||
union bpf_iter_link_info linfo;
|
||||
skel = bpf_iter_task_ex__open_and_load();
|
||||
if (skel == NULL)
|
||||
return;
|
||||
memset(&linfo, 0, sizeof(linfo));
|
||||
linfo.task.pid = getpid();
|
||||
opts.link_info = &linfo;
|
||||
opts.link_info_len = sizeof(linfo);
|
||||
printf("PID %d\n", getpid());
|
||||
do_read_opts(skel->progs.dump_task_file, &opts);
|
||||
bpf_iter_task_ex__destroy(skel);
|
||||
}
|
||||
|
||||
int main(int argc, const char * const * argv)
|
||||
{
|
||||
test_task_file();
|
||||
return 0;
|
||||
}
|
||||
|
||||
The following lines are the output of the program.
|
||||
::
|
||||
|
||||
PID 1859
|
||||
|
||||
tgid pid fd file
|
||||
1859 1859 0 ffffffff82270aa0
|
||||
1859 1859 1 ffffffff82270aa0
|
||||
1859 1859 2 ffffffff82270aa0
|
||||
1859 1859 3 ffffffff82272980
|
||||
1859 1859 4 ffffffff8225e120
|
||||
1859 1859 5 ffffffff82255120
|
||||
1859 1859 6 ffffffff82254f00
|
||||
1859 1859 7 ffffffff82254d80
|
||||
1859 1859 8 ffffffff8225abe0
|
||||
|
||||
------------------
|
||||
Without Parameters
|
||||
------------------
|
||||
|
||||
Let us look at how a BPF iterator without parameters skips files of other
|
||||
processes in the system. In this case, the BPF program has to check the pid or
|
||||
the tid of tasks, or it will receive every opened file in the system (in the
|
||||
current *pid* namespace, actually). So, we usually add a global variable in the
|
||||
BPF program to pass a *pid* to the BPF program.
|
||||
|
||||
The BPF program would look like the following block.
|
||||
|
||||
::
|
||||
|
||||
......
|
||||
int target_pid = 0;
|
||||
|
||||
SEC("iter/task_file")
|
||||
int dump_task_file(struct bpf_iter__task_file *ctx)
|
||||
{
|
||||
......
|
||||
if (task->tgid != target_pid) /* Check task->pid instead to check thread IDs */
|
||||
return 0;
|
||||
BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
|
||||
(long)file->f_op);
|
||||
return 0;
|
||||
}
|
||||
|
||||
The user space program would look like the following block:
|
||||
|
||||
::
|
||||
|
||||
......
|
||||
static void test_task_file(void)
|
||||
{
|
||||
......
|
||||
skel = bpf_iter_task_ex__open_and_load();
|
||||
if (skel == NULL)
|
||||
return;
|
||||
skel->bss->target_pid = getpid(); /* process ID. For thread id, use gettid() */
|
||||
memset(&linfo, 0, sizeof(linfo));
|
||||
linfo.task.pid = getpid();
|
||||
opts.link_info = &linfo;
|
||||
opts.link_info_len = sizeof(linfo);
|
||||
......
|
||||
}
|
||||
|
||||
``target_pid`` is a global variable in the BPF program. The user space program
|
||||
should initialize the variable with a process ID to skip opened files of other
|
||||
processes in the BPF program. When you parametrize a BPF iterator, the iterator
|
||||
calls the BPF program fewer times which can save significant resources.
|
||||
|
||||
---------------------------
|
||||
Parametrizing VMA Iterators
|
||||
---------------------------
|
||||
|
||||
By default, a BPF VMA iterator includes every VMA in every process. However,
|
||||
you can still specify a process or a thread to include only its VMAs. Unlike
|
||||
files, a thread can not have a separate address space (since Linux 2.6.0-test6).
|
||||
Here, using *tid* makes no difference from using *pid*.
|
||||
|
||||
----------------------------
|
||||
Parametrizing Task Iterators
|
||||
----------------------------
|
||||
|
||||
A BPF task iterator with *pid* includes all tasks (threads) of a process. The
|
||||
BPF program receives these tasks one after another. You can specify a BPF task
|
||||
iterator with *tid* parameter to include only the tasks that match the given
|
||||
*tid*.
|
@ -1062,4 +1062,9 @@ format.::
|
||||
7. Testing
|
||||
==========
|
||||
|
||||
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
|
||||
The kernel BPF selftest `tools/testing/selftests/bpf/prog_tests/btf.c`_
|
||||
provides an extensive set of BTF-related tests.
|
||||
|
||||
.. Links
|
||||
.. _tools/testing/selftests/bpf/prog_tests/btf.c:
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/prog_tests/btf.c
|
||||
|
@ -24,11 +24,13 @@ that goes into great technical depth about the BPF Architecture.
|
||||
maps
|
||||
bpf_prog_run
|
||||
classic_vs_extended.rst
|
||||
bpf_iterators
|
||||
bpf_licensing
|
||||
test_debug
|
||||
clang-notes
|
||||
linux-notes
|
||||
other
|
||||
redirect
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
|
@ -122,11 +122,11 @@ BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
|
||||
|
||||
``BPF_XOR | BPF_K | BPF_ALU`` means::
|
||||
|
||||
src_reg = (u32) src_reg ^ (u32) imm32
|
||||
dst_reg = (u32) dst_reg ^ (u32) imm32
|
||||
|
||||
``BPF_XOR | BPF_K | BPF_ALU64`` means::
|
||||
|
||||
src_reg = src_reg ^ imm32
|
||||
dst_reg = dst_reg ^ imm32
|
||||
|
||||
|
||||
Byte swap instructions
|
||||
|
@ -72,6 +72,30 @@ argument as its size. By default, without __sz annotation, the size of the type
|
||||
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
|
||||
pointer.
|
||||
|
||||
2.2.2 __k Annotation
|
||||
--------------------
|
||||
|
||||
This annotation is only understood for scalar arguments, where it indicates that
|
||||
the verifier must check the scalar argument to be a known constant, which does
|
||||
not indicate a size parameter, and the value of the constant is relevant to the
|
||||
safety of the program.
|
||||
|
||||
An example is given below::
|
||||
|
||||
void *bpf_obj_new(u32 local_type_id__k, ...)
|
||||
{
|
||||
...
|
||||
}
|
||||
|
||||
Here, bpf_obj_new uses local_type_id argument to find out the size of that type
|
||||
ID in program's BTF and return a sized pointer to it. Each type ID will have a
|
||||
distinct size, hence it is crucial to treat each such call as distinct when
|
||||
values don't match during verifier state pruning checks.
|
||||
|
||||
Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
|
||||
size parameter, and the value of the constant matters for program safety, __k
|
||||
suffix should be used.
|
||||
|
||||
.. _BPF_kfunc_nodef:
|
||||
|
||||
2.3 Using an existing kernel function
|
||||
@ -137,22 +161,20 @@ KF_ACQUIRE and KF_RET_NULL flags.
|
||||
--------------------------
|
||||
|
||||
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
||||
indicates that the all pointer arguments will always have a guaranteed lifetime,
|
||||
and pointers to kernel objects are always passed to helpers in their unmodified
|
||||
form (as obtained from acquire kfuncs).
|
||||
indicates that the all pointer arguments are valid, and that all pointers to
|
||||
BTF objects have been passed in their unmodified form (that is, at a zero
|
||||
offset, and without having been obtained from walking another pointer).
|
||||
|
||||
It can be used to enforce that a pointer to a refcounted object acquired from a
|
||||
kfunc or BPF helper is passed as an argument to this kfunc without any
|
||||
modifications (e.g. pointer arithmetic) such that it is trusted and points to
|
||||
the original object.
|
||||
There are two types of pointers to kernel objects which are considered "valid":
|
||||
|
||||
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
|
||||
but those can have a non-zero offset.
|
||||
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
|
||||
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
|
||||
|
||||
This flag is often used for kfuncs that operate (change some property, perform
|
||||
some operation) on an object that was obtained using an acquire kfunc. Such
|
||||
kfuncs need an unchanged pointer to ensure the integrity of the operation being
|
||||
performed on the expected object.
|
||||
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
|
||||
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
|
||||
|
||||
The definition of "valid" pointers is subject to change at any time, and has
|
||||
absolutely no ABI stability guarantees.
|
||||
|
||||
2.4.6 KF_SLEEPABLE flag
|
||||
-----------------------
|
||||
@ -169,6 +191,15 @@ rebooting or panicking. Due to this additional restrictions apply to these
|
||||
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
|
||||
added later.
|
||||
|
||||
2.4.8 KF_RCU flag
|
||||
-----------------
|
||||
|
||||
The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
|
||||
When used together with KF_ACQUIRE, it indicates the kfunc should have a
|
||||
single argument which must be a trusted argument or a MEM_RCU pointer.
|
||||
The argument may have reference count of 0 and the kfunc must take this
|
||||
into consideration.
|
||||
|
||||
2.5 Registering the kfuncs
|
||||
--------------------------
|
||||
|
||||
@ -191,3 +222,201 @@ type. An example is shown below::
|
||||
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
|
||||
}
|
||||
late_initcall(init_subsystem);
|
||||
|
||||
3. Core kfuncs
|
||||
==============
|
||||
|
||||
The BPF subsystem provides a number of "core" kfuncs that are potentially
|
||||
applicable to a wide variety of different possible use cases and programs.
|
||||
Those kfuncs are documented here.
|
||||
|
||||
3.1 struct task_struct * kfuncs
|
||||
-------------------------------
|
||||
|
||||
There are a number of kfuncs that allow ``struct task_struct *`` objects to be
|
||||
used as kptrs:
|
||||
|
||||
.. kernel-doc:: kernel/bpf/helpers.c
|
||||
:identifiers: bpf_task_acquire bpf_task_release
|
||||
|
||||
These kfuncs are useful when you want to acquire or release a reference to a
|
||||
``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
|
||||
struct_ops callback arg. For example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* A trivial example tracepoint program that shows how to
|
||||
* acquire and release a struct task_struct * pointer.
|
||||
*/
|
||||
SEC("tp_btf/task_newtask")
|
||||
int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
|
||||
{
|
||||
struct task_struct *acquired;
|
||||
|
||||
acquired = bpf_task_acquire(task);
|
||||
|
||||
/*
|
||||
* In a typical program you'd do something like store
|
||||
* the task in a map, and the map will automatically
|
||||
* release it later. Here, we release it manually.
|
||||
*/
|
||||
bpf_task_release(acquired);
|
||||
return 0;
|
||||
}
|
||||
|
||||
----
|
||||
|
||||
A BPF program can also look up a task from a pid. This can be useful if the
|
||||
caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
|
||||
it can acquire a reference on with bpf_task_acquire().
|
||||
|
||||
.. kernel-doc:: kernel/bpf/helpers.c
|
||||
:identifiers: bpf_task_from_pid
|
||||
|
||||
Here is an example of it being used:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
SEC("tp_btf/task_newtask")
|
||||
int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
|
||||
{
|
||||
struct task_struct *lookup;
|
||||
|
||||
lookup = bpf_task_from_pid(task->pid);
|
||||
if (!lookup)
|
||||
/* A task should always be found, as %task is a tracepoint arg. */
|
||||
return -ENOENT;
|
||||
|
||||
if (lookup->pid != task->pid) {
|
||||
/* bpf_task_from_pid() looks up the task via its
|
||||
* globally-unique pid from the init_pid_ns. Thus,
|
||||
* the pid of the lookup task should always be the
|
||||
* same as the input task.
|
||||
*/
|
||||
bpf_task_release(lookup);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* bpf_task_from_pid() returns an acquired reference,
|
||||
* so it must be dropped before returning from the
|
||||
* tracepoint handler.
|
||||
*/
|
||||
bpf_task_release(lookup);
|
||||
return 0;
|
||||
}
|
||||
|
||||
3.2 struct cgroup * kfuncs
|
||||
--------------------------
|
||||
|
||||
``struct cgroup *`` objects also have acquire and release functions:
|
||||
|
||||
.. kernel-doc:: kernel/bpf/helpers.c
|
||||
:identifiers: bpf_cgroup_acquire bpf_cgroup_release
|
||||
|
||||
These kfuncs are used in exactly the same manner as bpf_task_acquire() and
|
||||
bpf_task_release() respectively, so we won't provide examples for them.
|
||||
|
||||
----
|
||||
|
||||
You may also acquire a reference to a ``struct cgroup`` kptr that's already
|
||||
stored in a map using bpf_cgroup_kptr_get():
|
||||
|
||||
.. kernel-doc:: kernel/bpf/helpers.c
|
||||
:identifiers: bpf_cgroup_kptr_get
|
||||
|
||||
Here's an example of how it can be used:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* struct containing the struct task_struct kptr which is actually stored in the map. */
|
||||
struct __cgroups_kfunc_map_value {
|
||||
struct cgroup __kptr_ref * cgroup;
|
||||
};
|
||||
|
||||
/* The map containing struct __cgroups_kfunc_map_value entries. */
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__type(key, int);
|
||||
__type(value, struct __cgroups_kfunc_map_value);
|
||||
__uint(max_entries, 1);
|
||||
} __cgroups_kfunc_map SEC(".maps");
|
||||
|
||||
/* ... */
|
||||
|
||||
/**
|
||||
* A simple example tracepoint program showing how a
|
||||
* struct cgroup kptr that is stored in a map can
|
||||
* be acquired using the bpf_cgroup_kptr_get() kfunc.
|
||||
*/
|
||||
SEC("tp_btf/cgroup_mkdir")
|
||||
int BPF_PROG(cgroup_kptr_get_example, struct cgroup *cgrp, const char *path)
|
||||
{
|
||||
struct cgroup *kptr;
|
||||
struct __cgroups_kfunc_map_value *v;
|
||||
s32 id = cgrp->self.id;
|
||||
|
||||
/* Assume a cgroup kptr was previously stored in the map. */
|
||||
v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
|
||||
if (!v)
|
||||
return -ENOENT;
|
||||
|
||||
/* Acquire a reference to the cgroup kptr that's already stored in the map. */
|
||||
kptr = bpf_cgroup_kptr_get(&v->cgroup);
|
||||
if (!kptr)
|
||||
/* If no cgroup was present in the map, it's because
|
||||
* we're racing with another CPU that removed it with
|
||||
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
|
||||
* above, and our call to bpf_cgroup_kptr_get().
|
||||
* bpf_cgroup_kptr_get() internally safely handles this
|
||||
* race, and will return NULL if the task is no longer
|
||||
* present in the map by the time we invoke the kfunc.
|
||||
*/
|
||||
return -EBUSY;
|
||||
|
||||
/* Free the reference we just took above. Note that the
|
||||
* original struct cgroup kptr is still in the map. It will
|
||||
* be freed either at a later time if another context deletes
|
||||
* it from the map, or automatically by the BPF subsystem if
|
||||
* it's still present when the map is destroyed.
|
||||
*/
|
||||
bpf_cgroup_release(kptr);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
----
|
||||
|
||||
Another kfunc available for interacting with ``struct cgroup *`` objects is
|
||||
bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
|
||||
and return it as a cgroup kptr.
|
||||
|
||||
.. kernel-doc:: kernel/bpf/helpers.c
|
||||
:identifiers: bpf_cgroup_ancestor
|
||||
|
||||
Eventually, BPF should be updated to allow this to happen with a normal memory
|
||||
load in the program itself. This is currently not possible without more work in
|
||||
the verifier. bpf_cgroup_ancestor() can be used as follows:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* Simple tracepoint example that illustrates how a cgroup's
|
||||
* ancestor can be accessed using bpf_cgroup_ancestor().
|
||||
*/
|
||||
SEC("tp_btf/cgroup_mkdir")
|
||||
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
|
||||
{
|
||||
struct cgroup *parent;
|
||||
|
||||
/* The parent cgroup resides at the level before the current cgroup's level. */
|
||||
parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
|
||||
if (!parent)
|
||||
return -ENOENT;
|
||||
|
||||
bpf_printk("Parent id is %d", parent->self.id);
|
||||
|
||||
/* Return the parent cgroup that was acquired above. */
|
||||
bpf_cgroup_release(parent);
|
||||
return 0;
|
||||
}
|
||||
|
@ -1,5 +1,7 @@
|
||||
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
|
||||
.. _libbpf:
|
||||
|
||||
libbpf
|
||||
======
|
||||
|
||||
@ -7,6 +9,7 @@ libbpf
|
||||
:maxdepth: 1
|
||||
|
||||
API Documentation <https://libbpf.readthedocs.io/en/latest/api.html>
|
||||
program_types
|
||||
libbpf_naming_convention
|
||||
libbpf_build
|
||||
|
||||
|
203
Documentation/bpf/libbpf/program_types.rst
Normal file
203
Documentation/bpf/libbpf/program_types.rst
Normal file
@ -0,0 +1,203 @@
|
||||
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
|
||||
.. _program_types_and_elf:
|
||||
|
||||
Program Types and ELF Sections
|
||||
==============================
|
||||
|
||||
The table below lists the program types, their attach types where relevant and the ELF section
|
||||
names supported by libbpf for them. The ELF section names follow these rules:
|
||||
|
||||
- ``type`` is an exact match, e.g. ``SEC("socket")``
|
||||
- ``type+`` means it can be either exact ``SEC("type")`` or well-formed ``SEC("type/extras")``
|
||||
with a '``/``' separator between ``type`` and ``extras``.
|
||||
|
||||
When ``extras`` are specified, they provide details of how to auto-attach the BPF program. The
|
||||
format of ``extras`` depends on the program type, e.g. ``SEC("tracepoint/<category>/<name>")``
|
||||
for tracepoints or ``SEC("usdt/<path>:<provider>:<name>")`` for USDT probes. The extras are
|
||||
described in more detail in the footnotes.
|
||||
|
||||
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| Program Type | Attach Type | ELF Section Name | Sleepable |
|
||||
+===========================================+========================================+==================================+===========+
|
||||
| ``BPF_PROG_TYPE_CGROUP_DEVICE`` | ``BPF_CGROUP_DEVICE`` | ``cgroup/dev`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_CGROUP_SKB`` | | ``cgroup/skb`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET_EGRESS`` | ``cgroup_skb/egress`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET_INGRESS`` | ``cgroup_skb/ingress`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` | ``BPF_CGROUP_GETSOCKOPT`` | ``cgroup/getsockopt`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_SETSOCKOPT`` | ``cgroup/setsockopt`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_CGROUP_SOCK_ADDR`` | ``BPF_CGROUP_INET4_BIND`` | ``cgroup/bind4`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET4_CONNECT`` | ``cgroup/connect4`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET4_GETPEERNAME`` | ``cgroup/getpeername4`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET4_GETSOCKNAME`` | ``cgroup/getsockname4`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET6_BIND`` | ``cgroup/bind6`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET6_CONNECT`` | ``cgroup/connect6`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET6_GETPEERNAME`` | ``cgroup/getpeername6`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET6_GETSOCKNAME`` | ``cgroup/getsockname6`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_UDP4_RECVMSG`` | ``cgroup/recvmsg4`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_UDP4_SENDMSG`` | ``cgroup/sendmsg4`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_UDP6_RECVMSG`` | ``cgroup/recvmsg6`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_UDP6_SENDMSG`` | ``cgroup/sendmsg6`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_CGROUP_SOCK`` | ``BPF_CGROUP_INET4_POST_BIND`` | ``cgroup/post_bind4`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET6_POST_BIND`` | ``cgroup/post_bind6`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET_SOCK_CREATE`` | ``cgroup/sock_create`` | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``cgroup/sock`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_CGROUP_INET_SOCK_RELEASE`` | ``cgroup/sock_release`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_CGROUP_SYSCTL`` | ``BPF_CGROUP_SYSCTL`` | ``cgroup/sysctl`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_EXT`` | | ``freplace+`` [#fentry]_ | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | ``BPF_FLOW_DISSECTOR`` | ``flow_dissector`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_KPROBE`` | | ``kprobe+`` [#kprobe]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``kretprobe+`` [#kprobe]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``ksyscall+`` [#ksyscall]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``kretsyscall+`` [#ksyscall]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``uprobe+`` [#uprobe]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``uprobe.s+`` [#uprobe]_ | Yes |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``uretprobe+`` [#uprobe]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``uretprobe.s+`` [#uprobe]_ | Yes |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``usdt+`` [#usdt]_ | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_TRACE_KPROBE_MULTI`` | ``kprobe.multi+`` [#kpmulti]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``kretprobe.multi+`` [#kpmulti]_ | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_LIRC_MODE2`` | ``BPF_LIRC_MODE2`` | ``lirc_mode2`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_LSM`` | ``BPF_LSM_CGROUP`` | ``lsm_cgroup+`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_LSM_MAC`` | ``lsm+`` [#lsm]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``lsm.s+`` [#lsm]_ | Yes |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_LWT_IN`` | | ``lwt_in`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_LWT_OUT`` | | ``lwt_out`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | | ``lwt_seg6local`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_LWT_XMIT`` | | ``lwt_xmit`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_PERF_EVENT`` | | ``perf_event`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE`` | | ``raw_tp.w+`` [#rawtp]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``raw_tracepoint.w+`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | | ``raw_tp+`` [#rawtp]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``raw_tracepoint+`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SCHED_ACT`` | | ``action`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SCHED_CLS`` | | ``classifier`` | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``tc`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SK_LOOKUP`` | ``BPF_SK_LOOKUP`` | ``sk_lookup`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SK_MSG`` | ``BPF_SK_MSG_VERDICT`` | ``sk_msg`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SK_REUSEPORT`` | ``BPF_SK_REUSEPORT_SELECT_OR_MIGRATE`` | ``sk_reuseport/migrate`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_SK_REUSEPORT_SELECT`` | ``sk_reuseport`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SK_SKB`` | | ``sk_skb`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_SK_SKB_STREAM_PARSER`` | ``sk_skb/stream_parser`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_SK_SKB_STREAM_VERDICT`` | ``sk_skb/stream_verdict`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SOCKET_FILTER`` | | ``socket`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SOCK_OPS`` | ``BPF_CGROUP_SOCK_OPS`` | ``sockops`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_STRUCT_OPS`` | | ``struct_ops+`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_SYSCALL`` | | ``syscall`` | Yes |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_TRACEPOINT`` | | ``tp+`` [#tp]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``tracepoint+`` [#tp]_ | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_TRACING`` | ``BPF_MODIFY_RETURN`` | ``fmod_ret+`` [#fentry]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``fmod_ret.s+`` [#fentry]_ | Yes |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_TRACE_FENTRY`` | ``fentry+`` [#fentry]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``fentry.s+`` [#fentry]_ | Yes |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_TRACE_FEXIT`` | ``fexit+`` [#fentry]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``fexit.s+`` [#fentry]_ | Yes |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_TRACE_ITER`` | ``iter+`` [#iter]_ | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``iter.s+`` [#iter]_ | Yes |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_TRACE_RAW_TP`` | ``tp_btf+`` [#fentry]_ | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
| ``BPF_PROG_TYPE_XDP`` | ``BPF_XDP_CPUMAP`` | ``xdp.frags/cpumap`` | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``xdp/cpumap`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_XDP_DEVMAP`` | ``xdp.frags/devmap`` | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``xdp/devmap`` | |
|
||||
+ +----------------------------------------+----------------------------------+-----------+
|
||||
| | ``BPF_XDP`` | ``xdp.frags`` | |
|
||||
+ + +----------------------------------+-----------+
|
||||
| | | ``xdp`` | |
|
||||
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [#fentry] The ``fentry`` attach format is ``fentry[.s]/<function>``.
|
||||
.. [#kprobe] The ``kprobe`` attach format is ``kprobe/<function>[+<offset>]``. Valid
|
||||
characters for ``function`` are ``a-zA-Z0-9_.`` and ``offset`` must be a valid
|
||||
non-negative integer.
|
||||
.. [#ksyscall] The ``ksyscall`` attach format is ``ksyscall/<syscall>``.
|
||||
.. [#uprobe] The ``uprobe`` attach format is ``uprobe[.s]/<path>:<function>[+<offset>]``.
|
||||
.. [#usdt] The ``usdt`` attach format is ``usdt/<path>:<provider>:<name>``.
|
||||
.. [#kpmulti] The ``kprobe.multi`` attach format is ``kprobe.multi/<pattern>`` where ``pattern``
|
||||
supports ``*`` and ``?`` wildcards. Valid characters for pattern are
|
||||
``a-zA-Z0-9_.*?``.
|
||||
.. [#lsm] The ``lsm`` attachment format is ``lsm[.s]/<hook>``.
|
||||
.. [#rawtp] The ``raw_tp`` attach format is ``raw_tracepoint[.w]/<tracepoint>``.
|
||||
.. [#tp] The ``tracepoint`` attach format is ``tracepoint/<category>/<name>``.
|
||||
.. [#iter] The ``iter`` attach format is ``iter[.s]/<struct-name>``.
|
262
Documentation/bpf/map_array.rst
Normal file
262
Documentation/bpf/map_array.rst
Normal file
@ -0,0 +1,262 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
================================================
|
||||
BPF_MAP_TYPE_ARRAY and BPF_MAP_TYPE_PERCPU_ARRAY
|
||||
================================================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_ARRAY`` was introduced in kernel version 3.19
|
||||
- ``BPF_MAP_TYPE_PERCPU_ARRAY`` was introduced in version 4.6
|
||||
|
||||
``BPF_MAP_TYPE_ARRAY`` and ``BPF_MAP_TYPE_PERCPU_ARRAY`` provide generic array
|
||||
storage. The key type is an unsigned 32-bit integer (4 bytes) and the map is
|
||||
of constant size. The size of the array is defined in ``max_entries`` at
|
||||
creation time. All array elements are pre-allocated and zero initialized when
|
||||
created. ``BPF_MAP_TYPE_PERCPU_ARRAY`` uses a different memory region for each
|
||||
CPU whereas ``BPF_MAP_TYPE_ARRAY`` uses the same memory region. The value
|
||||
stored can be of any size, however, all array elements are aligned to 8
|
||||
bytes.
|
||||
|
||||
Since kernel 5.5, memory mapping may be enabled for ``BPF_MAP_TYPE_ARRAY`` by
|
||||
setting the flag ``BPF_F_MMAPABLE``. The map definition is page-aligned and
|
||||
starts on the first page. Sufficient page-sized and page-aligned blocks of
|
||||
memory are allocated to store all array values, starting on the second page,
|
||||
which in some cases will result in over-allocation of memory. The benefit of
|
||||
using this is increased performance and ease of use since userspace programs
|
||||
would not be required to use helper functions to access and mutate data.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
Array elements can be retrieved using the ``bpf_map_lookup_elem()`` helper.
|
||||
This helper returns a pointer into the array element, so to avoid data races
|
||||
with userspace reading the value, the user must use primitives like
|
||||
``__sync_fetch_and_add()`` when updating the value in-place.
|
||||
|
||||
bpf_map_update_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
||||
|
||||
Array elements can be updated using the ``bpf_map_update_elem()`` helper.
|
||||
|
||||
``bpf_map_update_elem()`` returns 0 on success, or negative error in case of
|
||||
failure.
|
||||
|
||||
Since the array is of constant size, ``bpf_map_delete_elem()`` is not supported.
|
||||
To clear an array element, you may use ``bpf_map_update_elem()`` to insert a
|
||||
zero value to that index.
|
||||
|
||||
Per CPU Array
|
||||
-------------
|
||||
|
||||
Values stored in ``BPF_MAP_TYPE_ARRAY`` can be accessed by multiple programs
|
||||
across different CPUs. To restrict storage to a single CPU, you may use a
|
||||
``BPF_MAP_TYPE_PERCPU_ARRAY``.
|
||||
|
||||
When using a ``BPF_MAP_TYPE_PERCPU_ARRAY`` the ``bpf_map_update_elem()`` and
|
||||
``bpf_map_lookup_elem()`` helpers automatically access the slot for the current
|
||||
CPU.
|
||||
|
||||
bpf_map_lookup_percpu_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
|
||||
|
||||
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the array
|
||||
value for a specific CPU. Returns value on success , or ``NULL`` if no entry was
|
||||
found or ``cpu`` is invalid.
|
||||
|
||||
Concurrency
|
||||
-----------
|
||||
|
||||
Since kernel version 5.1, the BPF infrastructure provides ``struct bpf_spin_lock``
|
||||
to synchronize access.
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
Access from userspace uses libbpf APIs with the same names as above, with
|
||||
the map identified by its ``fd``.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Please see the ``tools/testing/selftests/bpf`` directory for functional
|
||||
examples. The code samples below demonstrate API usage.
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
This snippet shows how to declare an array in a BPF program.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__type(key, u32);
|
||||
__type(value, long);
|
||||
__uint(max_entries, 256);
|
||||
} my_map SEC(".maps");
|
||||
|
||||
|
||||
This example BPF program shows how to access an array element.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_prog(struct __sk_buff *skb)
|
||||
{
|
||||
struct iphdr ip;
|
||||
int index;
|
||||
long *value;
|
||||
|
||||
if (bpf_skb_load_bytes(skb, ETH_HLEN, &ip, sizeof(ip)) < 0)
|
||||
return 0;
|
||||
|
||||
index = ip.protocol;
|
||||
value = bpf_map_lookup_elem(&my_map, &index);
|
||||
if (value)
|
||||
__sync_fetch_and_add(value, skb->len);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
BPF_MAP_TYPE_ARRAY
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This snippet shows how to create an array, using ``bpf_map_create_opts`` to
|
||||
set flags.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <bpf/libbpf.h>
|
||||
#include <bpf/bpf.h>
|
||||
|
||||
int create_array()
|
||||
{
|
||||
int fd;
|
||||
LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE);
|
||||
|
||||
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY,
|
||||
"example_array", /* name */
|
||||
sizeof(__u32), /* key size */
|
||||
sizeof(long), /* value size */
|
||||
256, /* max entries */
|
||||
&opts); /* create opts */
|
||||
return fd;
|
||||
}
|
||||
|
||||
This snippet shows how to initialize the elements of an array.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int initialize_array(int fd)
|
||||
{
|
||||
__u32 i;
|
||||
long value;
|
||||
int ret;
|
||||
|
||||
for (i = 0; i < 256; i++) {
|
||||
value = i;
|
||||
ret = bpf_map_update_elem(fd, &i, &value, BPF_ANY);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
This snippet shows how to retrieve an element value from an array.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int lookup(int fd)
|
||||
{
|
||||
__u32 index = 42;
|
||||
long value;
|
||||
int ret;
|
||||
|
||||
ret = bpf_map_lookup_elem(fd, &index, &value);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
/* use value here */
|
||||
assert(value == 42);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
BPF_MAP_TYPE_PERCPU_ARRAY
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This snippet shows how to initialize the elements of a per CPU array.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int initialize_array(int fd)
|
||||
{
|
||||
int ncpus = libbpf_num_possible_cpus();
|
||||
long values[ncpus];
|
||||
__u32 i, j;
|
||||
int ret;
|
||||
|
||||
for (i = 0; i < 256 ; i++) {
|
||||
for (j = 0; j < ncpus; j++)
|
||||
values[j] = i;
|
||||
ret = bpf_map_update_elem(fd, &i, &values, BPF_ANY);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
This snippet shows how to access the per CPU elements of an array value.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int lookup(int fd)
|
||||
{
|
||||
int ncpus = libbpf_num_possible_cpus();
|
||||
__u32 index = 42, j;
|
||||
long values[ncpus];
|
||||
int ret;
|
||||
|
||||
ret = bpf_map_lookup_elem(fd, &index, &values);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
for (j = 0; j < ncpus; j++) {
|
||||
/* Use per CPU value here */
|
||||
assert(values[j] == 42);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
Semantics
|
||||
=========
|
||||
|
||||
As shown in the example above, when accessing a ``BPF_MAP_TYPE_PERCPU_ARRAY``
|
||||
in userspace, each value is an array with ``ncpus`` elements.
|
||||
|
||||
When calling ``bpf_map_update_elem()`` the flag ``BPF_NOEXIST`` can not be used
|
||||
for these maps.
|
174
Documentation/bpf/map_bloom_filter.rst
Normal file
174
Documentation/bpf/map_bloom_filter.rst
Normal file
@ -0,0 +1,174 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
=========================
|
||||
BPF_MAP_TYPE_BLOOM_FILTER
|
||||
=========================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_BLOOM_FILTER`` was introduced in kernel version 5.16
|
||||
|
||||
``BPF_MAP_TYPE_BLOOM_FILTER`` provides a BPF bloom filter map. Bloom
|
||||
filters are a space-efficient probabilistic data structure used to
|
||||
quickly test whether an element exists in a set. In a bloom filter,
|
||||
false positives are possible whereas false negatives are not.
|
||||
|
||||
The bloom filter map does not have keys, only values. When the bloom
|
||||
filter map is created, it must be created with a ``key_size`` of 0. The
|
||||
bloom filter map supports two operations:
|
||||
|
||||
- push: adding an element to the map
|
||||
- peek: determining whether an element is present in the map
|
||||
|
||||
BPF programs must use ``bpf_map_push_elem`` to add an element to the
|
||||
bloom filter map and ``bpf_map_peek_elem`` to query the map. These
|
||||
operations are exposed to userspace applications using the existing
|
||||
``bpf`` syscall in the following way:
|
||||
|
||||
- ``BPF_MAP_UPDATE_ELEM`` -> push
|
||||
- ``BPF_MAP_LOOKUP_ELEM`` -> peek
|
||||
|
||||
The ``max_entries`` size that is specified at map creation time is used
|
||||
to approximate a reasonable bitmap size for the bloom filter, and is not
|
||||
otherwise strictly enforced. If the user wishes to insert more entries
|
||||
into the bloom filter than ``max_entries``, this may lead to a higher
|
||||
false positive rate.
|
||||
|
||||
The number of hashes to use for the bloom filter is configurable using
|
||||
the lower 4 bits of ``map_extra`` in ``union bpf_attr`` at map creation
|
||||
time. If no number is specified, the default used will be 5 hash
|
||||
functions. In general, using more hashes decreases both the false
|
||||
positive rate and the speed of a lookup.
|
||||
|
||||
It is not possible to delete elements from a bloom filter map. A bloom
|
||||
filter map may be used as an inner map. The user is responsible for
|
||||
synchronising concurrent updates and lookups to ensure no false negative
|
||||
lookups occur.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
bpf_map_push_elem()
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
|
||||
|
||||
A ``value`` can be added to a bloom filter using the
|
||||
``bpf_map_push_elem()`` helper. The ``flags`` parameter must be set to
|
||||
``BPF_ANY`` when adding an entry to the bloom filter. This helper
|
||||
returns ``0`` on success, or negative error in case of failure.
|
||||
|
||||
bpf_map_peek_elem()
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_peek_elem(struct bpf_map *map, void *value)
|
||||
|
||||
The ``bpf_map_peek_elem()`` helper is used to determine whether
|
||||
``value`` is present in the bloom filter map. This helper returns ``0``
|
||||
if ``value`` is probably present in the map, or ``-ENOENT`` if ``value``
|
||||
is definitely not present in the map.
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
bpf_map_update_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags)
|
||||
|
||||
A userspace program can add a ``value`` to a bloom filter using libbpf's
|
||||
``bpf_map_update_elem`` function. The ``key`` parameter must be set to
|
||||
``NULL`` and ``flags`` must be set to ``BPF_ANY``. Returns ``0`` on
|
||||
success, or negative error in case of failure.
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_lookup_elem (int fd, const void *key, void *value)
|
||||
|
||||
A userspace program can determine the presence of ``value`` in a bloom
|
||||
filter using libbpf's ``bpf_map_lookup_elem`` function. The ``key``
|
||||
parameter must be set to ``NULL``. Returns ``0`` if ``value`` is
|
||||
probably present in the map, or ``-ENOENT`` if ``value`` is definitely
|
||||
not present in the map.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
This snippet shows how to declare a bloom filter in a BPF program:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_BLOOM_FILTER);
|
||||
__type(value, __u32);
|
||||
__uint(max_entries, 1000);
|
||||
__uint(map_extra, 3);
|
||||
} bloom_filter SEC(".maps");
|
||||
|
||||
This snippet shows how to determine presence of a value in a bloom
|
||||
filter in a BPF program:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *lookup(__u32 key)
|
||||
{
|
||||
if (bpf_map_peek_elem(&bloom_filter, &key) == 0) {
|
||||
/* Verify not a false positive and fetch an associated
|
||||
* value using a secondary lookup, e.g. in a hash table
|
||||
*/
|
||||
return bpf_map_lookup_elem(&hash_table, &key);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
This snippet shows how to use libbpf to create a bloom filter map from
|
||||
userspace:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int create_bloom()
|
||||
{
|
||||
LIBBPF_OPTS(bpf_map_create_opts, opts,
|
||||
.map_extra = 3); /* number of hashes */
|
||||
|
||||
return bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER,
|
||||
"ipv6_bloom", /* name */
|
||||
0, /* key size, must be zero */
|
||||
sizeof(ipv6_addr), /* value size */
|
||||
10000, /* max entries */
|
||||
&opts); /* create options */
|
||||
}
|
||||
|
||||
This snippet shows how to add an element to a bloom filter from
|
||||
userspace:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int add_element(struct bpf_map *bloom_map, __u32 value)
|
||||
{
|
||||
int bloom_fd = bpf_map__fd(bloom_map);
|
||||
return bpf_map_update_elem(bloom_fd, NULL, &value, BPF_ANY);
|
||||
}
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
https://lwn.net/ml/bpf/20210831225005.2762202-1-joannekoong@fb.com/
|
109
Documentation/bpf/map_cgrp_storage.rst
Normal file
109
Documentation/bpf/map_cgrp_storage.rst
Normal file
@ -0,0 +1,109 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Meta Platforms, Inc. and affiliates.
|
||||
|
||||
=========================
|
||||
BPF_MAP_TYPE_CGRP_STORAGE
|
||||
=========================
|
||||
|
||||
The ``BPF_MAP_TYPE_CGRP_STORAGE`` map type represents a local fix-sized
|
||||
storage for cgroups. It is only available with ``CONFIG_CGROUPS``.
|
||||
The programs are made available by the same Kconfig. The
|
||||
data for a particular cgroup can be retrieved by looking up the map
|
||||
with that cgroup.
|
||||
|
||||
This document describes the usage and semantics of the
|
||||
``BPF_MAP_TYPE_CGRP_STORAGE`` map type.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
The map key must be ``sizeof(int)`` representing a cgroup fd.
|
||||
To access the storage in a program, use ``bpf_cgrp_storage_get``::
|
||||
|
||||
void *bpf_cgrp_storage_get(struct bpf_map *map, struct cgroup *cgroup, void *value, u64 flags)
|
||||
|
||||
``flags`` could be 0 or ``BPF_LOCAL_STORAGE_GET_F_CREATE`` which indicates that
|
||||
a new local storage will be created if one does not exist.
|
||||
|
||||
The local storage can be removed with ``bpf_cgrp_storage_delete``::
|
||||
|
||||
long bpf_cgrp_storage_delete(struct bpf_map *map, struct cgroup *cgroup)
|
||||
|
||||
The map is available to all program types.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
A BPF program example with BPF_MAP_TYPE_CGRP_STORAGE::
|
||||
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
|
||||
__uint(map_flags, BPF_F_NO_PREALLOC);
|
||||
__type(key, int);
|
||||
__type(value, long);
|
||||
} cgrp_storage SEC(".maps");
|
||||
|
||||
SEC("tp_btf/sys_enter")
|
||||
int BPF_PROG(on_enter, struct pt_regs *regs, long id)
|
||||
{
|
||||
struct task_struct *task = bpf_get_current_task_btf();
|
||||
long *ptr;
|
||||
|
||||
ptr = bpf_cgrp_storage_get(&cgrp_storage, task->cgroups->dfl_cgrp, 0,
|
||||
BPF_LOCAL_STORAGE_GET_F_CREATE);
|
||||
if (ptr)
|
||||
__sync_fetch_and_add(ptr, 1);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
Userspace accessing map declared above::
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/libbpf.h>
|
||||
|
||||
__u32 map_lookup(struct bpf_map *map, int cgrp_fd)
|
||||
{
|
||||
__u32 *value;
|
||||
value = bpf_map_lookup_elem(bpf_map__fd(map), &cgrp_fd);
|
||||
if (value)
|
||||
return *value;
|
||||
return 0;
|
||||
}
|
||||
|
||||
Difference Between BPF_MAP_TYPE_CGRP_STORAGE and BPF_MAP_TYPE_CGROUP_STORAGE
|
||||
============================================================================
|
||||
|
||||
The old cgroup storage map ``BPF_MAP_TYPE_CGROUP_STORAGE`` has been marked as
|
||||
deprecated (renamed to ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``). The new
|
||||
``BPF_MAP_TYPE_CGRP_STORAGE`` map should be used instead. The following
|
||||
illusates the main difference between ``BPF_MAP_TYPE_CGRP_STORAGE`` and
|
||||
``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``.
|
||||
|
||||
(1). ``BPF_MAP_TYPE_CGRP_STORAGE`` can be used by all program types while
|
||||
``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` is available only to cgroup program types
|
||||
like BPF_CGROUP_INET_INGRESS or BPF_CGROUP_SOCK_OPS, etc.
|
||||
|
||||
(2). ``BPF_MAP_TYPE_CGRP_STORAGE`` supports local storage for more than one
|
||||
cgroup while ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` only supports one cgroup
|
||||
which is attached by a BPF program.
|
||||
|
||||
(3). ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` allocates local storage at attach time so
|
||||
``bpf_get_local_storage()`` always returns non-NULL local storage.
|
||||
``BPF_MAP_TYPE_CGRP_STORAGE`` allocates local storage at runtime so
|
||||
it is possible that ``bpf_cgrp_storage_get()`` may return null local storage.
|
||||
To avoid such null local storage issue, user space can do
|
||||
``bpf_map_update_elem()`` to pre-allocate local storage before a BPF program
|
||||
is attached.
|
||||
|
||||
(4). ``BPF_MAP_TYPE_CGRP_STORAGE`` supports deleting local storage by a BPF program
|
||||
while ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` only deletes storage during
|
||||
prog detach time.
|
||||
|
||||
So overall, ``BPF_MAP_TYPE_CGRP_STORAGE`` supports all ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``
|
||||
functionality and beyond. It is recommended to use ``BPF_MAP_TYPE_CGRP_STORAGE``
|
||||
instead of ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``.
|
177
Documentation/bpf/map_cpumap.rst
Normal file
177
Documentation/bpf/map_cpumap.rst
Normal file
@ -0,0 +1,177 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
===================
|
||||
BPF_MAP_TYPE_CPUMAP
|
||||
===================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_CPUMAP`` was introduced in kernel version 4.15
|
||||
|
||||
.. kernel-doc:: kernel/bpf/cpumap.c
|
||||
:doc: cpu map
|
||||
|
||||
An example use-case for this map type is software based Receive Side Scaling (RSS).
|
||||
|
||||
The CPUMAP represents the CPUs in the system indexed as the map-key, and the
|
||||
map-value is the config setting (per CPUMAP entry). Each CPUMAP entry has a dedicated
|
||||
kernel thread bound to the given CPU to represent the remote CPU execution unit.
|
||||
|
||||
Starting from Linux kernel version 5.9 the CPUMAP can run a second XDP program
|
||||
on the remote CPU. This allows an XDP program to split its processing across
|
||||
multiple CPUs. For example, a scenario where the initial CPU (that sees/receives
|
||||
the packets) needs to do minimal packet processing and the remote CPU (to which
|
||||
the packet is directed) can afford to spend more cycles processing the frame. The
|
||||
initial CPU is where the XDP redirect program is executed. The remote CPU
|
||||
receives raw ``xdp_frame`` objects.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
bpf_redirect_map()
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
|
||||
|
||||
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
|
||||
For ``BPF_MAP_TYPE_CPUMAP`` this map contains references to CPUs.
|
||||
|
||||
The lower two bits of ``flags`` are used as the return code if the map lookup
|
||||
fails. This is so that the return value can be one of the XDP program return
|
||||
codes up to ``XDP_TX``, as chosen by the caller.
|
||||
|
||||
User space
|
||||
----------
|
||||
.. note::
|
||||
CPUMAP entries can only be updated/looked up/deleted from user space and not
|
||||
from an eBPF program. Trying to call these functions from a kernel eBPF
|
||||
program will result in the program failing to load and a verifier warning.
|
||||
|
||||
bpf_map_update_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags);
|
||||
|
||||
CPU entries can be added or updated using the ``bpf_map_update_elem()``
|
||||
helper. This helper replaces existing elements atomically. The ``value`` parameter
|
||||
can be ``struct bpf_cpumap_val``.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct bpf_cpumap_val {
|
||||
__u32 qsize; /* queue size to remote target CPU */
|
||||
union {
|
||||
int fd; /* prog fd on map write */
|
||||
__u32 id; /* prog id on map read */
|
||||
} bpf_prog;
|
||||
};
|
||||
|
||||
The flags argument can be one of the following:
|
||||
- BPF_ANY: Create a new element or update an existing element.
|
||||
- BPF_NOEXIST: Create a new element only if it did not exist.
|
||||
- BPF_EXIST: Update an existing element.
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_lookup_elem(int fd, const void *key, void *value);
|
||||
|
||||
CPU entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||
helper.
|
||||
|
||||
bpf_map_delete_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_delete_elem(int fd, const void *key);
|
||||
|
||||
CPU entries can be deleted using the ``bpf_map_delete_elem()``
|
||||
helper. This helper will return 0 on success, or negative error in case of
|
||||
failure.
|
||||
|
||||
Examples
|
||||
========
|
||||
Kernel
|
||||
------
|
||||
|
||||
The following code snippet shows how to declare a ``BPF_MAP_TYPE_CPUMAP`` called
|
||||
``cpu_map`` and how to redirect packets to a remote CPU using a round robin scheme.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_CPUMAP);
|
||||
__type(key, __u32);
|
||||
__type(value, struct bpf_cpumap_val);
|
||||
__uint(max_entries, 12);
|
||||
} cpu_map SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__type(key, __u32);
|
||||
__type(value, __u32);
|
||||
__uint(max_entries, 12);
|
||||
} cpus_available SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
|
||||
__type(key, __u32);
|
||||
__type(value, __u32);
|
||||
__uint(max_entries, 1);
|
||||
} cpus_iterator SEC(".maps");
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_redir_cpu_round_robin(struct xdp_md *ctx)
|
||||
{
|
||||
__u32 key = 0;
|
||||
__u32 cpu_dest = 0;
|
||||
__u32 *cpu_selected, *cpu_iterator;
|
||||
__u32 cpu_idx;
|
||||
|
||||
cpu_iterator = bpf_map_lookup_elem(&cpus_iterator, &key);
|
||||
if (!cpu_iterator)
|
||||
return XDP_ABORTED;
|
||||
cpu_idx = *cpu_iterator;
|
||||
|
||||
*cpu_iterator += 1;
|
||||
if (*cpu_iterator == bpf_num_possible_cpus())
|
||||
*cpu_iterator = 0;
|
||||
|
||||
cpu_selected = bpf_map_lookup_elem(&cpus_available, &cpu_idx);
|
||||
if (!cpu_selected)
|
||||
return XDP_ABORTED;
|
||||
cpu_dest = *cpu_selected;
|
||||
|
||||
if (cpu_dest >= bpf_num_possible_cpus())
|
||||
return XDP_ABORTED;
|
||||
|
||||
return bpf_redirect_map(&cpu_map, cpu_dest, 0);
|
||||
}
|
||||
|
||||
User space
|
||||
----------
|
||||
|
||||
The following code snippet shows how to dynamically set the max_entries for a
|
||||
CPUMAP to the max number of cpus available on the system.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int set_max_cpu_entries(struct bpf_map *cpu_map)
|
||||
{
|
||||
if (bpf_map__set_max_entries(cpu_map, libbpf_num_possible_cpus()) < 0) {
|
||||
fprintf(stderr, "Failed to set max entries for cpu_map map: %s",
|
||||
strerror(errno));
|
||||
return -1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
References
|
||||
===========
|
||||
|
||||
- https://developers.redhat.com/blog/2021/05/13/receive-side-scaling-rss-with-ebpf-and-cpumap#redirecting_into_a_cpumap
|
238
Documentation/bpf/map_devmap.rst
Normal file
238
Documentation/bpf/map_devmap.rst
Normal file
@ -0,0 +1,238 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
=================================================
|
||||
BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH
|
||||
=================================================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14
|
||||
- ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4
|
||||
|
||||
``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily
|
||||
used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``.
|
||||
``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as
|
||||
the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH``
|
||||
is backed by a hash table that uses a key to lookup a reference to a net device.
|
||||
The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``>
|
||||
pairs to update the maps with new net devices.
|
||||
|
||||
.. note::
|
||||
- The key to a hash map doesn't have to be an ``ifindex``.
|
||||
- While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices
|
||||
it comes at the cost of a hash of the key when performing a look up.
|
||||
|
||||
The setup and packet enqueue/send code is shared between the two types of
|
||||
devmap; only the lookup and insertion is different.
|
||||
|
||||
Usage
|
||||
=====
|
||||
Kernel BPF
|
||||
----------
|
||||
bpf_redirect_map()
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
|
||||
|
||||
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
|
||||
For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains
|
||||
references to net devices (for forwarding packets through other ports).
|
||||
|
||||
The lower two bits of *flags* are used as the return code if the map lookup
|
||||
fails. This is so that the return value can be one of the XDP program return
|
||||
codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags``
|
||||
can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined
|
||||
below.
|
||||
|
||||
With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces
|
||||
in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded
|
||||
from the broadcast.
|
||||
|
||||
.. note::
|
||||
- The key is ignored if BPF_F_BROADCAST is set.
|
||||
- The broadcast feature can also be used to implement multicast forwarding:
|
||||
simply create multiple DEVMAPs, each one corresponding to a single multicast group.
|
||||
|
||||
This helper will return ``XDP_REDIRECT`` on success, or the value of the two
|
||||
lower bits of the ``flags`` argument if the map lookup fails.
|
||||
|
||||
More information about redirection can be found :doc:`redirect`
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||
helper.
|
||||
|
||||
User space
|
||||
----------
|
||||
.. note::
|
||||
DEVMAP entries can only be updated/deleted from user space and not
|
||||
from an eBPF program. Trying to call these functions from a kernel eBPF
|
||||
program will result in the program failing to load and a verifier warning.
|
||||
|
||||
bpf_map_update_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags);
|
||||
|
||||
Net device entries can be added or updated using the ``bpf_map_update_elem()``
|
||||
helper. This helper replaces existing elements atomically. The ``value`` parameter
|
||||
can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards
|
||||
compatibility.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct bpf_devmap_val {
|
||||
__u32 ifindex; /* device index */
|
||||
union {
|
||||
int fd; /* prog fd on map write */
|
||||
__u32 id; /* prog id on map read */
|
||||
} bpf_prog;
|
||||
};
|
||||
|
||||
The ``flags`` argument can be one of the following:
|
||||
- ``BPF_ANY``: Create a new element or update an existing element.
|
||||
- ``BPF_NOEXIST``: Create a new element only if it did not exist.
|
||||
- ``BPF_EXIST``: Update an existing element.
|
||||
|
||||
DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd``
|
||||
to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have
|
||||
access to both Rx device and Tx device. The program associated with the ``fd``
|
||||
must have type XDP with expected attach type ``xdp_devmap``.
|
||||
When a program is associated with a device index, the program is run on an
|
||||
``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples
|
||||
of how to attach/use xdp_devmap progs can be found in the kernel selftests:
|
||||
|
||||
- ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c``
|
||||
- ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c``
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
.. c:function::
|
||||
int bpf_map_lookup_elem(int fd, const void *key, void *value);
|
||||
|
||||
Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||
helper.
|
||||
|
||||
bpf_map_delete_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
.. c:function::
|
||||
int bpf_map_delete_elem(int fd, const void *key);
|
||||
|
||||
Net device entries can be deleted using the ``bpf_map_delete_elem()``
|
||||
helper. This helper will return 0 on success, or negative error in case of
|
||||
failure.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP``
|
||||
called tx_port.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_DEVMAP);
|
||||
__type(key, __u32);
|
||||
__type(value, __u32);
|
||||
__uint(max_entries, 256);
|
||||
} tx_port SEC(".maps");
|
||||
|
||||
The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH``
|
||||
called forward_map.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_DEVMAP_HASH);
|
||||
__type(key, __u32);
|
||||
__type(value, struct bpf_devmap_val);
|
||||
__uint(max_entries, 32);
|
||||
} forward_map SEC(".maps");
|
||||
|
||||
.. note::
|
||||
|
||||
The value type in the DEVMAP above is a ``struct bpf_devmap_val``
|
||||
|
||||
The following code snippet shows a simple xdp_redirect_map program. This program
|
||||
would work with a user space program that populates the devmap ``forward_map`` based
|
||||
on ingress ifindexes. The BPF program (below) is redirecting packets using the
|
||||
ingress ``ifindex`` as the ``key``.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_redirect_map_func(struct xdp_md *ctx)
|
||||
{
|
||||
int index = ctx->ingress_ifindex;
|
||||
|
||||
return bpf_redirect_map(&forward_map, index, 0);
|
||||
}
|
||||
|
||||
The following code snippet shows a BPF program that is broadcasting packets to
|
||||
all the interfaces in the ``tx_port`` devmap.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_redirect_map_func(struct xdp_md *ctx)
|
||||
{
|
||||
return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS);
|
||||
}
|
||||
|
||||
User space
|
||||
----------
|
||||
|
||||
The following code snippet shows how to update a devmap called ``tx_port``.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int update_devmap(int ifindex, int redirect_ifindex)
|
||||
{
|
||||
int ret;
|
||||
|
||||
ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0);
|
||||
if (ret < 0) {
|
||||
fprintf(stderr, "Failed to update devmap_ value: %s\n",
|
||||
strerror(errno));
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
The following code snippet shows how to update a hash_devmap called ``forward_map``.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int update_devmap(int ifindex, int redirect_ifindex)
|
||||
{
|
||||
struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex };
|
||||
int ret;
|
||||
|
||||
ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0);
|
||||
if (ret < 0) {
|
||||
fprintf(stderr, "Failed to update devmap_ value: %s\n",
|
||||
strerror(errno));
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
References
|
||||
===========
|
||||
|
||||
- https://lwn.net/Articles/728146/
|
||||
- https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176
|
||||
- https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106
|
@ -34,7 +34,14 @@ the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.
|
||||
Usage
|
||||
=====
|
||||
|
||||
.. c:function::
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
bpf_map_update_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
||||
|
||||
Hash entries can be added or updated using the ``bpf_map_update_elem()``
|
||||
@ -49,14 +56,22 @@ parameter can be used to control the update behaviour:
|
||||
``bpf_map_update_elem()`` returns 0 on success, or negative error in
|
||||
case of failure.
|
||||
|
||||
.. c:function::
|
||||
bpf_map_lookup_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||
helper. This helper returns a pointer to the value associated with
|
||||
``key``, or ``NULL`` if no entry was found.
|
||||
|
||||
.. c:function::
|
||||
bpf_map_delete_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
Hash entries can be deleted using the ``bpf_map_delete_elem()``
|
||||
@ -70,7 +85,11 @@ For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
|
||||
the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
|
||||
automatically access the hash slot for the current CPU.
|
||||
|
||||
.. c:function::
|
||||
bpf_map_lookup_percpu_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
|
||||
|
||||
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
|
||||
@ -89,7 +108,11 @@ See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
|
||||
Userspace
|
||||
---------
|
||||
|
||||
.. c:function::
|
||||
bpf_map_get_next_key()
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
|
||||
|
||||
In userspace, it is possible to iterate through the keys of a hash using
|
||||
|
197
Documentation/bpf/map_lpm_trie.rst
Normal file
197
Documentation/bpf/map_lpm_trie.rst
Normal file
@ -0,0 +1,197 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
=====================
|
||||
BPF_MAP_TYPE_LPM_TRIE
|
||||
=====================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_LPM_TRIE`` was introduced in kernel version 4.11
|
||||
|
||||
``BPF_MAP_TYPE_LPM_TRIE`` provides a longest prefix match algorithm that
|
||||
can be used to match IP addresses to a stored set of prefixes.
|
||||
Internally, data is stored in an unbalanced trie of nodes that uses
|
||||
``prefixlen,data`` pairs as its keys. The ``data`` is interpreted in
|
||||
network byte order, i.e. big endian, so ``data[0]`` stores the most
|
||||
significant byte.
|
||||
|
||||
LPM tries may be created with a maximum prefix length that is a multiple
|
||||
of 8, in the range from 8 to 2048. The key used for lookup and update
|
||||
operations is a ``struct bpf_lpm_trie_key``, extended by
|
||||
``max_prefixlen/8`` bytes.
|
||||
|
||||
- For IPv4 addresses the data length is 4 bytes
|
||||
- For IPv6 addresses the data length is 16 bytes
|
||||
|
||||
The value type stored in the LPM trie can be any user defined type.
|
||||
|
||||
.. note::
|
||||
When creating a map of type ``BPF_MAP_TYPE_LPM_TRIE`` you must set the
|
||||
``BPF_F_NO_PREALLOC`` flag.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
The longest prefix entry for a given data value can be found using the
|
||||
``bpf_map_lookup_elem()`` helper. This helper returns a pointer to the
|
||||
value associated with the longest matching ``key``, or ``NULL`` if no
|
||||
entry was found.
|
||||
|
||||
The ``key`` should have ``prefixlen`` set to ``max_prefixlen`` when
|
||||
performing longest prefix lookups. For example, when searching for the
|
||||
longest prefix match for an IPv4 address, ``prefixlen`` should be set to
|
||||
``32``.
|
||||
|
||||
bpf_map_update_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
||||
|
||||
Prefix entries can be added or updated using the ``bpf_map_update_elem()``
|
||||
helper. This helper replaces existing elements atomically.
|
||||
|
||||
``bpf_map_update_elem()`` returns ``0`` on success, or negative error in
|
||||
case of failure.
|
||||
|
||||
.. note::
|
||||
The flags parameter must be one of BPF_ANY, BPF_NOEXIST or BPF_EXIST,
|
||||
but the value is ignored, giving BPF_ANY semantics.
|
||||
|
||||
bpf_map_delete_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
Prefix entries can be deleted using the ``bpf_map_delete_elem()``
|
||||
helper. This helper will return 0 on success, or negative error in case
|
||||
of failure.
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
Access from userspace uses libbpf APIs with the same names as above, with
|
||||
the map identified by ``fd``.
|
||||
|
||||
bpf_map_get_next_key()
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_get_next_key (int fd, const void *cur_key, void *next_key)
|
||||
|
||||
A userspace program can iterate through the entries in an LPM trie using
|
||||
libbpf's ``bpf_map_get_next_key()`` function. The first key can be
|
||||
fetched by calling ``bpf_map_get_next_key()`` with ``cur_key`` set to
|
||||
``NULL``. Subsequent calls will fetch the next key that follows the
|
||||
current key. ``bpf_map_get_next_key()`` returns ``0`` on success,
|
||||
``-ENOENT`` if ``cur_key`` is the last key in the trie, or negative
|
||||
error in case of failure.
|
||||
|
||||
``bpf_map_get_next_key()`` will iterate through the LPM trie elements
|
||||
from leftmost leaf first. This means that iteration will return more
|
||||
specific keys before less specific ones.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Please see ``tools/testing/selftests/bpf/test_lpm_map.c`` for examples
|
||||
of LPM trie usage from userspace. The code snippets below demonstrate
|
||||
API usage.
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
The following BPF code snippet shows how to declare a new LPM trie for IPv4
|
||||
address prefixes:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
struct ipv4_lpm_key {
|
||||
__u32 prefixlen;
|
||||
__u32 data;
|
||||
};
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_LPM_TRIE);
|
||||
__type(key, struct ipv4_lpm_key);
|
||||
__type(value, __u32);
|
||||
__uint(map_flags, BPF_F_NO_PREALLOC);
|
||||
__uint(max_entries, 255);
|
||||
} ipv4_lpm_map SEC(".maps");
|
||||
|
||||
The following BPF code snippet shows how to lookup by IPv4 address:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *lookup(__u32 ipaddr)
|
||||
{
|
||||
struct ipv4_lpm_key key = {
|
||||
.prefixlen = 32,
|
||||
.data = ipaddr
|
||||
};
|
||||
|
||||
return bpf_map_lookup_elem(&ipv4_lpm_map, &key);
|
||||
}
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
The following snippet shows how to insert an IPv4 prefix entry into an
|
||||
LPM trie:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int add_prefix_entry(int lpm_fd, __u32 addr, __u32 prefixlen, struct value *value)
|
||||
{
|
||||
struct ipv4_lpm_key ipv4_key = {
|
||||
.prefixlen = prefixlen,
|
||||
.data = addr
|
||||
};
|
||||
return bpf_map_update_elem(lpm_fd, &ipv4_key, value, BPF_ANY);
|
||||
}
|
||||
|
||||
The following snippet shows a userspace program walking through the entries
|
||||
of an LPM trie:
|
||||
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <bpf/libbpf.h>
|
||||
#include <bpf/bpf.h>
|
||||
|
||||
void iterate_lpm_trie(int map_fd)
|
||||
{
|
||||
struct ipv4_lpm_key *cur_key = NULL;
|
||||
struct ipv4_lpm_key next_key;
|
||||
struct value value;
|
||||
int err;
|
||||
|
||||
for (;;) {
|
||||
err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
|
||||
if (err)
|
||||
break;
|
||||
|
||||
bpf_map_lookup_elem(map_fd, &next_key, &value);
|
||||
|
||||
/* Use key and value here */
|
||||
|
||||
cur_key = &next_key;
|
||||
}
|
||||
}
|
130
Documentation/bpf/map_of_maps.rst
Normal file
130
Documentation/bpf/map_of_maps.rst
Normal file
@ -0,0 +1,130 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
========================================================
|
||||
BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS
|
||||
========================================================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_ARRAY_OF_MAPS`` and ``BPF_MAP_TYPE_HASH_OF_MAPS`` were
|
||||
introduced in kernel version 4.12
|
||||
|
||||
``BPF_MAP_TYPE_ARRAY_OF_MAPS`` and ``BPF_MAP_TYPE_HASH_OF_MAPS`` provide general
|
||||
purpose support for map in map storage. One level of nesting is supported, where
|
||||
an outer map contains instances of a single type of inner map, for example
|
||||
``array_of_maps->sock_map``.
|
||||
|
||||
When creating an outer map, an inner map instance is used to initialize the
|
||||
metadata that the outer map holds about its inner maps. This inner map has a
|
||||
separate lifetime from the outer map and can be deleted after the outer map has
|
||||
been created.
|
||||
|
||||
The outer map supports element lookup, update and delete from user space using
|
||||
the syscall API. A BPF program is only allowed to do element lookup in the outer
|
||||
map.
|
||||
|
||||
.. note::
|
||||
- Multi-level nesting is not supported.
|
||||
- Any BPF map type can be used as an inner map, except for
|
||||
``BPF_MAP_TYPE_PROG_ARRAY``.
|
||||
- A BPF program cannot update or delete outer map entries.
|
||||
|
||||
For ``BPF_MAP_TYPE_ARRAY_OF_MAPS`` the key is an unsigned 32-bit integer index
|
||||
into the array. The array is a fixed size with ``max_entries`` elements that are
|
||||
zero initialized when created.
|
||||
|
||||
For ``BPF_MAP_TYPE_HASH_OF_MAPS`` the key type can be chosen when defining the
|
||||
map. The kernel is responsible for allocating and freeing key/value pairs, up to
|
||||
the max_entries limit that you specify. Hash maps use pre-allocation of hash
|
||||
table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be used to disable
|
||||
pre-allocation when it is too memory expensive.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF Helper
|
||||
-----------------
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
Inner maps can be retrieved using the ``bpf_map_lookup_elem()`` helper. This
|
||||
helper returns a pointer to the inner map, or ``NULL`` if no entry was found.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Kernel BPF Example
|
||||
------------------
|
||||
|
||||
This snippet shows how to create and initialise an array of devmaps in a BPF
|
||||
program. Note that the outer array can only be modified from user space using
|
||||
the syscall API.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct inner_map {
|
||||
__uint(type, BPF_MAP_TYPE_DEVMAP);
|
||||
__uint(max_entries, 10);
|
||||
__type(key, __u32);
|
||||
__type(value, __u32);
|
||||
} inner_map1 SEC(".maps"), inner_map2 SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
|
||||
__uint(max_entries, 2);
|
||||
__type(key, __u32);
|
||||
__array(values, struct inner_map);
|
||||
} outer_map SEC(".maps") = {
|
||||
.values = { &inner_map1,
|
||||
&inner_map2 }
|
||||
};
|
||||
|
||||
See ``progs/test_btf_map_in_map.c`` in ``tools/testing/selftests/bpf`` for more
|
||||
examples of declarative initialisation of outer maps.
|
||||
|
||||
User Space
|
||||
----------
|
||||
|
||||
This snippet shows how to create an array based outer map:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int create_outer_array(int inner_fd) {
|
||||
LIBBPF_OPTS(bpf_map_create_opts, opts, .inner_map_fd = inner_fd);
|
||||
int fd;
|
||||
|
||||
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY_OF_MAPS,
|
||||
"example_array", /* name */
|
||||
sizeof(__u32), /* key size */
|
||||
sizeof(__u32), /* value size */
|
||||
256, /* max entries */
|
||||
&opts); /* create opts */
|
||||
return fd;
|
||||
}
|
||||
|
||||
|
||||
This snippet shows how to add an inner map to an outer map:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int add_devmap(int outer_fd, int index, const char *name) {
|
||||
int fd;
|
||||
|
||||
fd = bpf_map_create(BPF_MAP_TYPE_DEVMAP, name,
|
||||
sizeof(__u32), sizeof(__u32), 256, NULL);
|
||||
if (fd < 0)
|
||||
return fd;
|
||||
|
||||
return bpf_map_update_elem(outer_fd, &index, &fd, BPF_ANY);
|
||||
}
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
- https://lore.kernel.org/netdev/20170322170035.923581-3-kafai@fb.com/
|
||||
- https://lore.kernel.org/netdev/20170322170035.923581-4-kafai@fb.com/
|
146
Documentation/bpf/map_queue_stack.rst
Normal file
146
Documentation/bpf/map_queue_stack.rst
Normal file
@ -0,0 +1,146 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
=========================================
|
||||
BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK
|
||||
=========================================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_QUEUE`` and ``BPF_MAP_TYPE_STACK`` were introduced
|
||||
in kernel version 4.20
|
||||
|
||||
``BPF_MAP_TYPE_QUEUE`` provides FIFO storage and ``BPF_MAP_TYPE_STACK``
|
||||
provides LIFO storage for BPF programs. These maps support peek, pop and
|
||||
push operations that are exposed to BPF programs through the respective
|
||||
helpers. These operations are exposed to userspace applications using
|
||||
the existing ``bpf`` syscall in the following way:
|
||||
|
||||
- ``BPF_MAP_LOOKUP_ELEM`` -> peek
|
||||
- ``BPF_MAP_LOOKUP_AND_DELETE_ELEM`` -> pop
|
||||
- ``BPF_MAP_UPDATE_ELEM`` -> push
|
||||
|
||||
``BPF_MAP_TYPE_QUEUE`` and ``BPF_MAP_TYPE_STACK`` do not support
|
||||
``BPF_F_NO_PREALLOC``.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
bpf_map_push_elem()
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
|
||||
|
||||
An element ``value`` can be added to a queue or stack using the
|
||||
``bpf_map_push_elem`` helper. The ``flags`` parameter must be set to
|
||||
``BPF_ANY`` or ``BPF_EXIST``. If ``flags`` is set to ``BPF_EXIST`` then,
|
||||
when the queue or stack is full, the oldest element will be removed to
|
||||
make room for ``value`` to be added. Returns ``0`` on success, or
|
||||
negative error in case of failure.
|
||||
|
||||
bpf_map_peek_elem()
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_peek_elem(struct bpf_map *map, void *value)
|
||||
|
||||
This helper fetches an element ``value`` from a queue or stack without
|
||||
removing it. Returns ``0`` on success, or negative error in case of
|
||||
failure.
|
||||
|
||||
bpf_map_pop_elem()
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_map_pop_elem(struct bpf_map *map, void *value)
|
||||
|
||||
This helper removes an element into ``value`` from a queue or
|
||||
stack. Returns ``0`` on success, or negative error in case of failure.
|
||||
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
bpf_map_update_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags)
|
||||
|
||||
A userspace program can push ``value`` onto a queue or stack using libbpf's
|
||||
``bpf_map_update_elem`` function. The ``key`` parameter must be set to
|
||||
``NULL`` and ``flags`` must be set to ``BPF_ANY`` or ``BPF_EXIST``, with the
|
||||
same semantics as the ``bpf_map_push_elem`` kernel helper. Returns ``0`` on
|
||||
success, or negative error in case of failure.
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_lookup_elem (int fd, const void *key, void *value)
|
||||
|
||||
A userspace program can peek at the ``value`` at the head of a queue or stack
|
||||
using the libbpf ``bpf_map_lookup_elem`` function. The ``key`` parameter must be
|
||||
set to ``NULL``. Returns ``0`` on success, or negative error in case of
|
||||
failure.
|
||||
|
||||
bpf_map_lookup_and_delete_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_lookup_and_delete_elem (int fd, const void *key, void *value)
|
||||
|
||||
A userspace program can pop a ``value`` from the head of a queue or stack using
|
||||
the libbpf ``bpf_map_lookup_and_delete_elem`` function. The ``key`` parameter
|
||||
must be set to ``NULL``. Returns ``0`` on success, or negative error in case of
|
||||
failure.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
This snippet shows how to declare a queue in a BPF program:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_QUEUE);
|
||||
__type(value, __u32);
|
||||
__uint(max_entries, 10);
|
||||
} queue SEC(".maps");
|
||||
|
||||
|
||||
Userspace
|
||||
---------
|
||||
|
||||
This snippet shows how to use libbpf's low-level API to create a queue from
|
||||
userspace:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int create_queue()
|
||||
{
|
||||
return bpf_map_create(BPF_MAP_TYPE_QUEUE,
|
||||
"sample_queue", /* name */
|
||||
0, /* key size, must be zero */
|
||||
sizeof(__u32), /* value size */
|
||||
10, /* max entries */
|
||||
NULL); /* create options */
|
||||
}
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
https://lwn.net/ml/netdev/153986858555.9127.14517764371945179514.stgit@kernel/
|
155
Documentation/bpf/map_sk_storage.rst
Normal file
155
Documentation/bpf/map_sk_storage.rst
Normal file
@ -0,0 +1,155 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
=======================
|
||||
BPF_MAP_TYPE_SK_STORAGE
|
||||
=======================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_SK_STORAGE`` was introduced in kernel version 5.2
|
||||
|
||||
``BPF_MAP_TYPE_SK_STORAGE`` is used to provide socket-local storage for BPF
|
||||
programs. A map of type ``BPF_MAP_TYPE_SK_STORAGE`` declares the type of storage
|
||||
to be provided and acts as the handle for accessing the socket-local
|
||||
storage. The values for maps of type ``BPF_MAP_TYPE_SK_STORAGE`` are stored
|
||||
locally with each socket instead of with the map. The kernel is responsible for
|
||||
allocating storage for a socket when requested and for freeing the storage when
|
||||
either the map or the socket is deleted.
|
||||
|
||||
.. note::
|
||||
- The key type must be ``int`` and ``max_entries`` must be set to ``0``.
|
||||
- The ``BPF_F_NO_PREALLOC`` flag must be used when creating a map for
|
||||
socket-local storage.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
bpf_sk_storage_get()
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
|
||||
|
||||
Socket-local storage can be retrieved using the ``bpf_sk_storage_get()``
|
||||
helper. The helper gets the storage from ``sk`` that is associated with ``map``.
|
||||
If the ``BPF_LOCAL_STORAGE_GET_F_CREATE`` flag is used then
|
||||
``bpf_sk_storage_get()`` will create the storage for ``sk`` if it does not
|
||||
already exist. ``value`` can be used together with
|
||||
``BPF_LOCAL_STORAGE_GET_F_CREATE`` to initialize the storage value, otherwise it
|
||||
will be zero initialized. Returns a pointer to the storage on success, or
|
||||
``NULL`` in case of failure.
|
||||
|
||||
.. note::
|
||||
- ``sk`` is a kernel ``struct sock`` pointer for LSM or tracing programs.
|
||||
- ``sk`` is a ``struct bpf_sock`` pointer for other program types.
|
||||
|
||||
bpf_sk_storage_delete()
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
|
||||
|
||||
Socket-local storage can be deleted using the ``bpf_sk_storage_delete()``
|
||||
helper. The helper deletes the storage from ``sk`` that is identified by
|
||||
``map``. Returns ``0`` on success, or negative error in case of failure.
|
||||
|
||||
User space
|
||||
----------
|
||||
|
||||
bpf_map_update_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_update_elem(int map_fd, const void *key, const void *value, __u64 flags)
|
||||
|
||||
Socket-local storage for the socket identified by ``key`` belonging to
|
||||
``map_fd`` can be added or updated using the ``bpf_map_update_elem()`` libbpf
|
||||
function. ``key`` must be a pointer to a valid ``fd`` in the user space
|
||||
program. The ``flags`` parameter can be used to control the update behaviour:
|
||||
|
||||
- ``BPF_ANY`` will create storage for ``fd`` or update existing storage.
|
||||
- ``BPF_NOEXIST`` will create storage for ``fd`` only if it did not already
|
||||
exist, otherwise the call will fail with ``-EEXIST``.
|
||||
- ``BPF_EXIST`` will update existing storage for ``fd`` if it already exists,
|
||||
otherwise the call will fail with ``-ENOENT``.
|
||||
|
||||
Returns ``0`` on success, or negative error in case of failure.
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_lookup_elem(int map_fd, const void *key, void *value)
|
||||
|
||||
Socket-local storage for the socket identified by ``key`` belonging to
|
||||
``map_fd`` can be retrieved using the ``bpf_map_lookup_elem()`` libbpf
|
||||
function. ``key`` must be a pointer to a valid ``fd`` in the user space
|
||||
program. Returns ``0`` on success, or negative error in case of failure.
|
||||
|
||||
bpf_map_delete_elem()
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_delete_elem(int map_fd, const void *key)
|
||||
|
||||
Socket-local storage for the socket identified by ``key`` belonging to
|
||||
``map_fd`` can be deleted using the ``bpf_map_delete_elem()`` libbpf
|
||||
function. Returns ``0`` on success, or negative error in case of failure.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
|
||||
This snippet shows how to declare socket-local storage in a BPF program:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_SK_STORAGE);
|
||||
__uint(map_flags, BPF_F_NO_PREALLOC);
|
||||
__type(key, int);
|
||||
__type(value, struct my_storage);
|
||||
} socket_storage SEC(".maps");
|
||||
|
||||
This snippet shows how to retrieve socket-local storage in a BPF program:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
SEC("sockops")
|
||||
int _sockops(struct bpf_sock_ops *ctx)
|
||||
{
|
||||
struct my_storage *storage;
|
||||
struct bpf_sock *sk;
|
||||
|
||||
sk = ctx->sk;
|
||||
if (!sk)
|
||||
return 1;
|
||||
|
||||
storage = bpf_sk_storage_get(&socket_storage, sk, 0,
|
||||
BPF_LOCAL_STORAGE_GET_F_CREATE);
|
||||
if (!storage)
|
||||
return 1;
|
||||
|
||||
/* Use 'storage' here */
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
||||
Please see the ``tools/testing/selftests/bpf`` directory for functional
|
||||
examples.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
https://lwn.net/ml/netdev/20190426171103.61892-1-kafai@fb.com/
|
192
Documentation/bpf/map_xskmap.rst
Normal file
192
Documentation/bpf/map_xskmap.rst
Normal file
@ -0,0 +1,192 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
===================
|
||||
BPF_MAP_TYPE_XSKMAP
|
||||
===================
|
||||
|
||||
.. note::
|
||||
- ``BPF_MAP_TYPE_XSKMAP`` was introduced in kernel version 4.18
|
||||
|
||||
The ``BPF_MAP_TYPE_XSKMAP`` is used as a backend map for XDP BPF helper
|
||||
call ``bpf_redirect_map()`` and ``XDP_REDIRECT`` action, like 'devmap' and 'cpumap'.
|
||||
This map type redirects raw XDP frames to `AF_XDP`_ sockets (XSKs), a new type of
|
||||
address family in the kernel that allows redirection of frames from a driver to
|
||||
user space without having to traverse the full network stack. An AF_XDP socket
|
||||
binds to a single netdev queue. A mapping of XSKs to queues is shown below:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
+---------------------------------------------------+
|
||||
| xsk A | xsk B | xsk C |<---+ User space
|
||||
=========================================================|==========
|
||||
| Queue 0 | Queue 1 | Queue 2 | | Kernel
|
||||
+---------------------------------------------------+ |
|
||||
| Netdev eth0 | |
|
||||
+---------------------------------------------------+ |
|
||||
| +=============+ | |
|
||||
| | key | xsk | | |
|
||||
| +---------+ +=============+ | |
|
||||
| | | | 0 | xsk A | | |
|
||||
| | | +-------------+ | |
|
||||
| | | | 1 | xsk B | | |
|
||||
| | BPF |-- redirect -->+-------------+-------------+
|
||||
| | prog | | 2 | xsk C | |
|
||||
| | | +-------------+ |
|
||||
| | | |
|
||||
| | | |
|
||||
| +---------+ |
|
||||
| |
|
||||
+---------------------------------------------------+
|
||||
|
||||
.. note::
|
||||
An AF_XDP socket that is bound to a certain <netdev/queue_id> will *only*
|
||||
accept XDP frames from that <netdev/queue_id>. If an XDP program tries to redirect
|
||||
from a <netdev/queue_id> other than what the socket is bound to, the frame will
|
||||
not be received on the socket.
|
||||
|
||||
Typically an XSKMAP is created per netdev. This map contains an array of XSK File
|
||||
Descriptors (FDs). The number of array elements is typically set or adjusted using
|
||||
the ``max_entries`` map parameter. For AF_XDP ``max_entries`` is equal to the number
|
||||
of queues supported by the netdev.
|
||||
|
||||
.. note::
|
||||
Both the map key and map value size must be 4 bytes.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
Kernel BPF
|
||||
----------
|
||||
bpf_redirect_map()
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
|
||||
|
||||
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
|
||||
For ``BPF_MAP_TYPE_XSKMAP`` this map contains references to XSK FDs
|
||||
for sockets attached to a netdev's queues.
|
||||
|
||||
.. note::
|
||||
If the map is empty at an index, the packet is dropped. This means that it is
|
||||
necessary to have an XDP program loaded with at least one XSK in the
|
||||
XSKMAP to be able to get any traffic to user space through the socket.
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||
|
||||
XSK entry references of type ``struct xdp_sock *`` can be retrieved using the
|
||||
``bpf_map_lookup_elem()`` helper.
|
||||
|
||||
User space
|
||||
----------
|
||||
.. note::
|
||||
XSK entries can only be updated/deleted from user space and not from
|
||||
a BPF program. Trying to call these functions from a kernel BPF program will
|
||||
result in the program failing to load and a verifier warning.
|
||||
|
||||
bpf_map_update_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags)
|
||||
|
||||
XSK entries can be added or updated using the ``bpf_map_update_elem()``
|
||||
helper. The ``key`` parameter is equal to the queue_id of the queue the XSK
|
||||
is attaching to. And the ``value`` parameter is the FD value of that socket.
|
||||
|
||||
Under the hood, the XSKMAP update function uses the XSK FD value to retrieve the
|
||||
associated ``struct xdp_sock`` instance.
|
||||
|
||||
The flags argument can be one of the following:
|
||||
|
||||
- BPF_ANY: Create a new element or update an existing element.
|
||||
- BPF_NOEXIST: Create a new element only if it did not exist.
|
||||
- BPF_EXIST: Update an existing element.
|
||||
|
||||
bpf_map_lookup_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_lookup_elem(int fd, const void *key, void *value)
|
||||
|
||||
Returns ``struct xdp_sock *`` or negative error in case of failure.
|
||||
|
||||
bpf_map_delete_elem()
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
.. code-block:: c
|
||||
|
||||
int bpf_map_delete_elem(int fd, const void *key)
|
||||
|
||||
XSK entries can be deleted using the ``bpf_map_delete_elem()``
|
||||
helper. This helper will return 0 on success, or negative error in case of
|
||||
failure.
|
||||
|
||||
.. note::
|
||||
When `libxdp`_ deletes an XSK it also removes the associated socket
|
||||
entry from the XSKMAP.
|
||||
|
||||
Examples
|
||||
========
|
||||
Kernel
|
||||
------
|
||||
|
||||
The following code snippet shows how to declare a ``BPF_MAP_TYPE_XSKMAP`` called
|
||||
``xsks_map`` and how to redirect packets to an XSK.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_XSKMAP);
|
||||
__type(key, __u32);
|
||||
__type(value, __u32);
|
||||
__uint(max_entries, 64);
|
||||
} xsks_map SEC(".maps");
|
||||
|
||||
|
||||
SEC("xdp")
|
||||
int xsk_redir_prog(struct xdp_md *ctx)
|
||||
{
|
||||
__u32 index = ctx->rx_queue_index;
|
||||
|
||||
if (bpf_map_lookup_elem(&xsks_map, &index))
|
||||
return bpf_redirect_map(&xsks_map, index, 0);
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
User space
|
||||
----------
|
||||
|
||||
The following code snippet shows how to update an XSKMAP with an XSK entry.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int update_xsks_map(struct bpf_map *xsks_map, int queue_id, int xsk_fd)
|
||||
{
|
||||
int ret;
|
||||
|
||||
ret = bpf_map_update_elem(bpf_map__fd(xsks_map), &queue_id, &xsk_fd, 0);
|
||||
if (ret < 0)
|
||||
fprintf(stderr, "Failed to update xsks_map: %s\n", strerror(errno));
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
For an example on how create AF_XDP sockets, please see the AF_XDP-example and
|
||||
AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository.
|
||||
For a detailed explaination of the AF_XDP interface please see:
|
||||
|
||||
- `libxdp-readme`_.
|
||||
- `AF_XDP`_ kernel documentation.
|
||||
|
||||
.. note::
|
||||
The most comprehensive resource for using XSKMAPs and AF_XDP is `libxdp`_.
|
||||
|
||||
.. _libxdp: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
|
||||
.. _AF_XDP: https://www.kernel.org/doc/html/latest/networking/af_xdp.html
|
||||
.. _bpf-examples: https://github.com/xdp-project/bpf-examples
|
||||
.. _libxdp-readme: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp#using-af_xdp-sockets
|
@ -1,46 +1,19 @@
|
||||
|
||||
=========
|
||||
eBPF maps
|
||||
=========
|
||||
========
|
||||
BPF maps
|
||||
========
|
||||
|
||||
'maps' is a generic storage of different types for sharing data between kernel
|
||||
and userspace.
|
||||
BPF 'maps' provide generic storage of different types for sharing data between
|
||||
kernel and user space. There are several storage types available, including
|
||||
hash, array, bloom filter and radix-tree. Several of the map types exist to
|
||||
support specific BPF helpers that perform actions based on the map contents. The
|
||||
maps are accessed from BPF programs via BPF helpers which are documented in the
|
||||
`man-pages`_ for `bpf-helpers(7)`_.
|
||||
|
||||
The maps are accessed from user space via BPF syscall, which has commands:
|
||||
|
||||
- create a map with given type and attributes
|
||||
``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
|
||||
returns process-local file descriptor or negative error
|
||||
|
||||
- lookup key in a given map
|
||||
``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_fd, attr->key, attr->value
|
||||
returns zero and stores found elem into value or negative error
|
||||
|
||||
- create or update key/value pair in a given map
|
||||
``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_fd, attr->key, attr->value
|
||||
returns zero or negative error
|
||||
|
||||
- find and delete element by key in a given map
|
||||
``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_fd, attr->key
|
||||
|
||||
- to delete map: close(fd)
|
||||
Exiting process will delete maps automatically
|
||||
|
||||
userspace programs use this syscall to create/access maps that eBPF programs
|
||||
are concurrently updating.
|
||||
|
||||
maps can have different types: hash, array, bloom filter, radix-tree, etc.
|
||||
|
||||
The map is defined by:
|
||||
|
||||
- type
|
||||
- max number of elements
|
||||
- key size in bytes
|
||||
- value size in bytes
|
||||
BPF maps are accessed from user space via the ``bpf`` syscall, which provides
|
||||
commands to create maps, lookup elements, update elements and delete
|
||||
elements. More details of the BPF syscall are available in
|
||||
:doc:`/userspace-api/ebpf/syscall` and in the `man-pages`_ for `bpf(2)`_.
|
||||
|
||||
Map Types
|
||||
=========
|
||||
@ -49,4 +22,60 @@ Map Types
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
|
||||
map_*
|
||||
map_*
|
||||
|
||||
Usage Notes
|
||||
===========
|
||||
|
||||
.. c:function::
|
||||
int bpf(int command, union bpf_attr *attr, u32 size)
|
||||
|
||||
Use the ``bpf()`` system call to perform the operation specified by
|
||||
``command``. The operation takes parameters provided in ``attr``. The ``size``
|
||||
argument is the size of the ``union bpf_attr`` in ``attr``.
|
||||
|
||||
**BPF_MAP_CREATE**
|
||||
|
||||
Create a map with the desired type and attributes in ``attr``:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int fd;
|
||||
union bpf_attr attr = {
|
||||
.map_type = BPF_MAP_TYPE_ARRAY; /* mandatory */
|
||||
.key_size = sizeof(__u32); /* mandatory */
|
||||
.value_size = sizeof(__u32); /* mandatory */
|
||||
.max_entries = 256; /* mandatory */
|
||||
.map_flags = BPF_F_MMAPABLE;
|
||||
.map_name = "example_array";
|
||||
};
|
||||
|
||||
fd = bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
|
||||
|
||||
Returns a process-local file descriptor on success, or negative error in case of
|
||||
failure. The map can be deleted by calling ``close(fd)``. Maps held by open
|
||||
file descriptors will be deleted automatically when a process exits.
|
||||
|
||||
.. note:: Valid characters for ``map_name`` are ``A-Z``, ``a-z``, ``0-9``,
|
||||
``'_'`` and ``'.'``.
|
||||
|
||||
**BPF_MAP_LOOKUP_ELEM**
|
||||
|
||||
Lookup key in a given map using ``attr->map_fd``, ``attr->key``,
|
||||
``attr->value``. Returns zero and stores found elem into ``attr->value`` on
|
||||
success, or negative error on failure.
|
||||
|
||||
**BPF_MAP_UPDATE_ELEM**
|
||||
|
||||
Create or update key/value pair in a given map using ``attr->map_fd``, ``attr->key``,
|
||||
``attr->value``. Returns zero on success or negative error on failure.
|
||||
|
||||
**BPF_MAP_DELETE_ELEM**
|
||||
|
||||
Find and delete element by key in a given map using ``attr->map_fd``,
|
||||
``attr->key``. Returns zero on success or negative error on failure.
|
||||
|
||||
.. Links:
|
||||
.. _man-pages: https://www.kernel.org/doc/man-pages/
|
||||
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
|
||||
.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
|
||||
|
@ -7,3 +7,6 @@ Program Types
|
||||
:glob:
|
||||
|
||||
prog_*
|
||||
|
||||
For a list of all program types, see :ref:`program_types_and_elf` in
|
||||
the :ref:`libbpf` documentation.
|
||||
|
81
Documentation/bpf/redirect.rst
Normal file
81
Documentation/bpf/redirect.rst
Normal file
@ -0,0 +1,81 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
.. Copyright (C) 2022 Red Hat, Inc.
|
||||
|
||||
========
|
||||
Redirect
|
||||
========
|
||||
XDP_REDIRECT
|
||||
############
|
||||
Supported maps
|
||||
--------------
|
||||
|
||||
XDP_REDIRECT works with the following map types:
|
||||
|
||||
- ``BPF_MAP_TYPE_DEVMAP``
|
||||
- ``BPF_MAP_TYPE_DEVMAP_HASH``
|
||||
- ``BPF_MAP_TYPE_CPUMAP``
|
||||
- ``BPF_MAP_TYPE_XSKMAP``
|
||||
|
||||
For more information on these maps, please see the specific map documentation.
|
||||
|
||||
Process
|
||||
-------
|
||||
|
||||
.. kernel-doc:: net/core/filter.c
|
||||
:doc: xdp redirect
|
||||
|
||||
.. note::
|
||||
Not all drivers support transmitting frames after a redirect, and for
|
||||
those that do, not all of them support non-linear frames. Non-linear xdp
|
||||
bufs/frames are bufs/frames that contain more than one fragment.
|
||||
|
||||
Debugging packet drops
|
||||
----------------------
|
||||
Silent packet drops for XDP_REDIRECT can be debugged using:
|
||||
|
||||
- bpf_trace
|
||||
- perf_record
|
||||
|
||||
bpf_trace
|
||||
^^^^^^^^^
|
||||
The following bpftrace command can be used to capture and count all XDP tracepoints:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
sudo bpftrace -e 'tracepoint:xdp:* { @cnt[probe] = count(); }'
|
||||
Attaching 12 probes...
|
||||
^C
|
||||
|
||||
@cnt[tracepoint:xdp:mem_connect]: 18
|
||||
@cnt[tracepoint:xdp:mem_disconnect]: 18
|
||||
@cnt[tracepoint:xdp:xdp_exception]: 19605
|
||||
@cnt[tracepoint:xdp:xdp_devmap_xmit]: 1393604
|
||||
@cnt[tracepoint:xdp:xdp_redirect]: 22292200
|
||||
|
||||
.. note::
|
||||
The various xdp tracepoints can be found in ``source/include/trace/events/xdp.h``
|
||||
|
||||
The following bpftrace command can be used to extract the ``ERRNO`` being returned as
|
||||
part of the err parameter:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
sudo bpftrace -e \
|
||||
'tracepoint:xdp:xdp_redirect*_err {@redir_errno[-args->err] = count();}
|
||||
tracepoint:xdp:xdp_devmap_xmit {@devmap_errno[-args->err] = count();}'
|
||||
|
||||
perf record
|
||||
^^^^^^^^^^^
|
||||
The perf tool also supports recording tracepoints:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
perf record -a -e xdp:xdp_redirect_err \
|
||||
-e xdp:xdp_redirect_map_err \
|
||||
-e xdp:xdp_exception \
|
||||
-e xdp:xdp_devmap_xmit
|
||||
|
||||
References
|
||||
===========
|
||||
|
||||
- https://github.com/xdp-project/xdp-tutorial/tree/master/tracing02-xdp-monitor
|
@ -29,6 +29,38 @@ properties:
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
memory-region:
|
||||
items:
|
||||
- description: firmware EMI region
|
||||
- description: firmware ILM region
|
||||
- description: firmware DLM region
|
||||
- description: firmware CPU DATA region
|
||||
- description: firmware BOOT region
|
||||
|
||||
memory-region-names:
|
||||
items:
|
||||
- const: wo-emi
|
||||
- const: wo-ilm
|
||||
- const: wo-dlm
|
||||
- const: wo-data
|
||||
- const: wo-boot
|
||||
|
||||
mediatek,wo-ccif:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
description: mediatek wed-wo controller interface.
|
||||
|
||||
allOf:
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: mediatek,mt7622-wed
|
||||
then:
|
||||
properties:
|
||||
memory-region-names: false
|
||||
memory-region: false
|
||||
mediatek,wo-ccif: false
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
@ -49,3 +81,23 @@ examples:
|
||||
interrupts = <GIC_SPI 214 IRQ_TYPE_LEVEL_LOW>;
|
||||
};
|
||||
};
|
||||
|
||||
- |
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
#include <dt-bindings/interrupt-controller/irq.h>
|
||||
soc {
|
||||
#address-cells = <2>;
|
||||
#size-cells = <2>;
|
||||
|
||||
wed@15010000 {
|
||||
compatible = "mediatek,mt7986-wed", "syscon";
|
||||
reg = <0 0x15010000 0 0x1000>;
|
||||
interrupts = <GIC_SPI 205 IRQ_TYPE_LEVEL_HIGH>;
|
||||
|
||||
memory-region = <&wo_emi>, <&wo_ilm>, <&wo_dlm>,
|
||||
<&wo_data>, <&wo_boot>;
|
||||
memory-region-names = "wo-emi", "wo-ilm", "wo-dlm",
|
||||
"wo-data", "wo-boot";
|
||||
mediatek,wo-ccif = <&wo_ccif0>;
|
||||
};
|
||||
};
|
||||
|
@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
title: Freescale INTMUX interrupt multiplexer
|
||||
|
||||
maintainers:
|
||||
- Joakim Zhang <qiangqing.zhang@nxp.com>
|
||||
- Shawn Guo <shawnguo@kernel.org>
|
||||
- NXP Linux Team <linux-imx@nxp.com>
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
|
@ -46,6 +46,10 @@ properties:
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
reset-gpios:
|
||||
maxItems: 1
|
||||
description: GPIO connected to active low reset
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
|
@ -27,7 +27,9 @@ properties:
|
||||
- usbb95,772b # ASIX AX88772B
|
||||
- usbb95,7e2b # ASIX AX88772B
|
||||
|
||||
reg: true
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
local-mac-address: true
|
||||
mac-address: true
|
||||
|
||||
|
@ -1,5 +0,0 @@
|
||||
The following properties are common to the Bluetooth controllers:
|
||||
|
||||
- local-bd-address: array of 6 bytes, specifies the BD address that was
|
||||
uniquely assigned to the Bluetooth device, formatted with least significant
|
||||
byte first (little-endian).
|
@ -0,0 +1,29 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/bluetooth/bluetooth-controller.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Bluetooth Controller Generic Binding
|
||||
|
||||
maintainers:
|
||||
- Marcel Holtmann <marcel@holtmann.org>
|
||||
- Johan Hedberg <johan.hedberg@gmail.com>
|
||||
- Luiz Augusto von Dentz <luiz.dentz@gmail.com>
|
||||
|
||||
properties:
|
||||
$nodename:
|
||||
pattern: "^bluetooth(@.*)?$"
|
||||
|
||||
local-bd-address:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
maxItems: 6
|
||||
description:
|
||||
Specifies the BD address that was uniquely assigned to the Bluetooth
|
||||
device. Formatted with least significant byte first (little-endian), e.g.
|
||||
in order to assign the address 00:11:22:33:44:55 this property must have
|
||||
the value [55 44 33 22 11 00].
|
||||
|
||||
additionalProperties: true
|
||||
|
||||
...
|
@ -0,0 +1,81 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/bluetooth/brcm,bcm4377-bluetooth.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Broadcom BCM4377 family PCIe Bluetooth Chips
|
||||
|
||||
maintainers:
|
||||
- Sven Peter <sven@svenpeter.dev>
|
||||
|
||||
description:
|
||||
This binding describes Broadcom BCM4377 family PCIe-attached bluetooth chips
|
||||
usually found in Apple machines. The Wi-Fi part of the chip is described in
|
||||
bindings/net/wireless/brcm,bcm4329-fmac.yaml.
|
||||
|
||||
allOf:
|
||||
- $ref: bluetooth-controller.yaml#
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- pci14e4,5fa0 # BCM4377
|
||||
- pci14e4,5f69 # BCM4378
|
||||
- pci14e4,5f71 # BCM4387
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
brcm,board-type:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
description: Board type of the Bluetooth chip. This is used to decouple
|
||||
the overall system board from the Bluetooth module and used to construct
|
||||
firmware and calibration data filenames.
|
||||
On Apple platforms, this should be the Apple module-instance codename
|
||||
prefixed by "apple,", e.g. "apple,atlantisb".
|
||||
pattern: '^apple,.*'
|
||||
|
||||
brcm,taurus-cal-blob:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
description: A per-device calibration blob for the Bluetooth radio. This
|
||||
should be filled in by the bootloader from platform configuration
|
||||
data, if necessary, and will be uploaded to the device.
|
||||
This blob is used if the chip stepping of the Bluetooth module does not
|
||||
support beamforming.
|
||||
|
||||
brcm,taurus-bf-cal-blob:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
description: A per-device calibration blob for the Bluetooth radio. This
|
||||
should be filled in by the bootloader from platform configuration
|
||||
data, if necessary, and will be uploaded to the device.
|
||||
This blob is used if the chip stepping of the Bluetooth module supports
|
||||
beamforming.
|
||||
|
||||
local-bd-address: true
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- local-bd-address
|
||||
- brcm,board-type
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
pcie@a0000000 {
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
reg = <0xa0000000 0x1000000>;
|
||||
device_type = "pci";
|
||||
ranges = <0x43000000 0x6 0xa0000000 0xa0000000 0x0 0x20000000>;
|
||||
|
||||
bluetooth@0,1 {
|
||||
compatible = "pci14e4,5f69";
|
||||
reg = <0x100 0x0 0x0 0x0 0x0>;
|
||||
brcm,board-type = "apple,honshu";
|
||||
/* To be filled by the bootloader */
|
||||
local-bd-address = [00 00 00 00 00 00];
|
||||
};
|
||||
};
|
@ -1,7 +1,7 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/qualcomm-bluetooth.yaml#
|
||||
$id: http://devicetree.org/schemas/net/bluetooth/qualcomm-bluetooth.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Qualcomm Bluetooth Chips
|
||||
@ -79,8 +79,7 @@ properties:
|
||||
firmware-name:
|
||||
description: specify the name of nvm firmware to load
|
||||
|
||||
local-bd-address:
|
||||
description: see Documentation/devicetree/bindings/net/bluetooth.txt
|
||||
local-bd-address: true
|
||||
|
||||
|
||||
required:
|
||||
@ -89,6 +88,7 @@ required:
|
||||
additionalProperties: false
|
||||
|
||||
allOf:
|
||||
- $ref: bluetooth-controller.yaml#
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
@ -19,11 +19,14 @@ properties:
|
||||
- brcm,bcm4329-bt
|
||||
- brcm,bcm4330-bt
|
||||
- brcm,bcm4334-bt
|
||||
- brcm,bcm43430a0-bt
|
||||
- brcm,bcm43430a1-bt
|
||||
- brcm,bcm43438-bt
|
||||
- brcm,bcm4345c5
|
||||
- brcm,bcm43540-bt
|
||||
- brcm,bcm4335a0
|
||||
- brcm,bcm4349-bt
|
||||
- cypress,cyw4373a0-bt
|
||||
- infineon,cyw55572-bt
|
||||
|
||||
shutdown-gpios:
|
||||
|
@ -17,6 +17,7 @@ properties:
|
||||
compatible:
|
||||
oneOf:
|
||||
- enum:
|
||||
- fsl,imx93-flexcan
|
||||
- fsl,imx8qm-flexcan
|
||||
- fsl,imx8mp-flexcan
|
||||
- fsl,imx6q-flexcan
|
||||
|
@ -9,9 +9,6 @@ title: Renesas R-Car CAN FD Controller
|
||||
maintainers:
|
||||
- Fabrizio Castro <fabrizio.castro.jz@renesas.com>
|
||||
|
||||
allOf:
|
||||
- $ref: can-controller.yaml#
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
oneOf:
|
||||
@ -33,7 +30,7 @@ properties:
|
||||
|
||||
- items:
|
||||
- enum:
|
||||
- renesas,r9a07g043-canfd # RZ/G2UL
|
||||
- renesas,r9a07g043-canfd # RZ/G2UL and RZ/Five
|
||||
- renesas,r9a07g044-canfd # RZ/G2{L,LC}
|
||||
- renesas,r9a07g054-canfd # RZ/V2L
|
||||
- const: renesas,rzg2l-canfd # RZ/G2L family
|
||||
@ -77,12 +74,13 @@ properties:
|
||||
description: Maximum frequency of the CANFD clock.
|
||||
|
||||
patternProperties:
|
||||
"^channel[01]$":
|
||||
"^channel[0-7]$":
|
||||
type: object
|
||||
description:
|
||||
The controller supports two channels and each is represented as a child
|
||||
node. Each child node supports the "status" property only, which
|
||||
is used to enable/disable the respective channel.
|
||||
The controller supports multiple channels and each is represented as a
|
||||
child node. Each channel can be enabled/disabled individually.
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
required:
|
||||
- compatible
|
||||
@ -98,60 +96,73 @@ required:
|
||||
- channel0
|
||||
- channel1
|
||||
|
||||
if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
enum:
|
||||
- renesas,rzg2l-canfd
|
||||
then:
|
||||
properties:
|
||||
interrupts:
|
||||
items:
|
||||
- description: CAN global error interrupt
|
||||
- description: CAN receive FIFO interrupt
|
||||
- description: CAN0 error interrupt
|
||||
- description: CAN0 transmit interrupt
|
||||
- description: CAN0 transmit/receive FIFO receive completion interrupt
|
||||
- description: CAN1 error interrupt
|
||||
- description: CAN1 transmit interrupt
|
||||
- description: CAN1 transmit/receive FIFO receive completion interrupt
|
||||
allOf:
|
||||
- $ref: can-controller.yaml#
|
||||
|
||||
interrupt-names:
|
||||
items:
|
||||
- const: g_err
|
||||
- const: g_recc
|
||||
- const: ch0_err
|
||||
- const: ch0_rec
|
||||
- const: ch0_trx
|
||||
- const: ch1_err
|
||||
- const: ch1_rec
|
||||
- const: ch1_trx
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
enum:
|
||||
- renesas,rzg2l-canfd
|
||||
then:
|
||||
properties:
|
||||
interrupts:
|
||||
items:
|
||||
- description: CAN global error interrupt
|
||||
- description: CAN receive FIFO interrupt
|
||||
- description: CAN0 error interrupt
|
||||
- description: CAN0 transmit interrupt
|
||||
- description: CAN0 transmit/receive FIFO receive completion interrupt
|
||||
- description: CAN1 error interrupt
|
||||
- description: CAN1 transmit interrupt
|
||||
- description: CAN1 transmit/receive FIFO receive completion interrupt
|
||||
|
||||
resets:
|
||||
maxItems: 2
|
||||
interrupt-names:
|
||||
items:
|
||||
- const: g_err
|
||||
- const: g_recc
|
||||
- const: ch0_err
|
||||
- const: ch0_rec
|
||||
- const: ch0_trx
|
||||
- const: ch1_err
|
||||
- const: ch1_rec
|
||||
- const: ch1_trx
|
||||
|
||||
reset-names:
|
||||
items:
|
||||
- const: rstp_n
|
||||
- const: rstc_n
|
||||
resets:
|
||||
maxItems: 2
|
||||
|
||||
required:
|
||||
- reset-names
|
||||
else:
|
||||
properties:
|
||||
interrupts:
|
||||
items:
|
||||
- description: Channel interrupt
|
||||
- description: Global interrupt
|
||||
reset-names:
|
||||
items:
|
||||
- const: rstp_n
|
||||
- const: rstc_n
|
||||
|
||||
interrupt-names:
|
||||
items:
|
||||
- const: ch_int
|
||||
- const: g_int
|
||||
required:
|
||||
- reset-names
|
||||
else:
|
||||
properties:
|
||||
interrupts:
|
||||
items:
|
||||
- description: Channel interrupt
|
||||
- description: Global interrupt
|
||||
|
||||
resets:
|
||||
maxItems: 1
|
||||
interrupt-names:
|
||||
items:
|
||||
- const: ch_int
|
||||
- const: g_int
|
||||
|
||||
resets:
|
||||
maxItems: 1
|
||||
|
||||
- if:
|
||||
not:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: renesas,r8a779a0-canfd
|
||||
then:
|
||||
patternProperties:
|
||||
"^channel[2-7]$": false
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
|
@ -19,7 +19,8 @@ allOf:
|
||||
|
||||
properties:
|
||||
reg:
|
||||
description: Port number
|
||||
items:
|
||||
- description: Port number
|
||||
|
||||
label:
|
||||
description:
|
||||
|
@ -12,7 +12,7 @@ allOf:
|
||||
maintainers:
|
||||
- Andrew Lunn <andrew@lunn.ch>
|
||||
- Florian Fainelli <f.fainelli@gmail.com>
|
||||
- Vivien Didelot <vivien.didelot@gmail.com>
|
||||
- Vladimir Oltean <olteanv@gmail.com>
|
||||
- Kurt Kanzenbach <kurt@linutronix.de>
|
||||
|
||||
description:
|
||||
|
@ -74,10 +74,10 @@ properties:
|
||||
|
||||
properties:
|
||||
pcs-handle:
|
||||
maxItems: 1
|
||||
description:
|
||||
phandle pointing to a PCS sub-node compatible with
|
||||
renesas,rzn1-miic.yaml#
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
|
@ -108,11 +108,17 @@ properties:
|
||||
$ref: "#/properties/phy-connection-type"
|
||||
|
||||
pcs-handle:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
$ref: /schemas/types.yaml#/definitions/phandle-array
|
||||
items:
|
||||
maxItems: 1
|
||||
description:
|
||||
Specifies a reference to a node representing a PCS PHY device on a MDIO
|
||||
bus to link with an external PHY (phy-handle) if exists.
|
||||
|
||||
pcs-handle-names:
|
||||
description:
|
||||
The name of each PCS in pcs-handle.
|
||||
|
||||
phy-handle:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
description:
|
||||
@ -216,6 +222,9 @@ properties:
|
||||
required:
|
||||
- speed
|
||||
|
||||
dependencies:
|
||||
pcs-handle-names: [pcs-handle]
|
||||
|
||||
allOf:
|
||||
- if:
|
||||
properties:
|
||||
|
@ -7,7 +7,9 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
title: Freescale Fast Ethernet Controller (FEC)
|
||||
|
||||
maintainers:
|
||||
- Joakim Zhang <qiangqing.zhang@nxp.com>
|
||||
- Shawn Guo <shawnguo@kernel.org>
|
||||
- Wei Fang <wei.fang@nxp.com>
|
||||
- NXP Linux Team <linux-imx@nxp.com>
|
||||
|
||||
allOf:
|
||||
- $ref: ethernet-controller.yaml#
|
||||
|
@ -85,9 +85,39 @@ properties:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
description: A reference to the IEEE1588 timer
|
||||
|
||||
phys:
|
||||
description: A reference to the SerDes lane(s)
|
||||
maxItems: 1
|
||||
|
||||
phy-names:
|
||||
items:
|
||||
- const: serdes
|
||||
|
||||
pcsphy-handle:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
description: A reference to the PCS (typically found on the SerDes)
|
||||
$ref: /schemas/types.yaml#/definitions/phandle-array
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
deprecated: true
|
||||
description: See pcs-handle.
|
||||
|
||||
pcs-handle:
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
description: |
|
||||
A reference to the various PCSs (typically found on the SerDes). If
|
||||
pcs-handle-names is absent, and phy-connection-type is "xgmii", then the first
|
||||
reference will be assumed to be for "xfi". Otherwise, if pcs-handle-names is
|
||||
absent, then the first reference will be assumed to be for "sgmii".
|
||||
|
||||
pcs-handle-names:
|
||||
minItems: 1
|
||||
maxItems: 3
|
||||
items:
|
||||
enum:
|
||||
- sgmii
|
||||
- qsgmii
|
||||
- xfi
|
||||
description: The type of each PCS in pcsphy-handle.
|
||||
|
||||
tbi-handle:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
@ -100,6 +130,10 @@ required:
|
||||
- fsl,fman-ports
|
||||
- ptp-timer
|
||||
|
||||
dependencies:
|
||||
pcs-handle-names:
|
||||
- pcs-handle
|
||||
|
||||
allOf:
|
||||
- $ref: ethernet-controller.yaml#
|
||||
- if:
|
||||
@ -110,14 +144,6 @@ allOf:
|
||||
then:
|
||||
required:
|
||||
- tbi-handle
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: fsl,fman-memac
|
||||
then:
|
||||
required:
|
||||
- pcsphy-handle
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
@ -138,8 +164,9 @@ examples:
|
||||
reg = <0xe8000 0x1000>;
|
||||
fsl,fman-ports = <&fman0_rx_0x0c &fman0_tx_0x2c>;
|
||||
ptp-timer = <&ptp_timer0>;
|
||||
pcsphy-handle = <&pcsphy4>;
|
||||
phy-handle = <&sgmii_phy1>;
|
||||
phy-connection-type = "sgmii";
|
||||
pcs-handle = <&pcsphy4>, <&qsgmiib_pcs1>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
phys = <&serdes1 1>;
|
||||
phy-names = "serdes";
|
||||
};
|
||||
...
|
||||
|
@ -31,7 +31,7 @@ properties:
|
||||
phy-mode: true
|
||||
|
||||
pcs-handle:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
maxItems: 1
|
||||
description:
|
||||
A reference to a node representing a PCS PHY device found on
|
||||
the internal MDIO bus.
|
||||
|
@ -320,8 +320,9 @@ For internal PHY device on internal mdio bus, a PHY node should be created.
|
||||
See the definition of the PHY node in booting-without-of.txt for an
|
||||
example of how to define a PHY (Internal PHY has no interrupt line).
|
||||
- For "fsl,fman-mdio" compatible internal mdio bus, the PHY is TBI PHY.
|
||||
- For "fsl,fman-memac-mdio" compatible internal mdio bus, the PHY is PCS PHY,
|
||||
PCS PHY addr must be '0'.
|
||||
- For "fsl,fman-memac-mdio" compatible internal mdio bus, the PHY is PCS PHY.
|
||||
The PCS PHY address should correspond to the value of the appropriate
|
||||
MDEV_PORT.
|
||||
|
||||
EXAMPLE
|
||||
|
||||
|
@ -0,0 +1,62 @@
|
||||
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/marvell,dfx-server.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Marvell Prestera DFX server
|
||||
|
||||
maintainers:
|
||||
- Miquel Raynal <miquel.raynal@bootlin.com>
|
||||
|
||||
select:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: marvell,dfx-server
|
||||
required:
|
||||
- compatible
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
items:
|
||||
- const: marvell,dfx-server
|
||||
- const: simple-bus
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
ranges: true
|
||||
|
||||
'#address-cells':
|
||||
const: 1
|
||||
|
||||
'#size-cells':
|
||||
const: 1
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- ranges
|
||||
|
||||
# The DFX server may expose clocks described as subnodes
|
||||
additionalProperties:
|
||||
type: object
|
||||
|
||||
examples:
|
||||
- |
|
||||
|
||||
#define MBUS_ID(target,attributes) (((target) << 24) | ((attributes) << 16))
|
||||
bus@0 {
|
||||
reg = <0 0>;
|
||||
#address-cells = <2>;
|
||||
#size-cells = <1>;
|
||||
|
||||
dfx-bus@ac000000 {
|
||||
compatible = "marvell,dfx-server", "simple-bus";
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
ranges = <0 MBUS_ID(0x08, 0x00) 0 0x100000>;
|
||||
reg = <MBUS_ID(0x08, 0x00) 0 0x100000>;
|
||||
};
|
||||
};
|
305
Documentation/devicetree/bindings/net/marvell,pp2.yaml
Normal file
305
Documentation/devicetree/bindings/net/marvell,pp2.yaml
Normal file
@ -0,0 +1,305 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/marvell,pp2.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Marvell CN913X / Marvell Armada 375, 7K, 8K Ethernet Controller
|
||||
|
||||
maintainers:
|
||||
- Marcin Wojtas <mw@semihalf.com>
|
||||
- Russell King <linux@armlinux.org>
|
||||
|
||||
description: |
|
||||
Marvell Armada 375 Ethernet Controller (PPv2.1)
|
||||
Marvell Armada 7K/8K Ethernet Controller (PPv2.2)
|
||||
Marvell CN913X Ethernet Controller (PPv2.3)
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- marvell,armada-375-pp2
|
||||
- marvell,armada-7k-pp22
|
||||
|
||||
reg:
|
||||
minItems: 3
|
||||
maxItems: 4
|
||||
|
||||
"#address-cells":
|
||||
const: 1
|
||||
|
||||
"#size-cells":
|
||||
const: 0
|
||||
|
||||
clocks:
|
||||
minItems: 2
|
||||
items:
|
||||
- description: main controller clock
|
||||
- description: GOP clock
|
||||
- description: MG clock
|
||||
- description: MG Core clock
|
||||
- description: AXI clock
|
||||
|
||||
clock-names:
|
||||
minItems: 2
|
||||
items:
|
||||
- const: pp_clk
|
||||
- const: gop_clk
|
||||
- const: mg_clk
|
||||
- const: mg_core_clk
|
||||
- const: axi_clk
|
||||
|
||||
dma-coherent: true
|
||||
|
||||
marvell,system-controller:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
description: a phandle to the system controller.
|
||||
|
||||
patternProperties:
|
||||
'^(ethernet-)?port@[0-2]$':
|
||||
type: object
|
||||
description: subnode for each ethernet port.
|
||||
$ref: ethernet-controller.yaml#
|
||||
unevaluatedProperties: false
|
||||
|
||||
properties:
|
||||
reg:
|
||||
description: ID of the port from the MAC point of view.
|
||||
maximum: 2
|
||||
|
||||
interrupts:
|
||||
minItems: 1
|
||||
maxItems: 10
|
||||
description: interrupt(s) for the port
|
||||
|
||||
interrupt-names:
|
||||
minItems: 1
|
||||
items:
|
||||
- const: hif0
|
||||
- const: hif1
|
||||
- const: hif2
|
||||
- const: hif3
|
||||
- const: hif4
|
||||
- const: hif5
|
||||
- const: hif6
|
||||
- const: hif7
|
||||
- const: hif8
|
||||
- const: link
|
||||
|
||||
description: >
|
||||
if more than a single interrupt for is given, must be the
|
||||
name associated to the interrupts listed. Valid names are:
|
||||
"hifX", with X in [0..8], and "link". The names "tx-cpu0",
|
||||
"tx-cpu1", "tx-cpu2", "tx-cpu3" and "rx-shared" are supported
|
||||
for backward compatibility but shouldn't be used for new
|
||||
additions.
|
||||
|
||||
phys:
|
||||
minItems: 1
|
||||
maxItems: 2
|
||||
description: >
|
||||
Generic PHY, providing SerDes connectivity. For most modes,
|
||||
one lane is sufficient, but some (e.g. RXAUI) may require two.
|
||||
|
||||
phy-mode:
|
||||
enum:
|
||||
- gmii
|
||||
- sgmii
|
||||
- rgmii-id
|
||||
- 1000base-x
|
||||
- 2500base-x
|
||||
- 5gbase-r
|
||||
- rxaui
|
||||
- 10gbase-r
|
||||
|
||||
port-id:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
deprecated: true
|
||||
description: >
|
||||
ID of the port from the MAC point of view.
|
||||
Legacy binding for backward compatibility.
|
||||
|
||||
marvell,loopback:
|
||||
$ref: /schemas/types.yaml#/definitions/flag
|
||||
description: port is loopback mode.
|
||||
|
||||
gop-port-id:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: >
|
||||
only for marvell,armada-7k-pp22, ID of the port from the
|
||||
GOP (Group Of Ports) point of view. This ID is used to index the
|
||||
per-port registers in the second register area.
|
||||
|
||||
required:
|
||||
- reg
|
||||
- interrupts
|
||||
- phy-mode
|
||||
- port-id
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- clocks
|
||||
- clock-names
|
||||
|
||||
allOf:
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
const: marvell,armada-7k-pp22
|
||||
then:
|
||||
properties:
|
||||
reg:
|
||||
items:
|
||||
- description: Packet Processor registers
|
||||
- description: Networking interfaces registers
|
||||
- description: CM3 address space used for TX Flow Control
|
||||
|
||||
clocks:
|
||||
minItems: 5
|
||||
|
||||
clock-names:
|
||||
minItems: 5
|
||||
|
||||
patternProperties:
|
||||
'^(ethernet-)?port@[0-2]$':
|
||||
required:
|
||||
- gop-port-id
|
||||
|
||||
required:
|
||||
- marvell,system-controller
|
||||
else:
|
||||
properties:
|
||||
reg:
|
||||
items:
|
||||
- description: Packet Processor registers
|
||||
- description: LMS registers
|
||||
- description: Register area per eth0
|
||||
- description: Register area per eth1
|
||||
|
||||
clocks:
|
||||
maxItems: 2
|
||||
|
||||
clock-names:
|
||||
maxItems: 2
|
||||
|
||||
patternProperties:
|
||||
'^(ethernet-)?port@[0-1]$':
|
||||
properties:
|
||||
reg:
|
||||
maximum: 1
|
||||
|
||||
gop-port-id: false
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
// For Armada 375 variant
|
||||
#include <dt-bindings/interrupt-controller/mvebu-icu.h>
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
|
||||
ethernet@f0000 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
compatible = "marvell,armada-375-pp2";
|
||||
reg = <0xf0000 0xa000>,
|
||||
<0xc0000 0x3060>,
|
||||
<0xc4000 0x100>,
|
||||
<0xc5000 0x100>;
|
||||
clocks = <&gateclk 3>, <&gateclk 19>;
|
||||
clock-names = "pp_clk", "gop_clk";
|
||||
|
||||
ethernet-port@0 {
|
||||
interrupts = <GIC_SPI 37 IRQ_TYPE_LEVEL_HIGH>;
|
||||
reg = <0>;
|
||||
port-id = <0>; /* For backward compatibility. */
|
||||
phy = <&phy0>;
|
||||
phy-mode = "rgmii-id";
|
||||
};
|
||||
|
||||
ethernet-port@1 {
|
||||
interrupts = <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>;
|
||||
reg = <1>;
|
||||
port-id = <1>; /* For backward compatibility. */
|
||||
phy = <&phy3>;
|
||||
phy-mode = "gmii";
|
||||
};
|
||||
};
|
||||
|
||||
- |
|
||||
// For Armada 7k/8k and Cn913x variants
|
||||
#include <dt-bindings/interrupt-controller/mvebu-icu.h>
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
|
||||
ethernet@0 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
compatible = "marvell,armada-7k-pp22";
|
||||
reg = <0x0 0x100000>, <0x129000 0xb000>, <0x220000 0x800>;
|
||||
clocks = <&cp0_clk 1 3>, <&cp0_clk 1 9>,
|
||||
<&cp0_clk 1 5>, <&cp0_clk 1 6>, <&cp0_clk 1 18>;
|
||||
clock-names = "pp_clk", "gop_clk", "mg_clk", "mg_core_clk", "axi_clk";
|
||||
marvell,system-controller = <&cp0_syscon0>;
|
||||
|
||||
ethernet-port@0 {
|
||||
interrupts = <ICU_GRP_NSR 39 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 43 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 47 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 51 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 59 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 63 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 67 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 71 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 129 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||
"hif5", "hif6", "hif7", "hif8", "link";
|
||||
phy-mode = "10gbase-r";
|
||||
phys = <&cp0_comphy4 0>;
|
||||
reg = <0>;
|
||||
port-id = <0>; /* For backward compatibility. */
|
||||
gop-port-id = <0>;
|
||||
};
|
||||
|
||||
ethernet-port@1 {
|
||||
interrupts = <ICU_GRP_NSR 40 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 44 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 48 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 52 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 60 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 64 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 68 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 72 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 128 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||
"hif5", "hif6", "hif7", "hif8", "link";
|
||||
phy-mode = "rgmii-id";
|
||||
reg = <1>;
|
||||
port-id = <1>; /* For backward compatibility. */
|
||||
gop-port-id = <2>;
|
||||
};
|
||||
|
||||
ethernet-port@2 {
|
||||
interrupts = <ICU_GRP_NSR 41 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 45 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 49 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 53 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 61 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 65 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 69 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 73 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 127 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||
"hif5", "hif6", "hif7", "hif8", "link";
|
||||
phy-mode = "2500base-x";
|
||||
managed = "in-band-status";
|
||||
phys = <&cp0_comphy5 2>;
|
||||
sfp = <&sfp_eth3>;
|
||||
reg = <2>;
|
||||
port-id = <2>; /* For backward compatibility. */
|
||||
gop-port-id = <3>;
|
||||
};
|
||||
};
|
@ -1,81 +0,0 @@
|
||||
Marvell Prestera Switch Chip bindings
|
||||
-------------------------------------
|
||||
|
||||
Required properties:
|
||||
- compatible: must be "marvell,prestera" and one of the following
|
||||
"marvell,prestera-98dx3236",
|
||||
"marvell,prestera-98dx3336",
|
||||
"marvell,prestera-98dx4251",
|
||||
- reg: address and length of the register set for the device.
|
||||
- interrupts: interrupt for the device
|
||||
|
||||
Optional properties:
|
||||
- dfx: phandle reference to the "DFX Server" node
|
||||
|
||||
Example:
|
||||
|
||||
switch {
|
||||
compatible = "simple-bus";
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
ranges = <0 MBUS_ID(0x03, 0x00) 0 0x100000>;
|
||||
|
||||
packet-processor@0 {
|
||||
compatible = "marvell,prestera-98dx3236", "marvell,prestera";
|
||||
reg = <0 0x4000000>;
|
||||
interrupts = <33>, <34>, <35>;
|
||||
dfx = <&dfx>;
|
||||
};
|
||||
};
|
||||
|
||||
DFX Server bindings
|
||||
-------------------
|
||||
|
||||
Required properties:
|
||||
- compatible: must be "marvell,dfx-server", "simple-bus"
|
||||
- ranges: describes the address mapping of a memory-mapped bus.
|
||||
- reg: address and length of the register set for the device.
|
||||
|
||||
Example:
|
||||
|
||||
dfx-server {
|
||||
compatible = "marvell,dfx-server", "simple-bus";
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
ranges = <0 MBUS_ID(0x08, 0x00) 0 0x100000>;
|
||||
reg = <MBUS_ID(0x08, 0x00) 0 0x100000>;
|
||||
};
|
||||
|
||||
Marvell Prestera SwitchDev bindings
|
||||
-----------------------------------
|
||||
Optional properties:
|
||||
- compatible: must be "marvell,prestera"
|
||||
- base-mac-provider: describes handle to node which provides base mac address,
|
||||
might be a static base mac address or nvme cell provider.
|
||||
|
||||
Example:
|
||||
|
||||
eeprom_mac_addr: eeprom-mac-addr {
|
||||
compatible = "eeprom,mac-addr-cell";
|
||||
status = "okay";
|
||||
|
||||
nvmem = <&eeprom_at24>;
|
||||
};
|
||||
|
||||
prestera {
|
||||
compatible = "marvell,prestera";
|
||||
status = "okay";
|
||||
|
||||
base-mac-provider = <&eeprom_mac_addr>;
|
||||
};
|
||||
|
||||
The current implementation of Prestera Switchdev PCI interface driver requires
|
||||
that BAR2 is assigned to 0xf6000000 as base address from the PCI IO range:
|
||||
|
||||
&cp0_pcie0 {
|
||||
ranges = <0x81000000 0x0 0xfb000000 0x0 0xfb000000 0x0 0xf0000
|
||||
0x82000000 0x0 0xf6000000 0x0 0xf6000000 0x0 0x2000000
|
||||
0x82000000 0x0 0xf9000000 0x0 0xf9000000 0x0 0x100000>;
|
||||
phys = <&cp0_comphy0 0>;
|
||||
status = "okay";
|
||||
};
|
91
Documentation/devicetree/bindings/net/marvell,prestera.yaml
Normal file
91
Documentation/devicetree/bindings/net/marvell,prestera.yaml
Normal file
@ -0,0 +1,91 @@
|
||||
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/marvell,prestera.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Marvell Prestera switch family
|
||||
|
||||
maintainers:
|
||||
- Miquel Raynal <miquel.raynal@bootlin.com>
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
oneOf:
|
||||
- items:
|
||||
- enum:
|
||||
- marvell,prestera-98dx3236
|
||||
- marvell,prestera-98dx3336
|
||||
- marvell,prestera-98dx4251
|
||||
- const: marvell,prestera
|
||||
- enum:
|
||||
- pci11ab,c804
|
||||
- pci11ab,c80c
|
||||
- pci11ab,cc1e
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
interrupts:
|
||||
maxItems: 3
|
||||
|
||||
dfx:
|
||||
description: Reference to the DFX Server bus node.
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
|
||||
nvmem-cells: true
|
||||
|
||||
nvmem-cell-names: true
|
||||
|
||||
if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
const: marvell,prestera
|
||||
|
||||
# Memory mapped AlleyCat3 family
|
||||
then:
|
||||
properties:
|
||||
nvmem-cells: false
|
||||
nvmem-cell-names: false
|
||||
required:
|
||||
- interrupts
|
||||
|
||||
# PCI Aldrin family
|
||||
else:
|
||||
properties:
|
||||
interrupts: false
|
||||
dfx: false
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
|
||||
# Ports can also be described
|
||||
additionalProperties:
|
||||
type: object
|
||||
|
||||
examples:
|
||||
- |
|
||||
packet-processor@0 {
|
||||
compatible = "marvell,prestera-98dx3236", "marvell,prestera";
|
||||
reg = <0 0x4000000>;
|
||||
interrupts = <33>, <34>, <35>;
|
||||
dfx = <&dfx>;
|
||||
};
|
||||
|
||||
- |
|
||||
pcie@0 {
|
||||
#address-cells = <3>;
|
||||
#size-cells = <2>;
|
||||
ranges = <0x0 0x0 0x0 0x0 0x0 0x0>;
|
||||
reg = <0x0 0x0 0x0 0x0 0x0 0x0>;
|
||||
device_type = "pci";
|
||||
|
||||
switch@0,0 {
|
||||
reg = <0x0 0x0 0x0 0x0 0x0>;
|
||||
compatible = "pci11ab,c80c";
|
||||
nvmem-cells = <&mac_address 0>;
|
||||
nvmem-cell-names = "mac-address";
|
||||
};
|
||||
};
|
@ -1,141 +0,0 @@
|
||||
* Marvell Armada 375 Ethernet Controller (PPv2.1)
|
||||
Marvell Armada 7K/8K Ethernet Controller (PPv2.2)
|
||||
Marvell CN913X Ethernet Controller (PPv2.3)
|
||||
|
||||
Required properties:
|
||||
|
||||
- compatible: should be one of:
|
||||
"marvell,armada-375-pp2"
|
||||
"marvell,armada-7k-pp2"
|
||||
- reg: addresses and length of the register sets for the device.
|
||||
For "marvell,armada-375-pp2", must contain the following register
|
||||
sets:
|
||||
- common controller registers
|
||||
- LMS registers
|
||||
- one register area per Ethernet port
|
||||
For "marvell,armada-7k-pp2" used by 7K/8K and CN913X, must contain the following register
|
||||
sets:
|
||||
- packet processor registers
|
||||
- networking interfaces registers
|
||||
- CM3 address space used for TX Flow Control
|
||||
|
||||
- clocks: pointers to the reference clocks for this device, consequently:
|
||||
- main controller clock (for both armada-375-pp2 and armada-7k-pp2)
|
||||
- GOP clock (for both armada-375-pp2 and armada-7k-pp2)
|
||||
- MG clock (only for armada-7k-pp2)
|
||||
- MG Core clock (only for armada-7k-pp2)
|
||||
- AXI clock (only for armada-7k-pp2)
|
||||
- clock-names: names of used clocks, must be "pp_clk", "gop_clk", "mg_clk",
|
||||
"mg_core_clk" and "axi_clk" (the 3 latter only for armada-7k-pp2).
|
||||
|
||||
The ethernet ports are represented by subnodes. At least one port is
|
||||
required.
|
||||
|
||||
Required properties (port):
|
||||
|
||||
- interrupts: interrupt(s) for the port
|
||||
- port-id: ID of the port from the MAC point of view
|
||||
- gop-port-id: only for marvell,armada-7k-pp2, ID of the port from the
|
||||
GOP (Group Of Ports) point of view. This ID is used to index the
|
||||
per-port registers in the second register area.
|
||||
- phy-mode: See ethernet.txt file in the same directory
|
||||
|
||||
Optional properties (port):
|
||||
|
||||
- marvell,loopback: port is loopback mode
|
||||
- phy: a phandle to a phy node defining the PHY address (as the reg
|
||||
property, a single integer).
|
||||
- interrupt-names: if more than a single interrupt for is given, must be the
|
||||
name associated to the interrupts listed. Valid names are:
|
||||
"hifX", with X in [0..8], and "link". The names "tx-cpu0",
|
||||
"tx-cpu1", "tx-cpu2", "tx-cpu3" and "rx-shared" are supported
|
||||
for backward compatibility but shouldn't be used for new
|
||||
additions.
|
||||
- marvell,system-controller: a phandle to the system controller.
|
||||
|
||||
Example for marvell,armada-375-pp2:
|
||||
|
||||
ethernet@f0000 {
|
||||
compatible = "marvell,armada-375-pp2";
|
||||
reg = <0xf0000 0xa000>,
|
||||
<0xc0000 0x3060>,
|
||||
<0xc4000 0x100>,
|
||||
<0xc5000 0x100>;
|
||||
clocks = <&gateclk 3>, <&gateclk 19>;
|
||||
clock-names = "pp_clk", "gop_clk";
|
||||
|
||||
eth0: eth0@c4000 {
|
||||
interrupts = <GIC_SPI 37 IRQ_TYPE_LEVEL_HIGH>;
|
||||
port-id = <0>;
|
||||
phy = <&phy0>;
|
||||
phy-mode = "gmii";
|
||||
};
|
||||
|
||||
eth1: eth1@c5000 {
|
||||
interrupts = <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>;
|
||||
port-id = <1>;
|
||||
phy = <&phy3>;
|
||||
phy-mode = "gmii";
|
||||
};
|
||||
};
|
||||
|
||||
Example for marvell,armada-7k-pp2:
|
||||
|
||||
cpm_ethernet: ethernet@0 {
|
||||
compatible = "marvell,armada-7k-pp22";
|
||||
reg = <0x0 0x100000>, <0x129000 0xb000>, <0x220000 0x800>;
|
||||
clocks = <&cpm_syscon0 1 3>, <&cpm_syscon0 1 9>,
|
||||
<&cpm_syscon0 1 5>, <&cpm_syscon0 1 6>, <&cpm_syscon0 1 18>;
|
||||
clock-names = "pp_clk", "gop_clk", "mg_clk", "mg_core_clk", "axi_clk";
|
||||
|
||||
eth0: eth0 {
|
||||
interrupts = <ICU_GRP_NSR 39 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 43 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 47 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 51 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 59 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 63 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 67 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 71 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 129 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||
"hif5", "hif6", "hif7", "hif8", "link";
|
||||
port-id = <0>;
|
||||
gop-port-id = <0>;
|
||||
};
|
||||
|
||||
eth1: eth1 {
|
||||
interrupts = <ICU_GRP_NSR 40 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 44 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 48 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 52 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 60 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 64 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 68 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 72 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 128 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||
"hif5", "hif6", "hif7", "hif8", "link";
|
||||
port-id = <1>;
|
||||
gop-port-id = <2>;
|
||||
};
|
||||
|
||||
eth2: eth2 {
|
||||
interrupts = <ICU_GRP_NSR 41 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 45 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 49 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 53 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 61 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 65 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 69 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 73 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<ICU_GRP_NSR 127 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||
"hif5", "hif6", "hif7", "hif8", "link";
|
||||
port-id = <2>;
|
||||
gop-port-id = <3>;
|
||||
};
|
||||
};
|
@ -39,7 +39,9 @@ properties:
|
||||
- usb424,9e08 # SMSC LAN89530 USB Ethernet Device
|
||||
- usb424,ec00 # SMSC9512/9514 USB Hub & Ethernet Device
|
||||
|
||||
reg: true
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
local-mac-address: true
|
||||
mac-address: true
|
||||
|
||||
|
@ -14,7 +14,9 @@ properties:
|
||||
oneOf:
|
||||
- const: nxp,nxp-nci-i2c
|
||||
- items:
|
||||
- const: nxp,pn547
|
||||
- enum:
|
||||
- nxp,nq310
|
||||
- nxp,pn547
|
||||
- const: nxp,nxp-nci-i2c
|
||||
|
||||
enable-gpios:
|
||||
|
@ -7,7 +7,9 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
title: NXP i.MX8 DWMAC glue layer
|
||||
|
||||
maintainers:
|
||||
- Joakim Zhang <qiangqing.zhang@nxp.com>
|
||||
- Clark Wang <xiaoning.wang@nxp.com>
|
||||
- Shawn Guo <shawnguo@kernel.org>
|
||||
- NXP Linux Team <linux-imx@nxp.com>
|
||||
|
||||
# We need a select here so we don't match all nodes with 'snps,dwmac'
|
||||
select:
|
||||
|
40
Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml
Normal file
40
Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml
Normal file
@ -0,0 +1,40 @@
|
||||
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/pcs/fsl,lynx-pcs.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: NXP Lynx PCS
|
||||
|
||||
maintainers:
|
||||
- Ioana Ciornei <ioana.ciornei@nxp.com>
|
||||
|
||||
description: |
|
||||
NXP Lynx 10G and 28G SerDes have Ethernet PCS devices which can be used as
|
||||
protocol controllers. They are accessible over the Ethernet interface's MDIO
|
||||
bus.
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
const: fsl,lynx-pcs
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
mdio-bus {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
qsgmii_pcs1: ethernet-pcs@1 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <1>;
|
||||
};
|
||||
};
|
@ -123,7 +123,6 @@ examples:
|
||||
|
||||
switch_port0: port@0 {
|
||||
reg = <0x0>;
|
||||
label = "cpu";
|
||||
ethernet = <ð1>;
|
||||
|
||||
phy-mode = "gmii";
|
||||
|
@ -49,6 +49,7 @@ properties:
|
||||
- qcom,sc7280-ipa
|
||||
- qcom,sdm845-ipa
|
||||
- qcom,sdx55-ipa
|
||||
- qcom,sm6350-ipa
|
||||
- qcom,sm8350-ipa
|
||||
|
||||
reg:
|
||||
@ -124,19 +125,31 @@ properties:
|
||||
- const: ipa-clock-enabled-valid
|
||||
- const: ipa-clock-enabled
|
||||
|
||||
qcom,gsi-loader:
|
||||
enum:
|
||||
- self
|
||||
- modem
|
||||
- skip
|
||||
description:
|
||||
Indicates how GSI firmware should be loaded. If the AP loads
|
||||
and validates GSI firmware, this property has value "self".
|
||||
If the modem does this, this property has value "modem".
|
||||
Otherwise, "skip" means GSI firmware loading is not required.
|
||||
|
||||
modem-init:
|
||||
deprecated: true
|
||||
type: boolean
|
||||
description:
|
||||
If present, it indicates that the modem is responsible for
|
||||
performing early IPA initialization, including loading and
|
||||
validating firwmare used by the GSI.
|
||||
This is the older (deprecated) way of indicating how GSI firmware
|
||||
should be loaded. If present, the modem loads GSI firmware; if
|
||||
absent, the AP loads GSI firmware.
|
||||
|
||||
memory-region:
|
||||
maxItems: 1
|
||||
description:
|
||||
If present, a phandle for a reserved memory area that holds
|
||||
the firmware passed to Trust Zone for authentication. Required
|
||||
when Trust Zone (not the modem) performs early initialization.
|
||||
when the AP (not the modem) performs early initialization.
|
||||
|
||||
firmware-name:
|
||||
$ref: /schemas/types.yaml#/definitions/string
|
||||
@ -155,22 +168,36 @@ required:
|
||||
- interconnects
|
||||
- qcom,smem-states
|
||||
|
||||
# Either modem-init is present, or memory-region must be present.
|
||||
oneOf:
|
||||
- required:
|
||||
- modem-init
|
||||
- required:
|
||||
- memory-region
|
||||
allOf:
|
||||
# If qcom,gsi-loader is present, modem-init must not be present
|
||||
- if:
|
||||
required:
|
||||
- qcom,gsi-loader
|
||||
then:
|
||||
properties:
|
||||
modem-init: false
|
||||
|
||||
# If memory-region is present, firmware-name may optionally be present.
|
||||
# But if modem-init is present, firmware-name must not be present.
|
||||
if:
|
||||
required:
|
||||
- modem-init
|
||||
then:
|
||||
not:
|
||||
required:
|
||||
- firmware-name
|
||||
# If qcom,gsi-loader is "self", the AP loads GSI firmware, and
|
||||
# memory-region must be specified
|
||||
if:
|
||||
properties:
|
||||
qcom,gsi-loader:
|
||||
contains:
|
||||
const: self
|
||||
then:
|
||||
required:
|
||||
- memory-region
|
||||
else:
|
||||
# If qcom,gsi-loader is not present, we use deprecated behavior.
|
||||
# If modem-init is not present, the AP loads GSI firmware, and
|
||||
# memory-region must be specified.
|
||||
if:
|
||||
not:
|
||||
required:
|
||||
- modem-init
|
||||
then:
|
||||
required:
|
||||
- memory-region
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
@ -201,14 +228,17 @@ examples:
|
||||
};
|
||||
|
||||
ipa@1e40000 {
|
||||
compatible = "qcom,sdm845-ipa";
|
||||
compatible = "qcom,sc7180-ipa";
|
||||
|
||||
modem-init;
|
||||
qcom,gsi-loader = "self";
|
||||
memory-region = <&ipa_fw_mem>;
|
||||
firmware-name = "qcom/sc7180-trogdor/modem/modem.mdt";
|
||||
|
||||
iommus = <&apps_smmu 0x720 0x3>;
|
||||
iommus = <&apps_smmu 0x440 0x0>,
|
||||
<&apps_smmu 0x442 0x0>;
|
||||
reg = <0x1e40000 0x7000>,
|
||||
<0x1e47000 0x2000>,
|
||||
<0x1e04000 0x2c000>;
|
||||
<0x1e47000 0x2000>,
|
||||
<0x1e04000 0x2c000>;
|
||||
reg-names = "ipa-reg",
|
||||
"ipa-shared",
|
||||
"gsi";
|
||||
@ -226,9 +256,9 @@ examples:
|
||||
clock-names = "core";
|
||||
|
||||
interconnects =
|
||||
<&rsc_hlos MASTER_IPA &rsc_hlos SLAVE_EBI1>,
|
||||
<&rsc_hlos MASTER_IPA &rsc_hlos SLAVE_IMEM>,
|
||||
<&rsc_hlos MASTER_APPSS_PROC &rsc_hlos SLAVE_IPA_CFG>;
|
||||
<&aggre2_noc MASTER_IPA 0 &mc_virt SLAVE_EBI1 0>,
|
||||
<&aggre2_noc MASTER_IPA 0 &system_noc SLAVE_IMEM 0>,
|
||||
<&gem_noc MASTER_APPSS_PROC 0 &config_noc SLAVE_IPA_CFG 0>;
|
||||
interconnect-names = "memory",
|
||||
"imem",
|
||||
"config";
|
||||
|
@ -9,14 +9,18 @@ title: Qualcomm IPQ40xx MDIO Controller
|
||||
maintainers:
|
||||
- Robert Marko <robert.marko@sartura.hr>
|
||||
|
||||
allOf:
|
||||
- $ref: "mdio.yaml#"
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
enum:
|
||||
- qcom,ipq4019-mdio
|
||||
- qcom,ipq5018-mdio
|
||||
oneOf:
|
||||
- enum:
|
||||
- qcom,ipq4019-mdio
|
||||
- qcom,ipq5018-mdio
|
||||
|
||||
- items:
|
||||
- enum:
|
||||
- qcom,ipq6018-mdio
|
||||
- qcom,ipq8074-mdio
|
||||
- const: qcom,ipq4019-mdio
|
||||
|
||||
"#address-cells":
|
||||
const: 1
|
||||
@ -33,10 +37,12 @@ properties:
|
||||
address range is only required by the platform IPQ50xx.
|
||||
|
||||
clocks:
|
||||
maxItems: 1
|
||||
description: |
|
||||
MDIO clock source frequency fixed to 100MHZ, this clock should be specified
|
||||
by the platform IPQ807x, IPQ60xx and IPQ50xx.
|
||||
items:
|
||||
- description: MDIO clock source frequency fixed to 100MHZ
|
||||
|
||||
clock-names:
|
||||
items:
|
||||
- const: gcc_mdio_ahb_clk
|
||||
|
||||
required:
|
||||
- compatible
|
||||
@ -44,6 +50,26 @@ required:
|
||||
- "#address-cells"
|
||||
- "#size-cells"
|
||||
|
||||
allOf:
|
||||
- $ref: "mdio.yaml#"
|
||||
|
||||
- if:
|
||||
properties:
|
||||
compatible:
|
||||
contains:
|
||||
enum:
|
||||
- qcom,ipq5018-mdio
|
||||
- qcom,ipq6018-mdio
|
||||
- qcom,ipq8074-mdio
|
||||
then:
|
||||
required:
|
||||
- clocks
|
||||
- clock-names
|
||||
else:
|
||||
properties:
|
||||
clocks: false
|
||||
clock-names: false
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
|
@ -20,6 +20,7 @@ properties:
|
||||
enum:
|
||||
- realtek,rtl8723bs-bt
|
||||
- realtek,rtl8723cs-bt
|
||||
- realtek,rtl8723ds-bt
|
||||
- realtek,rtl8822cs-bt
|
||||
|
||||
device-wake-gpios:
|
||||
|
@ -0,0 +1,262 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/renesas,r8a779f0-ether-switch.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Renesas Ethernet Switch
|
||||
|
||||
maintainers:
|
||||
- Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
const: renesas,r8a779f0-ether-switch
|
||||
|
||||
reg:
|
||||
maxItems: 2
|
||||
|
||||
reg-names:
|
||||
items:
|
||||
- const: base
|
||||
- const: secure_base
|
||||
|
||||
interrupts:
|
||||
maxItems: 47
|
||||
|
||||
interrupt-names:
|
||||
items:
|
||||
- const: mfwd_error
|
||||
- const: race_error
|
||||
- const: coma_error
|
||||
- const: gwca0_error
|
||||
- const: gwca1_error
|
||||
- const: etha0_error
|
||||
- const: etha1_error
|
||||
- const: etha2_error
|
||||
- const: gptp0_status
|
||||
- const: gptp1_status
|
||||
- const: mfwd_status
|
||||
- const: race_status
|
||||
- const: coma_status
|
||||
- const: gwca0_status
|
||||
- const: gwca1_status
|
||||
- const: etha0_status
|
||||
- const: etha1_status
|
||||
- const: etha2_status
|
||||
- const: rmac0_status
|
||||
- const: rmac1_status
|
||||
- const: rmac2_status
|
||||
- const: gwca0_rxtx0
|
||||
- const: gwca0_rxtx1
|
||||
- const: gwca0_rxtx2
|
||||
- const: gwca0_rxtx3
|
||||
- const: gwca0_rxtx4
|
||||
- const: gwca0_rxtx5
|
||||
- const: gwca0_rxtx6
|
||||
- const: gwca0_rxtx7
|
||||
- const: gwca1_rxtx0
|
||||
- const: gwca1_rxtx1
|
||||
- const: gwca1_rxtx2
|
||||
- const: gwca1_rxtx3
|
||||
- const: gwca1_rxtx4
|
||||
- const: gwca1_rxtx5
|
||||
- const: gwca1_rxtx6
|
||||
- const: gwca1_rxtx7
|
||||
- const: gwca0_rxts0
|
||||
- const: gwca0_rxts1
|
||||
- const: gwca1_rxts0
|
||||
- const: gwca1_rxts1
|
||||
- const: rmac0_mdio
|
||||
- const: rmac1_mdio
|
||||
- const: rmac2_mdio
|
||||
- const: rmac0_phy
|
||||
- const: rmac1_phy
|
||||
- const: rmac2_phy
|
||||
|
||||
clocks:
|
||||
maxItems: 1
|
||||
|
||||
resets:
|
||||
maxItems: 1
|
||||
|
||||
iommus:
|
||||
maxItems: 16
|
||||
|
||||
power-domains:
|
||||
maxItems: 1
|
||||
|
||||
ethernet-ports:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
|
||||
properties:
|
||||
'#address-cells':
|
||||
description: Port number of ETHA (TSNA).
|
||||
const: 1
|
||||
|
||||
'#size-cells':
|
||||
const: 0
|
||||
|
||||
patternProperties:
|
||||
"^port@[0-9a-f]+$":
|
||||
type: object
|
||||
$ref: /schemas/net/ethernet-controller.yaml#
|
||||
unevaluatedProperties: false
|
||||
|
||||
properties:
|
||||
reg:
|
||||
maxItems: 1
|
||||
description:
|
||||
Port number of ETHA (TSNA).
|
||||
|
||||
phys:
|
||||
maxItems: 1
|
||||
description:
|
||||
Phandle of an Ethernet SERDES.
|
||||
|
||||
mdio:
|
||||
$ref: /schemas/net/mdio.yaml#
|
||||
unevaluatedProperties: false
|
||||
|
||||
required:
|
||||
- reg
|
||||
- phy-handle
|
||||
- phy-mode
|
||||
- phys
|
||||
- mdio
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- reg-names
|
||||
- interrupts
|
||||
- interrupt-names
|
||||
- clocks
|
||||
- resets
|
||||
- power-domains
|
||||
- ethernet-ports
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/clock/r8a779f0-cpg-mssr.h>
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
#include <dt-bindings/power/r8a779f0-sysc.h>
|
||||
|
||||
ethernet@e6880000 {
|
||||
compatible = "renesas,r8a779f0-ether-switch";
|
||||
reg = <0xe6880000 0x20000>, <0xe68c0000 0x20000>;
|
||||
reg-names = "base", "secure_base";
|
||||
interrupts = <GIC_SPI 256 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 257 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 258 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 259 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 260 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 261 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 262 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 263 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 265 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 266 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 267 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 268 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 269 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 270 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 271 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 272 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 273 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 274 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 276 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 277 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 278 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 280 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 281 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 282 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 283 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 284 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 285 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 286 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 287 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 288 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 289 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 290 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 291 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 292 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 293 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 294 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 295 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 296 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 297 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 298 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 299 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 300 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 301 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 302 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 304 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 305 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<GIC_SPI 306 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "mfwd_error", "race_error",
|
||||
"coma_error", "gwca0_error",
|
||||
"gwca1_error", "etha0_error",
|
||||
"etha1_error", "etha2_error",
|
||||
"gptp0_status", "gptp1_status",
|
||||
"mfwd_status", "race_status",
|
||||
"coma_status", "gwca0_status",
|
||||
"gwca1_status", "etha0_status",
|
||||
"etha1_status", "etha2_status",
|
||||
"rmac0_status", "rmac1_status",
|
||||
"rmac2_status",
|
||||
"gwca0_rxtx0", "gwca0_rxtx1",
|
||||
"gwca0_rxtx2", "gwca0_rxtx3",
|
||||
"gwca0_rxtx4", "gwca0_rxtx5",
|
||||
"gwca0_rxtx6", "gwca0_rxtx7",
|
||||
"gwca1_rxtx0", "gwca1_rxtx1",
|
||||
"gwca1_rxtx2", "gwca1_rxtx3",
|
||||
"gwca1_rxtx4", "gwca1_rxtx5",
|
||||
"gwca1_rxtx6", "gwca1_rxtx7",
|
||||
"gwca0_rxts0", "gwca0_rxts1",
|
||||
"gwca1_rxts0", "gwca1_rxts1",
|
||||
"rmac0_mdio", "rmac1_mdio",
|
||||
"rmac2_mdio",
|
||||
"rmac0_phy", "rmac1_phy",
|
||||
"rmac2_phy";
|
||||
clocks = <&cpg CPG_MOD 1505>;
|
||||
power-domains = <&sysc R8A779F0_PD_ALWAYS_ON>;
|
||||
resets = <&cpg 1505>;
|
||||
|
||||
ethernet-ports {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
port@0 {
|
||||
reg = <0>;
|
||||
phy-handle = <ð_phy0>;
|
||||
phy-mode = "sgmii";
|
||||
phys = <ð_serdes 0>;
|
||||
mdio {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
};
|
||||
};
|
||||
port@1 {
|
||||
reg = <1>;
|
||||
phy-handle = <ð_phy1>;
|
||||
phy-mode = "sgmii";
|
||||
phys = <ð_serdes 1>;
|
||||
mdio {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
};
|
||||
};
|
||||
port@2 {
|
||||
reg = <2>;
|
||||
phy-handle = <ð_phy2>;
|
||||
phy-mode = "sgmii";
|
||||
phys = <ð_serdes 2>;
|
||||
mdio {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
@ -22,7 +22,8 @@ properties:
|
||||
phandle of an I2C bus controller for the SFP two wire serial
|
||||
|
||||
maximum-power-milliwatt:
|
||||
maxItems: 1
|
||||
minimum: 1000
|
||||
default: 1000
|
||||
description:
|
||||
Maximum module power consumption Specifies the maximum power consumption
|
||||
allowable by a module in the slot, in milli-Watts. Presently, modules can
|
||||
|
@ -167,56 +167,238 @@ properties:
|
||||
snps,mtl-rx-config:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
description:
|
||||
Multiple RX Queues parameters. Phandle to a node that can
|
||||
contain the following properties
|
||||
* snps,rx-queues-to-use, number of RX queues to be used in the
|
||||
driver
|
||||
* Choose one of these RX scheduling algorithms
|
||||
* snps,rx-sched-sp, Strict priority
|
||||
* snps,rx-sched-wsp, Weighted Strict priority
|
||||
* For each RX queue
|
||||
* Choose one of these modes
|
||||
* snps,dcb-algorithm, Queue to be enabled as DCB
|
||||
* snps,avb-algorithm, Queue to be enabled as AVB
|
||||
* snps,map-to-dma-channel, Channel to map
|
||||
* Specifiy specific packet routing
|
||||
* snps,route-avcp, AV Untagged Control packets
|
||||
* snps,route-ptp, PTP Packets
|
||||
* snps,route-dcbcp, DCB Control Packets
|
||||
* snps,route-up, Untagged Packets
|
||||
* snps,route-multi-broad, Multicast & Broadcast Packets
|
||||
* snps,priority, bitmask of the tagged frames priorities assigned to
|
||||
the queue
|
||||
Multiple RX Queues parameters. Phandle to a node that
|
||||
implements the 'rx-queues-config' object described in
|
||||
this binding.
|
||||
|
||||
rx-queues-config:
|
||||
type: object
|
||||
properties:
|
||||
snps,rx-queues-to-use:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: number of RX queues to be used in the driver
|
||||
snps,rx-sched-sp:
|
||||
type: boolean
|
||||
description: Strict priority
|
||||
snps,rx-sched-wsp:
|
||||
type: boolean
|
||||
description: Weighted Strict priority
|
||||
allOf:
|
||||
- if:
|
||||
required:
|
||||
- snps,rx-sched-sp
|
||||
then:
|
||||
properties:
|
||||
snps,rx-sched-wsp: false
|
||||
- if:
|
||||
required:
|
||||
- snps,rx-sched-wsp
|
||||
then:
|
||||
properties:
|
||||
snps,rx-sched-sp: false
|
||||
patternProperties:
|
||||
"^queue[0-9]$":
|
||||
description: Each subnode represents a queue.
|
||||
type: object
|
||||
properties:
|
||||
snps,dcb-algorithm:
|
||||
type: boolean
|
||||
description: Queue to be enabled as DCB
|
||||
snps,avb-algorithm:
|
||||
type: boolean
|
||||
description: Queue to be enabled as AVB
|
||||
snps,map-to-dma-channel:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: DMA channel id to map
|
||||
snps,route-avcp:
|
||||
type: boolean
|
||||
description: AV Untagged Control packets
|
||||
snps,route-ptp:
|
||||
type: boolean
|
||||
description: PTP Packets
|
||||
snps,route-dcbcp:
|
||||
type: boolean
|
||||
description: DCB Control Packets
|
||||
snps,route-up:
|
||||
type: boolean
|
||||
description: Untagged Packets
|
||||
snps,route-multi-broad:
|
||||
type: boolean
|
||||
description: Multicast & Broadcast Packets
|
||||
snps,priority:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: Bitmask of the tagged frames priorities assigned to the queue
|
||||
allOf:
|
||||
- if:
|
||||
required:
|
||||
- snps,dcb-algorithm
|
||||
then:
|
||||
properties:
|
||||
snps,avb-algorithm: false
|
||||
- if:
|
||||
required:
|
||||
- snps,avb-algorithm
|
||||
then:
|
||||
properties:
|
||||
snps,dcb-algorithm: false
|
||||
- if:
|
||||
required:
|
||||
- snps,route-avcp
|
||||
then:
|
||||
properties:
|
||||
snps,route-ptp: false
|
||||
snps,route-dcbcp: false
|
||||
snps,route-up: false
|
||||
snps,route-multi-broad: false
|
||||
- if:
|
||||
required:
|
||||
- snps,route-ptp
|
||||
then:
|
||||
properties:
|
||||
snps,route-avcp: false
|
||||
snps,route-dcbcp: false
|
||||
snps,route-up: false
|
||||
snps,route-multi-broad: false
|
||||
- if:
|
||||
required:
|
||||
- snps,route-dcbcp
|
||||
then:
|
||||
properties:
|
||||
snps,route-avcp: false
|
||||
snps,route-ptp: false
|
||||
snps,route-up: false
|
||||
snps,route-multi-broad: false
|
||||
- if:
|
||||
required:
|
||||
- snps,route-up
|
||||
then:
|
||||
properties:
|
||||
snps,route-avcp: false
|
||||
snps,route-ptp: false
|
||||
snps,route-dcbcp: false
|
||||
snps,route-multi-broad: false
|
||||
- if:
|
||||
required:
|
||||
- snps,route-multi-broad
|
||||
then:
|
||||
properties:
|
||||
snps,route-avcp: false
|
||||
snps,route-ptp: false
|
||||
snps,route-dcbcp: false
|
||||
snps,route-up: false
|
||||
additionalProperties: false
|
||||
additionalProperties: false
|
||||
|
||||
snps,mtl-tx-config:
|
||||
$ref: /schemas/types.yaml#/definitions/phandle
|
||||
description:
|
||||
Multiple TX Queues parameters. Phandle to a node that can
|
||||
contain the following properties
|
||||
* snps,tx-queues-to-use, number of TX queues to be used in the
|
||||
driver
|
||||
* Choose one of these TX scheduling algorithms
|
||||
* snps,tx-sched-wrr, Weighted Round Robin
|
||||
* snps,tx-sched-wfq, Weighted Fair Queuing
|
||||
* snps,tx-sched-dwrr, Deficit Weighted Round Robin
|
||||
* snps,tx-sched-sp, Strict priority
|
||||
* For each TX queue
|
||||
* snps,weight, TX queue weight (if using a DCB weight
|
||||
algorithm)
|
||||
* Choose one of these modes
|
||||
* snps,dcb-algorithm, TX queue will be working in DCB
|
||||
* snps,avb-algorithm, TX queue will be working in AVB
|
||||
[Attention] Queue 0 is reserved for legacy traffic
|
||||
and so no AVB is available in this queue.
|
||||
* Configure Credit Base Shaper (if AVB Mode selected)
|
||||
* snps,send_slope, enable Low Power Interface
|
||||
* snps,idle_slope, unlock on WoL
|
||||
* snps,high_credit, max write outstanding req. limit
|
||||
* snps,low_credit, max read outstanding req. limit
|
||||
* snps,priority, bitmask of the priorities assigned to the queue.
|
||||
When a PFC frame is received with priorities matching the bitmask,
|
||||
the queue is blocked from transmitting for the pause time specified
|
||||
in the PFC frame.
|
||||
Multiple TX Queues parameters. Phandle to a node that
|
||||
implements the 'tx-queues-config' object described in
|
||||
this binding.
|
||||
|
||||
tx-queues-config:
|
||||
type: object
|
||||
properties:
|
||||
snps,tx-queues-to-use:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: number of TX queues to be used in the driver
|
||||
snps,tx-sched-wrr:
|
||||
type: boolean
|
||||
description: Weighted Round Robin
|
||||
snps,tx-sched-wfq:
|
||||
type: boolean
|
||||
description: Weighted Fair Queuing
|
||||
snps,tx-sched-dwrr:
|
||||
type: boolean
|
||||
description: Deficit Weighted Round Robin
|
||||
snps,tx-sched-sp:
|
||||
type: boolean
|
||||
description: Strict priority
|
||||
allOf:
|
||||
- if:
|
||||
required:
|
||||
- snps,tx-sched-wrr
|
||||
then:
|
||||
properties:
|
||||
snps,tx-sched-wfq: false
|
||||
snps,tx-sched-dwrr: false
|
||||
snps,tx-sched-sp: false
|
||||
- if:
|
||||
required:
|
||||
- snps,tx-sched-wfq
|
||||
then:
|
||||
properties:
|
||||
snps,tx-sched-wrr: false
|
||||
snps,tx-sched-dwrr: false
|
||||
snps,tx-sched-sp: false
|
||||
- if:
|
||||
required:
|
||||
- snps,tx-sched-dwrr
|
||||
then:
|
||||
properties:
|
||||
snps,tx-sched-wrr: false
|
||||
snps,tx-sched-wfq: false
|
||||
snps,tx-sched-sp: false
|
||||
- if:
|
||||
required:
|
||||
- snps,tx-sched-sp
|
||||
then:
|
||||
properties:
|
||||
snps,tx-sched-wrr: false
|
||||
snps,tx-sched-wfq: false
|
||||
snps,tx-sched-dwrr: false
|
||||
patternProperties:
|
||||
"^queue[0-9]$":
|
||||
description: Each subnode represents a queue.
|
||||
type: object
|
||||
properties:
|
||||
snps,weight:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: TX queue weight (if using a DCB weight algorithm)
|
||||
snps,dcb-algorithm:
|
||||
type: boolean
|
||||
description: TX queue will be working in DCB
|
||||
snps,avb-algorithm:
|
||||
type: boolean
|
||||
description:
|
||||
TX queue will be working in AVB.
|
||||
Queue 0 is reserved for legacy traffic and so no AVB is
|
||||
available in this queue.
|
||||
snps,send_slope:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: enable Low Power Interface
|
||||
snps,idle_slope:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: unlock on WoL
|
||||
snps,high_credit:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: max write outstanding req. limit
|
||||
snps,low_credit:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description: max read outstanding req. limit
|
||||
snps,priority:
|
||||
$ref: /schemas/types.yaml#/definitions/uint32
|
||||
description:
|
||||
Bitmask of the tagged frames priorities assigned to the queue.
|
||||
When a PFC frame is received with priorities matching the bitmask,
|
||||
the queue is blocked from transmitting for the pause time specified
|
||||
in the PFC frame.
|
||||
allOf:
|
||||
- if:
|
||||
required:
|
||||
- snps,dcb-algorithm
|
||||
then:
|
||||
properties:
|
||||
snps,avb-algorithm: false
|
||||
- if:
|
||||
required:
|
||||
- snps,avb-algorithm
|
||||
then:
|
||||
properties:
|
||||
snps,dcb-algorithm: false
|
||||
snps,weight: false
|
||||
additionalProperties: false
|
||||
additionalProperties: false
|
||||
|
||||
snps,reset-gpio:
|
||||
deprecated: true
|
||||
@ -463,41 +645,6 @@ additionalProperties: true
|
||||
|
||||
examples:
|
||||
- |
|
||||
stmmac_axi_setup: stmmac-axi-config {
|
||||
snps,wr_osr_lmt = <0xf>;
|
||||
snps,rd_osr_lmt = <0xf>;
|
||||
snps,blen = <256 128 64 32 0 0 0>;
|
||||
};
|
||||
|
||||
mtl_rx_setup: rx-queues-config {
|
||||
snps,rx-queues-to-use = <1>;
|
||||
snps,rx-sched-sp;
|
||||
queue0 {
|
||||
snps,dcb-algorithm;
|
||||
snps,map-to-dma-channel = <0x0>;
|
||||
snps,priority = <0x0>;
|
||||
};
|
||||
};
|
||||
|
||||
mtl_tx_setup: tx-queues-config {
|
||||
snps,tx-queues-to-use = <2>;
|
||||
snps,tx-sched-wrr;
|
||||
queue0 {
|
||||
snps,weight = <0x10>;
|
||||
snps,dcb-algorithm;
|
||||
snps,priority = <0x0>;
|
||||
};
|
||||
|
||||
queue1 {
|
||||
snps,avb-algorithm;
|
||||
snps,send_slope = <0x1000>;
|
||||
snps,idle_slope = <0x1000>;
|
||||
snps,high_credit = <0x3E800>;
|
||||
snps,low_credit = <0xFFC18000>;
|
||||
snps,priority = <0x1>;
|
||||
};
|
||||
};
|
||||
|
||||
gmac0: ethernet@e0800000 {
|
||||
compatible = "snps,dwxgmac-2.10", "snps,dwxgmac";
|
||||
reg = <0xe0800000 0x8000>;
|
||||
@ -516,6 +663,42 @@ examples:
|
||||
snps,axi-config = <&stmmac_axi_setup>;
|
||||
snps,mtl-rx-config = <&mtl_rx_setup>;
|
||||
snps,mtl-tx-config = <&mtl_tx_setup>;
|
||||
|
||||
stmmac_axi_setup: stmmac-axi-config {
|
||||
snps,wr_osr_lmt = <0xf>;
|
||||
snps,rd_osr_lmt = <0xf>;
|
||||
snps,blen = <256 128 64 32 0 0 0>;
|
||||
};
|
||||
|
||||
mtl_rx_setup: rx-queues-config {
|
||||
snps,rx-queues-to-use = <1>;
|
||||
snps,rx-sched-sp;
|
||||
queue0 {
|
||||
snps,dcb-algorithm;
|
||||
snps,map-to-dma-channel = <0x0>;
|
||||
snps,priority = <0x0>;
|
||||
};
|
||||
};
|
||||
|
||||
mtl_tx_setup: tx-queues-config {
|
||||
snps,tx-queues-to-use = <2>;
|
||||
snps,tx-sched-wrr;
|
||||
queue0 {
|
||||
snps,weight = <0x10>;
|
||||
snps,dcb-algorithm;
|
||||
snps,priority = <0x0>;
|
||||
};
|
||||
|
||||
queue1 {
|
||||
snps,avb-algorithm;
|
||||
snps,send_slope = <0x1000>;
|
||||
snps,idle_slope = <0x1000>;
|
||||
snps,high_credit = <0x3E800>;
|
||||
snps,low_credit = <0xFFC18000>;
|
||||
snps,priority = <0x1>;
|
||||
};
|
||||
};
|
||||
|
||||
mdio0 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
@ -0,0 +1,73 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/net/socionext,synquacer-netsec.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: Socionext NetSec Ethernet Controller IP
|
||||
|
||||
maintainers:
|
||||
- Jassi Brar <jaswinder.singh@linaro.org>
|
||||
- Ilias Apalodimas <ilias.apalodimas@linaro.org>
|
||||
|
||||
allOf:
|
||||
- $ref: ethernet-controller.yaml#
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
const: socionext,synquacer-netsec
|
||||
|
||||
reg:
|
||||
items:
|
||||
- description: control register area
|
||||
- description: EEPROM holding the MAC address and microengine firmware
|
||||
|
||||
clocks:
|
||||
maxItems: 1
|
||||
|
||||
clock-names:
|
||||
const: phy_ref_clk
|
||||
|
||||
dma-coherent: true
|
||||
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
mdio:
|
||||
$ref: mdio.yaml#
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- clocks
|
||||
- clock-names
|
||||
- interrupts
|
||||
- mdio
|
||||
|
||||
unevaluatedProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
|
||||
ethernet@522d0000 {
|
||||
compatible = "socionext,synquacer-netsec";
|
||||
reg = <0x522d0000 0x10000>, <0x10000000 0x10000>;
|
||||
interrupts = <GIC_SPI 176 IRQ_TYPE_LEVEL_HIGH>;
|
||||
clocks = <&clk_netsec>;
|
||||
clock-names = "phy_ref_clk";
|
||||
phy-mode = "rgmii";
|
||||
max-speed = <1000>;
|
||||
max-frame-size = <9000>;
|
||||
phy-handle = <&phy1>;
|
||||
|
||||
mdio {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
phy1: ethernet-phy@1 {
|
||||
compatible = "ethernet-phy-ieee802.3-c22";
|
||||
reg = <1>;
|
||||
};
|
||||
};
|
||||
};
|
||||
...
|
@ -1,56 +0,0 @@
|
||||
* Socionext NetSec Ethernet Controller IP
|
||||
|
||||
Required properties:
|
||||
- compatible: Should be "socionext,synquacer-netsec"
|
||||
- reg: Address and length of the control register area, followed by the
|
||||
address and length of the EEPROM holding the MAC address and
|
||||
microengine firmware
|
||||
- interrupts: Should contain ethernet controller interrupt
|
||||
- clocks: phandle to the PHY reference clock
|
||||
- clock-names: Should be "phy_ref_clk"
|
||||
- phy-mode: See ethernet.txt file in the same directory
|
||||
- phy-handle: See ethernet.txt in the same directory.
|
||||
|
||||
- mdio device tree subnode: When the Netsec has a phy connected to its local
|
||||
mdio, there must be device tree subnode with the following
|
||||
required properties:
|
||||
|
||||
- #address-cells: Must be <1>.
|
||||
- #size-cells: Must be <0>.
|
||||
|
||||
For each phy on the mdio bus, there must be a node with the following
|
||||
fields:
|
||||
- compatible: Refer to phy.txt
|
||||
- reg: phy id used to communicate to phy.
|
||||
|
||||
Optional properties: (See ethernet.txt file in the same directory)
|
||||
- dma-coherent: Boolean property, must only be present if memory
|
||||
accesses performed by the device are cache coherent.
|
||||
- max-speed: See ethernet.txt in the same directory.
|
||||
- max-frame-size: See ethernet.txt in the same directory.
|
||||
|
||||
The MAC address will be determined using the optional properties
|
||||
defined in ethernet.txt. The 'phy-mode' property is required, but may
|
||||
be set to the empty string if the PHY configuration is programmed by
|
||||
the firmware or set by hardware straps, and needs to be preserved.
|
||||
|
||||
Example:
|
||||
eth0: ethernet@522d0000 {
|
||||
compatible = "socionext,synquacer-netsec";
|
||||
reg = <0 0x522d0000 0x0 0x10000>, <0 0x10000000 0x0 0x10000>;
|
||||
interrupts = <GIC_SPI 176 IRQ_TYPE_LEVEL_HIGH>;
|
||||
clocks = <&clk_netsec>;
|
||||
clock-names = "phy_ref_clk";
|
||||
phy-mode = "rgmii";
|
||||
max-speed = <1000>;
|
||||
max-frame-size = <9000>;
|
||||
phy-handle = <&phy1>;
|
||||
|
||||
mdio {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
phy1: ethernet-phy@1 {
|
||||
compatible = "ethernet-phy-ieee802.3-c22";
|
||||
reg = <1>;
|
||||
};
|
||||
};
|
@ -68,6 +68,8 @@ Optional properties:
|
||||
- mdio : Child node for MDIO bus. Must be defined if PHY access is
|
||||
required through the core's MDIO interface (i.e. always,
|
||||
unless the PHY is accessed through a different bus).
|
||||
Non-standard MDIO bus frequency is supported via
|
||||
"clock-frequency", see mdio.yaml.
|
||||
|
||||
- pcs-handle: Phandle to the internal PCS/PMA PHY in SGMII or 1000Base-X
|
||||
modes, where "pcs-handle" should be used to point
|
||||
|
@ -0,0 +1,51 @@
|
||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||
%YAML 1.2
|
||||
---
|
||||
$id: http://devicetree.org/schemas/soc/mediatek/mediatek,mt7986-wo-ccif.yaml#
|
||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||
|
||||
title: MediaTek Wireless Ethernet Dispatch (WED) WO controller interface for MT7986
|
||||
|
||||
maintainers:
|
||||
- Lorenzo Bianconi <lorenzo@kernel.org>
|
||||
- Felix Fietkau <nbd@nbd.name>
|
||||
|
||||
description:
|
||||
The MediaTek wo-ccif provides a configuration interface for WED WO
|
||||
controller used to perfrom offload rx packet processing (e.g. 802.11
|
||||
aggregation packet reordering or rx header translation) on MT7986 soc.
|
||||
|
||||
properties:
|
||||
compatible:
|
||||
items:
|
||||
- enum:
|
||||
- mediatek,mt7986-wo-ccif
|
||||
- const: syscon
|
||||
|
||||
reg:
|
||||
maxItems: 1
|
||||
|
||||
interrupts:
|
||||
maxItems: 1
|
||||
|
||||
required:
|
||||
- compatible
|
||||
- reg
|
||||
- interrupts
|
||||
|
||||
additionalProperties: false
|
||||
|
||||
examples:
|
||||
- |
|
||||
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||
#include <dt-bindings/interrupt-controller/irq.h>
|
||||
soc {
|
||||
#address-cells = <2>;
|
||||
#size-cells = <2>;
|
||||
|
||||
syscon@151a5000 {
|
||||
compatible = "mediatek,mt7986-wo-ccif", "syscon";
|
||||
reg = <0 0x151a5000 0 0x1000>;
|
||||
interrupts = <GIC_SPI 205 IRQ_TYPE_LEVEL_HIGH>;
|
||||
};
|
||||
};
|
@ -42,15 +42,13 @@ properties:
|
||||
bluetooth:
|
||||
type: object
|
||||
additionalProperties: false
|
||||
allOf:
|
||||
- $ref: /schemas/net/bluetooth/bluetooth-controller.yaml#
|
||||
properties:
|
||||
compatible:
|
||||
const: qcom,wcnss-bt
|
||||
|
||||
local-bd-address:
|
||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||
maxItems: 6
|
||||
description:
|
||||
See Documentation/devicetree/bindings/net/bluetooth.txt
|
||||
local-bd-address: true
|
||||
|
||||
required:
|
||||
- compatible
|
||||
|
@ -566,7 +566,8 @@ miimon
|
||||
link monitoring. A value of 100 is a good starting point.
|
||||
The use_carrier option, below, affects how the link state is
|
||||
determined. See the High Availability section for additional
|
||||
information. The default value is 0.
|
||||
information. The default value is 100 if arp_interval is not
|
||||
set.
|
||||
|
||||
min_links
|
||||
|
||||
@ -956,6 +957,7 @@ xmit_hash_policy
|
||||
hash = hash XOR source IP XOR destination IP
|
||||
hash = hash XOR (hash RSHIFT 16)
|
||||
hash = hash XOR (hash RSHIFT 8)
|
||||
hash = hash RSHIFT 1
|
||||
And then hash is reduced modulo slave count.
|
||||
|
||||
If the protocol is IPv6 then the source and destination
|
||||
|
@ -1148,6 +1148,39 @@ tuning on deep embedded systems'. The author is running a MPC603e
|
||||
load without any problems ...
|
||||
|
||||
|
||||
Switchable Termination Resistors
|
||||
--------------------------------
|
||||
|
||||
CAN bus requires a specific impedance across the differential pair,
|
||||
typically provided by two 120Ohm resistors on the farthest nodes of
|
||||
the bus. Some CAN controllers support activating / deactivating a
|
||||
termination resistor(s) to provide the correct impedance.
|
||||
|
||||
Query the available resistances::
|
||||
|
||||
$ ip -details link show can0
|
||||
...
|
||||
termination 120 [ 0, 120 ]
|
||||
|
||||
Activate the terminating resistor::
|
||||
|
||||
$ ip link set dev can0 type can termination 120
|
||||
|
||||
Deactivate the terminating resistor::
|
||||
|
||||
$ ip link set dev can0 type can termination 0
|
||||
|
||||
To enable termination resistor support to a can-controller, either
|
||||
implement in the controller's struct can-priv::
|
||||
|
||||
termination_const
|
||||
termination_const_cnt
|
||||
do_set_termination
|
||||
|
||||
or add gpio control with the device tree entries from
|
||||
Documentation/devicetree/bindings/net/can/can-controller.yaml
|
||||
|
||||
|
||||
The Virtual CAN Driver (vcan)
|
||||
-----------------------------
|
||||
|
||||
|
@ -181,10 +181,13 @@ when necessary using the below listed API::
|
||||
- int dpaa2_mac_connect(struct dpaa2_mac *mac);
|
||||
- void dpaa2_mac_disconnect(struct dpaa2_mac *mac);
|
||||
|
||||
A phylink integration is necessary only when the partner DPMAC is not of TYPE_FIXED.
|
||||
One can check for this condition using the below API::
|
||||
A phylink integration is necessary only when the partner DPMAC is not of
|
||||
``TYPE_FIXED``. This means it is either of ``TYPE_PHY``, or of
|
||||
``TYPE_BACKPLANE`` (the difference being the two that in the ``TYPE_BACKPLANE``
|
||||
mode, the MC firmware does not access the PCS registers). One can check for
|
||||
this condition using the following helper::
|
||||
|
||||
- bool dpaa2_mac_is_type_fixed(struct fsl_mc_device *dpmac_dev,struct fsl_mc_io *mc_io);
|
||||
- static inline bool dpaa2_mac_is_type_phy(struct dpaa2_mac *mac);
|
||||
|
||||
Before connection to a MAC, the caller must allocate and populate the
|
||||
dpaa2_mac structure with the associated net_device, a pointer to the MC portal
|
||||
|
@ -23,6 +23,7 @@ Supported Devices
|
||||
=================
|
||||
Currently, this driver support following devices:
|
||||
* Network controller: Cavium, Inc. Device b200
|
||||
* Network controller: Cavium, Inc. Device b400
|
||||
|
||||
Interface Control
|
||||
=================
|
||||
|
@ -25,7 +25,7 @@ Enabling the driver and kconfig options
|
||||
| at build time via kernel Kconfig flags.
|
||||
| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
|
||||
| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
|
||||
| For the list of advanced features please see below.
|
||||
| For the list of advanced features, please see below.
|
||||
|
||||
**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
|
||||
|
||||
@ -89,11 +89,11 @@ Enabling the driver and kconfig options
|
||||
|
||||
**CONFIG_MLX5_EN_IPSEC=(y/n)**
|
||||
|
||||
| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
|
||||
| Enables `IPSec XFRM cryptography-offload acceleration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
|
||||
|
||||
**CONFIG_MLX5_EN_TLS=(y/n)**
|
||||
|
||||
| TLS cryptography-offload accelaration.
|
||||
| TLS cryptography-offload acceleration.
|
||||
|
||||
|
||||
**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
|
||||
@ -139,14 +139,14 @@ flow_steering_mode: Device flow steering mode
|
||||
The flow steering mode parameter controls the flow steering mode of the driver.
|
||||
Two modes are supported:
|
||||
1. 'dmfs' - Device managed flow steering.
|
||||
2. 'smfs - Software/Driver managed flow steering.
|
||||
2. 'smfs' - Software/Driver managed flow steering.
|
||||
|
||||
In DMFS mode, the HW steering entities are created and managed through the
|
||||
Firmware.
|
||||
In SMFS mode, the HW steering entities are created and managed though by
|
||||
the driver directly into Hardware without firmware intervention.
|
||||
the driver directly into hardware without firmware intervention.
|
||||
|
||||
SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode.
|
||||
SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode.
|
||||
|
||||
User command examples:
|
||||
|
||||
@ -165,9 +165,9 @@ User command examples:
|
||||
enable_roce: RoCE enablement state
|
||||
----------------------------------
|
||||
RoCE enablement state controls driver support for RoCE traffic.
|
||||
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well known UDP RoCE port is handled as raw ethernet traffic.
|
||||
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well-known UDP RoCE port is handled as raw ethernet traffic.
|
||||
|
||||
To change RoCE enablement state a user must change the driverinit cmode value and run devlink reload.
|
||||
To change RoCE enablement state, a user must change the driverinit cmode value and run devlink reload.
|
||||
|
||||
User command examples:
|
||||
|
||||
@ -186,7 +186,7 @@ User command examples:
|
||||
|
||||
esw_port_metadata: Eswitch port metadata state
|
||||
----------------------------------------------
|
||||
When applicable, disabling Eswitch metadata can increase packet rate
|
||||
When applicable, disabling eswitch metadata can increase packet rate
|
||||
up to 20% depending on the use case and packet sizes.
|
||||
|
||||
Eswitch port metadata state controls whether to internally tag packets with
|
||||
@ -253,26 +253,26 @@ mlx5 subfunction
|
||||
================
|
||||
mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
|
||||
|
||||
A Subfunction has its own function capabilities and its own resources. This
|
||||
A subfunction has its own function capabilities and its own resources. This
|
||||
means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
|
||||
queues are neither shared nor stolen from the parent PCI function.
|
||||
|
||||
When a subfunction is RDMA capable, it has its own QP1, GID table and rdma
|
||||
When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
|
||||
resources neither shared nor stolen from the parent PCI function.
|
||||
|
||||
A subfunction has a dedicated window in PCI BAR space that is not shared
|
||||
with ther other subfunctions or the parent PCI function. This ensures that all
|
||||
devices (netdev, rdma, vdpa etc.) of the subfunction accesses only assigned
|
||||
with the other subfunctions or the parent PCI function. This ensures that all
|
||||
devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
|
||||
PCI BAR space.
|
||||
|
||||
A Subfunction supports eswitch representation through which it supports tc
|
||||
A subfunction supports eswitch representation through which it supports tc
|
||||
offloads. The user configures eswitch to send/receive packets from/to
|
||||
the subfunction port.
|
||||
|
||||
Subfunctions share PCI level resources such as PCI MSI-X IRQs with
|
||||
other subfunctions and/or with its parent PCI function.
|
||||
|
||||
Example mlx5 software, system and device view::
|
||||
Example mlx5 software, system, and device view::
|
||||
|
||||
_______
|
||||
| admin |
|
||||
@ -310,7 +310,7 @@ Example mlx5 software, system and device view::
|
||||
| (device add/del)
|
||||
_____|____ ____|________
|
||||
| | | subfunction |
|
||||
| PCI NIC |---- activate/deactive events---->| host driver |
|
||||
| PCI NIC |--- activate/deactivate events--->| host driver |
|
||||
|__________| | (mlx5_core) |
|
||||
|_____________|
|
||||
|
||||
@ -320,7 +320,7 @@ Subfunction is created using devlink port interface.
|
||||
|
||||
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
||||
|
||||
- Add a devlink port of subfunction flaovur::
|
||||
- Add a devlink port of subfunction flavour::
|
||||
|
||||
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
|
||||
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
||||
@ -351,46 +351,30 @@ driver.
|
||||
|
||||
MAC address setup
|
||||
-----------------
|
||||
mlx5 driver provides mechanism to setup the MAC address of the PCI VF/SF.
|
||||
mlx5 driver support devlink port function attr mechanism to setup MAC
|
||||
address. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||
|
||||
The configured MAC address of the PCI VF/SF will be used by netdevice and rdma
|
||||
device created for the PCI VF/SF.
|
||||
RoCE capability setup
|
||||
---------------------
|
||||
Not all mlx5 PCI devices/SFs require RoCE capability.
|
||||
|
||||
- Get the MAC address of the VF identified by its unique devlink port index::
|
||||
When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
|
||||
PCI devices/SF.
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00
|
||||
mlx5 driver support devlink port function attr mechanism to setup RoCE
|
||||
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||
|
||||
- Set the MAC address of the VF identified by its unique devlink port index::
|
||||
migratable capability setup
|
||||
---------------------------
|
||||
User who wants mlx5 PCI VFs to be able to perform live migration need to
|
||||
explicitly enable the VF migratable capability.
|
||||
|
||||
$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:11:22:33:44:55
|
||||
|
||||
- Get the MAC address of the SF identified by its unique devlink port index::
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/32768
|
||||
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00
|
||||
|
||||
- Set the MAC address of the VF identified by its unique devlink port index::
|
||||
|
||||
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/32768
|
||||
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcivf pfnum 0 sfnum 88
|
||||
function:
|
||||
hw_addr 00:00:00:00:88:88
|
||||
mlx5 driver support devlink port function attr mechanism to setup migratable
|
||||
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||
|
||||
SF state setup
|
||||
--------------
|
||||
To use the SF, the user must active the SF using the SF function state
|
||||
To use the SF, the user must activate the SF using the SF function state
|
||||
attribute.
|
||||
|
||||
- Get the state of the SF identified by its unique devlink port index::
|
||||
@ -447,7 +431,7 @@ for it.
|
||||
|
||||
Additionally, the SF port also gets the event when the driver attaches to the
|
||||
auxiliary device of the subfunction. This results in changing the operational
|
||||
state of the function. This provides visiblity to the user to decide when is it
|
||||
state of the function. This provides visibility to the user to decide when is it
|
||||
safe to delete the SF port for graceful termination of the subfunction.
|
||||
|
||||
- Show the SF port operational state::
|
||||
@ -464,14 +448,14 @@ tx reporter
|
||||
-----------
|
||||
The tx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||
|
||||
- TX timeout
|
||||
- tx timeout
|
||||
Report on kernel tx timeout detection.
|
||||
Recover by searching lost interrupts.
|
||||
- TX error completion
|
||||
- tx error completion
|
||||
Report on error tx completion.
|
||||
Recover by flushing the TX queue and reset it.
|
||||
Recover by flushing the tx queue and reset it.
|
||||
|
||||
TX reporter also support on demand diagnose callback, on which it provides
|
||||
tx reporter also support on demand diagnose callback, on which it provides
|
||||
real time information of its send queues status.
|
||||
|
||||
User commands examples:
|
||||
@ -491,32 +475,32 @@ rx reporter
|
||||
-----------
|
||||
The rx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||
|
||||
- RX queues initialization (population) timeout
|
||||
RX queues descriptors population on ring initialization is done in
|
||||
napi context via triggering an irq, in case of a failure to get
|
||||
the minimum amount of descriptors, a timeout would occur and it
|
||||
could be recoverable by polling the EQ (Event Queue).
|
||||
- RX completions with errors (reported by HW on interrupt context)
|
||||
- rx queues' initialization (population) timeout
|
||||
Population of rx queues' descriptors on ring initialization is done
|
||||
in napi context via triggering an irq. In case of a failure to get
|
||||
the minimum amount of descriptors, a timeout would occur, and
|
||||
descriptors could be recovered by polling the EQ (Event Queue).
|
||||
- rx completions with errors (reported by HW on interrupt context)
|
||||
Report on rx completion error.
|
||||
Recover (if needed) by flushing the related queue and reset it.
|
||||
|
||||
RX reporter also supports on demand diagnose callback, on which it
|
||||
provides real time information of its receive queues status.
|
||||
rx reporter also supports on demand diagnose callback, on which it
|
||||
provides real time information of its receive queues' status.
|
||||
|
||||
- Diagnose rx queues status, and corresponding completion queue::
|
||||
- Diagnose rx queues' status and corresponding completion queue::
|
||||
|
||||
$ devlink health diagnose pci/0000:82:00.0 reporter rx
|
||||
|
||||
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
|
||||
NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output.
|
||||
|
||||
- Show number of rx errors indicated, number of recover flows ended successfully,
|
||||
is autorecover enabled and graceful period from last recover::
|
||||
is autorecover enabled, and graceful period from last recover::
|
||||
|
||||
$ devlink health show pci/0000:82:00.0 reporter rx
|
||||
|
||||
fw reporter
|
||||
-----------
|
||||
The fw reporter implements diagnose and dump callbacks.
|
||||
The fw reporter implements `diagnose` and `dump` callbacks.
|
||||
It follows symptoms of fw error such as fw syndrome by triggering
|
||||
fw core dump and storing it into the dump buffer.
|
||||
The fw reporter diagnose command can be triggered any time by the user to check
|
||||
@ -537,7 +521,7 @@ running it on other PF or any VF will return "Operation not permitted".
|
||||
|
||||
fw fatal reporter
|
||||
-----------------
|
||||
The fw fatal reporter implements dump and recover callbacks.
|
||||
The fw fatal reporter implements `dump` and `recover` callbacks.
|
||||
It follows fatal errors indications by CR-space dump and recover flow.
|
||||
The CR-space dump uses vsc interface which is valid even if the FW command
|
||||
interface is not functional, which is the case in most FW fatal errors.
|
||||
@ -552,7 +536,7 @@ User commands examples:
|
||||
|
||||
$ devlink health recover pci/0000:82:00.0 reporter fw_fatal
|
||||
|
||||
- Read FW CR-space dump if already strored or trigger new one::
|
||||
- Read FW CR-space dump if already stored or trigger new one::
|
||||
|
||||
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
|
||||
|
||||
@ -561,10 +545,10 @@ NOTE: This command can run only on PF.
|
||||
mlx5 tracepoints
|
||||
================
|
||||
|
||||
mlx5 driver provides internal trace points for tracking and debugging using
|
||||
mlx5 driver provides internal tracepoints for tracking and debugging using
|
||||
kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst).
|
||||
|
||||
For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/
|
||||
For the list of support mlx5 events, check `/sys/kernel/debug/tracing/events/mlx5/`.
|
||||
|
||||
tc and eswitch offloads tracepoints:
|
||||
|
||||
|
@ -1,50 +1,57 @@
|
||||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=============================================
|
||||
Netronome Flow Processor (NFP) Kernel Drivers
|
||||
=============================================
|
||||
===========================================
|
||||
Network Flow Processor (NFP) Kernel Drivers
|
||||
===========================================
|
||||
|
||||
Copyright (c) 2019, Netronome Systems, Inc.
|
||||
:Copyright: |copy| 2019, Netronome Systems, Inc.
|
||||
:Copyright: |copy| 2022, Corigine, Inc.
|
||||
|
||||
Contents
|
||||
========
|
||||
|
||||
- `Overview`_
|
||||
- `Acquiring Firmware`_
|
||||
- `Devlink Info`_
|
||||
- `Configure Device`_
|
||||
- `Statistics`_
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
This driver supports Netronome's line of Flow Processor devices,
|
||||
including the NFP4000, NFP5000, and NFP6000 models, which are also
|
||||
incorporated in the company's family of Agilio SmartNICs. The SR-IOV
|
||||
physical and virtual functions for these devices are supported by
|
||||
the driver.
|
||||
This driver supports Netronome and Corigine's line of Network Flow Processor
|
||||
devices, including the NFP3800, NFP4000, NFP5000, and NFP6000 models, which
|
||||
are also incorporated in the companies' family of Agilio SmartNICs. The SR-IOV
|
||||
physical and virtual functions for these devices are supported by the driver.
|
||||
|
||||
Acquiring Firmware
|
||||
==================
|
||||
|
||||
The NFP4000 and NFP6000 devices require application specific firmware
|
||||
to function. Application firmware can be located either on the host file system
|
||||
The NFP3800, NFP4000 and NFP6000 devices require application specific firmware
|
||||
to function. Application firmware can be located either on the host file system
|
||||
or in the device flash (if supported by management firmware).
|
||||
|
||||
Firmware files on the host filesystem contain card type (`AMDA-*` string), media
|
||||
config etc. They should be placed in `/lib/firmware/netronome` directory to
|
||||
config etc. They should be placed in `/lib/firmware/netronome` directory to
|
||||
load firmware from the host file system.
|
||||
|
||||
Firmware for basic NIC operation is available in the upstream
|
||||
`linux-firmware.git` repository.
|
||||
|
||||
A more comprehensive list of firmware can be downloaded from the
|
||||
`Corigine Support site <https://www.corigine.com/DPUDownload.html>`_.
|
||||
|
||||
Firmware in NVRAM
|
||||
-----------------
|
||||
|
||||
Recent versions of management firmware supports loading application
|
||||
firmware from flash when the host driver gets probed. The firmware loading
|
||||
firmware from flash when the host driver gets probed. The firmware loading
|
||||
policy configuration may be used to configure this feature appropriately.
|
||||
|
||||
Devlink or ethtool can be used to update the application firmware on the device
|
||||
flash by providing the appropriate `nic_AMDA*.nffw` file to the respective
|
||||
command. Users need to take care to write the correct firmware image for the
|
||||
command. Users need to take care to write the correct firmware image for the
|
||||
card and media configuration to flash.
|
||||
|
||||
Available storage space in flash depends on the card being used.
|
||||
@ -79,9 +86,9 @@ You may need to use hard instead of symbolic links on distributions
|
||||
which use old `mkinitrd` command instead of `dracut` (e.g. Ubuntu).
|
||||
|
||||
After changing firmware files you may need to regenerate the initramfs
|
||||
image. Initramfs contains drivers and firmware files your system may
|
||||
need to boot. Refer to the documentation of your distribution to find
|
||||
out how to update initramfs. Good indication of stale initramfs
|
||||
image. Initramfs contains drivers and firmware files your system may
|
||||
need to boot. Refer to the documentation of your distribution to find
|
||||
out how to update initramfs. Good indication of stale initramfs
|
||||
is system loading wrong driver or firmware on boot, but when driver is
|
||||
later reloaded manually everything works correctly.
|
||||
|
||||
@ -89,9 +96,9 @@ Selecting firmware per device
|
||||
-----------------------------
|
||||
|
||||
Most commonly all cards on the system use the same type of firmware.
|
||||
If you want to load specific firmware image for a specific card, you
|
||||
can use either the PCI bus address or serial number. Driver will print
|
||||
which files it's looking for when it recognizes a NFP device::
|
||||
If you want to load a specific firmware image for a specific card, you
|
||||
can use either the PCI bus address or serial number. The driver will
|
||||
print which files it's looking for when it recognizes a NFP device::
|
||||
|
||||
nfp: Looking for firmware file in order of priority:
|
||||
nfp: netronome/serial-00-12-34-aa-bb-cc-10-ff.nffw: not found
|
||||
@ -106,6 +113,15 @@ Note that `serial-*` and `pci-*` files are **not** automatically included
|
||||
in initramfs, you will have to refer to documentation of appropriate tools
|
||||
to find out how to include them.
|
||||
|
||||
Running firmware version
|
||||
------------------------
|
||||
|
||||
The version of the loaded firmware for a particular <netdev> interface,
|
||||
(e.g. enp4s0), or an interface's port <netdev port> (e.g. enp4s0np0) can
|
||||
be displayed with the ethtool command::
|
||||
|
||||
$ ethtool -i <netdev>
|
||||
|
||||
Firmware loading policy
|
||||
-----------------------
|
||||
|
||||
@ -132,6 +148,115 @@ abi_drv_load_ifc
|
||||
Defines a list of PF devices allowed to load FW on the device.
|
||||
This variable is not currently user configurable.
|
||||
|
||||
Devlink Info
|
||||
============
|
||||
|
||||
The devlink info command displays the running and stored firmware versions
|
||||
on the device, serial number and board information.
|
||||
|
||||
Devlink info command example (replace PCI address)::
|
||||
|
||||
$ devlink dev info pci/0000:03:00.0
|
||||
pci/0000:03:00.0:
|
||||
driver nfp
|
||||
serial_number CSAAMDA2001-1003000111
|
||||
versions:
|
||||
fixed:
|
||||
board.id AMDA2001-1003
|
||||
board.rev 01
|
||||
board.manufacture CSA
|
||||
board.model mozart
|
||||
running:
|
||||
fw.mgmt 22.10.0-rc3
|
||||
fw.cpld 0x1000003
|
||||
fw.app nic-22.09.0
|
||||
chip.init AMDA-2001-1003 1003000111
|
||||
stored:
|
||||
fw.bundle_id bspbundle_1003000111
|
||||
fw.mgmt 22.10.0-rc3
|
||||
fw.cpld 0x0
|
||||
chip.init AMDA-2001-1003 1003000111
|
||||
|
||||
Configure Device
|
||||
================
|
||||
|
||||
This section explains how to use Agilio SmartNICs running basic NIC firmware.
|
||||
|
||||
Configure interface link-speed
|
||||
------------------------------
|
||||
The following steps explains how to change between 10G mode and 25G mode on
|
||||
Agilio CX 2x25GbE cards. The changing of port speed must be done in order,
|
||||
port 0 (p0) must be set to 10G before port 1 (p1) may be set to 10G.
|
||||
|
||||
Down the respective interface(s)::
|
||||
|
||||
$ ip link set dev <netdev port 0> down
|
||||
$ ip link set dev <netdev port 1> down
|
||||
|
||||
Set interface link-speed to 10G::
|
||||
|
||||
$ ethtool -s <netdev port 0> speed 10000
|
||||
$ ethtool -s <netdev port 1> speed 10000
|
||||
|
||||
Set interface link-speed to 25G::
|
||||
|
||||
$ ethtool -s <netdev port 0> speed 25000
|
||||
$ ethtool -s <netdev port 1> speed 25000
|
||||
|
||||
Reload driver for changes to take effect::
|
||||
|
||||
$ rmmod nfp; modprobe nfp
|
||||
|
||||
Configure interface Maximum Transmission Unit (MTU)
|
||||
---------------------------------------------------
|
||||
|
||||
The MTU of interfaces can temporarily be set using the iproute2, ip link or
|
||||
ifconfig tools. Note that this change will not persist. Setting this via
|
||||
Network Manager, or another appropriate OS configuration tool, is
|
||||
recommended as changes to the MTU using Network Manager can be made to
|
||||
persist.
|
||||
|
||||
Set interface MTU to 9000 bytes::
|
||||
|
||||
$ ip link set dev <netdev port> mtu 9000
|
||||
|
||||
It is the responsibility of the user or the orchestration layer to set
|
||||
appropriate MTU values when handling jumbo frames or utilizing tunnels. For
|
||||
example, if packets sent from a VM are to be encapsulated on the card and
|
||||
egress a physical port, then the MTU of the VF should be set to lower than
|
||||
that of the physical port to account for the extra bytes added by the
|
||||
additional header. If a setup is expected to see fallback traffic between
|
||||
the SmartNIC and the kernel then the user should also ensure that the PF MTU
|
||||
is appropriately set to avoid unexpected drops on this path.
|
||||
|
||||
Configure Forward Error Correction (FEC) modes
|
||||
----------------------------------------------
|
||||
|
||||
Agilio SmartNICs support FEC mode configuration, e.g. Auto, Firecode Base-R,
|
||||
ReedSolomon and Off modes. Each physical port's FEC mode can be set
|
||||
independently using ethtool. The supported FEC modes for an interface can
|
||||
be viewed using::
|
||||
|
||||
$ ethtool <netdev>
|
||||
|
||||
The currently configured FEC mode can be viewed using::
|
||||
|
||||
$ ethtool --show-fec <netdev>
|
||||
|
||||
To force the FEC mode for a particular port, auto-negotiation must be disabled
|
||||
(see the `Auto-negotiation`_ section). An example of how to set the FEC mode
|
||||
to Reed-Solomon is::
|
||||
|
||||
$ ethtool --set-fec <netdev> encoding rs
|
||||
|
||||
Auto-negotiation
|
||||
----------------
|
||||
|
||||
To change auto-negotiation settings, the link must first be put down. After the
|
||||
link is down, auto-negotiation can be enabled or disabled using::
|
||||
|
||||
ethtool -s <netdev> autoneg <on|off>
|
||||
|
||||
Statistics
|
||||
==========
|
||||
|
||||
|
@ -198,6 +198,11 @@ fw.bundle_id
|
||||
|
||||
Unique identifier of the entire firmware bundle.
|
||||
|
||||
fw.bootloader
|
||||
-------------
|
||||
|
||||
Version of the bootloader.
|
||||
|
||||
Future work
|
||||
===========
|
||||
|
||||
|
@ -110,7 +110,7 @@ devlink ports for both the controllers.
|
||||
Function configuration
|
||||
======================
|
||||
|
||||
A user can configure the function attribute before enumerating the PCI
|
||||
Users can configure one or more function attributes before enumerating the PCI
|
||||
function. Usually it means, user should configure function attribute
|
||||
before a bus specific device for the function is created. However, when
|
||||
SRIOV is enabled, virtual function devices are created on the PCI bus.
|
||||
@ -119,9 +119,127 @@ function device to the driver. For subfunctions, this means user should
|
||||
configure port function attribute before activating the port function.
|
||||
|
||||
A user may set the hardware address of the function using
|
||||
'devlink port function set hw_addr' command. For Ethernet port function
|
||||
`devlink port function set hw_addr` command. For Ethernet port function
|
||||
this means a MAC address.
|
||||
|
||||
Users may also set the RoCE capability of the function using
|
||||
`devlink port function set roce` command.
|
||||
|
||||
Users may also set the function as migratable using
|
||||
'devlink port function set migratable' command.
|
||||
|
||||
Function attributes
|
||||
===================
|
||||
|
||||
MAC address setup
|
||||
-----------------
|
||||
The configured MAC address of the PCI VF/SF will be used by netdevice and rdma
|
||||
device created for the PCI VF/SF.
|
||||
|
||||
- Get the MAC address of the VF identified by its unique devlink port index::
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00
|
||||
|
||||
- Set the MAC address of the VF identified by its unique devlink port index::
|
||||
|
||||
$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:11:22:33:44:55
|
||||
|
||||
- Get the MAC address of the SF identified by its unique devlink port index::
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/32768
|
||||
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00
|
||||
|
||||
- Set the MAC address of the SF identified by its unique devlink port index::
|
||||
|
||||
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/32768
|
||||
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
||||
function:
|
||||
hw_addr 00:00:00:00:88:88
|
||||
|
||||
RoCE capability setup
|
||||
---------------------
|
||||
Not all PCI VFs/SFs require RoCE capability.
|
||||
|
||||
When RoCE capability is disabled, it saves system memory per PCI VF/SF.
|
||||
|
||||
When user disables RoCE capability for a VF/SF, user application cannot send or
|
||||
receive any RoCE packets through this VF/SF and RoCE GID table for this PCI
|
||||
will be empty.
|
||||
|
||||
When RoCE capability is disabled in the device using port function attribute,
|
||||
VF/SF driver cannot override it.
|
||||
|
||||
- Get RoCE capability of the VF device::
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00 roce enable
|
||||
|
||||
- Set RoCE capability of the VF device::
|
||||
|
||||
$ devlink port function set pci/0000:06:00.0/2 roce disable
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00 roce disable
|
||||
|
||||
migratable capability setup
|
||||
---------------------------
|
||||
Live migration is the process of transferring a live virtual machine
|
||||
from one physical host to another without disrupting its normal
|
||||
operation.
|
||||
|
||||
User who want PCI VFs to be able to perform live migration need to
|
||||
explicitly enable the VF migratable capability.
|
||||
|
||||
When user enables migratable capability for a VF, and the HV binds the VF to VFIO driver
|
||||
with migration support, the user can migrate the VM with this VF from one HV to a
|
||||
different one.
|
||||
|
||||
However, when migratable capability is enable, device will disable features which cannot
|
||||
be migrated. Thus migratable cap can impose limitations on a VF so let the user decide.
|
||||
|
||||
Example of LM with migratable function configuration:
|
||||
- Get migratable capability of the VF device::
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00 migratable disable
|
||||
|
||||
- Set migratable capability of the VF device::
|
||||
|
||||
$ devlink port function set pci/0000:06:00.0/2 migratable enable
|
||||
|
||||
$ devlink port show pci/0000:06:00.0/2
|
||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||
function:
|
||||
hw_addr 00:00:00:00:00:00 migratable enable
|
||||
|
||||
- Bind VF to VFIO driver with migration support::
|
||||
|
||||
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
|
||||
$ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
|
||||
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind
|
||||
|
||||
Attach VF to the VM.
|
||||
Start the VM.
|
||||
Perform live migration.
|
||||
|
||||
Subfunction
|
||||
============
|
||||
|
||||
@ -130,10 +248,11 @@ it is deployed. Subfunction is created and deployed in unit of 1. Unlike
|
||||
SRIOV VFs, a subfunction doesn't require its own PCI virtual function.
|
||||
A subfunction communicates with the hardware through the parent PCI function.
|
||||
|
||||
To use a subfunction, 3 steps setup sequence is followed.
|
||||
(1) create - create a subfunction;
|
||||
(2) configure - configure subfunction attributes;
|
||||
(3) deploy - deploy the subfunction;
|
||||
To use a subfunction, 3 steps setup sequence is followed:
|
||||
|
||||
1) create - create a subfunction;
|
||||
2) configure - configure subfunction attributes;
|
||||
3) deploy - deploy the subfunction;
|
||||
|
||||
Subfunction management is done using devlink port user interface.
|
||||
User performs setup on the subfunction management device.
|
||||
@ -191,13 +310,48 @@ API allows to configure following rate object's parameters:
|
||||
``tx_max``
|
||||
Maximum TX rate value.
|
||||
|
||||
``tx_priority``
|
||||
Allows for usage of strict priority arbiter among siblings. This
|
||||
arbitration scheme attempts to schedule nodes based on their priority
|
||||
as long as the nodes remain within their bandwidth limit. The higher the
|
||||
priority the higher the probability that the node will get selected for
|
||||
scheduling.
|
||||
|
||||
``tx_weight``
|
||||
Allows for usage of Weighted Fair Queuing arbitration scheme among
|
||||
siblings. This arbitration scheme can be used simultaneously with the
|
||||
strict priority. As a node is configured with a higher rate it gets more
|
||||
BW relative to it's siblings. Values are relative like a percentage
|
||||
points, they basically tell how much BW should node take relative to
|
||||
it's siblings.
|
||||
|
||||
``parent``
|
||||
Parent node name. Parent node rate limits are considered as additional limits
|
||||
to all node children limits. ``tx_max`` is an upper limit for children.
|
||||
``tx_share`` is a total bandwidth distributed among children.
|
||||
|
||||
``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
|
||||
nodes with the same priority form a WFQ subgroup in the sibling group
|
||||
and arbitration among them is based on assigned weights.
|
||||
|
||||
Arbitration flow from the high level:
|
||||
|
||||
#. Choose a node, or group of nodes with the highest priority that stays
|
||||
within the BW limit and are not blocked. Use ``tx_priority`` as a
|
||||
parameter for this arbitration.
|
||||
|
||||
#. If group of nodes have the same priority perform WFQ arbitration on
|
||||
that subgroup. Use ``tx_weight`` as a parameter for this arbitration.
|
||||
|
||||
#. Select the winner node, and continue arbitration flow among it's children,
|
||||
until leaf node is reached, and the winner is established.
|
||||
|
||||
#. If all the nodes from the highest priority sub-group are satisfied, or
|
||||
overused their assigned BW, move to the lower priority nodes.
|
||||
|
||||
Driver implementations are allowed to support both or either rate object types
|
||||
and setting methods of their parameters.
|
||||
and setting methods of their parameters. Additionally driver implementation
|
||||
may export nodes/leafs and their child-parent relationships.
|
||||
|
||||
Terms and Definitions
|
||||
=====================
|
||||
|
@ -31,6 +31,15 @@ in its ``devlink_region_ops`` structure. If snapshot id is not set in
|
||||
the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send
|
||||
the snapshot information to user space.
|
||||
|
||||
Regions may optionally allow directly reading from their contents without a
|
||||
snapshot. Direct read requests are not atomic. In particular a read request
|
||||
of size 256 bytes or larger will be split into multiple chunks. If atomic
|
||||
access is required, use a snapshot. A driver wishing to enable this for a
|
||||
region should implement the ``.read`` callback in the ``devlink_region_ops``
|
||||
structure. User space can request a direct read by using the
|
||||
``DEVLINK_ATTR_REGION_DIRECT`` attribute instead of specifying a snapshot
|
||||
id.
|
||||
|
||||
example usage
|
||||
-------------
|
||||
|
||||
@ -65,6 +74,10 @@ example usage
|
||||
$ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16
|
||||
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
|
||||
|
||||
# Read from the region without a snapshot
|
||||
$ devlink region read pci/0000:00:05.0/fw-health address 16 length 16
|
||||
0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
|
||||
|
||||
As regions are likely very device or driver specific, no generic regions are
|
||||
defined. See the driver-specific documentation files for information on the
|
||||
specific regions a driver supports.
|
||||
|
@ -485,6 +485,16 @@ be added to the following table:
|
||||
- Traps incoming packets that the device decided to drop because
|
||||
the destination MAC is not configured in the MAC table and
|
||||
the interface is not in promiscuous mode
|
||||
* - ``eapol``
|
||||
- ``control``
|
||||
- Traps "Extensible Authentication Protocol over LAN" (EAPOL) packets
|
||||
specified in IEEE 802.1X
|
||||
* - ``locked_port``
|
||||
- ``drop``
|
||||
- Traps packets that the device decided to drop because they failed the
|
||||
locked bridge port check. That is, packets that were received via a
|
||||
locked port and whose {SMAC, VID} does not correspond to an FDB entry
|
||||
pointing to the port
|
||||
|
||||
Driver-specific Packet Traps
|
||||
============================
|
||||
@ -589,6 +599,9 @@ narrow. The description of these groups must be added to the following table:
|
||||
* - ``parser_error_drops``
|
||||
- Contains packet traps for packets that were marked by the device during
|
||||
parsing as erroneous
|
||||
* - ``eapol``
|
||||
- Contains packet traps for "Extensible Authentication Protocol over LAN"
|
||||
(EAPOL) packets specified in IEEE 802.1X
|
||||
|
||||
Packet Trap Policers
|
||||
====================
|
||||
|
36
Documentation/networking/devlink/etas_es58x.rst
Normal file
36
Documentation/networking/devlink/etas_es58x.rst
Normal file
@ -0,0 +1,36 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================
|
||||
etas_es58x devlink support
|
||||
==========================
|
||||
|
||||
This document describes the devlink features implemented by the
|
||||
``etas_es58x`` device driver.
|
||||
|
||||
Info versions
|
||||
=============
|
||||
|
||||
The ``etas_es58x`` driver reports the following versions
|
||||
|
||||
.. list-table:: devlink info versions implemented
|
||||
:widths: 5 5 90
|
||||
|
||||
* - Name
|
||||
- Type
|
||||
- Description
|
||||
* - ``fw``
|
||||
- running
|
||||
- Version of the firmware running on the device. Also available
|
||||
through ``ethtool -i`` as the first member of the
|
||||
``firmware-version``.
|
||||
* - ``fw.bootloader``
|
||||
- running
|
||||
- Version of the bootloader running on the device. Also available
|
||||
through ``ethtool -i`` as the second member of the
|
||||
``firmware-version``.
|
||||
* - ``board.rev``
|
||||
- fixed
|
||||
- The hardware revision of the device.
|
||||
* - ``serial_number``
|
||||
- fixed
|
||||
- The USB serial number. Also available through ``lsusb -v``.
|
@ -189,12 +189,21 @@ device data.
|
||||
* - ``nvm-flash``
|
||||
- The contents of the entire flash chip, sometimes referred to as
|
||||
the device's Non Volatile Memory.
|
||||
* - ``shadow-ram``
|
||||
- The contents of the Shadow RAM, which is loaded from the beginning
|
||||
of the flash. Although the contents are primarily from the flash,
|
||||
this area also contains data generated during device boot which is
|
||||
not stored in flash.
|
||||
* - ``device-caps``
|
||||
- The contents of the device firmware's capabilities buffer. Useful to
|
||||
determine the current state and configuration of the device.
|
||||
|
||||
Users can request an immediate capture of a snapshot via the
|
||||
``DEVLINK_CMD_REGION_NEW``
|
||||
Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a
|
||||
snapshot. The ``device-caps`` region requires a snapshot as the contents are
|
||||
sent by firmware and can't be split into separate reads.
|
||||
|
||||
Users can request an immediate capture of a snapshot for all three regions
|
||||
via the ``DEVLINK_CMD_REGION_NEW`` command.
|
||||
|
||||
.. code:: shell
|
||||
|
||||
@ -254,3 +263,118 @@ Users can request an immediate capture of a snapshot via the
|
||||
0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
||||
|
||||
$ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
|
||||
|
||||
Devlink Rate
|
||||
============
|
||||
|
||||
The ``ice`` driver implements devlink-rate API. It allows for offload of
|
||||
the Hierarchical QoS to the hardware. It enables user to group Virtual
|
||||
Functions in a tree structure and assign supported parameters: tx_share,
|
||||
tx_max, tx_priority and tx_weight to each node in a tree. So effectively
|
||||
user gains an ability to control how much bandwidth is allocated for each
|
||||
VF group. This is later enforced by the HW.
|
||||
|
||||
It is assumed that this feature is mutually exclusive with DCB performed
|
||||
in FW and ADQ, or any driver feature that would trigger changes in QoS,
|
||||
for example creation of the new traffic class. The driver will prevent DCB
|
||||
or ADQ configuration if user started making any changes to the nodes using
|
||||
devlink-rate API. To configure those features a driver reload is necessary.
|
||||
Correspondingly if ADQ or DCB will get configured the driver won't export
|
||||
hierarchy at all, or will remove the untouched hierarchy if those
|
||||
features are enabled after the hierarchy is exported, but before any
|
||||
changes are made.
|
||||
|
||||
This feature is also dependent on switchdev being enabled in the system.
|
||||
It's required bacause devlink-rate requires devlink-port objects to be
|
||||
present, and those objects are only created in switchdev mode.
|
||||
|
||||
If the driver is set to the switchdev mode, it will export internal
|
||||
hierarchy the moment VF's are created. Root of the tree is always
|
||||
represented by the node_0. This node can't be deleted by the user. Leaf
|
||||
nodes and nodes with children also can't be deleted.
|
||||
|
||||
.. list-table:: Attributes supported
|
||||
:widths: 15 85
|
||||
|
||||
* - Name
|
||||
- Description
|
||||
* - ``tx_max``
|
||||
- maximum bandwidth to be consumed by the tree Node. Rate Limit is
|
||||
an absolute number specifying a maximum amount of bytes a Node may
|
||||
consume during the course of one second. Rate limit guarantees
|
||||
that a link will not oversaturate the receiver on the remote end
|
||||
and also enforces an SLA between the subscriber and network
|
||||
provider.
|
||||
* - ``tx_share``
|
||||
- minimum bandwidth allocated to a tree node when it is not blocked.
|
||||
It specifies an absolute BW. While tx_max defines the maximum
|
||||
bandwidth the node may consume, the tx_share marks committed BW
|
||||
for the Node.
|
||||
* - ``tx_priority``
|
||||
- allows for usage of strict priority arbiter among siblings. This
|
||||
arbitration scheme attempts to schedule nodes based on their
|
||||
priority as long as the nodes remain within their bandwidth limit.
|
||||
Range 0-7. Nodes with priority 7 have the highest priority and are
|
||||
selected first, while nodes with priority 0 have the lowest
|
||||
priority. Nodes that have the same priority are treated equally.
|
||||
* - ``tx_weight``
|
||||
- allows for usage of Weighted Fair Queuing arbitration scheme among
|
||||
siblings. This arbitration scheme can be used simultaneously with
|
||||
the strict priority. Range 1-200. Only relative values mater for
|
||||
arbitration.
|
||||
|
||||
``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
|
||||
nodes with the same priority form a WFQ subgroup in the sibling group
|
||||
and arbitration among them is based on assigned weights.
|
||||
|
||||
.. code:: shell
|
||||
|
||||
# enable switchdev
|
||||
$ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev
|
||||
|
||||
# at this point driver should export internal hierarchy
|
||||
$ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs
|
||||
|
||||
$ devlink port function rate show
|
||||
pci/0000:4b:00.0/node_25: type node parent node_24
|
||||
pci/0000:4b:00.0/node_24: type node parent node_0
|
||||
pci/0000:4b:00.0/node_32: type node parent node_31
|
||||
pci/0000:4b:00.0/node_31: type node parent node_30
|
||||
pci/0000:4b:00.0/node_30: type node parent node_16
|
||||
pci/0000:4b:00.0/node_19: type node parent node_18
|
||||
pci/0000:4b:00.0/node_18: type node parent node_17
|
||||
pci/0000:4b:00.0/node_17: type node parent node_16
|
||||
pci/0000:4b:00.0/node_14: type node parent node_5
|
||||
pci/0000:4b:00.0/node_5: type node parent node_3
|
||||
pci/0000:4b:00.0/node_13: type node parent node_4
|
||||
pci/0000:4b:00.0/node_12: type node parent node_4
|
||||
pci/0000:4b:00.0/node_11: type node parent node_4
|
||||
pci/0000:4b:00.0/node_10: type node parent node_4
|
||||
pci/0000:4b:00.0/node_9: type node parent node_4
|
||||
pci/0000:4b:00.0/node_8: type node parent node_4
|
||||
pci/0000:4b:00.0/node_7: type node parent node_4
|
||||
pci/0000:4b:00.0/node_6: type node parent node_4
|
||||
pci/0000:4b:00.0/node_4: type node parent node_3
|
||||
pci/0000:4b:00.0/node_3: type node parent node_16
|
||||
pci/0000:4b:00.0/node_16: type node parent node_15
|
||||
pci/0000:4b:00.0/node_15: type node parent node_0
|
||||
pci/0000:4b:00.0/node_2: type node parent node_1
|
||||
pci/0000:4b:00.0/node_1: type node parent node_0
|
||||
pci/0000:4b:00.0/node_0: type node
|
||||
pci/0000:4b:00.0/1: type leaf parent node_25
|
||||
pci/0000:4b:00.0/2: type leaf parent node_25
|
||||
|
||||
# let's create some custom node
|
||||
$ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0
|
||||
|
||||
# second custom node
|
||||
$ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom
|
||||
|
||||
# reassign second VF to newly created branch
|
||||
$ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1
|
||||
|
||||
# assign tx_weight to the VF
|
||||
$ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5
|
||||
|
||||
# assign tx_share to the VF
|
||||
$ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps
|
||||
|
@ -222,6 +222,7 @@ Userspace to kernel:
|
||||
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
|
||||
``ETHTOOL_MSG_PSE_SET`` set PSE parameters
|
||||
``ETHTOOL_MSG_PSE_GET`` get PSE parameters
|
||||
``ETHTOOL_MSG_RSS_GET`` get RSS settings
|
||||
===================================== =================================
|
||||
|
||||
Kernel to userspace:
|
||||
@ -263,6 +264,7 @@ Kernel to userspace:
|
||||
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
|
||||
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
|
||||
``ETHTOOL_MSG_PSE_GET_REPLY`` PSE parameters
|
||||
``ETHTOOL_MSG_RSS_GET_REPLY`` RSS settings
|
||||
======================================== =================================
|
||||
|
||||
``GET`` requests are sent by userspace applications to retrieve device
|
||||
@ -491,6 +493,7 @@ Kernel response contents:
|
||||
``ETHTOOL_A_LINKSTATE_SQI_MAX`` u32 Max support SQI value
|
||||
``ETHTOOL_A_LINKSTATE_EXT_STATE`` u8 link extended state
|
||||
``ETHTOOL_A_LINKSTATE_EXT_SUBSTATE`` u8 link extended substate
|
||||
``ETHTOOL_A_LINKSTATE_EXT_DOWN_CNT`` u32 count of link down events
|
||||
==================================== ====== ============================
|
||||
|
||||
For most NIC drivers, the value of ``ETHTOOL_A_LINKSTATE_LINK`` returns
|
||||
@ -1686,6 +1689,33 @@ to control PoDL PSE Admin functions. This option is implementing
|
||||
``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See
|
||||
``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values.
|
||||
|
||||
RSS_GET
|
||||
=======
|
||||
|
||||
Get indirection table, hash key and hash function info associated with a
|
||||
RSS context of an interface similar to ``ETHTOOL_GRSSH`` ioctl request.
|
||||
|
||||
Request contents:
|
||||
|
||||
===================================== ====== ==========================
|
||||
``ETHTOOL_A_RSS_HEADER`` nested request header
|
||||
``ETHTOOL_A_RSS_CONTEXT`` u32 context number
|
||||
===================================== ====== ==========================
|
||||
|
||||
Kernel response contents:
|
||||
|
||||
===================================== ====== ==========================
|
||||
``ETHTOOL_A_RSS_HEADER`` nested reply header
|
||||
``ETHTOOL_A_RSS_HFUNC`` u32 RSS hash func
|
||||
``ETHTOOL_A_RSS_INDIR`` binary Indir table bytes
|
||||
``ETHTOOL_A_RSS_HKEY`` binary Hash key bytes
|
||||
===================================== ====== ==========================
|
||||
|
||||
ETHTOOL_A_RSS_HFUNC attribute is bitmap indicating the hash function
|
||||
being used. Current supported options are toeplitz, xor or crc32.
|
||||
ETHTOOL_A_RSS_INDIR attribute returns RSS indrection table where each byte
|
||||
indicates queue number.
|
||||
|
||||
Request translation
|
||||
===================
|
||||
|
||||
@ -1767,7 +1797,7 @@ are netlink only.
|
||||
``ETHTOOL_GMODULEEEPROM`` ``ETHTOOL_MSG_MODULE_EEPROM_GET``
|
||||
``ETHTOOL_GEEE`` ``ETHTOOL_MSG_EEE_GET``
|
||||
``ETHTOOL_SEEE`` ``ETHTOOL_MSG_EEE_SET``
|
||||
``ETHTOOL_GRSSH`` n/a
|
||||
``ETHTOOL_GRSSH`` ``ETHTOOL_MSG_RSS_GET``
|
||||
``ETHTOOL_SRSSH`` n/a
|
||||
``ETHTOOL_GTUNABLE`` n/a
|
||||
``ETHTOOL_STUNABLE`` n/a
|
||||
|
@ -104,6 +104,7 @@ Contents:
|
||||
switchdev
|
||||
sysfs-tagging
|
||||
tc-actions-env-rules
|
||||
tc-queue-filters
|
||||
tcp-thin
|
||||
team
|
||||
timestamping
|
||||
|
@ -1069,6 +1069,81 @@ tcp_child_ehash_entries - INTEGER
|
||||
|
||||
Default: 0
|
||||
|
||||
tcp_plb_enabled - BOOLEAN
|
||||
If set and the underlying congestion control (e.g. DCTCP) supports
|
||||
and enables PLB feature, TCP PLB (Protective Load Balancing) is
|
||||
enabled. PLB is described in the following paper:
|
||||
https://doi.org/10.1145/3544216.3544226. Based on PLB parameters,
|
||||
upon sensing sustained congestion, TCP triggers a change in
|
||||
flow label field for outgoing IPv6 packets. A change in flow label
|
||||
field potentially changes the path of outgoing packets for switches
|
||||
that use ECMP/WCMP for routing.
|
||||
|
||||
PLB changes socket txhash which results in a change in IPv6 Flow Label
|
||||
field, and currently no-op for IPv4 headers. It is possible
|
||||
to apply PLB for IPv4 with other network header fields (e.g. TCP
|
||||
or IPv4 options) or using encapsulation where outer header is used
|
||||
by switches to determine next hop. In either case, further host
|
||||
and switch side changes will be needed.
|
||||
|
||||
When set, PLB assumes that congestion signal (e.g. ECN) is made
|
||||
available and used by congestion control module to estimate a
|
||||
congestion measure (e.g. ce_ratio). PLB needs a congestion measure to
|
||||
make repathing decisions.
|
||||
|
||||
Default: FALSE
|
||||
|
||||
tcp_plb_idle_rehash_rounds - INTEGER
|
||||
Number of consecutive congested rounds (RTT) seen after which
|
||||
a rehash can be performed, given there are no packets in flight.
|
||||
This is referred to as M in PLB paper:
|
||||
https://doi.org/10.1145/3544216.3544226.
|
||||
|
||||
Possible Values: 0 - 31
|
||||
|
||||
Default: 3
|
||||
|
||||
tcp_plb_rehash_rounds - INTEGER
|
||||
Number of consecutive congested rounds (RTT) seen after which
|
||||
a forced rehash can be performed. Be careful when setting this
|
||||
parameter, as a small value increases the risk of retransmissions.
|
||||
This is referred to as N in PLB paper:
|
||||
https://doi.org/10.1145/3544216.3544226.
|
||||
|
||||
Possible Values: 0 - 31
|
||||
|
||||
Default: 12
|
||||
|
||||
tcp_plb_suspend_rto_sec - INTEGER
|
||||
Time, in seconds, to suspend PLB in event of an RTO. In order to avoid
|
||||
having PLB repath onto a connectivity "black hole", after an RTO a TCP
|
||||
connection suspends PLB repathing for a random duration between 1x and
|
||||
2x of this parameter. Randomness is added to avoid concurrent rehashing
|
||||
of multiple TCP connections. This should be set corresponding to the
|
||||
amount of time it takes to repair a failed link.
|
||||
|
||||
Possible Values: 0 - 255
|
||||
|
||||
Default: 60
|
||||
|
||||
tcp_plb_cong_thresh - INTEGER
|
||||
Fraction of packets marked with congestion over a round (RTT) to
|
||||
tag that round as congested. This is referred to as K in the PLB paper:
|
||||
https://doi.org/10.1145/3544216.3544226.
|
||||
|
||||
The 0-1 fraction range is mapped to 0-256 range to avoid floating
|
||||
point operations. For example, 128 means that if at least 50% of
|
||||
the packets in a round were marked as congested then the round
|
||||
will be tagged as congested.
|
||||
|
||||
Setting threshold to 0 means that PLB repaths every RTT regardless
|
||||
of congestion. This is not intended behavior for PLB and should be
|
||||
used only for experimentation purpose.
|
||||
|
||||
Possible Values: 0 - 256
|
||||
|
||||
Default: 128
|
||||
|
||||
UDP variables
|
||||
=============
|
||||
|
||||
@ -1102,6 +1177,33 @@ udp_rmem_min - INTEGER
|
||||
udp_wmem_min - INTEGER
|
||||
UDP does not have tx memory accounting and this tunable has no effect.
|
||||
|
||||
udp_hash_entries - INTEGER
|
||||
Show the number of hash buckets for UDP sockets in the current
|
||||
networking namespace.
|
||||
|
||||
A negative value means the networking namespace does not own its
|
||||
hash buckets and shares the initial networking namespace's one.
|
||||
|
||||
udp_child_ehash_entries - INTEGER
|
||||
Control the number of hash buckets for UDP sockets in the child
|
||||
networking namespace, which must be set before clone() or unshare().
|
||||
|
||||
If the value is not 0, the kernel uses a value rounded up to 2^n
|
||||
as the actual hash bucket size. 0 is a special value, meaning
|
||||
the child networking namespace will share the initial networking
|
||||
namespace's hash buckets.
|
||||
|
||||
Note that the child will use the global one in case the kernel
|
||||
fails to allocate enough memory. In addition, the global hash
|
||||
buckets are spread over available NUMA nodes, but the allocation
|
||||
of the child hash table depends on the current process's NUMA
|
||||
policy, which could result in performance differences.
|
||||
|
||||
Possible values: 0, 2^n (n: 7 (128) - 16 (64K))
|
||||
|
||||
Default: 0
|
||||
|
||||
|
||||
RAW variables
|
||||
=============
|
||||
|
||||
@ -3025,6 +3127,15 @@ ecn_enable - BOOLEAN
|
||||
|
||||
Default: 1
|
||||
|
||||
l3mdev_accept - BOOLEAN
|
||||
Enabling this option allows a "global" bound socket to work
|
||||
across L3 master domains (e.g., VRFs) with packets capable of
|
||||
being received regardless of the L3 domain in which they
|
||||
originated. Only valid when the kernel was compiled with
|
||||
CONFIG_NET_L3_MASTER_DEV.
|
||||
|
||||
Default: 1 (enabled)
|
||||
|
||||
|
||||
``/proc/sys/net/core/*``
|
||||
========================
|
||||
|
@ -129,6 +129,26 @@ drop_packet - INTEGER
|
||||
threshold. When the mode 3 is set, the always mode drop rate
|
||||
is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
|
||||
|
||||
est_cpulist - CPULIST
|
||||
Allowed CPUs for estimation kthreads
|
||||
|
||||
Syntax: standard cpulist format
|
||||
empty list - stop kthread tasks and estimation
|
||||
default - the system's housekeeping CPUs for kthreads
|
||||
|
||||
Example:
|
||||
"all": all possible CPUs
|
||||
"0-N": all possible CPUs, N denotes last CPU number
|
||||
"0,1-N:1/2": first and all CPUs with odd number
|
||||
"": empty list
|
||||
|
||||
est_nice - INTEGER
|
||||
default 0
|
||||
Valid range: -20 (more favorable) .. 19 (less favorable)
|
||||
|
||||
Niceness value to use for the estimation kthreads (scheduling
|
||||
priority)
|
||||
|
||||
expire_nodest_conn - BOOLEAN
|
||||
- 0 - disabled (default)
|
||||
- not 0 - enabled
|
||||
@ -304,8 +324,8 @@ run_estimation - BOOLEAN
|
||||
0 - disabled
|
||||
not 0 - enabled (default)
|
||||
|
||||
If disabled, the estimation will be stop, and you can't see
|
||||
any update on speed estimation data.
|
||||
If disabled, the estimation will be suspended and kthread tasks
|
||||
stopped.
|
||||
|
||||
You can always re-enable estimation by setting this value to 1.
|
||||
But be careful, the first estimation after re-enable is not
|
||||
|
37
Documentation/networking/tc-queue-filters.rst
Normal file
37
Documentation/networking/tc-queue-filters.rst
Normal file
@ -0,0 +1,37 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================
|
||||
TC queue based filtering
|
||||
=========================
|
||||
|
||||
TC can be used for directing traffic to either a set of queues or
|
||||
to a single queue on both the transmit and receive side.
|
||||
|
||||
On the transmit side:
|
||||
|
||||
1) TC filter directing traffic to a set of queues is achieved
|
||||
using the action skbedit priority for Tx priority selection,
|
||||
the priority maps to a traffic class (set of queues) when
|
||||
the queue-sets are configured using mqprio.
|
||||
|
||||
2) TC filter directs traffic to a transmit queue with the action
|
||||
skbedit queue_mapping $tx_qid. The action skbedit queue_mapping
|
||||
for transmit queue is executed in software only and cannot be
|
||||
offloaded.
|
||||
|
||||
Likewise, on the receive side, the two filters for selecting set of
|
||||
queues and/or a single queue are supported as below:
|
||||
|
||||
1) TC flower filter directs incoming traffic to a set of queues using
|
||||
the 'hw_tc' option.
|
||||
hw_tc $TCID - Specify a hardware traffic class to pass matching
|
||||
packets on to. TCID is in the range 0 through 15.
|
||||
|
||||
2) TC filter with action skbedit queue_mapping $rx_qid selects a
|
||||
receive queue. The action skbedit queue_mapping for receive queue
|
||||
is supported only in hardware. Multiple filters may compete in
|
||||
the hardware for queue selection. In such case, the hardware
|
||||
pipeline resolves conflicts based on priority. On Intel E810
|
||||
devices, TC filter directing traffic to a queue have higher
|
||||
priority over flow director filter assigning a queue. The hash
|
||||
filter has lowest priority.
|
@ -179,7 +179,8 @@ SOF_TIMESTAMPING_OPT_ID:
|
||||
identifier and returns that along with the timestamp. The identifier
|
||||
is derived from a per-socket u32 counter (that wraps). For datagram
|
||||
sockets, the counter increments with each sent packet. For stream
|
||||
sockets, it increments with every byte.
|
||||
sockets, it increments with every byte. For stream sockets, also set
|
||||
SOF_TIMESTAMPING_OPT_ID_TCP, see the section below.
|
||||
|
||||
The counter starts at zero. It is initialized the first time that
|
||||
the socket option is enabled. It is reset each time the option is
|
||||
@ -192,6 +193,35 @@ SOF_TIMESTAMPING_OPT_ID:
|
||||
among all possibly concurrently outstanding timestamp requests for
|
||||
that socket.
|
||||
|
||||
SOF_TIMESTAMPING_OPT_ID_TCP:
|
||||
Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
|
||||
timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
|
||||
counter increments for stream sockets, but its starting point is
|
||||
not entirely trivial. This option fixes that.
|
||||
|
||||
For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should
|
||||
always be set too. On datagram sockets the option has no effect.
|
||||
|
||||
A reasonable expectation is that the counter is reset to zero with
|
||||
the system call, so that a subsequent write() of N bytes generates
|
||||
a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
|
||||
implements this behavior under all conditions.
|
||||
|
||||
SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
|
||||
especially when the socket option is set when no data is in
|
||||
transmission. If data is being transmitted, it may be off by the
|
||||
length of the output queue (SIOCOUTQ).
|
||||
|
||||
The difference is due to being based on snd_una versus write_seq.
|
||||
snd_una is the offset in the stream acknowledged by the peer. This
|
||||
depends on factors outside of process control, such as network RTT.
|
||||
write_seq is the last byte written by the process. This offset is
|
||||
not affected by external inputs.
|
||||
|
||||
The difference is subtle and unlikely to be noticed when configured
|
||||
at initial socket creation, when no data is queued or sent. But
|
||||
SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of
|
||||
when the socket option is set.
|
||||
|
||||
SOF_TIMESTAMPING_OPT_CMSG:
|
||||
Support recv() cmsg for all timestamped packets. Control messages
|
||||
|
@ -5,6 +5,7 @@ XFRM device - offloading the IPsec computations
|
||||
===============================================
|
||||
|
||||
Shannon Nelson <shannon.nelson@oracle.com>
|
||||
Leon Romanovsky <leonro@nvidia.com>
|
||||
|
||||
|
||||
Overview
|
||||
@ -18,10 +19,21 @@ can radically increase throughput and decrease CPU utilization. The XFRM
|
||||
Device interface allows NIC drivers to offer to the stack access to the
|
||||
hardware offload.
|
||||
|
||||
Right now, there are two types of hardware offload that kernel supports.
|
||||
* IPsec crypto offload:
|
||||
* NIC performs encrypt/decrypt
|
||||
* Kernel does everything else
|
||||
* IPsec packet offload:
|
||||
* NIC performs encrypt/decrypt
|
||||
* NIC does encapsulation
|
||||
* Kernel and NIC have SA and policy in-sync
|
||||
* NIC handles the SA and policies states
|
||||
* The Kernel talks to the keymanager
|
||||
|
||||
Userland access to the offload is typically through a system such as
|
||||
libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
|
||||
be handy when experimenting. An example command might look something
|
||||
like this::
|
||||
like this for crypto offload:
|
||||
|
||||
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
|
||||
reqid 0x07 replay-window 32 \
|
||||
@ -29,6 +41,17 @@ like this::
|
||||
sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
|
||||
offload dev eth4 dir in
|
||||
|
||||
and for packet offload
|
||||
|
||||
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
|
||||
reqid 0x07 replay-window 32 \
|
||||
aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
|
||||
sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
|
||||
offload packet dev eth4 dir in
|
||||
|
||||
ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in
|
||||
tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport
|
||||
|
||||
Yes, that's ugly, but that's what shell scripts and/or libreswan are for.
|
||||
|
||||
|
||||
@ -40,17 +63,24 @@ Callbacks to implement
|
||||
|
||||
/* from include/linux/netdevice.h */
|
||||
struct xfrmdev_ops {
|
||||
/* Crypto and Packet offload callbacks */
|
||||
int (*xdo_dev_state_add) (struct xfrm_state *x);
|
||||
void (*xdo_dev_state_delete) (struct xfrm_state *x);
|
||||
void (*xdo_dev_state_free) (struct xfrm_state *x);
|
||||
bool (*xdo_dev_offload_ok) (struct sk_buff *skb,
|
||||
struct xfrm_state *x);
|
||||
void (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
|
||||
|
||||
/* Solely packet offload callbacks */
|
||||
void (*xdo_dev_state_update_curlft) (struct xfrm_state *x);
|
||||
int (*xdo_dev_policy_add) (struct xfrm_policy *x);
|
||||
void (*xdo_dev_policy_delete) (struct xfrm_policy *x);
|
||||
void (*xdo_dev_policy_free) (struct xfrm_policy *x);
|
||||
};
|
||||
|
||||
The NIC driver offering ipsec offload will need to implement these
|
||||
callbacks to make the offload available to the network stack's
|
||||
XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
|
||||
The NIC driver offering ipsec offload will need to implement callbacks
|
||||
relevant to supported offload to make the offload available to the network
|
||||
stack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
|
||||
NETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
|
||||
|
||||
|
||||
@ -79,7 +109,8 @@ and an indication of whether it is for Rx or Tx. The driver should
|
||||
|
||||
=========== ===================================
|
||||
0 success
|
||||
-EOPNETSUPP offload not supported, try SW IPsec
|
||||
-EOPNETSUPP offload not supported, try SW IPsec,
|
||||
not applicable for packet offload mode
|
||||
other fail the request
|
||||
=========== ===================================
|
||||
|
||||
@ -96,6 +127,7 @@ will serviceable. This can check the packet information to be sure the
|
||||
offload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
|
||||
return true of false to signify its support.
|
||||
|
||||
Crypto offload mode:
|
||||
When ready to send, the driver needs to inspect the Tx packet for the
|
||||
offload information, including the opaque context, and set up the packet
|
||||
send accordingly::
|
||||
@ -139,13 +171,25 @@ the stack in xfrm_input().
|
||||
In ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn().
|
||||
Driver will check packet seq number and update HW ESN state machine if needed.
|
||||
|
||||
Packet offload mode:
|
||||
HW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW
|
||||
reported success. In TX path, the packet lefts kernel without extra header
|
||||
and not encrypted, the HW is responsible to perform it.
|
||||
|
||||
When the SA is removed by the user, the driver's xdo_dev_state_delete()
|
||||
is asked to disable the offload. Later, xdo_dev_state_free() is called
|
||||
from a garbage collection routine after all reference counts to the state
|
||||
and xdo_dev_policy_delete() are asked to disable the offload. Later,
|
||||
xdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage
|
||||
collection routine after all reference counts to the state and policy
|
||||
have been removed and any remaining resources can be cleared for the
|
||||
offload state. How these are used by the driver will depend on specific
|
||||
hardware needs.
|
||||
|
||||
As a netdev is set to DOWN the XFRM stack's netdev listener will call
|
||||
xdo_dev_state_delete() and xdo_dev_state_free() on any remaining offloaded
|
||||
states.
|
||||
xdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and
|
||||
xdo_dev_policy_free() on any remaining offloaded states.
|
||||
|
||||
Outcome of HW handling packets, the XFRM core can't count hard, soft limits.
|
||||
The HW/driver are responsible to perform it and provide accurate data when
|
||||
xdo_dev_state_update_curlft() is called. In case of one of these limits
|
||||
occuried, the driver needs to call to xfrm_state_check_expire() to make sure
|
||||
that XFRM performs rekeying sequence.
|
||||
|
23
MAINTAINERS
23
MAINTAINERS
@ -1932,6 +1932,7 @@ F: Documentation/devicetree/bindings/interrupt-controller/apple,*
|
||||
F: Documentation/devicetree/bindings/iommu/apple,dart.yaml
|
||||
F: Documentation/devicetree/bindings/iommu/apple,sart.yaml
|
||||
F: Documentation/devicetree/bindings/mailbox/apple,mailbox.yaml
|
||||
F: Documentation/devicetree/bindings/net/bluetooth/brcm,bcm4377-bluetooth.yaml
|
||||
F: Documentation/devicetree/bindings/nvme/apple,nvme-ans.yaml
|
||||
F: Documentation/devicetree/bindings/nvmem/apple,efuses.yaml
|
||||
F: Documentation/devicetree/bindings/pci/apple,pcie.yaml
|
||||
@ -1939,6 +1940,7 @@ F: Documentation/devicetree/bindings/pinctrl/apple,pinctrl.yaml
|
||||
F: Documentation/devicetree/bindings/power/apple*
|
||||
F: Documentation/devicetree/bindings/watchdog/apple,wdt.yaml
|
||||
F: arch/arm64/boot/dts/apple/
|
||||
F: drivers/bluetooth/hci_bcm4377.c
|
||||
F: drivers/clk/clk-apple-nco.c
|
||||
F: drivers/cpufreq/apple-soc-cpufreq.c
|
||||
F: drivers/dma/apple-admac.c
|
||||
@ -2470,6 +2472,7 @@ L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
|
||||
S: Supported
|
||||
T: git git://github.com/microchip-ung/linux-upstream.git
|
||||
F: arch/arm64/boot/dts/microchip/
|
||||
F: drivers/net/ethernet/microchip/vcap/
|
||||
F: drivers/pinctrl/pinctrl-microchip-sgpio.c
|
||||
N: sparx5
|
||||
|
||||
@ -6362,6 +6365,7 @@ F: drivers/net/ethernet/freescale/dpaa2/Kconfig
|
||||
F: drivers/net/ethernet/freescale/dpaa2/Makefile
|
||||
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-eth*
|
||||
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-mac*
|
||||
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk*
|
||||
F: drivers/net/ethernet/freescale/dpaa2/dpkg.h
|
||||
F: drivers/net/ethernet/freescale/dpaa2/dpmac*
|
||||
F: drivers/net/ethernet/freescale/dpaa2/dpni*
|
||||
@ -7734,6 +7738,7 @@ ETAS ES58X CAN/USB DRIVER
|
||||
M: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
|
||||
L: linux-can@vger.kernel.org
|
||||
S: Maintained
|
||||
F: Documentation/networking/devlink/etas_es58x.rst
|
||||
F: drivers/net/can/usb/etas_es58x/
|
||||
|
||||
ETHERNET BRIDGE
|
||||
@ -8236,7 +8241,10 @@ S: Maintained
|
||||
F: drivers/i2c/busses/i2c-cpm.c
|
||||
|
||||
FREESCALE IMX / MXC FEC DRIVER
|
||||
M: Joakim Zhang <qiangqing.zhang@nxp.com>
|
||||
M: Wei Fang <wei.fang@nxp.com>
|
||||
R: Shenwei Wang <shenwei.wang@nxp.com>
|
||||
R: Clark Wang <xiaoning.wang@nxp.com>
|
||||
R: NXP Linux Team <linux-imx@nxp.com>
|
||||
L: netdev@vger.kernel.org
|
||||
S: Maintained
|
||||
F: Documentation/devicetree/bindings/net/fsl,fec.yaml
|
||||
@ -9493,8 +9501,9 @@ F: Documentation/devicetree/bindings/iio/humidity/st,hts221.yaml
|
||||
F: drivers/iio/humidity/hts221*
|
||||
|
||||
HUAWEI ETHERNET DRIVER
|
||||
M: Cai Huoqing <cai.huoqing@linux.dev>
|
||||
L: netdev@vger.kernel.org
|
||||
S: Orphan
|
||||
S: Maintained
|
||||
F: Documentation/networking/device_drivers/ethernet/huawei/hinic.rst
|
||||
F: drivers/net/ethernet/huawei/hinic/
|
||||
|
||||
@ -9597,6 +9606,7 @@ F: include/asm-generic/hyperv-tlfs.h
|
||||
F: include/asm-generic/mshyperv.h
|
||||
F: include/clocksource/hyperv_timer.h
|
||||
F: include/linux/hyperv.h
|
||||
F: include/net/mana
|
||||
F: include/uapi/linux/hyperv.h
|
||||
F: net/vmw_vsock/hyperv_transport.c
|
||||
F: tools/hv/
|
||||
@ -12410,7 +12420,7 @@ M: Marcin Wojtas <mw@semihalf.com>
|
||||
M: Russell King <linux@armlinux.org.uk>
|
||||
L: netdev@vger.kernel.org
|
||||
S: Maintained
|
||||
F: Documentation/devicetree/bindings/net/marvell-pp2.txt
|
||||
F: Documentation/devicetree/bindings/net/marvell,pp2.yaml
|
||||
F: drivers/net/ethernet/marvell/mvpp2/
|
||||
|
||||
MARVELL MWIFIEX WIRELESS DRIVER
|
||||
@ -12458,7 +12468,7 @@ F: Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
|
||||
F: drivers/net/ethernet/marvell/octeontx2/af/
|
||||
|
||||
MARVELL PRESTERA ETHERNET SWITCH DRIVER
|
||||
M: Taras Chornyi <tchornyi@marvell.com>
|
||||
M: Taras Chornyi <taras.chornyi@plvision.eu>
|
||||
S: Supported
|
||||
W: https://github.com/Marvell-switching/switchdev-prestera
|
||||
F: drivers/net/ethernet/marvell/prestera/
|
||||
@ -13017,6 +13027,7 @@ M: Felix Fietkau <nbd@nbd.name>
|
||||
M: John Crispin <john@phrozen.org>
|
||||
M: Sean Wang <sean.wang@mediatek.com>
|
||||
M: Mark Lee <Mark-MC.Lee@mediatek.com>
|
||||
M: Lorenzo Bianconi <lorenzo@kernel.org>
|
||||
L: netdev@vger.kernel.org
|
||||
S: Maintained
|
||||
F: drivers/net/ethernet/mediatek/
|
||||
@ -14048,6 +14059,7 @@ F: include/uapi/linux/meye.h
|
||||
|
||||
MOTORCOMM PHY DRIVER
|
||||
M: Peter Geis <pgwipeout@gmail.com>
|
||||
M: Frank <Frank.Sae@motor-comm.com>
|
||||
L: netdev@vger.kernel.org
|
||||
S: Maintained
|
||||
F: drivers/net/phy/motorcomm.c
|
||||
@ -19175,7 +19187,7 @@ M: Jassi Brar <jaswinder.singh@linaro.org>
|
||||
M: Ilias Apalodimas <ilias.apalodimas@linaro.org>
|
||||
L: netdev@vger.kernel.org
|
||||
S: Maintained
|
||||
F: Documentation/devicetree/bindings/net/socionext-netsec.txt
|
||||
F: Documentation/devicetree/bindings/net/socionext,synquacer-netsec.yaml
|
||||
F: drivers/net/ethernet/socionext/netsec.c
|
||||
|
||||
SOCIONEXT (SNI) Synquacer SPI DRIVER
|
||||
@ -20828,7 +20840,6 @@ W: https://wireless.wiki.kernel.org/en/users/Drivers/wl12xx
|
||||
W: https://wireless.wiki.kernel.org/en/users/Drivers/wl1251
|
||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/luca/wl12xx.git
|
||||
F: drivers/net/wireless/ti/
|
||||
F: include/linux/wl12xx.h
|
||||
|
||||
TIMEKEEPING, CLOCKSOURCE CORE, NTP, ALARMTIMER
|
||||
M: John Stultz <jstultz@google.com>
|
||||
|
@ -178,6 +178,8 @@
|
||||
|
||||
/* Network controller */
|
||||
ethernet: ethernet@f0000 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
compatible = "marvell,armada-375-pp2";
|
||||
reg = <0xf0000 0xa000>, /* Packet Processor regs */
|
||||
<0xc0000 0x3060>, /* LMS regs */
|
||||
@ -187,15 +189,17 @@
|
||||
clock-names = "pp_clk", "gop_clk";
|
||||
status = "disabled";
|
||||
|
||||
eth0: eth0 {
|
||||
eth0: ethernet-port@0 {
|
||||
interrupts = <GIC_SPI 37 IRQ_TYPE_LEVEL_HIGH>;
|
||||
port-id = <0>;
|
||||
reg = <0>;
|
||||
port-id = <0>; /* For backward compatibility. */
|
||||
status = "disabled";
|
||||
};
|
||||
|
||||
eth1: eth1 {
|
||||
eth1: ethernet-port@1 {
|
||||
interrupts = <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>;
|
||||
port-id = <1>;
|
||||
reg = <1>;
|
||||
port-id = <1>; /* For backward compatibility. */
|
||||
status = "disabled";
|
||||
};
|
||||
};
|
||||
|
@ -10,7 +10,6 @@
|
||||
#include <linux/init.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/of_platform.h>
|
||||
#include <linux/wl12xx.h>
|
||||
#include <linux/mmc/card.h>
|
||||
#include <linux/mmc/host.h>
|
||||
#include <linux/power/smartreflex.h>
|
||||
|
@ -21,6 +21,10 @@
|
||||
};
|
||||
};
|
||||
|
||||
&bluetooth0 {
|
||||
brcm,board-type = "apple,atlantisb";
|
||||
};
|
||||
|
||||
&wifi0 {
|
||||
brcm,board-type = "apple,atlantisb";
|
||||
};
|
||||
|
@ -17,6 +17,10 @@
|
||||
model = "Apple MacBook Pro (13-inch, M1, 2020)";
|
||||
};
|
||||
|
||||
&bluetooth0 {
|
||||
brcm,board-type = "apple,honshu";
|
||||
};
|
||||
|
||||
&wifi0 {
|
||||
brcm,board-type = "apple,honshu";
|
||||
};
|
||||
|
@ -17,6 +17,10 @@
|
||||
model = "Apple MacBook Air (M1, 2020)";
|
||||
};
|
||||
|
||||
&bluetooth0 {
|
||||
brcm,board-type = "apple,shikoku";
|
||||
};
|
||||
|
||||
&wifi0 {
|
||||
brcm,board-type = "apple,shikoku";
|
||||
};
|
||||
|
@ -21,6 +21,10 @@
|
||||
};
|
||||
};
|
||||
|
||||
&bluetooth0 {
|
||||
brcm,board-type = "apple,capri";
|
||||
};
|
||||
|
||||
&wifi0 {
|
||||
brcm,board-type = "apple,capri";
|
||||
};
|
||||
|
@ -21,6 +21,10 @@
|
||||
};
|
||||
};
|
||||
|
||||
&bluetooth0 {
|
||||
brcm,board-type = "apple,santorini";
|
||||
};
|
||||
|
||||
&wifi0 {
|
||||
brcm,board-type = "apple,santorini";
|
||||
};
|
||||
|
@ -11,6 +11,7 @@
|
||||
|
||||
/ {
|
||||
aliases {
|
||||
bluetooth0 = &bluetooth0;
|
||||
serial0 = &serial0;
|
||||
serial2 = &serial2;
|
||||
wifi0 = &wifi0;
|
||||
@ -77,6 +78,13 @@
|
||||
local-mac-address = [00 00 00 00 00 00];
|
||||
apple,antenna-sku = "XX";
|
||||
};
|
||||
|
||||
bluetooth0: bluetooth@0,1 {
|
||||
compatible = "pci14e4,5f69";
|
||||
reg = <0x10100 0x0 0x0 0x0 0x0>;
|
||||
/* To be filled by the loader */
|
||||
local-bd-address = [00 00 00 00 00 00];
|
||||
};
|
||||
};
|
||||
|
||||
&nco_clkref {
|
||||
|
@ -24,9 +24,12 @@
|
||||
|
||||
/* these aliases provide the FMan ports mapping */
|
||||
enet0: ethernet@e0000 {
|
||||
pcs-handle-names = "qsgmii";
|
||||
};
|
||||
|
||||
enet1: ethernet@e2000 {
|
||||
pcsphy-handle = <&pcsphy1>, <&qsgmiib_pcs1>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
};
|
||||
|
||||
enet2: ethernet@e4000 {
|
||||
@ -36,11 +39,32 @@
|
||||
};
|
||||
|
||||
enet4: ethernet@e8000 {
|
||||
pcsphy-handle = <&pcsphy4>, <&qsgmiib_pcs2>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
};
|
||||
|
||||
enet5: ethernet@ea000 {
|
||||
pcsphy-handle = <&pcsphy5>, <&qsgmiib_pcs3>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
};
|
||||
|
||||
enet6: ethernet@f0000 {
|
||||
};
|
||||
|
||||
mdio@e1000 {
|
||||
qsgmiib_pcs1: ethernet-pcs@1 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <0x1>;
|
||||
};
|
||||
|
||||
qsgmiib_pcs2: ethernet-pcs@2 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <0x2>;
|
||||
};
|
||||
|
||||
qsgmiib_pcs3: ethernet-pcs@3 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <0x3>;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
@ -23,6 +23,8 @@
|
||||
&fman0 {
|
||||
/* these aliases provide the FMan ports mapping */
|
||||
enet0: ethernet@e0000 {
|
||||
pcsphy-handle = <&qsgmiib_pcs3>;
|
||||
pcs-handle-names = "qsgmii";
|
||||
};
|
||||
|
||||
enet1: ethernet@e2000 {
|
||||
@ -35,14 +37,37 @@
|
||||
};
|
||||
|
||||
enet4: ethernet@e8000 {
|
||||
pcsphy-handle = <&pcsphy4>, <&qsgmiib_pcs1>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
};
|
||||
|
||||
enet5: ethernet@ea000 {
|
||||
pcsphy-handle = <&pcsphy5>, <&pcsphy5>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
};
|
||||
|
||||
enet6: ethernet@f0000 {
|
||||
};
|
||||
|
||||
enet7: ethernet@f2000 {
|
||||
pcsphy-handle = <&pcsphy7>, <&qsgmiib_pcs2>, <&pcsphy7>;
|
||||
pcs-handle-names = "sgmii", "qsgmii", "xfi";
|
||||
};
|
||||
|
||||
mdio@eb000 {
|
||||
qsgmiib_pcs1: ethernet-pcs@1 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <0x1>;
|
||||
};
|
||||
|
||||
qsgmiib_pcs2: ethernet-pcs@2 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <0x2>;
|
||||
};
|
||||
|
||||
qsgmiib_pcs3: ethernet-pcs@3 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <0x3>;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
@ -58,6 +58,8 @@
|
||||
ranges = <0x0 0x0 ADDRESSIFY(CP11X_BASE) 0x2000000>;
|
||||
|
||||
CP11X_LABEL(ethernet): ethernet@0 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
compatible = "marvell,armada-7k-pp22";
|
||||
reg = <0x0 0x100000>, <0x129000 0xb000>, <0x220000 0x800>;
|
||||
clocks = <&CP11X_LABEL(clk) 1 3>, <&CP11X_LABEL(clk) 1 9>,
|
||||
@ -69,7 +71,7 @@
|
||||
status = "disabled";
|
||||
dma-coherent;
|
||||
|
||||
CP11X_LABEL(eth0): eth0 {
|
||||
CP11X_LABEL(eth0): ethernet-port@0 {
|
||||
interrupts = <39 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<43 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<47 IRQ_TYPE_LEVEL_HIGH>,
|
||||
@ -83,12 +85,13 @@
|
||||
interrupt-names = "hif0", "hif1", "hif2",
|
||||
"hif3", "hif4", "hif5", "hif6", "hif7",
|
||||
"hif8", "link";
|
||||
port-id = <0>;
|
||||
reg = <0>;
|
||||
port-id = <0>; /* For backward compatibility. */
|
||||
gop-port-id = <0>;
|
||||
status = "disabled";
|
||||
};
|
||||
|
||||
CP11X_LABEL(eth1): eth1 {
|
||||
CP11X_LABEL(eth1): ethernet-port@1 {
|
||||
interrupts = <40 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<44 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<48 IRQ_TYPE_LEVEL_HIGH>,
|
||||
@ -102,12 +105,13 @@
|
||||
interrupt-names = "hif0", "hif1", "hif2",
|
||||
"hif3", "hif4", "hif5", "hif6", "hif7",
|
||||
"hif8", "link";
|
||||
port-id = <1>;
|
||||
reg = <1>;
|
||||
port-id = <1>; /* For backward compatibility. */
|
||||
gop-port-id = <2>;
|
||||
status = "disabled";
|
||||
};
|
||||
|
||||
CP11X_LABEL(eth2): eth2 {
|
||||
CP11X_LABEL(eth2): ethernet-port@2 {
|
||||
interrupts = <41 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<45 IRQ_TYPE_LEVEL_HIGH>,
|
||||
<49 IRQ_TYPE_LEVEL_HIGH>,
|
||||
@ -121,7 +125,8 @@
|
||||
interrupt-names = "hif0", "hif1", "hif2",
|
||||
"hif3", "hif4", "hif5", "hif6", "hif7",
|
||||
"hif8", "link";
|
||||
port-id = <2>;
|
||||
reg = <2>;
|
||||
port-id = <2>; /* For backward compatibility. */
|
||||
gop-port-id = <3>;
|
||||
status = "disabled";
|
||||
};
|
||||
|
@ -77,6 +77,47 @@
|
||||
no-map;
|
||||
reg = <0 0x4fc00000 0 0x00100000>;
|
||||
};
|
||||
|
||||
wo_emi0: wo-emi@4fd00000 {
|
||||
reg = <0 0x4fd00000 0 0x40000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
wo_emi1: wo-emi@4fd40000 {
|
||||
reg = <0 0x4fd40000 0 0x40000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
wo_ilm0: wo-ilm@151e0000 {
|
||||
reg = <0 0x151e0000 0 0x8000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
wo_ilm1: wo-ilm@151f0000 {
|
||||
reg = <0 0x151f0000 0 0x8000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
wo_data: wo-data@4fd80000 {
|
||||
reg = <0 0x4fd80000 0 0x240000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
wo_dlm0: wo-dlm@151e8000 {
|
||||
reg = <0 0x151e8000 0 0x2000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
wo_dlm1: wo-dlm@151f8000 {
|
||||
reg = <0 0x151f8000 0 0x2000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
wo_boot: wo-boot@15194000 {
|
||||
reg = <0 0x15194000 0 0x1000>;
|
||||
no-map;
|
||||
};
|
||||
|
||||
};
|
||||
|
||||
timer {
|
||||
@ -298,6 +339,11 @@
|
||||
reg = <0 0x15010000 0 0x1000>;
|
||||
interrupt-parent = <&gic>;
|
||||
interrupts = <GIC_SPI 205 IRQ_TYPE_LEVEL_HIGH>;
|
||||
memory-region = <&wo_emi0>, <&wo_ilm0>, <&wo_dlm0>,
|
||||
<&wo_data>, <&wo_boot>;
|
||||
memory-region-names = "wo-emi", "wo-ilm", "wo-dlm",
|
||||
"wo-data", "wo-boot";
|
||||
mediatek,wo-ccif = <&wo_ccif0>;
|
||||
};
|
||||
|
||||
wed1: wed@15011000 {
|
||||
@ -306,6 +352,25 @@
|
||||
reg = <0 0x15011000 0 0x1000>;
|
||||
interrupt-parent = <&gic>;
|
||||
interrupts = <GIC_SPI 206 IRQ_TYPE_LEVEL_HIGH>;
|
||||
memory-region = <&wo_emi1>, <&wo_ilm1>, <&wo_dlm1>,
|
||||
<&wo_data>, <&wo_boot>;
|
||||
memory-region-names = "wo-emi", "wo-ilm", "wo-dlm",
|
||||
"wo-data", "wo-boot";
|
||||
mediatek,wo-ccif = <&wo_ccif1>;
|
||||
};
|
||||
|
||||
wo_ccif0: syscon@151a5000 {
|
||||
compatible = "mediatek,mt7986-wo-ccif", "syscon";
|
||||
reg = <0 0x151a5000 0 0x1000>;
|
||||
interrupt-parent = <&gic>;
|
||||
interrupts = <GIC_SPI 211 IRQ_TYPE_LEVEL_HIGH>;
|
||||
};
|
||||
|
||||
wo_ccif1: syscon@151ad000 {
|
||||
compatible = "mediatek,mt7986-wo-ccif", "syscon";
|
||||
reg = <0 0x151ad000 0 0x1000>;
|
||||
interrupt-parent = <&gic>;
|
||||
interrupts = <GIC_SPI 212 IRQ_TYPE_LEVEL_HIGH>;
|
||||
};
|
||||
|
||||
eth: ethernet@15100000 {
|
||||
|
@ -1649,13 +1649,8 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
|
||||
struct bpf_prog *p = l->link.prog;
|
||||
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
|
||||
|
||||
if (p->aux->sleepable) {
|
||||
enter_prog = (u64)__bpf_prog_enter_sleepable;
|
||||
exit_prog = (u64)__bpf_prog_exit_sleepable;
|
||||
} else {
|
||||
enter_prog = (u64)__bpf_prog_enter;
|
||||
exit_prog = (u64)__bpf_prog_exit;
|
||||
}
|
||||
enter_prog = (u64)bpf_trampoline_enter(p);
|
||||
exit_prog = (u64)bpf_trampoline_exit(p);
|
||||
|
||||
if (l->cookie == 0) {
|
||||
/* if cookie is zero, one instruction is enough to store it */
|
||||
|
@ -284,7 +284,6 @@ CONFIG_IXGB=m
|
||||
CONFIG_SKGE=m
|
||||
CONFIG_SKY2=m
|
||||
CONFIG_MYRI10GE=m
|
||||
CONFIG_FEALNX=m
|
||||
CONFIG_NATSEMI=m
|
||||
CONFIG_NS83820=m
|
||||
CONFIG_S2IO=m
|
||||
|
@ -55,7 +55,8 @@ fman@400000 {
|
||||
reg = <0xe0000 0x1000>;
|
||||
fsl,fman-ports = <&fman0_rx_0x08 &fman0_tx_0x28>;
|
||||
ptp-timer = <&ptp_timer0>;
|
||||
pcsphy-handle = <&pcsphy0>;
|
||||
pcsphy-handle = <&pcsphy0>, <&pcsphy0>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
};
|
||||
|
||||
mdio@e1000 {
|
||||
|
@ -52,7 +52,15 @@ fman@400000 {
|
||||
compatible = "fsl,fman-memac";
|
||||
reg = <0xf0000 0x1000>;
|
||||
fsl,fman-ports = <&fman0_rx_0x10 &fman0_tx_0x30>;
|
||||
pcsphy-handle = <&pcsphy6>;
|
||||
pcsphy-handle = <&pcsphy6>, <&qsgmiib_pcs2>, <&pcsphy6>;
|
||||
pcs-handle-names = "sgmii", "qsgmii", "xfi";
|
||||
};
|
||||
|
||||
mdio@e9000 {
|
||||
qsgmiib_pcs2: ethernet-pcs@2 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <2>;
|
||||
};
|
||||
};
|
||||
|
||||
mdio@f1000 {
|
||||
|
@ -55,7 +55,15 @@ fman@400000 {
|
||||
reg = <0xe2000 0x1000>;
|
||||
fsl,fman-ports = <&fman0_rx_0x09 &fman0_tx_0x29>;
|
||||
ptp-timer = <&ptp_timer0>;
|
||||
pcsphy-handle = <&pcsphy1>;
|
||||
pcsphy-handle = <&pcsphy1>, <&qsgmiia_pcs1>;
|
||||
pcs-handle-names = "sgmii", "qsgmii";
|
||||
};
|
||||
|
||||
mdio@e1000 {
|
||||
qsgmiia_pcs1: ethernet-pcs@1 {
|
||||
compatible = "fsl,lynx-pcs";
|
||||
reg = <1>;
|
||||
};
|
||||
};
|
||||
|
||||
mdio@e3000 {
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user