Commit Graph

14416 Commits

Author SHA1 Message Date
Howard Chung
80af16a3e4 Bluetooth: Add toggle to switch off interleave scan
This patch add a configurable parameter to switch off the interleave
scan feature.

Signed-off-by: Howard Chung <howardchung@google.com>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:01:00 +02:00
Howard Chung
c4f1f40816 Bluetooth: Interleave with allowlist scan
This patch implements the interleaving between allowlist scan and
no-filter scan. It'll be used to save power when at least one monitor is
registered and at least one pending connection or one device to be
scanned for.

The durations of the allowlist scan and the no-filter scan are
controlled by MGMT command: Set Default System Configuration. The
default values are set randomly for now.

Signed-off-by: Howard Chung <howardchung@google.com>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Reviewed-by: Manish Mandlik <mmandlik@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 17:00:43 +02:00
Reo Shiseki
353021588c Bluetooth: fix typo in struct name
Signed-off-by: Reo Shiseki <reoshiseki@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-12-07 16:51:22 +02:00
Bongsu Jeon
bcd684aace net/nfc/nci: Support NCI 2.x initial sequence
implement the NCI 2.x initial sequence to support NCI 2.x NFCC.
Since NCI 2.0, CORE_RESET and CORE_INIT sequence have been changed.
If NFCEE supports NCI 2.x, then NCI 2.x initial sequence will work.

In NCI 1.0, Initial sequence and payloads are as below:
(DH)                     (NFCC)
 |  -- CORE_RESET_CMD --> |
 |  <-- CORE_RESET_RSP -- |
 |  -- CORE_INIT_CMD -->  |
 |  <-- CORE_INIT_RSP --  |
 CORE_RESET_RSP payloads are Status, NCI version, Configuration Status.
 CORE_INIT_CMD payloads are empty.
 CORE_INIT_RSP payloads are Status, NFCC Features,
    Number of Supported RF Interfaces, Supported RF Interface,
    Max Logical Connections, Max Routing table Size,
    Max Control Packet Payload Size, Max Size for Large Parameters,
    Manufacturer ID, Manufacturer Specific Information.

In NCI 2.0, Initial Sequence and Parameters are as below:
(DH)                     (NFCC)
 |  -- CORE_RESET_CMD --> |
 |  <-- CORE_RESET_RSP -- |
 |  <-- CORE_RESET_NTF -- |
 |  -- CORE_INIT_CMD -->  |
 |  <-- CORE_INIT_RSP --  |
 CORE_RESET_RSP payloads are Status.
 CORE_RESET_NTF payloads are Reset Trigger,
    Configuration Status, NCI Version, Manufacturer ID,
    Manufacturer Specific Information Length,
    Manufacturer Specific Information.
 CORE_INIT_CMD payloads are Feature1, Feature2.
 CORE_INIT_RSP payloads are Status, NFCC Features,
    Max Logical Connections, Max Routing Table Size,
    Max Control Packet Payload Size,
    Max Data Packet Payload Size of the Static HCI Connection,
    Number of Credits of the Static HCI Connection,
    Max NFC-V RF Frame Size, Number of Supported RF Interfaces,
    Supported RF Interfaces.

Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Link: https://lore.kernel.org/r/20201202223147.3472-1-bongsu.jeon@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04 17:47:35 -08:00
Jakub Kicinski
a1dd1d8697 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-12-03

The main changes are:

1) Support BTF in kernel modules, from Andrii.

2) Introduce preferred busy-polling, from Björn.

3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh.

4) Memcg-based memory accounting for bpf objects, from Roman.

5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits)
  selftests/bpf: Fix invalid use of strncat in test_sockmap
  libbpf: Use memcpy instead of strncpy to please GCC
  selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module
  selftests/bpf: Add tp_btf CO-RE reloc test for modules
  libbpf: Support attachment of BPF tracing programs to kernel modules
  libbpf: Factor out low-level BPF program loading helper
  bpf: Allow to specify kernel module BTFs when attaching BPF programs
  bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier
  selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF
  selftests/bpf: Add support for marking sub-tests as skipped
  selftests/bpf: Add bpf_testmod kernel module for testing
  libbpf: Add kernel module BTF support for CO-RE relocations
  libbpf: Refactor CO-RE relocs to not assume a single BTF object
  libbpf: Add internal helper to load BTF data by FD
  bpf: Keep module's btf_data_size intact after load
  bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
  selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP
  bpf: Adds support for setting window clamp
  samples/bpf: Fix spelling mistake "recieving" -> "receiving"
  bpf: Fix cold build of test_progs-no_alu32
  ...
====================

Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04 07:48:12 -08:00
Prankur gupta
cb81110997 bpf: Adds support for setting window clamp
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_WINDOW_CLAMP,
which sets the maximum receiver window size. It will be useful for
limiting receiver window based on RTT.

Signed-off-by: Prankur gupta <prankgup@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201202213152.435886-2-prankgup@fb.com
2020-12-03 17:23:24 -08:00
Jakub Kicinski
55fd59b003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:
	drivers/net/ethernet/ibm/ibmvnic.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 15:44:09 -08:00
Florian Westphal
7ea851d19b tcp: merge 'init_req' and 'route_req' functions
The Multipath-TCP standard (RFC 8684) says that an MPTCP host should send
a TCP reset if the token in a MP_JOIN request is unknown.

At this time we don't do this, the 3whs completes and the 'new subflow'
is reset afterwards.  There are two ways to allow MPTCP to send the
reset.

1. override 'send_synack' callback and emit the rst from there.
   The drawback is that the request socket gets inserted into the
   listeners queue just to get removed again right away.

2. Send the reset from the 'route_req' function instead.
   This avoids the 'add&remove request socket', but route_req lacks the
   skb that is required to send the TCP reset.

Instead of just adding the skb to that function for MPTCP sake alone,
Paolo suggested to merge init_req and route_req functions.

This saves one indirection from syn processing path and provides the skb
to the merged function at the same time.

'send reset on unknown mptcp join token' is added in next patch.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 12:56:03 -08:00
Toke Høiland-Jørgensen
2867e1eac6 inet_ecn: Fix endianness of checksum update when setting ECT(1)
When adding support for propagating ECT(1) marking in IP headers it seems I
suffered from endianness-confusion in the checksum update calculation: In
fact the ECN field is in the *lower* bits of the first 16-bit word of the
IP header when calculating in network byte order. This means that the
addition performed to update the checksum field was wrong; let's fix that.

Fixes: b723748750 ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
Reported-by: Jonathan Morton <chromatix99@gmail.com>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20201130183705.17540-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01 17:16:54 -08:00
Danielle Ratson
22ec19f3ae bridge: switchdev: Notify about VLAN protocol changes
Drivers that support bridge offload need to be notified about changes to
the bridge's VLAN protocol so that they could react accordingly and
potentially veto the change.

Add a new switchdev attribute to communicate the change to drivers.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01 15:21:13 -08:00
Vlad Buslov
0fca55ed98 net: sched: remove redundant 'rtnl_held' argument
Functions tfilter_notify_chain() and tcf_get_next_proto() are always called
with rtnl lock held in current implementation. Moreover, attempting to call
them without rtnl lock would cause a warning down the call chain in
function __tcf_get_next_proto() that requires the lock to be held by
callers. Remove the 'rtnl_held' argument in order to simplify the code and
make rtnl lock requirement explicit.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20201127151205.23492-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01 11:24:56 -08:00
Daniel Borkmann
ba0581749f net, xdp, xsk: fix __sk_mark_napi_id_once napi_id error
Stephen reported the following build error for !CONFIG_NET_RX_BUSY_POLL
built kernels:

  In file included from fs/select.c:32:
  include/net/busy_poll.h: In function 'sk_mark_napi_id_once':
  include/net/busy_poll.h:150:36: error: 'const struct sk_buff' has no member named 'napi_id'
    150 |  __sk_mark_napi_id_once_xdp(sk, skb->napi_id);
        |                                    ^~

Fix it by wrapping a CONFIG_NET_RX_BUSY_POLL around the helpers.

Fixes: b02e5a0ebb ("xsk: Propagate napi_id to XDP socket Rx path")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/linux-next/20201201190746.7d3357fb@canb.auug.org.au
2020-12-01 15:51:19 +01:00
Jan Engelhardt
04295878be netfilter: use actual socket sk for REJECT action
True to the message of commit v5.10-rc1-105-g46d6c5ae953c, _do_
actually make use of state->sk when possible, such as in the REJECT
modules.

Reported-by: Minqiang Chen <ptpt52@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-01 14:33:55 +01:00
Wang Shanker
f7583f02a5 netfilter: nfnl_acct: remove data from struct net
This patch removes nfnl_acct_list from struct net to reduce the default
memory footprint for the netns structure.

Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-01 09:45:29 +01:00
Paolo Abeni
ad80b0fc6e mptcp: open code mptcp variant for lock_sock
This allows invoking an additional callback under the
socket spin lock.

Will be used by the next patches to avoid additional
spin lock contention.

Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-30 17:55:23 -08:00
Björn Töpel
b02e5a0ebb xsk: Propagate napi_id to XDP socket Rx path
Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
2020-12-01 00:09:25 +01:00
Björn Töpel
7c951cafc0 net: Add SO_BUSY_POLL_BUDGET socket option
This option lets a user set a per socket NAPI budget for
busy-polling. If the options is not set, it will use the default of 8.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20201130185205.196029-3-bjorn.topel@gmail.com
2020-12-01 00:09:25 +01:00
Björn Töpel
7fd3253a7d net: Introduce preferred busy-polling
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
option or system-wide using the /proc/sys/net/core/busy_read knob, is
an opportunistic. That means that if the NAPI context is not
scheduled, it will poll it. If, after busy-polling, the budget is
exceeded the busy-polling logic will schedule the NAPI onto the
regular softirq handling.

One implication of the behavior above is that a busy/heavy loaded NAPI
context will never enter/allow for busy-polling. Some applications
prefer that most NAPI processing would be done by busy-polling.

This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
in concert with the napi_defer_hard_irqs and gro_flush_timeout
knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
introduced in commit 6f8b12d661 ("net: napi: add hard irqs deferral
feature"), and allows for a user to defer interrupts to be enabled and
instead schedule the NAPI context from a watchdog timer. When a user
enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
and the NAPI context is being processed by a softirq, the softirq NAPI
processing will exit early to allow the busy-polling to be performed.

If the application stops performing busy-polling via a system call,
the watchdog timer defined by gro_flush_timeout will timeout, and
regular softirq handling will resume.

In summary; Heavy traffic applications that prefer busy-polling over
softirq processing should use this option.

Example usage:

  $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
  $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout

Note that the timeout should be larger than the userspace processing
window, otherwise the watchdog will timeout and fall back to regular
softirq processing.

Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com
2020-12-01 00:09:25 +01:00
Jakub Kicinski
bd2d5c54dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix insufficient validation of IPSET_ATTR_IPADDR_IPV6 reported
   by syzbot.

2) Remove spurious reports on nf_tables when lockdep gets disabled,
   from Florian Westphal.

3) Fix memleak in the error path of error path of
   ip_vs_control_net_init(), from Wang Hai.

4) Fix missing control data in flow dissector, otherwise IP address
   matching in hardware offload infra does not work.

5) Fix hardware offload match on prefix IP address when userspace
   does not send a bitwise expression to represent the prefix.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  netfilter: nftables_offload: build mask based from the matching bytes
  netfilter: nftables_offload: set address type in control dissector
  ipvs: fix possible memory leak in ip_vs_control_net_init
  netfilter: nf_tables: avoid false-postive lockdep splat
  netfilter: ipset: prevent uninit-value in hash_ip6_add
====================

Link: https://lore.kernel.org/r/20201127190313.24947-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:20:23 -08:00
Jakub Kicinski
3771b82242 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-11-28

1) Do not reference the skb for xsk's generic TX side since when looped
   back into RX it might crash in generic XDP, from Björn Töpel.

2) Fix umem cleanup on a partially set up xsk socket when being destroyed,
   from Magnus Karlsson.

3) Fix an incorrect netdev reference count when failing xsk_bind() operation,
   from Marek Majtyka.

4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(),
   from Zhen Lei.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add MAINTAINERS entry for BPF LSM
  bpftool: Fix error return value in build_btf_type_table
  net, xsk: Avoid taking multiple skbuff references
  xsk: Fix incorrect netdev reference count
  xsk: Fix umem cleanup bug at socket destruct
  MAINTAINERS: Update XDP and AF_XDP entries
====================

Link: https://lore.kernel.org/r/20201128005104.1205-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 12:04:57 -08:00
Jakub Kicinski
5c39f26e67 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.

Conflicts:
	drivers/net/can/usb/gs_usb.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 18:25:27 -08:00
wenxu
c129412f74 net/sched: sch_frag: add generic packet fragment support.
Currently kernel tc subsystem can do conntrack in cat_ct. But when several
fragment packets go through the act_ct, function tcf_ct_handle_fragments
will defrag the packets to a big one. But the last action will redirect
mirred to a device which maybe lead the reassembly big packet over the mtu
of target device.

This patch add support for a xmit hook to mirred, that gets executed before
xmiting the packet. Then, when act_ct gets loaded, it configs that hook.
The frag xmit hook maybe reused by other modules.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 14:36:02 -08:00
wenxu
fa6d639930 net/sched: act_mirred: refactor the handle of xmit
This one is prepare for the next patch.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 14:36:02 -08:00
Vadim Fedorenko
a6acbe6235 net/tls: add CHACHA20-POLY1305 specific behavior
RFC 7905 defines special behavior for ChaCha-Poly TLS sessions.
The differences are in the calculation of nonce and the absence
of explicit IV. This behavior is like TLSv1.3 partly.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 14:32:37 -08:00
Vadim Fedorenko
923c40c465 net/tls: add CHACHA20-POLY1305 specific defines and structures
To provide support for ChaCha-Poly cipher we need to define
specific constants and structures.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 14:32:37 -08:00
Vadim Fedorenko
6942a284fb net/tls: make inline helpers protocol-aware
Inline functions defined in tls.h have a lot of AES-specific
constants. Remove these constants and change argument to struct
tls_prot_info to have an access to cipher type in later patches

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 14:32:37 -08:00
Pablo Neira Ayuso
a5d45bc0dc netfilter: nftables_offload: build mask based from the matching bytes
Userspace might match on prefix bytes of header fields if they are on
the byte boundary, this requires that the mask is adjusted accordingly.
Use NFT_OFFLOAD_MATCH_EXACT() for meta since prefix byte matching is not
allowed for this type of selector.

The bitwise expression might be optimized out by userspace, hence the
kernel needs to infer the prefix from the number of payload bytes to
match on. This patch adds nft_payload_offload_mask() to calculate the
bitmask to match on the prefix.

Fixes: c9626a2cbd ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27 12:10:47 +01:00
Pablo Neira Ayuso
3c78e9e0d3 netfilter: nftables_offload: set address type in control dissector
This patch adds nft_flow_rule_set_addr_type() to set the address type
from the nft_payload expression accordingly.

If the address type is not set in the control dissector then a rule that
matches either on source or destination IP address does not work.

After this patch, nft hardware offload generates the flow dissector
configuration as tc-flower does to match on an IP address.

This patch has been also tested functionally to make sure packets are
filtered out by the NIC.

This is also getting the code aligned with the existing netfilter flow
offload infrastructure which is also setting the control dissector.

Fixes: c9626a2cbd ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27 12:10:46 +01:00
Maxim Mikityanskiy
025cc2fb6a net/tls: Protect from calling tls_dev_del for TLS RX twice
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after
calling tls_dev_del if TLX TX offload is also enabled. Clearing
tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a
time frame when tls_device_down may get called and call tls_dev_del for
RX one extra time, confusing the driver, which may lead to a crash.

This patch corrects this racy behavior by adding a flag to prevent
tls_device_down from calling tls_dev_del the second time.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-25 17:31:06 -08:00
Ido Schimmel
f0a5013e29 devlink: Add blackhole_nexthop trap
Add a packet trap to report packets that were dropped due to a
blackhole nexthop.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-24 12:14:56 -08:00
Jakub Kicinski
cc69837fca net: don't include ethtool.h from netdevice.h
linux/netdevice.h is included in very many places, touching any
of its dependecies causes large incremental builds.

Drop the linux/ethtool.h include, linux/netdevice.h just needs
a forward declaration of struct ethtool_ops.

Fix all the places which made use of this implicit include.

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-23 17:27:04 -08:00
Ricardo Dias
01770a1661 tcp: fix race condition when creating child sockets from syncookies
When the TCP stack is in SYN flood mode, the server child socket is
created from the SYN cookie received in a TCP packet with the ACK flag
set.

The child socket is created when the server receives the first TCP
packet with a valid SYN cookie from the client. Usually, this packet
corresponds to the final step of the TCP 3-way handshake, the ACK
packet. But is also possible to receive a valid SYN cookie from the
first TCP data packet sent by the client, and thus create a child socket
from that SYN cookie.

Since a client socket is ready to send data as soon as it receives the
SYN+ACK packet from the server, the client can send the ACK packet (sent
by the TCP stack code), and the first data packet (sent by the userspace
program) almost at the same time, and thus the server will equally
receive the two TCP packets with valid SYN cookies almost at the same
instant.

When such event happens, the TCP stack code has a race condition that
occurs between the momement a lookup is done to the established
connections hashtable to check for the existence of a connection for the
same client, and the moment that the child socket is added to the
established connections hashtable. As a consequence, this race condition
can lead to a situation where we add two child sockets to the
established connections hashtable and deliver two sockets to the
userspace program to the same client.

This patch fixes the race condition by checking if an existing child
socket exists for the same client when we are adding the second child
socket to the established connections socket. If an existing child
socket exists, we drop the packet and discard the second child socket
to the same client.

Signed-off-by: Ricardo Dias <rdias@singlestore.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-23 16:32:33 -08:00
Paul Moore
3df98d7921 lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
As pointed out by Herbert in a recent related patch, the LSM hooks do
not have the necessary address family information to use the flowi
struct safely.  As none of the LSMs currently use any of the protocol
specific flowi information, replace the flowi pointers with pointers
to the address family independent flowi_common struct.

Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-11-23 18:36:21 -05:00
Jakub Kicinski
fc0d3b24bd compat: always include linux/compat.h from net/compat.h
We're about to do reshuffling in networking headers and
eliminate some implicit includes. This results in:

In file included from ../net/ipv4/netfilter/arp_tables.c:26:
include/net/compat.h:60:40: error: unknown type name ‘compat_uptr_t’; did you mean ‘compat_ptr_ioctl’?
    struct sockaddr __user **save_addr, compat_uptr_t *ptr,
                                        ^~~~~~~~~~~~~
                                        compat_ptr_ioctl
include/net/compat.h:61:4: error: unknown type name ‘compat_size_t’; did you mean ‘compat_sigset_t’?
    compat_size_t *len);
    ^~~~~~~~~~~~~
    compat_sigset_t

Currently net/compat.h depends on linux/compat.h being included
first. After the upcoming changes this would break the 32bit build.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20201121214844.1488283-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-23 13:31:54 -08:00
Jamie Iles
b9ad3e9f5a bonding: wait for sysfs kobject destruction before freeing struct slave
syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a
struct slave device could result in the following splat:

  kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000)
  bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
  ------------[ cut here ]------------
  ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline]
  ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600
  WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S                5.9.0-rc8+ #96
  Hardware name: linux,dummy-virt (DT)
  Workqueue: netns cleanup_net
  Call trace:
   dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239
   show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x174/0x1f8 lib/dump_stack.c:118
   panic+0x360/0x7a0 kernel/panic.c:231
   __warn+0x244/0x2ec kernel/panic.c:600
   report_bug+0x240/0x398 lib/bug.c:198
   bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974
   call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322
   brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329
   do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864
   el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65
   el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93
   el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594
   debug_print_object+0x180/0x240 lib/debugobjects.c:485
   __debug_check_no_obj_freed lib/debugobjects.c:967 [inline]
   debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998
   slab_free_hook mm/slub.c:1536 [inline]
   slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577
   slab_free mm/slub.c:3138 [inline]
   kfree+0x13c/0x460 mm/slub.c:4119
   bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492
   __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190
   bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline]
   bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420
   notifier_call_chain+0xf0/0x200 kernel/notifier.c:83
   __raw_notifier_call_chain kernel/notifier.c:361 [inline]
   raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368
   call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033
   call_netdevice_notifiers_extack net/core/dev.c:2045 [inline]
   call_netdevice_notifiers net/core/dev.c:2059 [inline]
   rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347
   unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509
   unregister_netdevice_many net/core/dev.c:10508 [inline]
   default_device_exit_batch+0x294/0x338 net/core/dev.c:10992
   ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189
   cleanup_net+0x44c/0x888 net/core/net_namespace.c:603
   process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269
   worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415
   kthread+0x390/0x498 kernel/kthread.c:292
   ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925

This is a potential use-after-free if the sysfs nodes are being accessed
whilst removing the struct slave, so wait for the object destruction to
complete before freeing the struct slave itself.

Fixes: 07699f9a7c ("bonding: add sysfs /slave dir for bond slave devices.")
Fixes: a068aab422 ("bonding: Fix reference count leak in bond_sysfs_slave_add.")
Cc: Qiushi Wu <wu000273@umn.edu>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21 13:07:00 -08:00
Florian Westphal
fa3fe2b150 mptcp: track window announced to peer
OoO handling attempts to detect when packet is out-of-window by testing
current ack sequence and remaining space vs. sequence number.

This doesn't work reliably. Store the highest allowed sequence number
that we've announced and use it to detect oow packets.

Do this when mptcp options get written to the packet (wire format).
For this to work we need to move the write_options call until after
stack selected a new tcp window.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:33:25 -08:00
Randy Dunlap
fc9840fbef net: stream: fix TCP references when INET is not enabled
Fix build of net/core/stream.o when CONFIG_INET is not enabled.
Fixes these build errors (sample):

ld: net/core/stream.o: in function `sk_stream_write_space':
(.text+0x27e): undefined reference to `tcp_stream_memory_free'
ld: (.text+0x29c): undefined reference to `tcp_stream_memory_free'
ld: (.text+0x2ab): undefined reference to `tcp_stream_memory_free'
ld: net/core/stream.o: in function `sk_stream_wait_memory':
(.text+0x5a1): undefined reference to `tcp_stream_memory_free'
ld: (.text+0x5bf): undefined reference to `tcp_stream_memory_free'

Fixes: 1c5f2ced13 ("tcp: avoid indirect call to tcp_stream_memory_free()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201118194438.674-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:13:55 -08:00
Paolo Abeni
12f4bd8622 net: add annotation for sock_{lock,unlock}_fast
The static checker is fooled by the non-static locking scheme
implemented by the mentioned helpers.
Let's make its life easier adding some unconditional annotation
so that the helpers are now interpreted as a plain spinlock from
sparse.

v1 -> v2:
 - add __releases() annotation to unlock_sock_fast()

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/6ed7ae627d8271fb7f20e0a9c6750fbba1ac2635.1605634911.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 10:00:54 -08:00
Numan Siddique
e2ef5203c8 net: openvswitch: Be liberal in tcp conntrack.
There is no easy way to distinguish if a conntracked tcp packet is
marked invalid because of tcp_in_window() check error or because
it doesn't belong to an existing connection. With this patch,
openvswitch sets liberal tcp flag for the established sessions so
that out of window packets are not marked invalid.

A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
sets this flag for both the directions of the nf_conn.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116130126.3065077-1-nusiddiq@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 09:53:48 -08:00
Magnus Karlsson
537cf4e3cc xsk: Fix umem cleanup bug at socket destruct
Fix a bug that is triggered when a partially setup socket is
destroyed. For a fully setup socket, a socket that has been bound to a
device, the cleanup of the umem is performed at the end of the buffer
pool's cleanup work queue item. This has to be performed in a work
queue, and not in RCU cleanup, as it is doing a vunmap that cannot
execute in interrupt context. However, when a socket has only been
partially set up so that a umem has been created but the buffer pool
has not, the code erroneously directly calls the umem cleanup function
instead of using a work queue, and this leads to a BUG_ON() in
vunmap().

As there in this case is no buffer pool, we cannot use its work queue,
so we need to introduce a work queue for the umem and schedule this for
the cleanup. So in the case there is no pool, we are going to use the
umem's own work queue to schedule the cleanup. But if there is a
pool, the cleanup of the umem is still being performed by the pool's
work queue, as it is important that the umem is cleaned up after the
pool.

Fixes: e5e1a4bc91 ("xsk: Fix possible memory leak at socket close")
Reported-by: Marek Majtyka <marekx.majtyka@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Marek Majtyka <marekx.majtyka@intel.com>
Link: https://lore.kernel.org/bpf/1605873219-21629-1-git-send-email-magnus.karlsson@gmail.com
2020-11-20 15:52:39 +01:00
Jacob Keller
52cc5f3a16 devlink: move flash end and begin to core devlink
When performing a flash update via devlink, device drivers may inform
user space of status updates via
devlink_flash_update_(begin|end|timeout|status)_notify functions.

It is expected that drivers do not send any status notifications unless
they send a begin and end message. If a driver sends a status
notification without sending the appropriate end notification upon
finishing (regardless of success or failure), the current implementation
of the devlink userspace program can get stuck endlessly waiting for the
end notification that will never come.

The current ice driver implementation may send such a status message
without the appropriate end notification in rare cases.

Fixing the ice driver is relatively simple: we just need to send the
begin_notify at the start of the function and always send an end_notify
no matter how the function exits.

Rather than assuming driver authors will always get this right in the
future, lets just fix the API so that it is not possible to get wrong.
Make devlink_flash_update_begin_notify and
devlink_flash_update_end_notify static, and call them in devlink.c core
code. Always send the begin_notify just before calling the driver's
flash_update routine. Always send the end_notify just after the routine
returns regardless of success or failure.

Doing this makes the status notification easier to use from the driver,
as it no longer needs to worry about catching failures and cleaning up
by calling devlink_flash_update_end_notify. It is now no longer possible
to do the wrong thing in this regard. We also save a couple of lines of
code in each driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 21:41:02 -08:00
Jacob Keller
b44cfd4f5b devlink: move request_firmware out of driver
All drivers which implement the devlink flash update support, with the
exception of netdevsim, use either request_firmware or
request_firmware_direct to locate the firmware file. Rather than having
each driver do this separately as part of its .flash_update
implementation, perform the request_firmware within net/core/devlink.c

Replace the file_name parameter in the struct devlink_flash_update_params
with a pointer to the fw object.

Use request_firmware rather than request_firmware_direct. Although most
Linux distributions today do not have the fallback mechanism
implemented, only about half the drivers used the _direct request, as
compared to the generic request_firmware. In the event that
a distribution does support the fallback mechanism, the devlink flash
update ought to be able to use it to provide the firmware contents. For
distributions which do not support the fallback userspace mechanism,
there should be essentially no difference between request_firmware and
request_firmware_direct.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 21:40:57 -08:00
Jakub Kicinski
56495a2442 Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 19:08:46 -08:00
Georg Kohmann
2d8f6481c1 ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module
IPV6=m
NF_DEFRAG_IPV6=y

ld: net/ipv6/netfilter/nf_conntrack_reasm.o: in function
`nf_ct_frag6_gather':
net/ipv6/netfilter/nf_conntrack_reasm.c:462: undefined reference to
`ipv6_frag_thdr_truncated'

Netfilter is depending on ipv6 symbol ipv6_frag_thdr_truncated. This
dependency is forcing IPV6=y.

Remove this dependency by moving ipv6_frag_thdr_truncated out of ipv6. This
is the same solution as used with a similar issues: Referring to
commit 70b095c843 ("ipv6: remove dependency of nf_defrag_ipv6 on ipv6
module")

Fixes: 9d9e937b1c ("ipv6/netfilter: Discard first fragment not including all headers")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20201119095833.8409-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 10:49:50 -08:00
Dominique Martinet
ff5e72ebef 9p: apply review requests for fid refcounting
Fix style issues in parent commit ("apply review requests for fid
refcounting"), no functional change.

Link: http://lkml.kernel.org/r/1605802012-31133-2-git-send-email-asmadeus@codewreck.org
Fixes: 6636b6dcc3 ("9p: add refcount to p9_fid struct")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2020-11-19 17:21:34 +01:00
Jianyong Wu
6636b6dcc3 9p: add refcount to p9_fid struct
Fix race issue in fid contention.

Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
bug in 9p. But there is race issue in fid parallel accesses.
As Greg's patch stores all of fids from opened files into according inode,
so all the lookup fid ops can retrieve fid from inode preferentially. But
there is no mechanism to handle the fid contention issue. For example,
there are two threads get the same fid in the same time and one of them
clunk the fid before the other thread ready to discard the fid. In this
scenario, it will lead to some fatal problems, even kernel core dump.

I introduce a mechanism to fix this race issue. A counter field introduced
into p9_fid struct to store the reference counter to the fid. When a fid
is allocated from the inode or dentry, the counter will increase, and
will decrease at the end of its occupation. It is guaranteed that the
fid won't be clunked before the reference counter go down to 0, then
we can avoid the clunked fid to be used.

tests:
race issue test from the old test case:
for file in {01..50}; do touch f.${file}; done
seq 1 1000 | xargs -n 1 -P 50 -I{} cat f.* > /dev/null

open-unlink-f*syscall test:
I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.

Link: http://lkml.kernel.org/r/20200923141146.90046-5-jianyong.wu@arm.com
Fixes: 478ba09edc ("fs/9p: search open fids first")
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2020-11-19 17:20:39 +01:00
Tariq Toukan
138559b9f9 net/tls: Fix wrong record sn in async mode of device resync
In async_resync mode, we log the TCP seq of records until the async request
is completed.  Later, in case one of the logged seqs matches the resync
request, we return it, together with its record serial number.  Before this
fix, we mistakenly returned the serial number of the current record
instead.

Fixes: ed9b7646b0 ("net/tls: Add asynchronous resync")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 14:41:20 -08:00
Magnus Karlsson
9349eb3a9d xsk: Introduce batched Tx descriptor interfaces
Introduce batched descriptor interfaces in the xsk core code for the
Tx path to be used in the driver to write a code path with higher
performance. This interface will be used by the i40e driver in the
next patch. Though other drivers would likely benefit from this new
interface too.

Note that batching is only implemented for the common case when
there is only one socket bound to the same device and queue id. When
this is not the case, we fall back to the old non-batched version of
the function.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-5-git-send-email-magnus.karlsson@gmail.com
2020-11-17 22:07:40 +01:00
Paolo Abeni
77c3c95637 tcp: factor out __tcp_close() helper
unlocked version of protocol level close, will be used by
MPTCP to allow decouple orphaning and subflow level close.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Paolo Abeni
b796d04bd0 tcp: factor out tcp_build_frag()
Will be needed by the next patch, as MPTCP needs to handle
directly the error/memory-allocation-needed path.

No functional changes intended.

Additionally let MPTCP code access the tcp_remove_empty_skb()
helper.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:46:06 -08:00
Georg Kohmann
9d9e937b1c ipv6/netfilter: Discard first fragment not including all headers
Packets are processed even though the first fragment don't include all
headers through the upper layer header. This breaks TAHI IPv6 Core
Conformance Test v6LC.1.3.6.

Referring to RFC8200 SECTION 4.5: "If the first fragment does not include
all headers through an Upper-Layer header, then that fragment should be
discarded and an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero."

The fragment needs to be validated the same way it is done in
commit 2efdaaaf88 ("IPv6: reply ICMP error if the first fragment don't
include all headers") for ipv6. Wrap the validation into a common function,
ipv6_frag_thdr_truncated() to check for truncation in the upper layer
header. This validation does not fullfill all aspects of RFC 8200,
section 4.5, but is at the moment sufficient to pass mentioned TAHI test.

In netfilter, utilize the fragment offset returned by find_prev_fhdr() to
let ipv6_frag_thdr_truncated() start it's traverse from the fragment
header.

Return 0 to drop the fragment in the netfilter. This is the same behaviour
as used on other protocol errors in this function, e.g. when
nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up
by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an
appropriate ICMP Parameter Problem message back to the source.

References commit 2efdaaaf88 ("IPv6: reply ICMP error if the first
fragment don't include all headers")

Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 10:15:11 -08:00
Francis Laniel
872f690341 treewide: rename nla_strlcpy to nla_strscpy.
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:08:54 -08:00
Francis Laniel
9ca718743a Modify return value of nla_strlcpy to match that of strscpy.
nla_strlcpy now returns -E2BIG if src was truncated when written to dst.
It also returns this error value if dstsize is 0 or higher than INT_MAX.

For example, if src is "foo\0" and dst is 3 bytes long, the result will be:
1. "foG" after memcpy (G means garbage).
2. "fo\0" after memset.
3. -E2BIG is returned because src was not completely written into dst.

The callers of nla_strlcpy were modified to take into account this modification.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16 08:08:54 -08:00
Eric Dumazet
1c5f2ced13 tcp: avoid indirect call to tcp_stream_memory_free()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 16:09:56 -08:00
Eric Dumazet
d3cd4924e3 tcp: uninline tcp_stream_memory_free()
Both IPv4 and IPv6 needs it via a function pointer.
Following patch will avoid the indirect call.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 16:09:56 -08:00
Jakub Kicinski
07cbce2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-11-14

1) Add BTF generation for kernel modules and extend BTF infra in kernel
   e.g. support for split BTF loading and validation, from Andrii Nakryiko.

2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
   on inlined branch conditions, from Alexei Starovoitov.

3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.

4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
   infra, from Martin KaFai Lau.

5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
   XDP_REDIRECT path, from Lorenzo Bianconi.

6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
   context, from Song Liu.

7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
   for different target archs on same source tree, from Jean-Philippe Brucker.

8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.

9) Move functionality from test_tcpbpf_user into the test_progs framework so it
   can run in BPF CI, from Alexander Duyck.

10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.

Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.

  [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
      https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
  net: mlx5: Add xdp tx return bulking support
  net: mvpp2: Add xdp tx return bulking support
  net: mvneta: Add xdp tx return bulking support
  net: page_pool: Add bulk support for ptr_ring
  net: xdp: Introduce bulking for xdp tx return path
  bpf: Expose bpf_d_path helper to sleepable LSM hooks
  bpf: Augment the set of sleepable LSM hooks
  bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Rename some functions in bpf_sk_storage
  bpf: Folding omem_charge() into sk_storage_charge()
  selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
  selftests/bpf: Add skb_pkt_end test
  bpf: Support for pointers beyond pkt_end.
  tools/bpf: Always run the *-clean recipes
  tools/bpf: Add bootstrap/ to .gitignore
  bpf: Fix NULL dereference in bpf_task_storage
  tools/bpftool: Fix build slowdown
  tools/runqslower: Build bpftool using HOSTCC
  tools/runqslower: Enable out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 09:13:41 -08:00
Lorenzo Bianconi
7886244736 net: page_pool: Add bulk support for ptr_ring
Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi
8965398713 net: xdp: Introduce bulking for xdp tx return path
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
2020-11-14 02:28:59 +01:00
Yi-Hung Wei
9c2e14b481 ip_tunnels: Set tunnel option flag when tunnel metadata is present
Currently, we may set the tunnel option flag when the size of metadata
is zero.  For example, we set TUNNEL_GENEVE_OPT in the receive function
no matter the geneve option is present or not.  As this may result in
issues on the tunnel flags consumers, this patch fixes the issue.

Related discussion:
* https://lore.kernel.org/netdev/1604448694-19351-1-git-send-email-yihung.wei@gmail.com/T/#u

Fixes: 256c87c17c ("net: check tunnel option type in tunnel flags")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Link: https://lore.kernel.org/r/1605053800-74072-1-git-send-email-yihung.wei@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 16:58:10 -08:00
Jeff Dike
8cf8821e15 net: Exempt multicast addresses from five-second neighbor lifetime
Commit 58956317c8 ("neighbor: Improve garbage collection")
guarantees neighbour table entries a five-second lifetime.  Processes
which make heavy use of multicast can fill the neighour table with
multicast addresses in five seconds.  At that point, neighbour entries
can't be GC-ed because they aren't five seconds old yet, the kernel
log starts to fill up with "neighbor table overflow!" messages, and
sends start to fail.

This patch allows multicast addresses to be thrown out before they've
lived out their five seconds.  This makes room for non-multicast
addresses and makes messages to all addresses more reliable in these
circumstances.

Fixes: 58956317c8 ("neighbor: Improve garbage collection")
Signed-off-by: Jeff Dike <jdike@akamai.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 14:24:39 -08:00
Jakub Kicinski
f8fd36b95e Some updates:
* injection/radiotap updates for new test capabilities
  * remove WDS support - even years ago when we turned
    it off by default it was already basically unusable
  * support for HE (802.11ax) rates for beacons
  * support for some vendor-specific HE rates
  * many other small features/cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl+uSqIACgkQB8qZga/f
 l8QBOw/6AwlcQWMjqdb6H/QRORA81E4tX2+alHbeBai7KSI+9E1Jtakmn5qKQ4iH
 IjpNWPsclj4zKhgbKaariIn/bZEk8OhzmDpssnHTMpuo3iuCmuzFaDdZd9Uun2Ad
 tr3bqfHaom1MhWRF/FuBSHcnk599qRnsk+RY7/6dhjiPlWOWJvsfpuo1KblVoFWU
 wYDX+W2oYDAx44O/6AGJ0Zctwf6m7Kyzb2aMIqv2fwacBoDvyVdTIT/4NroV9INI
 QvIY4Gi8hoCDQX39zwaxSWOq7uFLYHwUozzZxktS5c4N3eSVFs80jmdiQiMKmKRQ
 A+R+ZcuFBcC+6+Wt4x+20T2mF6pUvSaIDA4jegCbDL4jQlp+023XTMlV42cnpP0z
 hFZgBWJszLnLtj4KW/v3sXefZ1Pxl0WD4BHNqz8SMzMUaWalrXP4Gt2bnjB7Bx1N
 2M/DjW570eNZeZ9ZFcvkwHysCWMzHKmh5sPXnOitrs4s2hweIrO7wnMlYVLAGF1J
 m8jUoqpI9Cc7dFEg0inaSIddcjobcx9i2eG14zaZnXj0t8WqAbQqI0Lw/mipWXFY
 7DfdjFULI+Yru46TAFbiisFo/2dlijxrIr3d3QK21Cwklb3BPhpiDf83q6HYhNpB
 xPs38OCZaNdSL7TwNRcuZ2jmBCf+48SYgse85HQOgdD2QzJv6dU=
 =TGgF
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Some updates:
 * injection/radiotap updates for new test capabilities
 * remove WDS support - even years ago when we turned
   it off by default it was already basically unusable
 * support for HE (802.11ax) rates for beacons
 * support for some vendor-specific HE rates
 * many other small features/cleanups

* tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (21 commits)
  nl80211: fix kernel-doc warning in the new SAE attribute
  cfg80211: remove WDS code
  mac80211: remove WDS-related code
  rt2x00: remove WDS code
  b43legacy: remove WDS code
  b43: remove WDS code
  carl9170: remove WDS code
  ath9k: remove WDS code
  wireless: remove CONFIG_WIRELESS_WDS
  mac80211: assure that certain drivers adhere to DONT_REORDER flag
  mac80211: don't overwrite QoS TID of injected frames
  mac80211: adhere to Tx control flag that prevents frame reordering
  mac80211: add radiotap flag to assure frames are not reordered
  mac80211: save HE oper info in BSS config for mesh
  cfg80211: add support to configure HE MCS for beacon rate
  nl80211: fix beacon tx rate mask validation
  nl80211/cfg80211: fix potential infinite loop
  cfg80211: Add support to calculate and report 4096-QAM HE rates
  cfg80211: Add support to configure SAE PWE value to drivers
  ieee80211: Add definition for WFA DPP
  ...
====================

Link: https://lore.kernel.org/r/20201113101148.25268-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 12:03:22 -08:00
Martin KaFai Lau
8e4597c627 bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use
the bpf_sk_storage_(get|delete) helper, so those tracing programs
can access the sk's bpf_local_storage and the later selftest
will show some examples.

The bpf_sk_storage is currently used in bpf-tcp-cc, tc,
cg sockops...etc which is running either in softirq or
task context.

This patch adds bpf_sk_storage_get_tracing_proto and
bpf_sk_storage_delete_tracing_proto.  They will check
in runtime that the helpers can only be called when serving
softirq or running in a task context.  That should enable
most common tracing use cases on sk.

During the load time, the new tracing_allowed() function
will ensure the tracing prog using the bpf_sk_storage_(get|delete)
helper is not tracing any bpf_sk_storage*() function itself.
The sk is passed as "void *" when calling into bpf_local_storage.

This patch only allows tracing a kernel function.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201112211313.2587383-1-kafai@fb.com
2020-11-12 18:39:28 -08:00
Jakub Kicinski
e1d9d7b913 Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12 16:54:48 -08:00
Johannes Berg
da1e9dd3a1 nl80211: fix kernel-doc warning in the new SAE attribute
Format the items as a definition list, to avoid the warning
from the rst parsing.

Fixes: 9f0ffa4184 ("cfg80211: Add support to configure SAE PWE value to drivers")
Link: https://lore.kernel.org/r/20201110094911.bb020e863aa0.I960caf90e2a8cc23f6bf9245d77524df6a4d8f37@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-11-11 08:39:13 +01:00
Johannes Berg
e7e0517c10 cfg80211: remove WDS code
Remove all the code that was there to configure WDS interfaces,
now that there's no way to reach it anymore.

Link: https://lore.kernel.org/r/20201109105103.8f5b98e4068d.I5f5129041649ef2862b69683574bb3344743727b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-11-11 08:39:13 +01:00
Eric Dumazet
7b58e63e74 inet: udp{4|6}_lib_lookup_skb() skb argument is const
The skb is needed only to fetch the keys for the lookup.

Both functions are used from GRO stack, we do not want
accidental modification of the skb.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-10 17:57:14 -08:00
Eric Dumazet
d6bb2d1e86 inet: constify inet_sdif() argument
inet_sdif() does not modify the skb.

This will permit propagating the const qualifier in
udp{4|6}_lib_lookup_skb() functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-10 17:56:54 -08:00
Heiner Kallweit
682036b2b9 net: remove ip_tunnel_get_stats64
After having migrated all users remove ip_tunnel_get_stats64().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-09 17:50:28 -08:00
Allen Pais
6e1978a9a9 net: sched: convert tasklets to use new tasklet_setup() API
In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-07 10:41:15 -08:00
Jakub Kicinski
86bbf01977 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-11-06

1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David.

2) Tighten bpf_lsm function check, from KP.

3) Fix bpftool attaching to flow dissector, from Lorenz.

4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard.

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Update verification logic for LSM programs
  bpf: Zero-fill re-used per-cpu map element
  bpf: BPF_PRELOAD depends on BPF_SYSCALL
  tools/bpftool: Fix attaching flow dissector
  libbpf: Fix possible use after free in xsk_socket__delete
  libbpf: Fix null dereference in xsk_socket__delete
  libbpf, hashmap: Fix undefined behavior in hash_bits
  bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
  tools, bpftool: Remove two unused variables.
  tools, bpftool: Avoid array index warnings.
  xsk: Fix possible memory leak at socket close
  bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs
  samples/bpf: Set rlimit for memlock to infinity in all samples
  bpf: Fix -Wshadow warnings
  selftest/bpf: Fix profiler test using CO-RE relocation for enums
====================

Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 17:49:34 -08:00
Jakub Kicinski
ae0d0bb29b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 17:33:38 -08:00
Ido Schimmel
ce7e9c8a08 nexthop: Pass extack to register_nexthop_notifier()
This will be used by the next patch which extends the function to replay
all the existing nexthops to the notifier block being registered.

Device drivers will be able to pass extack to the function since it is
passed to them upon reload from devlink.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:49 -08:00
Ido Schimmel
732d167bf5 nexthop: Emit a notification when a nexthop is added
Emit a notification in the nexthop notification chain when a new nexthop
is added (not replaced). The nexthop can either be a new group or a
single nexthop.

The notification is sent after the nexthop is inserted into the
red-black tree, as listeners might need to callback into the nexthop
code with the nexthop ID in order to mark the nexthop as offloaded.

A 'REPLACE' notification is emitted instead of 'ADD' as the distinction
between the two is not important for in-kernel listeners. In case the
listener is not familiar with the encoded nexthop ID, it can simply
treat it as a new one. This is also consistent with the route offload
API.

Changes since RFC:
* Reword commit message

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:49 -08:00
Ido Schimmel
e95f2592f6 nexthop: Allow setting "offload" and "trap" indications on nexthops
Add a function that can be called by device drivers to set "offload" or
"trap" indication on nexthops following nexthop notifications.

Changes since RFC:
* s/nexthop_hw_flags_set/nexthop_set_hw_flags/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:49 -08:00
Ido Schimmel
1c9cac65ce nexthop: Add nexthop notification data structures
Add data structures that will be used for nexthop replace and delete
notifications in the previously introduced nexthop notification chain.

New data structures are added instead of passing the existing nexthop
code structures directly for several reasons.

First, the existing structures encode a lot of bookkeeping information
which is irrelevant for listeners of the notification chain.

Second, the existing structures can be changed without worrying about
introducing regressions in listeners since they are not accessed
directly by them.

Third, listeners of the notification chain do not need to each parse the
relatively complex nexthop code structures. They are passing the
required information in a simplified way.

Note that a single 'has_encap' bit is added instead of the actual
encapsulation information since current listeners do not support such
nexthops.

Changes since RFC:
* s/is_encap/has_encap/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:48 -08:00
Mathy Vanhoef
30df81301c mac80211: add radiotap flag to assure frames are not reordered
Add a new radiotap flag to indicate injected frames must not be
reordered relative to other frames that also have this flag set,
independent of priority field values in the transmitted frame.
Parse this radiotap flag and define and set a corresponding Tx
control flag. Note that this flag has recently been standardized
as part of an update to radiotap.

Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201104061823.197407-2-Mathy.Vanhoef@kuleuven.be
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-11-06 11:01:01 +01:00
Pradeep Kumar Chitrapu
70debba3ab mac80211: save HE oper info in BSS config for mesh
Currently he_support is set only for AP mode. Storing this
information for mesh BSS as well helps driver to determine
HE support. Also save HE operation element params in BSS
conf so that drivers can access this for any configurations
instead of having to parse the beacon to fetch that info.

Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
Link: https://lore.kernel.org/r/20201020183111.25458-2-pradeepc@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-11-06 10:03:21 +01:00
Rohan Dutta
9f0ffa4184 cfg80211: Add support to configure SAE PWE value to drivers
Add support to configure SAE PWE preference from userspace to drivers in
both AP and STA modes. This is needed for cases where the driver takes
care of Authentication frame processing (SME in the driver) so that
correct enforcement of the acceptable PWE derivation mechanism can be
performed.

The userspace applications can pass the sae_pwe value using the
NL80211_ATTR_SAE_PWE attribute in the NL80211_CMD_CONNECT and
NL80211_CMD_START_AP commands to the driver. This allows selection
between the hunting-and-pecking loop and hash-to-element options for PWE
derivation. For backwards compatibility, this new attribute is optional
and if not included, the driver is notified of the value being
unspecified.

Signed-off-by: Rohan Dutta <drohan@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20201027100910.22283-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-11-06 10:00:04 +01:00
Xin Long
0356010d82 sctp: bring inet(6)_skb_parm back to sctp_input_cb
inet(6)_skb_parm was removed from sctp_input_cb by Commit a1dd2cf2f1
("sctp: allow changing transport encap_port by peer packets"), as it
thought sctp_input_cb->header is not used any more in SCTP.

syzbot reported a crash:

  [ ] BUG: KASAN: use-after-free in decode_session6+0xe7c/0x1580
  [ ]
  [ ] Call Trace:
  [ ]  <IRQ>
  [ ]  dump_stack+0x107/0x163
  [ ]  kasan_report.cold+0x1f/0x37
  [ ]  decode_session6+0xe7c/0x1580
  [ ]  __xfrm_policy_check+0x2fa/0x2850
  [ ]  sctp_rcv+0x12b0/0x2e30
  [ ]  sctp6_rcv+0x22/0x40
  [ ]  ip6_protocol_deliver_rcu+0x2e8/0x1680
  [ ]  ip6_input_finish+0x7f/0x160
  [ ]  ip6_input+0x9c/0xd0
  [ ]  ipv6_rcv+0x28e/0x3c0

It was caused by sctp_input_cb->header/IP6CB(skb) still used in sctp rx
path decode_session6() but some members overwritten by sctp6_rcv().

This patch is to fix it by bring inet(6)_skb_parm back to sctp_input_cb
and not overwriting it in sctp4/6_rcv() and sctp_udp_rcv().

Reported-by: syzbot+5be8aebb1b7dfa90ef31@syzkaller.appspotmail.com
Fixes: a1dd2cf2f1 ("sctp: allow changing transport encap_port by peer packets")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/136c1a7a419341487c504be6d1996928d9d16e02.1604472932.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-05 14:27:30 -08:00
Vladimir Oltean
e358bef7c3 net: dsa: Give drivers the chance to veto certain upper devices
Some switches rely on unique pvids to ensure port separation in
standalone mode, because they don't have a port forwarding matrix
configurable in hardware. So, setups like a group of 2 uppers with the
same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN
100 to be autonomously forwarded between these switch ports, in spite
of there being no bridge between swp0 and swp1.

These drivers need to prevent this from happening. They need to have
VLAN filtering enabled in standalone mode (so they'll drop frames tagged
with unknown VLANs) and they can only accept an 8021q upper on a port as
long as it isn't installed on any other port too. So give them the
chance to veto bad user requests.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
[Kurt: Pass info instead of ptr]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-05 14:04:49 -08:00
Kurt Kanzenbach
01ef09caad net: dsa: Add tag handling for Hirschmann Hellcreek switches
The Hirschmann Hellcreek TSN switches have a special tagging protocol for frames
exchanged between the CPU port and the master interface. The format is a one
byte trailer indicating the destination or origin port.

It's quite similar to the Micrel KSZ tagging. That's why the implementation is
based on that code.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-05 14:04:49 -08:00
Jakub Kicinski
b65ca4c388 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Move existing bridge packet reject infra to nf_reject_{ipv4,ipv6}.c
   from Jose M. Guisado.

2) Consolidate nft_reject_inet initialization and dump, also from Jose.

3) Add the netdev reject action, from Jose.

4) Allow to combine the exist flag and the destroy command in ipset,
   from Joszef Kadlecsik.

5) Expose bucket size parameter for hashtables, also from Jozsef.

6) Expose the init value for reproducible ipset listings, from Jozsef.

7) Use __printf attribute in nft_request_module, from Andrew Lunn.

8) Allow to use reject from the inet ingress chain.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nft_reject_inet: allow to use reject from inet ingress
  netfilter: nftables: Add __printf() attribute
  netfilter: ipset: Expose the initval hash parameter to userspace
  netfilter: ipset: Add bucketsize parameter to all hash types
  netfilter: ipset: Support the -exist flag with the destroy command
  netfilter: nft_reject: add reject verdict support for netdev
  netfilter: nft_reject: unify reject init and dump into nft_reject
  netfilter: nf_reject: add reject skbuff creation helpers
====================

Link: https://lore.kernel.org/r/20201104141149.30082-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-04 18:05:56 -08:00
Paolo Abeni
5a369ca643 tcp: propagate MPTCP skb extensions on xmit splits
When the TCP stack splits a packet on the write queue, the tail
half currently lose the associated skb extensions, and will not
carry the DSM on the wire.

The above does not cause functional problems and is allowed by
the RFC, but interact badly with GRO and RX coalescing, as possible
candidates for aggregation will carry different TCP options.

This change tries to improve the MPTCP behavior, propagating the
skb extensions on split.

Additionally, we must prevent the MPTCP stack from updating the
mapping after the split occur: that will both violate the RFC and
fool the reader.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-04 17:45:53 -08:00
Guillaume Nault
0992d67bc2 mpls: drop skb's dst in mpls_forward()
Commit 394de110a7 ("net: Added pointer check for
dst->ops->neigh_lookup in dst_neigh_lookup_skb") added a test in
dst_neigh_lookup_skb() to avoid a NULL pointer dereference. The root
cause was the MPLS forwarding code, which doesn't call skb_dst_drop()
on incoming packets. That is, if the packet is received from a
collect_md device, it has a metadata_dst attached to it that doesn't
implement any dst_ops function.

To align the MPLS behaviour with IPv4 and IPv6, let's drop the dst in
mpls_forward(). This way, dst_neigh_lookup_skb() doesn't need to test
->neigh_lookup any more. Let's keep a WARN condition though, to
document the precondition and to ease detection of such problems in the
future.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/f8c2784c13faa54469a2aac339470b1049ca6b63.1604102750.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03 12:55:53 -08:00
Greg Kurz
987a648509 fs/9p: track open fids
This patch adds accounting of open fids in a list hanging off the i_private
field of the corresponding inode. This allows faster lookups compared to
searching the full 9p client list.

The lookup code is modified accordingly.

Link: http://lkml.kernel.org/r/20200923141146.90046-3-jianyong.wu@arm.com
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2020-11-03 09:29:46 +01:00
Yuchung Cheng
7e901ee7b6 tcp: avoid slow start during fast recovery on new losses
During TCP fast recovery, the congestion control in charge is by
default the Proportional Rate Reduction (PRR) unless the congestion
control module specified otherwise (e.g. BBR).

Previously when tcp_packets_in_flight() is below snd_ssthresh PRR
would slow start upon receiving an ACK that
   1) cumulatively acknowledges retransmitted data
   and
   2) does not detect further lost retransmission

Such conditions indicate the repair is in good steady progress
after the first round trip of recovery. Otherwise PRR adopts the
packet conservation principle to send only the amount that was
newly delivered (indicated by this ACK).

This patch generalizes the previous design principle to include
also the newly sent data beside retransmission: as long as
the delivery is making good progress, both retransmission and
new data should be accounted to make PRR more cautious in slow
starting.

Suggested-by: Matt Mathis <mattmathis@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201031013412.1973112-1-ycheng@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:17:40 -08:00
Jakub Kicinski
a72e9d5472 net: sched: Remove broken definitions and un-hide for !LOCKDEP
Currently, variables used only within lockdep expressions are flagged as
unused, requiring that these variables' declarations be decorated with
either #ifdef or __maybe_unused.  This results in ugly code.  This commit
therefore causes the full definitions of the lockdep_tcf_chain_is_locked()
and lockdep_tcf_proto_is_locked() functions to be visible even when
lockdep is not enabled, thus removing the need for the previous empty
functions that were provided in non-lockdep kernels.  This approach
further relies on dead-code elimination to remove any references to
functions or variables that are not available in non-lockdep kernels.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
--
CC: jhs@mojatatu.com
CC: xiyou.wangcong@gmail.com
CC: jiri@resnulli.us
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-02 17:10:00 -08:00
Jakub Kicinski
d97f3bdf7a net: Un-hide lockdep_sock_is_held() for !LOCKDEP
Currently, variables used only within lockdep expressions are flagged
as unused, requiring that these variables' declarations be decorated
with either #ifdef or __maybe_unused.  This results in ugly code.
This commit therefore causes the lockdep_sock_is_held() function to be
visible even when lockdep is not enabled, thus removing the need for
these decorations.  This approach further relies on dead-code elimination
to remove any references to functions or variables that are not available
in non-lockdep kernels.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-02 17:09:59 -08:00
Jose M. Guisado Gomez
fa538f7cf0 netfilter: nf_reject: add reject skbuff creation helpers
Adds reject skbuff creation helper functions to ipv4/6 nf_reject
infrastructure. Use these functions for reject verdict in bridge
family.

Can be reused by all different families that support reject and
will not inject the reject packet through ip local out.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31 10:40:22 +01:00
Xin Long
e38d86b354 sctp: add the error cause for new encapsulation port restart
This patch is to add the function to make the abort chunk with
the error cause for new encapsulation port restart, defined
on Section 4.4 in draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.

v1->v2:
  - no change.
v2->v3:
  - no need to call htons() when setting nep.cur_port/new_port.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:40 -07:00
Xin Long
f1bfe8b541 sctp: add udphdr to overhead when udp_port is set
sctp_mtu_payload() is for calculating the frag size before making
chunks from a msg. So we should only add udphdr size to overhead
when udp socks are listening, as only then sctp can handle the
incoming sctp over udp packets and outgoing sctp over udp packets
will be possible.

Note that we can't do this according to transport->encap_port, as
different transports may be set to different values, while the
chunks were made before choosing the transport, we could not be
able to meet all rfc6951#section-5.6 recommends.

v1->v2:
  - Add udp_port for sctp_sock to avoid a potential race issue, it
    will be used in xmit path in the next patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:21 -07:00
Xin Long
a1dd2cf2f1 sctp: allow changing transport encap_port by peer packets
As rfc6951#section-5.4 says:

  "After finding the SCTP association (which
   includes checking the verification tag), the UDP source port MUST be
   stored as the encapsulation port for the destination address the SCTP
   packet is received from (see Section 5.1).

   When a non-encapsulated SCTP packet is received by the SCTP stack,
   the encapsulation of outgoing packets belonging to the same
   association and the corresponding destination address MUST be
   disabled."

transport encap_port should be updated by a validated incoming packet's
udp src port.

We save the udp src port in sctp_input_cb->encap_port, and then update
the transport in two places:

  1. right after vtag is verified, which is required by RFC, and this
     allows the existent transports to be updated by the chunks that
     can only be processed on an asoc.

  2. right before processing the 'init' where the transports are added,
     and this allows building a sctp over udp connection by client with
     the server not knowing the remote encap port.

  3. when processing ootb_pkt and creating the temporary transport for
     the reply pkt.

Note that sctp_input_cb->header is removed, as it's not used any more
in sctp.

v1->v2:
  - Change encap_port as __be16 for sctp_input_cb.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:16 -07:00
Xin Long
e8a3001c21 sctp: add encap_port for netns sock asoc and transport
encap_port is added as per netns/sock/assoc/transport, and the
latter one's encap_port inherits the former one's by default.
The transport's encap_port value would mostly decide if one
packet should go out with udp encapsulated or not.

This patch also allows users to set netns' encap_port by sysctl.

v1->v2:
  - Change to define encap_port as __be16 for sctp_sock, asoc and
    transport.
v2->v3:
  - No change.
v3->v4:
  - Add 'encap_port' entry in ip-sysctl.rst.
v4->v5:
  - Improve the description of encap_port in ip-sysctl.rst.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:06 -07:00
Xin Long
9d6ba260a0 sctp: create udp6 sock and set its encap_rcv
This patch is to add the udp6 sock part in sctp_udp_sock_start/stop().
udp_conf.use_udp6_rx_checksums is set to true, as:

   "The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP
    checksum SHOULD be computed for IPv4 and IPv6"

says in rfc6951#section-5.3.

v1->v2:
  - Add pr_err() when fails to create udp v6 sock.
  - Add #if IS_ENABLED(CONFIG_IPV6) not to create v6 sock when ipv6 is
    disabled.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:23:56 -07:00
Xin Long
965ae44412 sctp: create udp4 sock and add its encap_rcv
This patch is to add the functions to create/release udp4 sock,
and set the sock's encap_rcv to process the incoming udp encap
sctp packets. In sctp_udp_rcv(), as we can see, all we need to
do is fix the transport header for sctp_rcv(), then it would
implement the part of rfc6951#section-5.4:

  "When an encapsulated packet is received, the UDP header is removed.
   Then, the generic lookup is performed, as done by an SCTP stack
   whenever a packet is received, to find the association for the
   received SCTP packet"

Note that these functions will be called in the last patch of
this patchset when enabling this feature.

v1->v2:
  - Add pr_err() when fails to create udp v4 sock.
v2->v3:
  - Add 'select NET_UDP_TUNNEL' in sctp Kconfig.
v3->v4:
  - No change.
v4->v5:
  - Change to set udp_port to 0 by default.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:23:52 -07:00
Mauro Carvalho Chehab
b1e8eb11fb mac80211: fix kernel-doc markups
Some identifiers have different names between their prototypes
and the kernel-doc markup.

Others need to be fixed, as kernel-doc markups should use this format:
        identifier - description

In the specific case of __sta_info_flush(), add a documentation
for sta_info_flush(), as this one is the one used outside
sta_info.c.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/978d35eef2dc76e21c81931804e4eaefbd6d635e.1603469755.git.mchehab+huawei@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-10-30 10:06:09 +01:00
Arnd Bergmann
f54ec58fee wimax: move out to staging
There are no known users of this driver as of October 2020, and it will
be removed unless someone turns out to still need it in future releases.

According to https://en.wikipedia.org/wiki/List_of_WiMAX_networks, there
have been many public wimax networks, but it appears that many of these
have migrated to LTE or discontinued their service altogether.
As most PCs and phones lack WiMAX hardware support, the remaining
networks tend to use standalone routers. These almost certainly
run Linux, but not a modern kernel or the mainline wimax driver stack.

NetworkManager appears to have dropped userspace support in 2015
https://bugzilla.gnome.org/show_bug.cgi?id=747846, the
www.linuxwimax.org
site had already shut down earlier.

WiMax is apparently still being deployed on airport campus networks
("AeroMACS"), but in a frequency band that was not supported by the old
Intel 2400m (used in Sandy Bridge laptops and earlier), which is the
only driver using the kernel's wimax stack.

Move all files into drivers/staging/wimax, including the uapi header
files and documentation, to make it easier to remove it when it gets
to that. Only minimal changes are made to the source files, in order
to make it possible to port patches across the move.

Also remove the MAINTAINERS entry that refers to a broken mailing
list and website.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-By: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Suggested-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-10-29 19:27:45 +01:00
Magnus Karlsson
e5e1a4bc91 xsk: Fix possible memory leak at socket close
Fix a possible memory leak at xsk socket close that is caused by the
refcounting of the umem object being wrong. The reference count of the
umem was decremented only after the pool had been freed. Note that if
the buffer pool is destroyed, it is important that the umem is
destroyed after the pool, otherwise the umem would disappear while the
driver is still running. And as the buffer pool needs to be destroyed
in a work queue, the umem is also (if its refcount reaches zero)
destroyed after the buffer pool in that same work queue.

What was missing is that the refcount also needs to be decremented
when the pool is not freed and when the pool has not even been
created. The first case happens when the refcount of the pool is
higher than 1, i.e. it is still being used by some other socket using
the same device and queue id. In this case, it is safe to decrement
the refcount of the umem outside of the work queue as the umem will
never be freed because the refcount of the umem is always greater than
or equal to the refcount of the buffer pool. The second case is if the
buffer pool has not been created yet, i.e. the socket was closed
before it was bound but after the umem was created. In this case, it
is safe to destroy the umem outside of the work queue, since there is
no pool that can use it by definition.

Fixes: 1c1efc2af1 ("xsk: Create and free buffer pool independently from umem")
Reported-by: syzbot+eb71df123dc2be2c1456@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1603801921-2712-1-git-send-email-magnus.karlsson@gmail.com
2020-10-29 15:19:56 +01:00
Linus Torvalds
3cb12d27ff Fixes for 5.10-rc1 from the networking tree:
Cross-tree/merge window issues:
 
  - rtl8150: don't incorrectly assign random MAC addresses; fix late
    in the 5.9 cycle started depending on a return code from
    a function which changed with the 5.10 PR from the usb subsystem
 
 Current release - regressions:
 
  - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
    crashes at probe when control vq was not negotiated/available
 
 Previous releases - regressions:
 
  - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
    bus, only first device would be probed correctly
 
  - nexthop: Fix performance regression in nexthop deletion by
    effectively switching from recently added synchronize_rcu()
    to synchronize_rcu_expedited()
 
  - netsec: ignore 'phy-mode' device property on ACPI systems;
    the property is not populated correctly by the firmware,
    but firmware configures the PHY so just keep boot settings
 
 Previous releases - always broken:
 
  - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
    bulk transfers getting "stuck"
 
  - icmp: randomize the global rate limiter to prevent attackers from
    getting useful signal
 
  - r8169: fix operation under forced interrupt threading, make the
    driver always use hard irqs, even on RT, given the handler is
    light and only wants to schedule napi (and do so through
    a _irqoff() variant, preferably)
 
  - bpf: Enforce pointer id generation for all may-be-null register
    type to avoid pointers erroneously getting marked as null-checked
 
  - tipc: re-configure queue limit for broadcast link
 
  - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
    tunnels
 
  - fix various issues in chelsio inline tls driver
 
 Misc:
 
  - bpf: improve just-added bpf_redirect_neigh() helper api to support
    supplying nexthop by the caller - in case BPF program has already
    done a lookup we can avoid doing another one
 
  - remove unnecessary break statements
 
  - make MCTCP not select IPV6, but rather depend on it
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+R+5UACgkQMUZtbf5S
 Irt9KxAAiYme2aSvMOni0NQsOgQ5mVsy7tk0/4dyRqkAx0ggrfGcFuhgZYNm8ZKY
 KoQsQyn30Wb/2wAp1vX2I4Fod67rFyBfQg/8iWiEAu47X7Bj1lpPPJexSPKhF9/X
 e0TuGxZtoaDuV9C3Su/FOjRmnShGSFQu1SCyJThshwaGsFL3YQ0Ut07VRgRF8x05
 A5fy2SVVIw0JOQgV1oH0GP5oEK3c50oGnaXt8emm56PxVIfAYY0oq69hQUzrfMFP
 zV9R0XbnbCIibT8R3lEghjtXavtQTzK5rYDKazTeOyDU87M+yuykNYj7MhgDwl9Q
 UdJkH2OpMlJylEH3asUjz/+ObMhXfOuj/ZS3INtO5omBJx7x76egDZPMQe4wlpcC
 NT5EZMS7kBdQL8xXDob7hXsvFpuEErSUGruYTHp4H52A9ke1dRTH2kQszcKk87V3
 s+aVVPtJ5bHzF3oGEvfwP0DFLTF6WvjD0Ts0LmTY2DhpE//tFWV37j60Ni5XU21X
 fCPooihQbLOsq9D8zc0ydEvCg2LLWMXM5ovCkqfIAJzbGVYhnxJSryZwpOlKDS0y
 LiUmLcTZDoNR/szx0aJhVHdUUVgXDX/GsllHoc1w7ZvDRMJn40K+xnaF3dSMwtIl
 imhfc5pPi6fdBgjB0cFYRPfhwiwlPMQ4YFsOq9JvynJzmt6P5FQ=
 =ceke
 -----END PGP SIGNATURE-----

Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Cross-tree/merge window issues:

   - rtl8150: don't incorrectly assign random MAC addresses; fix late in
     the 5.9 cycle started depending on a return code from a function
     which changed with the 5.10 PR from the usb subsystem

  Current release regressions:

   - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
     crashes at probe when control vq was not negotiated/available

  Previous release regressions:

   - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
     bus, only first device would be probed correctly

   - nexthop: Fix performance regression in nexthop deletion by
     effectively switching from recently added synchronize_rcu() to
     synchronize_rcu_expedited()

   - netsec: ignore 'phy-mode' device property on ACPI systems; the
     property is not populated correctly by the firmware, but firmware
     configures the PHY so just keep boot settings

  Previous releases - always broken:

   - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
     bulk transfers getting "stuck"

   - icmp: randomize the global rate limiter to prevent attackers from
     getting useful signal

   - r8169: fix operation under forced interrupt threading, make the
     driver always use hard irqs, even on RT, given the handler is light
     and only wants to schedule napi (and do so through a _irqoff()
     variant, preferably)

   - bpf: Enforce pointer id generation for all may-be-null register
     type to avoid pointers erroneously getting marked as null-checked

   - tipc: re-configure queue limit for broadcast link

   - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
     tunnels

   - fix various issues in chelsio inline tls driver

  Misc:

   - bpf: improve just-added bpf_redirect_neigh() helper api to support
     supplying nexthop by the caller - in case BPF program has already
     done a lookup we can avoid doing another one

   - remove unnecessary break statements

   - make MCTCP not select IPV6, but rather depend on it"

* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  tcp: fix to update snd_wl1 in bulk receiver fast path
  net: Properly typecast int values to set sk_max_pacing_rate
  netfilter: nf_fwd_netdev: clear timestamp in forwarding path
  ibmvnic: save changed mac address to adapter->mac_addr
  selftests: mptcp: depends on built-in IPv6
  Revert "virtio-net: ethtool configurable RXCSUM"
  rtnetlink: fix data overflow in rtnl_calcit()
  net: ethernet: mtk-star-emac: select REGMAP_MMIO
  net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
  net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
  bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
  bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
  bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
  mptcp: depends on IPV6 but not as a module
  sfc: move initialisation of efx->filter_sem to efx_init_struct()
  mpls: load mpls_gso after mpls_iptunnel
  net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
  net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
  net: dsa: bcm_sf2: make const array static, makes object smaller
  mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
  ...
2020-10-23 12:05:49 -07:00
Saeed Mirzamohammadi
31cc578ae2 netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create
This patch fixes the issue due to:

BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
net/netfilter/nf_tables_offload.c:40
Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

Add nft_expr_more() and use it to fix this problem.

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20 13:54:54 +02:00