Eric Dumazet suggested to allow users to modify max GRO packet size.
We have seen GRO being disabled by users of appliances (such as
wifi access points) because of claimed bufferbloat issues,
or some work arounds in sch_cake, to split GRO/GSO packets.
Instead of disabling GRO completely, one can chose to limit
the maximum packet size of GRO packets, depending on their
latency constraints.
This patch adds a per device gro_max_size attribute
that can be changed with ip link command.
ip link set dev eth0 gro_max_size 16000
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2022-01-06
1) Fix some clang_analyzer warnings about never read variables.
From luo penghao.
2) Check for pols[0] only once in xfrm_expand_policies().
From Jean Sacren.
3) The SA curlft.use_time was updated only on SA cration time.
Update whenever the SA is used. From Antony Antony
4) Add support for SM3 secure hash.
From Xu Jia.
5) Add support for SM4 symmetric cipher algorithm.
From Xu Jia.
6) Add a rate limit for SA mapping change messages.
From Antony Antony.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the CAN netlink interface provides no easy ways to check
the capabilities of a given controller. The only method from the
command line is to try each CAN_CTRLMODE_* individually to check
whether the netlink interface returns an -EOPNOTSUPP error or not
(alternatively, one may find it easier to directly check the source
code of the driver instead...)
This patch introduces a method for the user to check both the
supported and the static capabilities. The proposed method introduces
a new IFLA nest: IFLA_CAN_CTRLMODE_EXT which extends the current
IFLA_CAN_CTRLMODE. This is done to guaranty a full forward and
backward compatibility between the kernel and the user land
applications.
The IFLA_CAN_CTRLMODE_EXT nest contains one single entry:
IFLA_CAN_CTRLMODE_SUPPORTED. Because this entry is only used in one
direction: kernel to userland, no new struct nla_policy are
introduced.
Below table explains how IFLA_CAN_CTRLMODE_SUPPORTED (hereafter:
"supported") and can_ctrlmode::flags (hereafter: "flags") allow us to
identify both the supported and the static capabilities, when masked
with any of the CAN_CTRLMODE_* bit flags:
supported & flags & Controller capabilities
CAN_CTRLMODE_* CAN_CTRLMODE_*
-----------------------------------------------------------------------
false false Feature not supported (always disabled)
false true Static feature (always enabled)
true false Feature supported but disabled
true true Feature supported and enabled
Link: https://lore.kernel.org/all/20211213160226.56219-5-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When a mesh link is in blocked state, it is very useful to still allow
auth requests from the peer to re-establish it.
When a remote node is power cycled, the peer state can easily end up
in blocked state if multiple auth attempts are performed. Since this
can lead to several minutes of downtime, we should accept auth attempts
of the peer after it has come back.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20211220105147.88625-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds net namespace ID to diag of linkgroup, helps us to distinguish
different namespaces, and net_cookie is unique in the whole system.
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-12-30
The following pull-request contains BPF updates for your *net-next* tree.
We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).
The main changes are:
1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.
2) Beautify and de-verbose verifier logs, from Christy.
3) Composable verifier types, from Hao.
4) bpf_strncmp helper, from Hou.
5) bpf.h header dependency cleanup, from Jakub.
6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.
7) Sleepable local storage, from KP.
8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for misc5 match parameter as per HW spec, this will allow
matching on tunnel_header fields.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/
net/smc/smc_wr.c
commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
bitmap_zero()/memset() is removed by the fix
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from.. Santa?
No regressions on our radar at this point. The igc problem fixed here
was the last one I was tracking but it was broken in previous
releases, anyway. Mostly driver fixes and a couple of largish SMC
fixes.
Current release - regressions:
- xsk: initialise xskb free_list_node, fixup for a -rc7 fix
Current release - new code bugs:
- mlx5: handful of minor fixes:
- use first online CPU instead of hard coded CPU
- fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'
- fix skb memory leak when TC classifier action offloads are disabled
- fix memory leak with rules with internal OvS port
Previous releases - regressions:
- igc: do not enable crosstimestamping for i225-V models
Previous releases - always broken:
- udp: use datalen to cap ipv6 udp max gso segments
- fix use-after-free in tw_timer_handler due to early free of stats
- smc: fix kernel panic caused by race of smc_sock
- smc: don't send CDC/LLC message if link not ready, avoid timeouts
- sctp: use call_rcu to free endpoint, avoid UAF in sock diag
- bridge: mcast: add and enforce query interval minimum
- usb: pegasus: do not drop long Ethernet frames
- mlx5e: fix ICOSQ recovery flow for XSK
- nfc: uapi: use kernel size_t to fix user-space builds"
* tag 'net-5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
fsl/fman: Fix missing put_device() call in fman_port_probe
selftests: net: using ping6 for IPv6 in udpgro_fwd.sh
Documentation: fix outdated interpretation of ip_no_pmtu_disc
net/ncsi: check for error return from call to nla_put_u32
net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
net: fix use-after-free in tw_timer_handler
selftests: net: Fix a typo in udpgro_fwd.sh
selftests/net: udpgso_bench_tx: fix dst ip argument
net: bridge: mcast: add and enforce startup query interval minimum
net: bridge: mcast: add and enforce query interval minimum
ipv6: raw: check passed optlen before reading
xsk: Initialise xskb free_list_node
net/mlx5e: Fix wrong features assignment in case of error
net/mlx5e: TC, Fix memory leak with rules with internal port
ionic: Initialize the 'lif->dbid_inuse' bitmap
igc: Fix TX timestamp support for non-MSI-X platforms
igc: Do not enable crosstimestamping for i225-V models
net/smc: fix kernel panic caused by race of smc_sock
net/smc: don't send CDC/LLC message if link not ready
NFC: st21nfca: Fix memory leak in device probe and remove
...
As we defined the new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX
as enum, it's not easy for userspace program to check if the flag is
supported when build.
Let's define the new flag so user space could build it on old kernel with
ifdef check.
Fixes: 9c9211a3fc ("net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix user-space builds if it includes /usr/include/linux/nfc.h before
some of other headers:
/usr/include/linux/nfc.h:281:9: error: unknown type name ‘size_t’
281 | size_t service_name_len;
| ^~~~~~
Fixes: d646960f79 ("NFC: Initial LLCP support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace sa_family_t with __kernel_sa_family_t to fix the following
linux/nfc.h userspace compilation errors:
/usr/include/linux/nfc.h:266:2: error: unknown type name 'sa_family_t'
sa_family_t sa_family;
/usr/include/linux/nfc.h:274:2: error: unknown type name 'sa_family_t'
sa_family_t sa_family;
Fixes: 23b7869c0f ("NFC: add the NFC socket raw protocol")
Fixes: d646960f79 ("NFC: Initial LLCP support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel generates mapping change message, XFRM_MSG_MAPPING,
when a source port chage is detected on a input state with UDP
encapsulation set. Kernel generates a message for each IPsec packet
with new source port. For a high speed flow per packet mapping change
message can be excessive, and can overload the user space listener.
Introduce rate limiting for XFRM_MSG_MAPPING message to the user space.
The rate limiting is configurable via netlink, when adding a new SA or
updating it. Use the new attribute XFRMA_MTIMER_THRESH in seconds.
v1->v2 change:
update xfrm_sa_len()
v2->v3 changes:
use u32 insted unsigned long to reduce size of struct xfrm_state
fix xfrm_ompat size Reported-by: kernel test robot <lkp@intel.com>
accept XFRM_MSG_MAPPING only when XFRMA_ENCAP is present
Co-developed-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This macro is defined by glibc itself, which makes the issue go unnoticed on
those systems. On non-glibc systems it causes build failures on several
utilities and libraries, like bpftool and objtool.
Fixes: 1d509f2a6e ("x86/insn: Support big endian cross-compiles")
Fixes: 2d7ce0e8a7 ("tools/virtio: more stubs")
Fixes: 3fb321fde2 ("selftests/net: ipv6 flowlabel")
Fixes: 50b3ed57de ("selftests/bpf: test bpf flow dissection")
Fixes: 9cacf81f81 ("bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE")
Fixes: a4b2061242 ("tools include uapi: Grab a copy of linux/in.h")
Fixes: b12d6ec097 ("bpf: btf: add btf print functionality")
Fixes: c0dd967818 ("tools, include: Grab a copy of linux/erspan.h")
Fixes: c4b6014e8b ("tools: Add copy of perf_event.h to tools/include/linux/")
Signed-off-by: Ismael Luceno <ismael@iodev.co.uk>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211115134647.1921-1-ismael@iodev.co.uk
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Johannes Berg says:
====================
This time we have:
* ndo_fill_forward_path support in mac80211, to let drivers use it
* association comeback notification for userspace, to be able
to react more sensibly to long delays
* support for background radar detection hardware in some chipsets
* SA Query Procedures offload on the AP side
* more logging if we find problems with HT/VHT/HE
* various cleanups and minor fixes
Conflicts:
net/wireless/reg.c:
e08ebd6d7b ("cfg80211: Acquire wiphy mutex on regulatory work")
701fdfe348 ("cfg80211: Enable regulatory enforcement checks for drivers supporting mesh iface")
https://lore.kernel.org/r/20211221111950.57ecc6a7@canb.auug.org.au
drivers/net/wireless/ath/ath10k/wmi.c:
7f599aeccb ("cfg80211: Use the HE operation IE to determine a 6GHz BSS channel")
3bf2537ec2 ("ath10k: drop beacon and probe response which leak from other channel")
https://lore.kernel.org/r/20211221115004.1cd6b262@canb.auug.org.au
* tag 'mac80211-next-for-net-next-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (32 commits)
cfg80211: Enable regulatory enforcement checks for drivers supporting mesh iface
rfkill: allow to get the software rfkill state
cfg80211: refactor cfg80211_get_ies_channel_number()
nl82011: clarify interface combinations wrt. channels
nl80211: Add support to offload SA Query procedures for AP SME device
nl80211: Add support to set AP settings flags with single attribute
mac80211: add more HT/VHT/HE state logging
cfg80211: Use the HE operation IE to determine a 6GHz BSS channel
cfg80211: rename offchannel_chain structs to background_chain to avoid confusion with ETSI standard
mac80211: Notify cfg80211 about association comeback
cfg80211: Add support for notifying association comeback
mac80211: introduce channel switch disconnect function
cfg80211: Fix order of enum nl80211_band_iftype_attr documentation
cfg80211: simplify cfg80211_chandef_valid()
mac80211: Remove a couple of obsolete TODO
mac80211: fix FEC flag in radio tap header
mac80211: use coarse boottime for airtime fairness code
ieee80211: change HE nominal packet padding value defines
cfg80211: use ieee80211_bss_get_elem() instead of _get_ie()
mac80211: Use memset_after() to clear tx status
...
====================
Link: https://lore.kernel.org/r/20211221112532.28708-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a flag attribute to use in ap settings to indicate userspace
supports offloading of SA Query procedures to driver. Also add AP SME
device feature flag to advertise that the SA Query procedures offloaded
to driver when userspace indicates support for offloading of SA Query
procedures.
Driver handles SA Query procedures in driver's SME it self and skip
sending SA Query request or response frames to userspace when userspace
indicates support for SA Query procedures offload. But if userspace
doesn't advertise support for SA Query procedures offload driver shall
not offload SA Query procedures handling.
Also userspace with SA Query procedures offload capability shall skip SA
Query specific validations when driver indicates support for handling SA
Query procedures.
Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Link: https://lore.kernel.org/r/1637911519-21306-2-git-send-email-vjakkam@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In previous method each AP settings flag is represented by a top-level
flag attribute and conversion to enum cfg80211_ap_settings_flags had to
be done before sending them to driver. This commit is to make it easier
to define new AP settings flags and sending them to driver.
This commit also deprecate sending of
%NL80211_ATTR_EXTERNAL_AUTH_SUPPORT in %NL80211_CMD_START_AP. But to
maintain backwards compatibility checks for
%NL80211_ATTR_EXTERNAL_AUTH_SUPPORT in %NL80211_CMD_START_AP when
%NL80211_ATTR_AP_SETTINGS_FLAGS not present in %NL80211_CMD_START_AP.
Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Link: https://lore.kernel.org/r/1637911519-21306-1-git-send-email-vjakkam@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ETSI standard defines "Offchannel CAC" as:
"Off-Channel CAC is performed by a number of non-continuous checks
spread over a period in time. This period, which is required to
determine the presence of radar signals, is defined as the Off-Channel
CAC Time..
Minimum Off-Channel CAC Time 6 minutes and Maximum Off-Channel CAC Time
4 hours..".
mac80211 implementation refers to a dedicated hw chain used for continuous
radar monitoring. Rename offchannel_* references to background_* in
order to avoid confusion with ETSI standard.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/4204cc1d648d76b44557981713231e030a3bd991.1638190762.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We add skip_hw and skip_sw for user to control if offload the action
to hardware.
We also add in_hw_count for user to indicate if the action is offloaded
to any hardware.
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from mac80211, wifi, bpf.
Relatively large batches of fixes from BPF and the WiFi stack, calm in
general networking.
Current release - regressions:
- dpaa2-eth: fix buffer overrun when reporting ethtool statistics
Current release - new code bugs:
- bpf: fix incorrect state pruning for <8B spill/fill
- iavf:
- add missing unlocks in iavf_watchdog_task()
- do not override the adapter state in the watchdog task (again)
- mlxsw: spectrum_router: consolidate MAC profiles when possible
Previous releases - regressions:
- mac80211 fixes:
- rate control, avoid driver crash for retransmitted frames
- regression in SSN handling of addba tx
- a memory leak where sta_info is not freed
- marking TX-during-stop for TX in in_reconfig, prevent stall
- cfg80211: acquire wiphy mutex on regulatory work
- wifi drivers: fix build regressions and LED config dependency
- virtio_net: fix rx_drops stat for small pkts
- dsa: mv88e6xxx: unforce speed & duplex in mac_link_down()
Previous releases - always broken:
- bpf fixes:
- kernel address leakage in atomic fetch
- kernel address leakage in atomic cmpxchg's r0 aux reg
- signed bounds propagation after mov32
- extable fixup offset
- extable address check
- mac80211:
- fix the size used for building probe request
- send ADDBA requests using the tid/queue of the aggregation
session
- agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid
deadlocks
- validate extended element ID is present
- mptcp:
- never allow the PM to close a listener subflow (null-defer)
- clear 'kern' flag from fallback sockets, prevent crash
- fix deadlock in __mptcp_push_pending()
- inet_diag: fix kernel-infoleak for UDP sockets
- xsk: do not sleep in poll() when need_wakeup set
- smc: avoid very long waits in smc_release()
- sch_ets: don't remove idle classes from the round-robin list
- netdevsim:
- zero-initialize memory for bpf map's value, prevent info leak
- don't let user space overwrite read only (max) ethtool parms
- ixgbe: set X550 MDIO speed before talking to PHY
- stmmac:
- fix null-deref in flower deletion w/ VLAN prio Rx steering
- dwmac-rk: fix oob read in rk_gmac_setup
- ice: time stamping fixes
- systemport: add global locking for descriptor life cycle"
* tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits)
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
selftest/bpf: Add a test that reads various addresses.
bpf: Fix extable address check.
bpf: Fix extable fixup offset.
bpf, selftests: Add test case trying to taint map value pointer
bpf: Make 32->64 bounds propagation slightly more robust
bpf: Fix signed bounds propagation after mov32
sit: do not call ipip6_dev_free() from sit_init_net()
net: systemport: Add global locking for descriptor lifecycle
net/smc: Prevent smc_release() from long blocking
net: Fix double 0x prefix print in SKB dump
virtio_net: fix rx_drops stat for small pkts
dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED
sfc_ef100: potential dereference of null pointer
net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup
net: usb: lan78xx: add Allied Telesis AT29M2-AF
net/packet: rx_owner_map depends on pg_vec
netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
dpaa2-eth: fix ethtool statistics
ixgbe: set X550 MDIO speed before talking to PHY
...
'loc_id' and 'rem_id' are set in all events linked to subflows but those
were missing in the events description in the comments.
Fixes: b911c97c7d ("mptcp: add netlink event support")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 94dd016ae5 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP
ioctl to active device") the user could get bond active interface's
PHC index directly. But when there is a failover, the bond active
interface will change, thus the PHC index is also changed. This may
break the user's program if they did not update the PHC timely.
This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
When the user wants to get the bond active interface's PHC, they need to
add this flag and be aware the PHC index may be changed.
With the new flag. All flag checks in current drivers are removed. Only
the checking in net_hwtstamp_validate() is kept.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding following helpers for tracing programs:
Get n-th argument of the traced function:
long bpf_get_func_arg(void *ctx, u32 n, u64 *value)
Get return value of the traced function:
long bpf_get_func_ret(void *ctx, u64 *value)
Get arguments count of the traced function:
long bpf_get_func_arg_cnt(void *ctx)
The trampoline now stores number of arguments on ctx-8
address, so it's easy to verify argument index and find
return value argument's position.
Moving function ip address on the trampoline stack behind
the number of functions arguments, so it's now stored on
ctx-16 address if it's needed.
All helpers above are inlined by verifier.
Also bit unrelated small change - using newly added function
bpf_prog_has_trampoline in check_get_func_ip.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211208193245.172141-5-jolsa@kernel.org
Merge misc fixes from Andrew Morton:
"21 patches.
Subsystems affected by this patch series: MAINTAINERS, mailmap, and mm
(mlock, pagecache, damon, slub, memcg, hugetlb, and pagecache)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm: bdi: initialize bdi_min_ratio when bdi is unregistered
hugetlbfs: fix issue of preallocation of gigantic pages can't work
mm/memcg: relocate mod_objcg_mlstate(), get_obj_stock() and put_obj_stock()
mm/slub: fix endianness bug for alloc/free_traces attributes
selftests/damon: split test cases
selftests/damon: test debugfs file reads/writes with huge count
selftests/damon: test wrong DAMOS condition ranges input
selftests/damon: test DAMON enabling with empty target_ids case
selftests/damon: skip test if DAMON is running
mm/damon/vaddr-test: remove unnecessary variables
mm/damon/vaddr-test: split a test function having >1024 bytes frame size
mm/damon/vaddr: remove an unnecessary warning message
mm/damon/core: remove unnecessary error messages
mm/damon/dbgfs: remove an unnecessary error message
mm/damon/core: use better timer mechanisms selection threshold
mm/damon/core: fix fake load reports due to uninterruptible sleeps
timers: implement usleep_idle_range()
filemap: remove PageHWPoison check from next_uptodate_page()
mailmap: update email address for Guo Ren
MAINTAINERS: update kdump maintainers
...
This limit has not been updated since 2008, when it was increased to 64
KiB at the request of GnuPG. Until recently, the main use-cases for this
feature were (1) preventing sensitive memory from being swapped, as in
GnuPG's use-case; and (2) real-time use-cases. In the first case, little
memory is called for, and in the second case, the user is generally in a
position to increase it if they need more.
The introduction of IOURING_REGISTER_BUFFERS adds a third use-case:
preparing fixed buffers for high-performance I/O. This use-case will take
as much of this memory as it can get, but is still limited to 64 KiB by
default, which is very little. This increases the limit to 8 MB, which
was chosen fairly arbitrarily as a more generous, but still conservative,
default value.
It is also possible to raise this limit in userspace. This is easily
done, for example, in the use-case of a network daemon: systemd, for
instance, provides for this via LimitMEMLOCK in the service file; OpenRC
via the rc_ulimit variables. However, there is no established userspace
facility for configuring this outside of daemons: end-user applications do
not presently have access to a convenient means of raising their limits.
The buck, as it were, stops with the kernel. It's much easier to address
it here than it is to bring it to hundreds of distributions, and it can
only realistically be relied upon to be high-enough by end-user software
if it is more-or-less ubiquitous. Most distros don't change this
particular rlimit from the kernel-supplied default value, so a change here
will easily provide that ubiquity.
Link: https://lkml.kernel.org/r/20211028080813.15966-1-sir@cmpwn.com
Signed-off-by: Drew DeVault <sir@cmpwn.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Dona-Couch <andrew@donacou.ch>
Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrii Nakryiko says:
====================
bpf-next 2021-12-10 v2
We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).
The main changes are:
1) Various samples fixes, from Alexander Lobakin.
2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.
3) A batch of new unified APIs for libbpf, logging improvements, version
querying, etc. Also a batch of old deprecations for old APIs and various
bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.
4) BPF documentation reorganization and improvements, from Christoph Hellwig
and Dave Tucker.
5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
libbpf, from Hengqi Chen.
6) Verifier log fixes, from Hou Tao.
7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.
8) Extend branch record capturing to all platforms that support it,
from Kajol Jain.
9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.
10) bpftool doc-generating script improvements, from Quentin Monnet.
11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.
12) Deprecation warning fix for perf/bpf_counter, from Song Liu.
13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
from Tiezhu Yang.
14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.
15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
and others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
libbpf: Add "bool skipped" to struct bpf_map
libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
selftests/bpf: Add test for libbpf's custom log_buf behavior
selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
libbpf: Deprecate bpf_object__load_xattr()
libbpf: Add per-program log buffer setter and getter
libbpf: Preserve kernel error code and remove kprobe prog type guessing
libbpf: Improve logging around BPF program loading
libbpf: Allow passing user log setting through bpf_object_open_opts
libbpf: Allow passing preallocated log_buf when loading BTF into kernel
libbpf: Add OPTS-based bpf_btf_load() API
libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
samples/bpf: Remove unneeded variable
bpf: Remove redundant assignment to pointer t
selftests/bpf: Fix a compilation warning
perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
samples: bpf: Fix 'unknown warning group' build warning on Clang
samples: bpf: Fix xdp_sample_user.o linking with Clang
...
====================
Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
signalfd_poll() and binder_poll() are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case. This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution. This solution is for the queue to be cleared
before it is freed, by sending a POLLFREE notification to all waiters.
Unfortunately, only eventpoll handles POLLFREE. A second type of
non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
handle POLLFREE. This allows a use-after-free to occur if a signalfd or
binder fd is polled with aio poll, and the waitqueue gets freed.
Fix this by making aio poll handle POLLFREE.
A patch by Ramji Jiyani <ramjiyani@google.com>
(https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
tried to do this by making aio_poll_wake() always complete the request
inline if POLLFREE is seen. However, that solution had two bugs.
First, it introduced a deadlock, as it unconditionally locked the aio
context while holding the waitqueue lock, which inverts the normal
locking order. Second, it didn't consider that POLLFREE notifications
are missed while the request has been temporarily de-queued.
The second problem was solved by my previous patch. This patch then
properly fixes the use-after-free by handling POLLFREE in a
deadlock-free way. It does this by taking advantage of the fact that
freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.
Fixes: 2c14fa838c ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Pull drm fixes from Dave Airlie:
"Bit of an uptick in patch count this week, though it's all relatively
small overall.
I suspect msm has been queuing up a few fixes to skew it here.
Otherwise amdgpu has a scattered bunch of small fixes, and then some
vc4, i915.
virtio-gpu changes an rc1 introduced uAPI mistake, and makes it
operate more like other drivers. This should be fine as no userspace
relies on the behaviour yet.
Summary:
dma-buf:
- memory leak fix
msm:
- kasan found memory overwrite
- mmap flags
- fencing error bug
- ioctl NULL ptr
- uninit var
- devfreqless devices fix
- dsi lanes fix
- dp: avoid unpowered aux xfers
amdgpu:
- IP discovery based enumeration fixes
- vkms fixes
- DSC fixes for DP MST
- Audio fix for hotplug with tiled displays
- Misc display fixes
- DP tunneling fix
- DP fix
- Aldebaran fix
amdkfd:
- Locking fix
- Static checker fix
- Fix double free
i915:
- backlight regression
- Intel HDR backlight detection fix
- revert TGL workaround that caused hangs
virtio-gpu:
- switch back to drm_poll
vc4:
- memory leak
- error check fix
- HVS modesetting fixes"
* tag 'drm-fixes-2021-12-03-1' of git://anongit.freedesktop.org/drm/drm: (41 commits)
Revert "drm/i915: Implement Wa_1508744258"
drm/amdkfd: process_info lock not needed for svm
drm/amdgpu: adjust the kfd reset sequence in reset sriov function
drm/amd/display: add connector type check for CRC source set
drm/amdkfd: fix double free mem structure
drm/amdkfd: set "r = 0" explicitly before goto
drm/amd/display: Add work around for tunneled MST.
drm/amd/display: Fix for the no Audio bug with Tiled Displays
drm/amd/display: Clear DPCD lane settings after repeater training
drm/amd/display: Allow DSC on supported MST branch devices
drm/amdgpu: Don't halt RLC on GFX suspend
drm/amdgpu: fix the missed handling for SDMA2 and SDMA3
drm/amdgpu: check atomic flag to differeniate with legacy path
drm/amdgpu: cancel the correct hrtimer on exit
drm/amdgpu/sriov/vcn: add new vcn ip revision check case for SIENNA_CICHLID
drm/i915/dp: Perform 30ms delay after source OUI write
dma-buf: system_heap: Use 'for_each_sgtable_sg' in pages free flow
drm/i915: Add support for panels with VESA backlights with PWM enable/disable
drm/vc4: kms: Fix previous HVS commit wait
drm/vc4: kms: Don't duplicate pending commit
...
struct bpf_core_relo is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the struct bpf_core_relo becomes uapi de-jure.
Add an ability to pass a set of 'struct bpf_core_relo' to prog_load command
and let the kernel perform CO-RE relocations.
Note the struct bpf_line_info and struct bpf_func_info have the same
layout when passed from LLVM to libbpf and from libbpf to the kernel
except "insn_off" fields means "byte offset" when LLVM generates it.
Then libbpf converts it to "insn index" to pass to the kernel.
The struct bpf_core_relo's "insn_off" field is always "byte offset".
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-6-alexei.starovoitov@gmail.com
enum bpf_core_relo_kind is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the bpf_core_relo_kind values become uapi de-jure.
Also rename them with BPF_CORE_ prefix to distinguish from conflicting names in
bpf_core_read.h. The enums bpf_field_info_kind, bpf_type_id_kind,
bpf_type_info_kind, bpf_enum_value_kind are passing different values from bpf
program into llvm.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-5-alexei.starovoitov@gmail.com
The description of ETH_P_802_3_MIN is misleading.
The value of EthernetType in Ethernet II frame is more than 0x0600,
the value of Length in 802.3 frame is less than 0x0600.
Signed-off-by: Xiayu Zhang <Xiayu.Zhang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit aeeecb8891.
The new SNMP variable (TCPSmallQueueFailure) can be incremented
for good reasons, even on a 100Gbit single TCP_STREAM flow.
If we really wanted to ease driver debugging [1], this would
require something more sophisticated.
[1] Usually, if a driver is delaying TX completions too much,
this can lead to stalls in TCP output. Various work arounds
have been used in the past, like skb_orphan() in ndo_start_xmit().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Link: https://lore.kernel.org/r/20211201033246.2826224-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds the kernel-side and API changes for a new helper
function, bpf_loop:
long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx,
u64 flags);
where long (*callback_fn)(u32 index, void *ctx);
bpf_loop invokes the "callback_fn" **nr_loops** times or until the
callback_fn returns 1. The callback_fn can only return 0 or 1, and
this is enforced by the verifier. The callback_fn index is zero-indexed.
A few things to please note:
~ The "u64 flags" parameter is currently unused but is included in
case a future use case for it arises.
~ In the kernel-side implementation of bpf_loop (kernel/bpf/bpf_iter.c),
bpf_callback_t is used as the callback function cast.
~ A program can have nested bpf_loop calls but the program must
still adhere to the verifier constraint of its stack depth (the stack depth
cannot exceed MAX_BPF_STACK))
~ Recursive callback_fns do not pass the verifier, due to the call stack
for these being too deep.
~ The next patch will include the tests and benchmark
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211130030622.4131246-2-joannekoong@fb.com
Currently, we use hard code number to verify if we are in the
arp_interval timeslice. But some user may want to reduce/extend
the verify timeslice. With the similar team option 'missed_max'
the uers could change that number based on their own environment.
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once tcp small queue check failed in tcp_small_queue_check(), the
throughput of tcp will be limited, and it's hard to distinguish
whether it is out of tcp congestion control.
Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current virtgpu implementation of poll(..) drops events
when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is enabled (otherwise
it's like a normal DRM driver).
This is because paravirtualized userspaces receives responses in a
buffer of type BLOB_MEM_GUEST, not by read(..).
To be in line with other DRM drivers and avoid specialized behavior,
it is possible to define a dummy event for virtgpu. Paravirtualized
userspace will now have to call read(..) on the DRM fd to receive the
dummy event.
Fixes: b10790434c ("drm/virtgpu api: create context init feature")
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20211122232210.602-2-gurchetansingh@google.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This change adds a MCTP Serial transport binding, as defined by DMTF
specificiation DSP0253 - "MCTP Serial Transport Binding". This is
implemented as a new serial line discipline, and can be attached to
arbitrary tty devices.
From the Kconfig description:
This driver provides an MCTP-over-serial interface, through a
serial line-discipline, as defined by DMTF specification "DSP0253 -
MCTP Serial Transport Binding". By attaching the ldisc to a serial
device, we get a new net device to transport MCTP packets.
This allows communication with external MCTP endpoints which use
serial as their transport. It can also be used as an easy way to
provide MCTP connectivity between virtual machines, by forwarding
data between simple virtual serial devices.
Say y here if you need to connect to MCTP endpoints over serial. To
compile as a module, use m; the module will be called mctp-serial.
Once the N_MCTP line discipline is set [using ioctl(TCIOSETD)], we get a
new netdev suitable for MCTP communication.
The 'mctp' utility[1] provides a simple wrapper for this ioctl, using
'link serial <device>':
# mctp link serial /dev/ttyS0 &
# mctp link
dev lo index 1 address 0x00:00:00:00:00:00 net 1 mtu 65536 up
dev mctpserial0 index 5 address 0x(no-addr) net 1 mtu 68 down
[1]: https://github.com/CodeConstruct/mctp
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>