Since ("netfilter: nf_tables: drop map element references from
preparation phase"), integration with commit protocol is better,
therefore drop the workaround that b91d903688 ("netfilter: nf_tables:
fix leaking object reference count") provides.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The .walk callback iterates over the current active set, but it might be
useful to iterate over the next generation set. Use the generation mask
to determine what set view (either current or next generation) is use
for the walk iteration.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
set .destroy callback releases the references to other objects in maps.
This is very late and it results in spurious EBUSY errors. Drop refcount
from the preparation phase instead, update set backend not to drop
reference counter from set .destroy path.
Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the
reference counter because the transaction abort path releases the map
references for each element since the set is unbound. The abort path
also deals with releasing reference counter for new elements added to
unbound sets.
Fixes: 591054469b ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new state to deal with rule expressions deactivation from the
newrule error path, otherwise the anonymous set remains in the list in
inactive state for the next generation. Mark the set/chain transaction
as unbound so the abort path releases this object, set it as inactive in
the next generation so it is not reachable anymore from this transaction
and reference counter is dropped.
Fixes: 1240eb93f0 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add bound flag to rule and chain transactions as in 6a0a8d10a3
("netfilter: nf_tables: use-after-free in failing rule with bound set")
to skip them in case that the chain is already bound from the abort
path.
This patch fixes an imbalance in the chain use refcnt that triggers a
WARN_ON on the table and chain destroy path.
This patch also disallows nested chain bindings, which is not
supported from userspace.
The logic to deal with chain binding in nft_data_hold() and
nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a
special handling in case a chain is bound but next expressions in the
same rule fail to initialize as described by 1240eb93f0 ("netfilter:
nf_tables: incorrect error path handling with NFT_MSG_NEWRULE").
The chain is left bound if rule construction fails, so the objects
stored in this chain (and the chain itself) are released by the
transaction records from the abort path, follow up patch ("netfilter:
nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
completes this error handling.
When deleting an existing rule, chain bound flag is set off so the
rule expression .destroy path releases the objects.
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When using encapsulation the original packet's headers are copied to the
inner headers. This preserves the space for an inner mac header, which
is not used by the inner payloads for the encapsulation types supported
by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
segmented, flow can be passed to __skb_udp_tunnel_segment() which
calculates a negative tunnel header length. A negative tunnel header
length causes pskb_may_pull() to fail, dropping the packet.
This can be observed by attaching probes to ip_vs_in_hook(),
__dev_queue_xmit(), and __skb_udp_tunnel_segment():
perf probe --add '__dev_queue_xmit skb->inner_mac_header \
skb->inner_network_header skb->mac_header skb->network_header'
perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \
skb->inner_network_header skb->mac_header skb->network_header'
These probes the headers and tunnel header length for packets which
traverse the IPVS encapsulation path. A TCP packet can be forced into
the segmentation path by being smaller than a calculated clamped MSS,
but larger than the advertised MSS.
probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2
When using veth-based encapsulation, the interfaces are set to be
mac-less, which does not preserve space for an inner mac header. This
prevents this issue from occurring.
In our real-world testing of sending a 32KB file we observed operation
time increasing from ~75ms for veth-based encapsulation to over 1.5s
using IPVS encapsulation due to retries from dropped packets.
This changeset modifies the packet on the encapsulation path in
ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
header offset. This fixes UDP segmentation for both encapsulation types,
and corrects the inner headers for any IPIP flows that may use it.
Fixes: 84c0d5e96f ("ipvs: allow tunneling with gue encapsulation")
Signed-off-by: Terin Stock <terin@cloudflare.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSMvxkACgkQSD+KveBX
+j42zQf+NxeAf0VYlNTUuAh/I5Zr4oi97SI/bexjOLvrQmN8+7Z1blKiaXAgCudY
ot3r2ySg3IL2nM/UhCIMd56qkYPJVxNwLOtfe+DYDU0pGXmrh6Fsn/QXNR2DLw/e
tQ/4KYQjC0t6kwfDKmEUOnNLg60E3CNCvodG7u7t75N7gmkR323ZFhTXIw5RRXu0
03UjgJm3y8Mkd7YK8e64UOJksTIJ2tOVZeMAtuNyrDUg2IC9LDlDmumDfsjm9ZBe
RFZQi03szjQLi5gXX6NaDh9eI5+HGclvsoygTDJF2Oe4jVx+vmVjiVhmvDOjFjSC
kT10AJuqwiB4tHqkxoWM0JDwp66zUA==
=/hiQ
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2023-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-fixes-2023-06-16
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the QCA7000 is not available via SPI (e.g. in reset),
the driver will cause a high load. The reason for this is
that the synchronization is never finished and schedule()
is never called. Since the synchronization is not timing
critical, it's safe to drop this from the scheduling condition.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: 291ab06ecf ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: David S. Miller <davem@davemloft.net>
If the core is left to remove the LEDs via devm_, it is performed too
late, after the PHY driver is removed from the PHY. This results in
dereferencing a NULL pointer when the LED core tries to turn the LED
off before destroying the LED.
Manually unregister the LEDs at a safe point in phy_remove.
Cc: stable@vger.kernel.org
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: 01e5b728e9 ("net: phy: Add a binding for PHY LEDs")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The module loads firmware so add MODULE_FIRMWARE macros to provide that
information via modinfo.
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The module loads firmware so add a MODULE_FIRMWARE macro to provide that
information via modinfo.
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running workloads heavy unbalanced towards TX (high TX, low RX
traffic), sfc driver can retain the CPU during too long times. Although
in many cases this is not enough to be visible, it can affect
performance and system responsiveness.
A way to reproduce it is to use a debug kernel and run some parallel
netperf TX tests. In some systems, this will lead to this message being
logged:
kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s!
The reason is that sfc driver doesn't account any NAPI budget for the TX
completion events work. With high-TX/low-RX traffic, this makes that the
CPU is held for long time for NAPI poll.
Documentations says "drivers can process completions for any number of Tx
packets but should only process up to budget number of Rx packets".
However, many drivers do limit the amount of TX completions that they
process in a single NAPI poll.
In the same way, this patch adds a limit for the TX work in sfc. With
the patch applied, the watchdog warning never appears.
Tested with netperf in different combinations: single process / parallel
processes, TCP / UDP and different sizes of UDP messages. Repeated the
tests before and after the patch, without any noticeable difference in
network or CPU performance.
Test hardware:
Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core)
Solarflare Communications XtremeScale X2522-25G Network Adapter
Fixes: 5227ecccea ("sfc: remove tx and MCDI handling from NAPI budget consideration")
Fixes: d19a537218 ("sfc_ef100: TX path for EF100 NICs")
Reported-by: Fei Liu <feliu@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230615084929.10506-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASO query can be scheduled in atomic context as such it can't use usleep.
Use udelay as recommended in Documentation/timers/timers-howto.rst.
Fixes: 76e463f650 ("net/mlx5e: Overcome slow response for first IPsec ASO WQE")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really
need to hold lock while modifying flow steering rules to drop traffic.
That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit()
work will be canceled anyway and won't run in parallel.
Fixes: b2f7b01d36 ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local
variable was passed as its HW action data, resulting in attempt to
free invalid address:
BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core]
Fixes: 4781df92f4 ("net/mlx5: DR, Move STEv0 modify header logic")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In some cases, steering might need to use SW-created action in
FW table, which results in wrong packet reformat being used:
mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154):
SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed,
status bad resource(0×5), syndrome (0xf2ff71)
This patch adds support for usage of SW-created packet reformat (encap)
actions in FW tables, and adds clear error flow for attempt to use
SW-created modify header on FW tables.
Fixes: 6a48faeeca ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit removes special handling of CT action. But it
removes too much. Pre ct/ct_nat tables and some other resources
are not destroyed due to the cited commit.
Fix it by adding it back.
Fixes: 08fe94ec5f ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commits add hardware miss support to tc action. But if
the rules can't be offloaded, the pointers are null and system
will panic when accessing them.
Fix it by checking null pointer.
Fixes: 08fe94ec5f ("net/mlx5e: TC, Remove special handling of CT action")
Fixes: 6702782845 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When a PCI device has just one msix vector available, we want to share
this vector between async and completion events. Current code fails to
do that assuming it will always have at least one dedicated vector for
completion events. Fix this by detecting when the pool contains just a
single vector.
Fixes: 3354822cde ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit missed setting napi_id on XSK RQs, it only affected
regular RQs. Add the missing part to support socket busy polling on XSK
RQs.
Fixes: a2740f529d ("net/mlx5e: xsk: Set napi_id to support busy polling")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commits missed passing frag_size to __xdp_rxq_info_reg, which
is required by bpf_xdp_adjust_tail to support growing the tail pointer
in fragmented packets. Pass the missing parameter when the current RQ
mode allows XDP multi buffer.
Fixes: ea5d49bdae ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Fixes: 9cb9482ef1 ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There are some MD5 tests which fail when the kernel is in FIPS mode,
since MD5 is not FIPS compliant. Add a check and only run those tests
if FIPS mode is not enabled.
Fixes: f0bee1ebb5 ("fcnal-test: Add TCP MD5 tests")
Fixes: 5cad8bce26 ("fcnal-test: Add TCP MD5 tests for VRF")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The vrf-xfrm-tests tests use the hmac(md5) and cbc(des3_ede)
algorithms for performing authentication and encryption, respectively.
This causes the tests to fail when fips=1 is set, since these algorithms
are not allowed in FIPS mode. Therefore, switch from hmac(md5) and
cbc(des3_ede) to hmac(sha1) and cbc(aes), which are FIPS compliant.
Fixes: 3f251d7411 ("selftests: Add tests for vrf and xfrms")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TLS selftests use the ChaCha20-Poly1305 and SM4 algorithms, which are not
FIPS compliant. When fips=1, this set of tests fails. Add a check and only
run these tests if not in FIPS mode.
Fixes: 4f336e88a8 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Fixes: e506342a03 ("selftests/tls: add SM4 GCM/CCM to tls selftests")
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before executing each test from a fixture, FIXTURE_SETUP is run once.
When SKIP is used in FIXTURE_SETUP, the setup function returns early
but the test still proceeds to run, unless another SKIP macro is used
within the test definition, leading to some code repetition. Therefore,
allow tests to be skipped directly from the setup function.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests excluded - we have 58 patches and diff of +442/-199,
which isn't really small but perhaps with the exception of
the WiFi locking change it's old(ish) bugs.
We have no known problems with v6.4.
The selftest changes are rather large as MPTCP folks try to apply
Greg's guidance that selftest from torvalds/linux should be able
to run against stable kernels.
Last thing I should call out is the DCCP/UDP-lite deprecation notices,
we are fairly sure those are dead, but if we're wrong reverting them
back in won't be fun.
Current release - regressions:
- wifi:
- cfg80211: fix double lock bug in reg_wdev_chan_valid()
- iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
Current release - new code bugs:
- handshake: remove fput() that causes use-after-free
Previous releases - regressions:
- sched: cls_u32: fix reference counter leak leading to overflow
- sched: cls_api: fix lockup on flushing explicitly created chain
Previous releases - always broken:
- nf_tables: integrate pipapo into commit protocol
- nf_tables: incorrect error path handling with NFT_MSG_NEWRULE,
fix dangling pointer on failure
- ping6: fix send to link-local addresses with VRF
- sched: act_pedit: parse L3 header for L4 offset, the skb may
not have the offset saved
- sched: act_ct: fix promotion of offloaded unreplied tuple
- sched: refuse to destroy an ingress and clsact Qdiscs if there
are lockless change operations in flight
- wifi: mac80211: fix handful of bugs in multi-link operation
- ipvlan: fix bound dev checking for IPv6 l3s mode
- eth: enetc: correct the indexes of highest and 2nd highest TCs
- eth: ice: fix XDP memory leak when NIC is brought up and down
Misc:
- add deprecation notices for UDP-lite and DCCP
- selftests: mptcp: skip tests not supported by old kernels
- sctp: handle invalid error codes without calling BUG()
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSLllkACgkQMUZtbf5S
IrubBg/+OeLG7Q3h80t8q1UV2uRXXp3zYcV1Hm2DEtP96RuBYXR4q06/n9Pqbt8P
gkPWS8dHgt+hCKgsPP2XcWOKxWXK4knTDcV58MVVo4DiVfjCNa6KKb6Glo+G/fvY
8RlLpQAaTLWBqm8BSQMLL5paWTe9q9LK0w1g280fwVnbPchtqM594zmpP2dm6z3o
sSFMtYHN62h0isLnrlo1cnY/Qq6H/OWMZDdcJpMoRXIF0JHKMfbangotX/MjgCGj
4EYrIwQj8+Ctyg+QgmgK5Pr53i2as/ErfrXQKfvjq/4FyLECPUd+KXu6uJW8TpIi
2/wzO9ssx0iArAn5V+OPqAalbWpJoQ4ba1Ztdd2GKSaOtR8zNYL0QepYK3s+n3YT
88ZJC0rDOKq9E3MdMuBVgV83NFtwkDe4JdKJwYW2F8+UsDs0jxXjcCEuH719GKSz
Ag5RK7MQGm3N1Uom9RDGlMin+cvTjWH/owN39ibvJ5G90JTUpGU7IyVHi0Z8X1DG
lb0C/fc/QF9xl0S7B+LgyRh53lBY0L+zLO8JYK51n+VzU1L9ur5sylqoS3P2XtwB
4gHX1E+OAX1j4X/lvwF6nclISQs9nF9G41EYfnh38+YtcAKd70+Yo0/cnY5HUCvr
KKELhdXfqx/Dx18aq8o9IhRuECM81Q7dHHoe6PhHxZaJFgn0nSE=
=oNA0
-----END PGP SIGNATURE-----
Merge tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and netfilter.
Selftests excluded - we have 58 patches and diff of +442/-199, which
isn't really small but perhaps with the exception of the WiFi locking
change it's old(ish) bugs.
We have no known problems with v6.4.
The selftest changes are rather large as MPTCP folks try to apply
Greg's guidance that selftest from torvalds/linux should be able to
run against stable kernels.
Last thing I should call out is the DCCP/UDP-lite deprecation notices.
We are fairly sure those are dead, but if we're wrong reverting them
back in won't be fun.
Current release - regressions:
- wifi:
- cfg80211: fix double lock bug in reg_wdev_chan_valid()
- iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
Current release - new code bugs:
- handshake: remove fput() that causes use-after-free
Previous releases - regressions:
- sched: cls_u32: fix reference counter leak leading to overflow
- sched: cls_api: fix lockup on flushing explicitly created chain
Previous releases - always broken:
- nf_tables: integrate pipapo into commit protocol
- nf_tables: incorrect error path handling with NFT_MSG_NEWRULE, fix
dangling pointer on failure
- ping6: fix send to link-local addresses with VRF
- sched: act_pedit: parse L3 header for L4 offset, the skb may not
have the offset saved
- sched: act_ct: fix promotion of offloaded unreplied tuple
- sched: refuse to destroy an ingress and clsact Qdiscs if there are
lockless change operations in flight
- wifi: mac80211: fix handful of bugs in multi-link operation
- ipvlan: fix bound dev checking for IPv6 l3s mode
- eth: enetc: correct the indexes of highest and 2nd highest TCs
- eth: ice: fix XDP memory leak when NIC is brought up and down
Misc:
- add deprecation notices for UDP-lite and DCCP
- selftests: mptcp: skip tests not supported by old kernels
- sctp: handle invalid error codes without calling BUG()"
* tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
dccp: Print deprecation notice.
udplite: Print deprecation notice.
octeon_ep: Add missing check for ioremap
selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
net: tipc: resize nlattr array to correct size
sfc: fix XDP queues mode with legacy IRQ
net: macsec: fix double free of percpu stats
net: lapbether: only support ethernet devices
MAINTAINERS: add reviewers for SMC Sockets
s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
net/sched: cls_api: Fix lockup on flushing explicitly created chain
ice: Fix ice module unload
net/handshake: remove fput() that causes use-after-free
selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
...
merge; where DM core was splitting large discards every 128K
(max_sectors_kb) rather than every 64M (discard_max_bytes).
- Extend DM core LOCKFS fix, made during 6.4 merge, to also fix race
between do_mount and dm's do_suspend (in addition to the earlier
fix's do_mount race with dm's do_resume).
- Fix DM thin metadata operations to first check if the thin-pool is
in "fail_io" mode; otherwise UAF can occur.
- Fix DM thinp's call to __blkdev_issue_discard to use GFP_NOIO rather
than GFP_NOWAIT (__blkdev_issue_discard cannot handle NULL return
from bio_alloc).
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmSLTh0ACgkQxSPxCi2d
A1rmqggAildPKBjT8nqZmU86lpsy60E03OwvBnGPkMF5pjkOUmTjkb5EWVSAmeuO
ojj0pWlC+1ZvVkiDfkWxt0NL/4ETD4q+5oy1ARBcOawPX6bj0eXLoBr6m10b+KOb
mKAoXgYrESEzQ2qPBe4a4Lj3zIBXzXpMpW9TtF23z4HnDpnwpED5xNPWBgiWc3O/
/6MF1ASLp0DWldoL+gmIp9hEzyQzbzgM4uBOGC4UAYk3U1I55qwX6bWDZ9cQNGMh
AqCSrphuKHvbsb31yb1X3hB3g1XbAeSvvcizgFY0g9ZpncddKm5gx0BWVDO7qGBG
UxLIec19kQ2CIEx/QJZIhEjneLlJ/g==
=mME4
-----END PGP SIGNATURE-----
Merge tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM thinp discard performance regression introduced during this
merge window where DM core was splitting large discards every 128K
(max_sectors_kb) rather than every 64M (discard_max_bytes).
- Extend DM core LOCKFS fix, made during 6.4 merge, to also fix race
between do_mount and dm's do_suspend (in addition to the earlier
fix's do_mount race with dm's do_resume).
- Fix DM thin metadata operations to first check if the thin-pool is in
"fail_io" mode; otherwise UAF can occur.
- Fix DM thinp's call to __blkdev_issue_discard to use GFP_NOIO rather
than GFP_NOWAIT (__blkdev_issue_discard cannot handle NULL return
from bio_alloc).
* tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: use op specific max_sectors when splitting abnormal io
dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard
dm thin metadata: check fail_io before using data_sm
dm: don't lock fs when the map is NULL during suspend or resume
Many rc bug fixes:
- Two rtrs bug fixes for error unwind bugs
- Several rxe bug fixes:
* Incorrect Rx packet validation
* Using memory without a refcount
* Syzkaller found use before initialization
* Regression fix for missing locking with the tasklet conversion from
this merge window
- Have bnxt report the correct link properties to userspace, this was
a regression in v6.3
- Several mlx5 bug fixes:
* Kernel crash triggerable by userspace for the RAW ethernet profile
* Defend against steering refcounting issues created by userspace
* Incorrect change of QP port affinity parameters in some LAG configurations
- Fix mlx5 Q counters:
* Do not over allocate Q counters to allow userspace to use the full
port capacity
* Kernel crash triggered by eswitch due to mis-use of Q counters
* Incorrect mlx5_device for Q counters in some LAG configurations
- Properly implement the IBA spec restricting privileged qkeys to root
- Always an error when reading from a disassociated device's event queue
- isert bug fixes:
* Avoid a deadlock with the CM handler and CM ID destruction
* Correct list corruption due to incorrect locking
* Fix a use after free around connection tear down
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZIoZ7gAKCRCFwuHvBreF
YUSyAPsH+VdqzYd1z/bk0zMds9JR2oqMOR8l02e4gq8K9hjTJAD/ePEuT5PTpFZu
FrT4SDhOfA30XNz+RofRNisJC92OOAo=
=KyF7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"This is an unusually large bunch of bug fixes for the later rc cycle,
rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
itself, and mlx5 is fixing a bunch of "queue counters" related bugs.
There is one highly notable bug fix regarding the qkey. This small
security check was missed in the original 2005 implementation and it
allows some significant issues.
Summary:
- Two rtrs bug fixes for error unwind bugs
- Several rxe bug fixes:
* Incorrect Rx packet validation
* Using memory without a refcount
* Syzkaller found use before initialization
* Regression fix for missing locking with the tasklet conversion
from this merge window
- Have bnxt report the correct link properties to userspace, this was
a regression in v6.3
- Several mlx5 bug fixes:
* Kernel crash triggerable by userspace for the RAW ethernet
profile
* Defend against steering refcounting issues created by userspace
* Incorrect change of QP port affinity parameters in some LAG
configurations
- Fix mlx5 Q counters:
* Do not over allocate Q counters to allow userspace to use the
full port capacity
* Kernel crash triggered by eswitch due to mis-use of Q counters
* Incorrect mlx5_device for Q counters in some LAG configurations
- Properly implement the IBA spec restricting privileged qkeys to
root
- Always an error when reading from a disassociated device's event
queue
- isert bug fixes:
* Avoid a deadlock with the CM handler and CM ID destruction
* Correct list corruption due to incorrect locking
* Fix a use after free around connection tear down"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix rxe_cq_post
IB/isert: Fix incorrect release of isert connection
IB/isert: Fix possible list corruption in CMA handler
IB/isert: Fix dead lock in ib_isert
RDMA/mlx5: Fix affinity assignment
IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
RDMA/uverbs: Restrict usage of privileged QKEYs
RDMA/cma: Always set static rate to 0 for RoCE
RDMA/mlx5: Fix Q-counters query in LAG mode
RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
RDMA/mlx5: Fix Q-counters per vport allocation
RDMA/mlx5: Create an indirect flow table for steering anchor
RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
RDMA/rxe: Fix the use-before-initialization error of resp_pkts
RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
RDMA/rxe: Fix ref count error in check_rkey()
RDMA/rxe: Fix packet length checks
RDMA/rtrs: Fix rxe_dealloc_pd warning
RDMA/rtrs: Fix the last iu->buf leak in err path
A few more driver specific fixes, the DesignWare fix is for an issue
introduced by conversion to the chip select accessor functions and is
pretty important but the other two are less severe.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSK+EcACgkQJNaLcl1U
h9AGrQf+OypgdT5kMex1gYUB7GaNPW3HcK49b0v2TudjmkRNVQCPZVVzbWQI7VfZ
EuB3+miDKbwxgpmFQHae+iQZL7uavqQWBQU2D8Gk5H15FDmdRw9aHCFZAsZrSn1x
QBRXo3b+tD2q1Feh1oCX0epbkDZB8MSlbUBTTtT/Q/LoEKEk/JDogsjsa7LVTKtz
GsUEWsQouNZ1ZrKfXVQ9RM5z9y6nhk3JNzs9bH+kvN/fRwj41mbHin96bb1+xL9U
bqta9JyDPAF1NWXagerky/BTuT/84ESbxiOcXujqr+wtaUgjgkKaCSj2uE5hws+o
9sld5NL3X3uqId+QQ+pivUYQ8Et2sQ==
=/wJ/
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few more driver specific fixes.
The DesignWare fix is for an issue introduced by conversion to the
chip select accessor functions and is pretty important but the other
two are less severe"
* tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dw: Replace incorrect spi_get_chipselect with set
spi: fsl-dspi: avoid SCK glitches with continuous transfers
spi: cadence-quadspi: Add missing check for dma_set_mask
One fix for v6.4, the set of regulators described for the Qualcomm
PM8550 just seems to have been completely wrong and would likely not
have worked at all if anything tried to actually configure anything
except for enabling and disabling at runtime.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSK+MgACgkQJNaLcl1U
h9AUuQgAgXhB/Y+Lv2Bsn/oiEnzvTcrj2/oYFhIryDn0Jm9iwPIeNKPg9xNa6rhd
9nWO4S1G8WUchu2X8AzULzQZNfGWezweh0WI5t5AgkGfj/9yIuRc9A/pMWTJCDlA
RdNGV/Xcvq9Is2kBjWMp747/0gpZma379tEB08FC8GqOx3+pzTVrHxG9y7xdchvD
Kjcplmt80O/Ab4AxWxrMSdcYDQvPb31aUj7P+ODIPuM3wzJihY64ED4aDK6t0VHm
p3U9onG3trF0gZePx4nfrSLHdZkcITnwgtFv4h9RfNN34ufi3Y9hBpOI7X4WQp4D
5Hnc7NT5h377LQpVEEmUJpDNlBZzcA==
=SOua
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"The set of regulators described for the Qualcomm PM8550 just seems to
have been completely wrong and would likely not have worked at all if
anything tried to actually configure anything except for enabling and
disabling at runtime"
* tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix regulators for PM8550
Another fix for the maple tree cache, Takashi noticed that unlike other
caches the maple tree cache didn't check for read only registers before
trying to sync which would result in spurious syncs for read only
registers where we don't have a default. This was due to the check
being open coded in the caches, we now check in the shared "does this
register need sync" function so that is fixed for this and future
caches.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmSK99kACgkQJNaLcl1U
h9B9YAf/cfOLwuU+OoHp6pvCZHQjojwRlXenp7pEzMlVvlzl/cm4Y0XDGpuP/vGc
MipxznnCzxLGWN2eVhoQcp2Ay8+bHJ9EohQ68HJ/WUv1GVjR+my7sqVVflqorIwm
gwDy2NBEBFe7m1MEITveLVWcV5Hviy7X0SMC7XPeAOs8Edg1AOqrsJ4Ta1+6z6zg
7/jlCwozDggV+qO/+jDGrKGCRidqH1E5KgTm+oWHV4HMJEPqt0GYYK+/5ELAQIsb
uWhye5LsIKbwLbOhcbkxqLf1r0Zeg4bwgubhfvgs8aky+uWCJrw9eVQD2S04mdVI
Rata8+XNkhYAv0BWRN87kg9wKyXyBQ==
=z6Cs
-----END PGP SIGNATURE-----
Merge tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"Another fix for the maple tree cache, Takashi noticed that unlike
other caches the maple tree cache didn't check for read only registers
before trying to sync which would result in spurious syncs for read
only registers where we don't have a default.
This was due to the check being open coded in the caches, we now check
in the shared 'does this register need sync' function so that is fixed
for this and future caches"
* tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: regcache: Don't sync read-only registers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmSK8uAACgkQCF8+vY7k
4RWCbQ/9E9Va+mCFiujL6P+ttDy8nIPwxrwohh3hhJatoMmWFGW1JYBcNhByIHKM
Cz9yaSnEuIeoHvvlj+G62cl4XZt6pw9h68FNSnt+i4Jti2I7qydgKWuEoep6OdgO
FkHlcvJhsWnARD4uBtZb+kiimr9PSZljUYFyZYlnN1ehG9UbVRS/gVR5LBjIcepe
1b/KTiD01dKHII+WLmxf15gZnkxmfQsXIRuqih7o1nlSGCGlf7YVv1nh4Tg+VDkN
IlyRPHwRP14L719pmUUfFIrtAyzgCnbCRdkwRDqK9USFiFthvmbKO7bUjlrRjtJR
2ZtM3xPT4knnEjiHOhbhct+um6yEATL9TkEJCO6WqYlyXJ5SN0Hkon7fb6/+Avcy
U5xPm367uzjcyjWUYjjhflT9qdoiBMy/heqfm9+fCr4rkVubTo/B3XX8w+WAIOC6
zNLqa9IvBFAYVfDsVXBL+rrzYqubPRBVmsQdMts0C8LpCXW2SQDnyeGl26VPw8Bl
nwVQI0Tf9n8Vpcm7wIhgX038tCCDiRoC72ArPUoGOs2+fYAvmNqZep0yzOTEyFwk
m3QlnoqSl4d7Lt9tYUUxr1e6u7h5JuQvHF0U6BCub9CQt4trj8EFpQb8FI5v8PfX
EHL3acMRIN0fwHwE2OAoIfjaKr2mc+4B8hRPFSX3PCmJxYy8WH8=
=+TC4
-----END PGP SIGNATURE-----
Merge tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A fix for dvb-core to avoid a race condition during DVB board
registration"
* tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "media: dvb-core: Fix use-after-free on race condition at dvb_frontend"
multiple users (and tracked by regzbot[2]).
[1] https://syzkaller.appspot.com/bug?extid=4acc7d910e617b360859
[2] https://linux-regtracking.leemhuis.info/regzbot/regression/ZIauBR7YiV3rVAHL@glitch/
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmSLPr8ACgkQ8vlZVpUN
gaO08wgAlk1HqZub9iJm7ZQRx4qVb4zGYGL8So4IE/971FT8Vq3suJsQlTWpPVBB
8O6H5bsq5F54wfjmY44aUyBjc1SEJL2zfrEeMxcsxH+1tEoEe8i+FT3BXVr6+Y7k
qdZYrqmtbYpM2dyNW7LY4UDRQBE6ug+r6Cq1z2Mo7cfwfiBcYYe7lncbl3xJe7D8
8DZLgjQg48wxvU5EOK2yHMHw0YfJySnmaCo7J7sU5TLZj91IEnlNGwAjYIy1GUBV
3I+/+U70QR1VtZig7v4q6TI3LDHfegRc8zIIOtM3u1qs2bzLFgyoNURtut2vhF2+
/XyXxFabmlpG7yp3UZVetCECKVa2Kg==
=WJ4O
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix two regressions in ext4, one report by syzkaller[1], and reported
by multiple users (and tracked by regzbot[2])"
[1] https://syzkaller.appspot.com/bug?extid=4acc7d910e617b360859
[2] https://linux-regtracking.leemhuis.info/regzbot/regression/ZIauBR7YiV3rVAHL@glitch/
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: drop the call to ext4_error() from ext4_get_group_info()
Revert "ext4: remove unnecessary check in ext4_bg_num_gdb_nometa"
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmSKVUsACgkQiiy9cAdy
T1EwhAv/bazgxHAjxmoZ8hxw21uH/Ufvf2Cvt49yFTjxiaOCoSPLovA5AyB1FlTz
xsw/Cf6haO+RIYpLQ1KAIg7Qt3zwUu50Io2s/yyaHmQu7d+eNUPISajrJIjYOnqB
tAVHav0XcvTaPAzyr9luqmBZnLaAeatK5CbrK+gT4xstW/cuTu/F4JUF8fOL++ww
/IDTvo6UKqEKuPD8dpszXZnJjuj8aIsRIh8CBGI73QrDVExTzhrdmRULWqwSj0Lr
nHsh8hgUZ25dSe/9jIagPqgvzaizvcyzopJuO8mpQAMDww9o1kBlud9GLdgNGnfM
bDHHbn3Y9Fu5VaxMJ/n9lozMXT+OB+3ZrrBFP8J0unSidzsEzWKfYjE0gPXGKru8
YhvjWKdJZjIOEu9QRuD4KK+MOF20cuq3OmvKpoZiEr2niB7oX25B+JddxW2CJD5a
oAiP1+1/wiVe79KUlLLRe+OTtujUXtMiXPePzHh15dYetg+P4qJCBqWjDPdzRmdr
MfMhsZpq
=peSC
-----END PGP SIGNATURE-----
Merge tag '6.4-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Eight, mostly small, smb3 client fixes:
- important fix for deferred close oops (race with unmount) found
with xfstest generic/098 to some servers
- important reconnect fix
- fix problem with max_credits mount option
- two multichannel (interface related) fixes
- one trivial removal of confusing comment
- two small debugging improvements (to better spot crediting
problems)"
* tag '6.4-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: add a warning when the in-flight count goes negative
cifs: fix lease break oops in xfstest generic/098
cifs: fix max_credits implementation
cifs: fix sockaddr comparison in iface_cmp
smb/client: print "Unknown" instead of bogus link speed value
cifs: print all credit counters in DebugData
cifs: fix status checks in cifs_tree_connect
smb: remove obsolete comment
Kuniyuki Iwashima says:
====================
udplite/dccp: Print deprecation notice.
UDP-Lite is assumed to have no users for 7 years, and DCCP is
orphaned for 7 years too.
Let's add deprecation notice and see if anyone responds to it.
====================
Link: https://lore.kernel.org/r/20230614194705.90673-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DCCP was marked as Orphan in the MAINTAINERS entry 2 years ago in commit
054c4610bd ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"). It says
we haven't heard from the maintainer for five years, so DCCP is not well
maintained for 7 years now.
Recently DCCP only receives updates for bugs, and major distros disable it
by default.
Removing DCCP would allow for better organisation of TCP fields to reduce
the number of cache lines hit in the fast path.
Let's add a deprecation notice when DCCP socket is created and schedule its
removal to 2025.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recently syzkaller reported a 7-year-old null-ptr-deref [0] that occurs
when a UDP-Lite socket tries to allocate a buffer under memory pressure.
Someone should have stumbled on the bug much earlier if UDP-Lite had been
used in a real app. Also, we do not always need a large UDP-Lite workload
to hit the bug since UDP and UDP-Lite share the same memory accounting
limit.
Removing UDP-Lite would simplify UDP code removing a bunch of conditionals
in fast path.
Let's add a deprecation notice when UDP-Lite socket is created and schedule
its removal to 2025.
Link: https://lore.kernel.org/netdev/20230523163305.66466-1-kuniyu@amazon.com/ [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add check for ioremap() and return the error if it fails in order to
guarantee the success of ioremap().
Fixes: 862cd659a6 ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20230615033400.2971-1-jiasheng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, timestamps were printed using "%lld.%u" which is incorrect
for nanosecond values lower than 100,000,000 as they're fractional
digits, therefore leading zeros are meaningful.
This patch changes the format strings to "%lld.%09u" in order to add
leading zeros to the nanosecond value.
Fixes: 568ebc5985 ("ptp: add the PTP_SYS_OFFSET ioctl to the testptp program")
Fixes: 4ec54f9573 ("ptp: Fix compiler warnings in the testptp utility")
Fixes: 6ab0e475f1 ("Documentation: fix misc. warnings")
Signed-off-by: Alex Maftei <alex.maftei@amd.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20230615083404.57112-1-alex.maftei@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a possible memory leak in __stmmac_open when stmmac_init_phy fails.
It's also needed to free everything allocated by stmmac_setup_dma_desc
and not just the dma_conf struct.
Drop free_dma_desc_resources from __stmmac_open and correctly call
free_dma_desc_resources on each user of __stmmac_open on error.
Reported-by: Jose Abreu <Jose.Abreu@synopsys.com>
Fixes: ba39b344e9 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jose Abreu <Jose.Abreu@synopsys.com>
Link: https://lore.kernel.org/r/20230614091714.15912-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to nla_parse_nested_deprecated(), the tb[] is supposed to the
destination array with maxtype+1 elements. In current
tipc_nl_media_get() and __tipc_nl_media_set(), a larger array is used
which is unnecessary. This patch resize them to a proper size.
Fixes: 1e55417d8f ("tipc: add media set to new netlink api")
Fixes: 46f15c6794 ("tipc: add media get/dump to new netlink api")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230614120604.1196377-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Split abnormal IO in terms of the corresponding operation specific
max_sectors (max_discard_sectors, max_secure_erase_sectors or
max_write_zeroes_sectors).
This fixes a significant dm-thinp discard performance regression that
was introduced with commit e2dd8aca2d ("dm bio prison v1: improve
concurrent IO performance"). Relative to discard: max_discard_sectors
is used instead of max_sectors; which fixes excessive discard splitting
(e.g. max_sectors=128K vs max_discard_sectors=64M).
Tested by discarding an 1 Petabyte dm-thin device:
lvcreate -V 1125899906842624B -T test/pool -n thin
time blkdiscard /dev/test/thin
Before this fix (splitting discards every 128K): ~116m
After this fix (splitting discards every 64M) : 0m33.460s
Reported-by: Zorro Lang <zlang@redhat.com>
Fixes: 06961c487a ("dm: split discards further if target sets max_discard_granularity")
Requires: 13f6facf3f ("dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE")
Fixes: e2dd8aca2d ("dm bio prison v1: improve concurrent IO performance")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
issue_discard() passes GFP_NOWAIT to __blkdev_issue_discard() despite
its code assuming bio_alloc() always succeeds.
Commit 3dba53a958 ("dm thin: use __blkdev_issue_discard for async
discard support") clearly shows where things went bad:
Before commit 3dba53a958, dm-thin.c's open-coded
__blkdev_issue_discard_async() properly handled using GFP_NOWAIT.
Unfortunately __blkdev_issue_discard() doesn't and it was missed
during review.
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Must check pmd->fail_io before using pmd->data_sm since
pmd->data_sm may be destroyed by other processes.
P1(kworker) P2(message)
do_worker
process_prepared
process_prepared_discard_passdown_pt2
dm_pool_dec_data_range
pool_message
commit
dm_pool_commit_metadata
↓
// commit failed
metadata_operation_failed
abort_transaction
dm_pool_abort_metadata
__open_or_format_metadata
↓
dm_sm_disk_open
↓
// open failed
// pmd->data_sm is NULL
dm_sm_dec_blocks
↓
// try to access pmd->data_sm --> UAF
As shown above, if dm_pool_commit_metadata() and
dm_pool_abort_metadata() fail in pool_message process, kworker may
trigger UAF.
Fixes: be500ed721 ("dm space maps: improve performance with inc/dec on ranges of blocks")
Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
As described in commit 38d11da522 ("dm: don't lock fs when the map is
NULL in process of resume"), a deadlock may be triggered between
do_resume() and do_mount().
This commit preserves the fix from commit 38d11da522 but moves it to
where it also serves to fix a similar deadlock between do_suspend()
and do_mount(). It does so, if the active map is NULL, by clearing
DM_SUSPEND_LOCKFS_FLAG in dm_suspend() which is called by both
do_suspend() and do_resume().
Fixes: 38d11da522 ("dm: don't lock fs when the map is NULL in process of resume")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>