Commit Graph

1029825 Commits

Author SHA1 Message Date
Jason Wang
530a5678bc vdpa: support packed virtqueue for set/get_vq_state()
This patch extends the vdpa_vq_state to support packed virtqueue
state which is basically the device/driver ring wrap counters and the
avail and used index. This will be used for the virito-vdpa support
for the packed virtqueue and the future vhost/vhost-vdpa support for
the packed virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210602021536.39525-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-08 07:49:01 -04:00
Jason Wang
72b5e89587 virtio-ring: store DMA metadata in desc_extra for split virtqueue
For split virtqueue, we used to depend on the address, length and
flags stored in the descriptor ring for DMA unmapping. This is unsafe
for the case since the device can manipulate the behavior of virtio
driver, IOMMU drivers and swiotlb.

For safety, maintain the DMA address, DMA length, descriptor flags and
next filed of the non indirect descriptors in vring_desc_state_extra
when DMA API is used for virtio as we did for packed virtqueue and use
those metadata for performing DMA operations. Indirect descriptors
should be safe since they are using streaming mappings.

With this the descriptor ring is write only form the view of the
driver.

This slight increase the footprint of the drive but it's not noticed
through pktgen (64B) test and netperf test in the case of virtio-net.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-8-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
5bc72234f7 virtio: use err label in __vring_new_virtqueue()
Using error label for unwind in __vring_new_virtqueue. This is useful
for future refacotring.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-7-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
fe4c3862df virtio_ring: introduce virtqueue_desc_add_split()
This patch introduces a helper for storing descriptor in the
descriptor table for split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-6-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
44593865b7 virtio_ring: secure handling of mapping errors
We should not depend on the DMA address, length and flag of descriptor
table since they could be wrote with arbitrary value by the device. So
this patch switches to use the stored one in desc_extra.

Note that the indirect descriptors are fine since they are read-only
streaming mappings.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-5-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
5a22242160 virtio-ring: factor out desc_extra allocation
A helper is introduced for the logic of allocating the descriptor
extra data. This will be reused by split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
1f28750f2e virtio_ring: rename vring_desc_extra_packed
Rename vring_desc_extra_packed to vring_desc_extra since the structure
are pretty generic which could be reused by split virtqueue as well.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Jason Wang
aeef9b4733 virtio-ring: maintain next in extra state for packed virtqueue
This patch moves next from vring_desc_state_packed to
vring_desc_desc_extra_packed. This makes it simpler to let extra state
to be reused by split virtqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210604055350.58753-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Eli Cohen
e3aadf2e16 vdpa/mlx5: Clear vq ready indication upon device reset
After device reset, the virtqueues are not ready so clear the ready
field.

Failing to do so can result in virtio_vdpa failing to load if the device
was previously used by vhost_vdpa and the old values are ready.
virtio_vdpa expects to find VQs in "not ready" state.

Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210606053128.170399-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-08 07:49:01 -04:00
Eli Cohen
b57c46cb3c vdpa/mlx5: Add support for doorbell bypassing
Implement mlx5_get_vq_notification() to return the doorbell address.
Since the notification area is mapped to userspace, make sure that the
BAR size is at least PAGE_SIZE large.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210603081153.5750-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-08 07:49:01 -04:00
Michael S. Tsirkin
a7766ef18b virtio_net: disable cb aggressively
There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi.  We currently do this with callbacks
enabled which can cause extra interrupts from the card.  Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08 07:49:01 -04:00
Takashi Iwai
24d1e49415 ALSA: intel8x0: Fix breakage at ac97 clock measurement
The recent workaround for the wild interrupts in commit c1f0616124
("ALSA: intel8x0: Don't update period unless prepared") leaded to a
regression, causing the interrupt storm during ac97 clock measurement
at the driver probe.  We need to handle the interrupt while the clock
measurement as well as the proper PCM streams.

Fixes: c1f0616124 ("ALSA: intel8x0: Don't update period unless prepared")
Reported-and-tested-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/CAMo8BfKKMQkcsbOQaeEjq_FsJhdK=fn598dvh7YOcZshUSOH=g@mail.gmail.com
Link: https://lore.kernel.org/r/20210708090738.1569-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-07-08 12:26:43 +02:00
Florian Fainelli
9615fe36b3 skbuff: Fix build with SKB extensions disabled
We will fail to build with CONFIG_SKB_EXTENSIONS disabled after
8550ff8d8c ("skbuff: Release nfct refcount on napi stolen or re-used
skbs") since there is an unconditionally use of skb_ext_find() without
an appropriate stub. Simply build the code conditionally and properly
guard against both COFNIG_SKB_EXTENSIONS as well as
CONFIG_NET_TC_SKB_EXT being disabled.

Fixes: Fixes: 8550ff8d8c ("skbuff: Release nfct refcount on napi stolen or re-used skbs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08 00:07:14 -07:00
Roy, UjjaL
92c4bed59b ipmr: Fix indentation issue
Fixed indentation by removing extra spaces.

Signed-off-by: Roy, UjjaL <royujjal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07 20:52:25 -07:00
Dan Carpenter
271dbc3184 sock: unlock on error in sock_setsockopt()
If copy_from_sockptr() then we need to unlock before returning.

Fixes: d463126e23 ("net: sock: extend SO_TIMESTAMPING for PHC binding")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07 20:49:12 -07:00
Dave Airlie
21c355b097 Short summary of fixes pull:
* amdgpu: TTM fixes
  * dma-buf: Doc fixes
  * gma500: Fix potential BO leaks in error handling
  * radeon: Fix NULL-ptr deref
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmDdhbwACgkQaA3BHVML
 eiM/fggAu5y7COqhMIMwdSOXIkLszILovZXAdERzIa6uPeh+zS1TLiW6hqkFam5g
 HtiUm982SP1OZ8p+w5R/BboeRuUymyVXdrAg+0+T6hH7aC1NtN11tfXr8h2aVJhn
 gduh5IixnhDScMrvI+8uVwzLH46fRJDwvfSGGXG3LQT1atWY4npTLozRwLE3uvMM
 w2kSoWsemo3d+40bdilhNKAwolSGNSgBW7OBhE8A8nN+3J8smNkfD9fP6cVG90Je
 HCQ0kYpIckyx43VTWrYJYYS8T7l+x8QdwjgSNmhjZ391A/EyB8oKAUL7ov8VdxJC
 vI8AabZf4JMV0f+2aVhxVMTJPvHhpQ==
 =gX7D
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-fixes-2021-07-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

Short summary of fixes pull:

 * amdgpu: TTM fixes
 * dma-buf: Doc fixes
 * gma500: Fix potential BO leaks in error handling
 * radeon: Fix NULL-ptr deref

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YN2GK2SH64yqXqh9@linux-uq9g
2021-07-08 11:17:32 +10:00
Dave Airlie
5cebdea6f8 One fix targeting stable for display DP VSC, plus DG1 display fix and
a bug fix of IRQs usages and cleanup references to the DRM IRQ midlayer.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmDlw5EACgkQ+mJfZA7r
 E8opywf/TlQC3hwWhOEfit2Sr3YbJVtqVR/1AmTPW+/5PoothXor2tuWpIijAj9L
 UGgVcFjCCJ22uC9GQQewW8E9BGM8P6+0QLrNp9EwrjlzuEjLXz3UYjdoTuXROYQu
 Czquux2HXohrL/wSA6lyWqKyPS/vgajengbR9A7WSreq1c5nkZC6EIklQawM2DsK
 fa45WrCu9kPo9FPjcEI5KpL68DzStazfA4nDrNve7R6hEV05ouUGo7t5qDq2yPFf
 wZm/+KjTfLfPQ8ab+RTsccEc2fFP+pcpy2iq2n7FVHLjCzZUhVnrBeDPKX1H35yX
 vx7xM7y2S7MaT0bLmkiU+b6ISVyExA==
 =Ew6e
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-next-fixes-2021-07-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

One fix targeting stable for display DP VSC, plus DG1 display fix and
a bug fix of IRQs usages and cleanup references to the DRM IRQ midlayer.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YOXDp/+CFDgJ2/7f@intel.com
2021-07-08 10:51:31 +10:00
Dave Airlie
0d3a1b37ab Merge tag 'amd-drm-next-5.14-2021-07-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.14-2021-07-01:

amdgpu:
- Misc Navi fixes
- Powergating fix
- Yellow Carp updates
- Beige Goby updates
- S0ix fix
- Revert overlay validation fix
- GPU reset fix for DC
- PPC64 fix
- Add new dimgrey cavefish DID
- RAS fix

amdkfd:
- SVM fixes

radeon:
- Fix missing drm_gem_object_put in error path

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210701042241.25449-1-alexander.deucher@amd.com
2021-07-08 08:45:20 +10:00
Steve French
d4dc277c48 CIFS: Clarify SMB1 code for POSIX Lock
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 PosixLock. This changeset
doesn't change the address but makes it slightly clearer.

Addresses-Coverity: 711520 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 16:43:17 -05:00
Steve French
f371793d6e CIFS: Clarify SMB1 code for rename open file
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 RenameOpenFile. This changeset
doesn't change the address but makes it slightly clearer.

Addresses-Coverity: 711521 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 16:42:25 -05:00
Wei Li
1d719254c1 tools: bpf: Fix error in 'make -C tools/ bpf_install'
make[2]: *** No rule to make target 'install'.  Stop.
make[1]: *** [Makefile:122: runqslower_install] Error 2
make: *** [Makefile:116: bpf_install] Error 2

There is no rule for target 'install' in tools/bpf/runqslower/Makefile,
and there is no need to install it, so just remove 'runqslower_install'.

Fixes: 9c01546d26 ("tools/bpf: Add runqslower tool to tools/bpf")
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210628030409.3459095-1-liwei391@huawei.com
2021-07-07 14:06:38 -07:00
David S. Miller
d7fba8ff3e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Do not refresh timeout in SYN_SENT for syn retransmissions.
   Add selftest for unreplied TCP connection, from Florian Westphal.

2) Fix null dereference from error path with hardware offload
   in nftables.

3) Remove useless nf_ct_gre_keymap_flush() from netns exit path,
   from Vasily Averin.

4) Missing rcu read-lock side in ctnetlink helper info dump,
   also from Vasily.

5) Do not mark RST in the reply direction coming after SYN packet
   for an out-of-sync entry, from Ali Abdallah and Florian Westphal.

6) Add tcp_ignore_invalid_rst sysctl to allow to disable out of
   segment RSTs, from Ali.

7) KCSAN fix for nf_conntrack_all_lock(), from Manfred Spraul.

8) Honor NFTA_LAST_SET in nft_last.

9) Fix incorrect arithmetics when restore last_jiffies in nft_last.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07 14:00:14 -07:00
Hangbin Liu
0e02bf5de4 selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
After redirecting, it's already a new path. So the old PMTU info should
be cleared. The IPv6 test "mtu exception plus redirect" should only
has redirect info without old PMTU.

The IPv4 test can not be changed because of legacy.

Fixes: ec81053528 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07 13:45:30 -07:00
Hangbin Liu
24b671aad4 selftests: icmp_redirect: remove from checking for IPv6 route get
If the kernel doesn't enable option CONFIG_IPV6_SUBTREES, the RTA_SRC
info will not be exported to userspace in rt6_fill_node(). And ip cmd will
not print "from ::" to the route output. So remove this check.

Fixes: ec81053528 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07 13:45:30 -07:00
YueHaibing
eca81f0914 stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
The "plat->phy_interface" variable is an enum and in this context GCC
will treat it as an unsigned int so the error handling is never
triggered.

Fixes: b9f0b2f634 ("net: stmmac: platform: fix probe for ACPI devices")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07 13:43:50 -07:00
YueHaibing
0d472c69c6 stmmac: dwmac-loongson: Fix unsigned comparison to zero
plat->phy_interface is unsigned integer, so the condition
can't be less than zero and the warning will never printed.

Fixes: 30bba69d7d ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07 13:43:16 -07:00
Linus Torvalds
e9f1cbc0c4 More ACPI updates for 5.14-rc1
- Fix up the recently added Platform Runtime Mechanism (PRM)
    support by correnting a couple of implementation mistakes in it
    and adding a Kconfig help text to describe it (Aubrey Li,
    Rafael Wysocki).
 
  - Add backlight quirk for Dell Vostro 3350 (Hans de Goede).
 
  - Avoid spurious wakeups from suspend-to-idle on non-Intel platforms
    by restricting special EC GPE handling to the Intel ones (Mario
    Limonciello).
 
  - Modify the AMBA bus support in ACPI to avoid adding using
    resource names in /proc/iomem (Liguang Zhang).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmDl+UASHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxjGQQAKm7uZDGb0S+FJoP321habXnkRlU+KKI
 GAQzxZBMVduxV2+/qC4EVb3X5jTr0x5MQzJEz9kLJhB22hJx35Zs9y/HS2I/HBxb
 oHGmsEiNq5jKXycwEodvpIn0l5s9/HFrQ1lJpSpYVAPMmjRkRvW8kFREV+2PtD5r
 MBA3KizXgLfbEEQLd4KQZhL6YIwg/DqWtgNZ6I5FAO5gRvVn0/6RhRGvDCPOlXU3
 El75Mc0UyaUniAPEd7DX6XSS8E2yexOBYt5PQ0BSb0xnwHkY1kLfKd2aYoS8qcJQ
 H56dMWOiSdbGwjEPvEQhhv9+PmjbGF/loDiz4/StKZ+ZV+kw76bZEqWdMQyYycfO
 LgzJFGqXt6bek/PhG0zMCFq+sHAwqjow17kO7elpVLqqI0/wxQ+3nmvFeN61zHlR
 AwMVeWIWCRxqiqcgRC9FsK/XMKpTJ0yGgYSR7VitO88AWKEfjIiD9nAdrkZ3eXf2
 kIGpbojKaeP/EMCH55cVrEGFZ1S9/Vf7l/Ab66UFPDMczQpM6hbrWQUHaf4BYLmH
 5sqPLhhpU3QYGxOyxvzd7W3ek7XLAiz34q0lNR5EOHvqcsBdH0/uvDkwmsy9VHkx
 qIB8j9MwKKhbB44yNNwTfDt5YJISET+pfmZWT3W1P3VkiHHe3C8lghTk/6qpF103
 dPWUU2ouPcCo
 =PXKo
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "These include fixes of the recently introduced support for the
  Platform Runtime Mechanism (PRM) feature, a new backlight quirk, a
  suspend-to-idle wakeup fix for non-Intel platforms and a fix for the
  AMBA bus resource list in /proc/iomem.

  Specifics:

   - Fix up the recently added Platform Runtime Mechanism (PRM) support
     by correnting a couple of implementation mistakes in it and adding
     a Kconfig help text to describe it (Aubrey Li, Rafael Wysocki).

   - Add backlight quirk for Dell Vostro 3350 (Hans de Goede).

   - Avoid spurious wakeups from suspend-to-idle on non-Intel platforms
     by restricting special EC GPE handling to the Intel ones (Mario
     Limonciello).

   - Modify the AMBA bus support in ACPI to avoid adding using resource
     names in /proc/iomem (Liguang Zhang)"

* tag 'acpi-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: Do not singal PRM support if not enabled
  ACPI: Correct \_SB._OSC bit definition for PRM
  ACPI: Kconfig: Provide help text for the ACPI_PRMT option
  ACPI: PM: Only mark EC GPE for wakeup on Intel systems
  ACPI: video: Add quirk for the Dell Vostro 3350
  ACPI: AMBA: Fix resource name in /proc/iomem
2021-07-07 13:30:01 -07:00
Linus Torvalds
aef4226f91 More power management updates for 5.14-rc1
- Drop the ->stop_cpu() (not really useful) and ->resolve_freq()
    (unused) cpufreq driver callbacks and modify the users of the
    former accordingly (Viresh Kumar, Rafael Wysocki).
 
  - Add frequency invariance support to the ACPI CPPC cpufreq driver
    again along with the related fixes and cleanups (Viresh Kumar).
 
  - Update the Meditak, qcom and SCMI ARM cpufreq drivers (Fabien
    Parent, Seiya Wang, Sibi Sankar, Christophe JAILLET).
 
  - Rename black/white-lists in the DT cpufreq driver (Viresh Kumar).
 
  - Add generic performance domains support to the dvfs DT bindings
    (Sudeep Holla).
 
  - Refine locking in the generic power domains (genpd) support code
    to avoid lock dependency issues (Stephen Boyd).
 
  - Update the MSM and qcom ARM cpuidle drivers (Bartosz Dudziak).
 
  - Simplify the PM core debug code by using ktime_us_delta() to
    compute time interval lengths (Mark-PK Tsai).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmDl+NcSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxyggQALTT/4fFZYlAG/2jpu0wl/3OeR5EPS/O
 /KrfU7NfXOlNDiuZzVVrMsu1YhkAMedvhHl+9bvveYJBgNuMz7Y1WSGY4PAbW9j+
 htnvPd8UdpNODAhT3MDV9sdpXglkWAjivCAM7nr57nl3vwrRgAhBEBkA0/OKiij7
 dlTSRy+Doq0AI3ceOHlMFHGnL+/PBx3YJcOuKBwK9gHrbCQxoBQJ5QJnm2Sa8NEy
 QGoQWWvSI6mDntsAYL4g7mbUdOqGSN04KdXnf0nA9Ywe/Lb/Rp/Il7zCfc13iBSV
 Nv1ala2NFtp/W+nQan8hxXOgbET+ybbduG65bB33McTuaZ3VIV5sSxxAfEWV+DiM
 PGEGwDJ/vZqoJiiTKmYij07/sIUXe1Q2YsxTvDh0tqzFmO78reDfXlLdfrrOjRsd
 Ybw3dkI/XQH7/sTawCxn/TdFRj2hx7wg7yDqkLJXnA037BN+EBFqSckyABoBPpmA
 TwcacjyotwnfwOpoxbxmjCX1VwIIZ2Mk7Q3h+v9Ej5aNIuyDfR0DzybqCLqIan/4
 vETtz+OCO5H1BT6g/ctIi13e7MvWzZXudNWTRTS8ZVXzuzo1hO3okHWuXiQbKGOA
 Dh2sjkJBBgr9WPFjkF5mgZZp8SM0D4S5SSAEglTKSwxcaMcqXVlgEN8beyMa5FuD
 y/UeopwhL66a
 =Un7e
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These include cpufreq core simplifications and fixes, cpufreq driver
  updates, cpuidle driver update, a generic power domains (genpd)
  locking fix and a debug-related simplification of the PM core.

  Specifics:

   - Drop the ->stop_cpu() (not really useful) and ->resolve_freq()
     (unused) cpufreq driver callbacks and modify the users of the
     former accordingly (Viresh Kumar, Rafael Wysocki).

   - Add frequency invariance support to the ACPI CPPC cpufreq driver
     again along with the related fixes and cleanups (Viresh Kumar).

   - Update the Meditak, qcom and SCMI ARM cpufreq drivers (Fabien
     Parent, Seiya Wang, Sibi Sankar, Christophe JAILLET).

   - Rename black/white-lists in the DT cpufreq driver (Viresh Kumar).

   - Add generic performance domains support to the dvfs DT bindings
     (Sudeep Holla).

   - Refine locking in the generic power domains (genpd) support code to
     avoid lock dependency issues (Stephen Boyd).

   - Update the MSM and qcom ARM cpuidle drivers (Bartosz Dudziak).

   - Simplify the PM core debug code by using ktime_us_delta() to
     compute time interval lengths (Mark-PK Tsai)"

* tag 'pm-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (21 commits)
  PM: domains: Shrink locking area of the gpd_list_lock
  PM: sleep: Use ktime_us_delta() in initcall_debug_report()
  cpufreq: CPPC: Add support for frequency invariance
  arch_topology: Avoid use-after-free for scale_freq_data
  cpufreq: CPPC: Pass structure instance by reference
  cpufreq: CPPC: Fix potential memleak in cppc_cpufreq_cpu_init
  cpufreq: Remove ->resolve_freq()
  cpufreq: Reuse cpufreq_driver_resolve_freq() in __cpufreq_driver_target()
  cpufreq: Remove the ->stop_cpu() driver callback
  cpufreq: powernv: Migrate to ->exit() callback instead of ->stop_cpu()
  cpufreq: CPPC: Migrate to ->exit() callback instead of ->stop_cpu()
  cpufreq: intel_pstate: Combine ->stop_cpu() and ->offline()
  cpuidle: qcom: Add SPM register data for MSM8226
  dt-bindings: arm: msm: Add SAW2 for MSM8226
  dt-bindings: cpufreq: update cpu type and clock name for MT8173 SoC
  clk: mediatek: remove deprecated CLK_INFRA_CA57SEL for MT8173 SoC
  cpufreq: dt: Rename black/white-lists
  cpufreq: scmi: Fix an error message
  cpufreq: mediatek: add support for mt8365
  dt-bindings: dvfs: Add support for generic performance domains
  ...
2021-07-07 13:22:59 -07:00
Linus Torvalds
c6e8c51f69 power supply and reset changes for the v5.14 series
battery/charger driver changes:
  * convert charger-manager binding to YAML
  * drop bd70528-charger driver
  * drop pm2301-charger driver
  * introduce rt5033-battery driver
  * misc. improvements and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE72YNB0Y/i3JqeVQT2O7X88g7+poFAmDdZH0ACgkQ2O7X88g7
 +pqLpxAAixSqw2UrKCipLUaWWH1vanwPeU78z8XEzpHdNayrTR9PujBMhWH+ps+R
 IgHNjsQuJsnvbedWfmzkgjrZe5amAuN6Va8OkVDSVzEY+RZvLmXfj9POC5d4LmNK
 wpIBM/Okjie097j3ZWz7CJp47rsQnkS9EvRY4FevNjz1zt1VSpQNyDHAjemsn+j9
 1F9BnBMr2gzgTMxLDIloa71VMEaA8cZlAWulIKwxaN5FaSwoacK0NfestjM1R/Bc
 Z50pfqzAXBrofm14WKQRIgSEkf9zM6S1AjHG/y4b/C69XbzmBJxn+FK1F+yukWfN
 Kiq1kGvE9zWqOrDYPd7LxYDetTJ9JgNDxMreLIP7N+syurFYizl7v0JU9TL6BKsh
 EXCklOx/xaCs0Y3a3kBH8dJ72MBppW1loE0XbAqhkMDkzQybTxg9xUI1gnhVTIlb
 5b6KbU5kUo4AghJ279zSNxYNHABKxyu/WCZKNafsLT9pC41muGIPT8ZNgAs2Z3fG
 V60RSiAFpRrHU7efuizbXYchLEv+vWJG7kfdWmHWawMIqE11cUHgMIjFvkv/41t/
 zD3ZnlvvQe/ZZFO16aasvpj99BsZkI5tmNjmVnfPc2cOsLw/HFHIObSe+7x2PpVe
 7peflSz6DHL48NAsoq71WxeoPVPm3YYMMA9kr4tt9ZnVBbLJaCM=
 =EFOr
 -----END PGP SIGNATURE-----

Merge tag 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply and reset updates from Sebastian Reichel:
 "Battery/charger driver changes:

   - convert charger-manager binding to YAML

   - drop bd70528-charger driver

   - drop pm2301-charger driver

   - introduce rt5033-battery driver

   - misc improvements and fixes"

* tag 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (42 commits)
  power: supply: ab8500: Fix an old bug
  power: supply: axp288_fuel_gauge: remove redundant continue statement
  power: supply: axp288_fuel_gauge: Make "T3 MRD" no_battery_list DMI entry more generic
  power: supply: axp288_fuel_gauge: Rename fuel_gauge_blacklist to no_battery_list
  power: supply: bq24190_charger: drop of_match_ptr() from device ID table
  drivers: power: add missing MODULE_DEVICE_TABLE in keystone-reset.c
  power: supply: ab8500: add missing MODULE_DEVICE_TABLE
  power: supply: charger-manager: add missing MODULE_DEVICE_TABLE
  power: reset: regulator-poweroff: add missing MODULE_DEVICE_TABLE
  power: supply: cpcap-charger: get the battery inserted infomation from cpcap-battery
  power: supply: cpcap-battery: invalidate config when incompatible measurements are read
  power: supply: axp20x_battery: allow disabling battery charging
  power: supply: max17040: drop unused platform data support
  power: supply: max17040: simplify POWER_SUPPLY_PROP_ONLINE
  power: supply: max17040: remove non-working POWER_SUPPLY_PROP_STATUS
  power: reset: at91-sama5d2_shdwc: Remove redundant error printing in at91_shdwc_probe()
  power: reset: gpio-poweroff: add missing MODULE_DEVICE_TABLE
  power: supply: rt5033_battery: Fix device tree enumeration
  dt-bindings: power: supply: Add DT schema for richtek,rt5033-battery
  power: supply: Drop BD70528 support
  ...
2021-07-07 13:17:48 -07:00
Linus Torvalds
9d69294be2 linux-watchdog 5.14-rc1 tag
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iEYEABECAAYFAmDltyYACgkQ+iyteGJfRsqsYgCffWETSHEY5Ydo1ocA74lKLYa6
 zuQAnjvBNRiuDXazyF1c3FTydg1A6BUy
 =AF+Z
 -----END PGP SIGNATURE-----

Merge tag 'linux-watchdog-5.14-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - Add Mstar MSC313e WDT driver

 - Add support for sama7g5-wdt

 - Add compatible for SC7280 SoC

 - Add compatible for Mediatek MT8195

 - sbsa: Support architecture version 1

 - Removal of the MV64x60 watchdog driver

 - Extra PCI IDs for hpwdt

 - Add hrtimer-based pretimeout feature

 - Add {min,max}_timeout sysfs nodes

 - keembay timeout and pre-timeout handling

 - Several fixes, cleanups and improvements

* tag 'linux-watchdog-5.14-rc1' of git://www.linux-watchdog.org/linux-watchdog: (56 commits)
  watchdog: iTCO_wdt: use dev_err() instead of pr_err()
  watchdog: Add Mstar MSC313e WDT driver
  dt-bindings: watchdog: Add Mstar MSC313e WDT devicetree bindings documentation
  watchdog: iTCO_wdt: Account for rebooting on second timeout
  dt-bindings: watchdog: Convert arm,sbsa-gwdt to DT schema
  dt-bindings: watchdog: sama5d4-wdt: add compatible for sama7g5-wdt
  watchdog: sama5d4_wdt: add support for sama7g5-wdt
  dt-bindings: watchdog: sama5d4-wdt: convert to yaml
  watchdog: ziirave_wdt: Remove VERSION_FMT defines and add sysfs newlines
  dt-bindings: watchdog: Add compatible for Mediatek MT8195
  dt-bindings: watchdog: dw-wdt: add description for rk3568
  watchdog: imx_sc_wdt: fix pretimeout
  watchdog: diag288_wdt: Remove redundant assignment
  watchdog: Add hrtimer-based pretimeout feature
  dt-bindings: watchdog: Add compatible for SC7280 SoC
  watchdog: qcom: Move suspend/resume to suspend_late/resume_early
  watchdog: Fix a typo in the file orion_wdt.c
  watchdog: jz4740: Fix return value check in jz4740_wdt_probe()
  watchdog: Remove MV64x60 watchdog driver
  doc: mtk-wdt: support pre-timeout when the bark irq is available
  ...
2021-07-07 12:57:46 -07:00
Linus Torvalds
0cc2ea8ceb Some highlights:
- add tracepoints for callbacks and for client creation and
 	  destruction
 	- cache the mounts used for server-to-server copies
 	- expose callback information in /proc/fs/nfsd/clients/*/info
 	- don't hold locks unnecessarily while waiting for commits
 	- update NLM to use xdr_stream, as we have for NFSv2/v3/v4
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmDlvjIVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+0MoP/RJ8Q7zwIz6WFHn3bCRaEXpnnkAH
 mmMfELhmgvH0V5nXWbb2rAfhllY+/zeWtf8QHSEKUPCnVLmB7WeXKdjXSy7EnYJ8
 R8DuuuII85McIrg93nJ8hxm4wXTaTZKXpS4Vxkuxc6YKxoeJoXOaTjbgRLIw8mfX
 w4wPfjAsnROboVxvDHUmBS9zNKaAi2dZ0jH2x2eS7eZSWzoJC30yd+pFSxyYoOac
 3fZUntDskQDGIpXHuTf53WcaK7h1bUHrwS7Joez8Z0ctg4vcbJsfdhKZUZwAxOZh
 3xWAgm3PFcze5xqHuX8BYBThHfB3uTeygZQRb3zI9sG2UQtQfundrtlxZRSjMMkC
 cwlSi2SQNL66EBIgOcS3U/9OeorLALnnRax1KWMWjpFzaBJJQTJDumwLRx4zogI1
 Ouiu0fI+hApck+L+qCzJMidA2wxOBsDzH471YiGiqQSmgNZc6wBc+aC/JKN8QAWb
 jG53vvpa3gCZa8Rs3KyOoUvtcCCdiQc+nljbzqtVfIvvGa9MSixufa+U5fojLEO7
 i8aangK+mteMxrrejEKvRu1efDIfpFq0HW7ev1mzW2Jl/AguDXM5XUeGK2mMMPtc
 WqT3arbtGVcXJN+Oh5TzTVuED/DecyO0Fig77G+WJTiWONgoHfs+E5nC4aHSpohn
 bMpmQMIOmTa5zgQP
 =BQyR
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:

 - add tracepoints for callbacks and for client creation and destruction

 - cache the mounts used for server-to-server copies

 - expose callback information in /proc/fs/nfsd/clients/*/info

 - don't hold locks unnecessarily while waiting for commits

 - update NLM to use xdr_stream, as we have for NFSv2/v3/v4

* tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linux: (69 commits)
  nfsd: fix NULL dereference in nfs3svc_encode_getaclres
  NFSD: Prevent a possible oops in the nfs_dirent() tracepoint
  nfsd: remove redundant assignment to pointer 'this'
  nfsd: Reduce contention for the nfsd_file nf_rwsem
  lockd: Update the NLMv4 SHARE results encoder to use struct xdr_stream
  lockd: Update the NLMv4 nlm_res results encoder to use struct xdr_stream
  lockd: Update the NLMv4 TEST results encoder to use struct xdr_stream
  lockd: Update the NLMv4 void results encoder to use struct xdr_stream
  lockd: Update the NLMv4 FREE_ALL arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 SHARE arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 SM_NOTIFY arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 nlm_res arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 UNLOCK arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 CANCEL arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 LOCK arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 TEST arguments decoder to use struct xdr_stream
  lockd: Update the NLMv4 void arguments decoder to use struct xdr_stream
  lockd: Update the NLMv1 SHARE results encoder to use struct xdr_stream
  lockd: Update the NLMv1 nlm_res results encoder to use struct xdr_stream
  lockd: Update the NLMv1 TEST results encoder to use struct xdr_stream
  ...
2021-07-07 12:50:08 -07:00
Colin Ian King
bebedf2bb4 pwm: Remove redundant assignment to pointer pwm
The pointer pwm is being initialized with a value that is never read and
it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2021-07-07 21:43:32 +02:00
Pavel Begunkov
c32aace0cf io_uring: fix drain alloc fail return code
After a recent change io_drain_req() started to fail requests with
result=0 in case of allocation failure, where it should be and have
been -ENOMEM.

Fixes: 76cc33d791 ("io_uring: refactor io_req_defer()")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e068110ac4293e0c56cfc4d280d0f22b9303ec08.1625682153.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-07 12:49:32 -06:00
Linus Torvalds
a931dd33d3 Modules updates for v5.14
Summary of modules changes for the 5.14 merge window:
 
 - Fix incorrect logic in module_kallsyms_on_each_symbol()
 
 - Fix for a Coccinelle warning
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAmDltvoQHGpleXVAa2Vy
 bmVsLm9yZwAKCRDARX44zjvBcu74D/9+HDmNlEd92ongCVhod9+s4p/g7pArBPdW
 YrvjYM+icH0qpE58etIQOibvFL04WxelDqyrCewEhABwB5hbad+owShMl+R43C8P
 IRCYYN2sG7vHVPMiUgeXlS5OiBZeFOb+Mf9Ccnu5MuI/BHpsTX9uhG+rbHQg+lXb
 eOWu0QmmCXgtZy0NVeCNtdOJV/PwgXC3ZHE2zgWukf1QWtwm1+0VciZwui162eyN
 TL/XLSH5pNZNr2MSxi5b2rdwrUVdxb0enE4L4cwCdALipooVEsrxuVvugcYPBHrC
 38b7cJBfdYpVuS2iK9tSqSMdRkqX2LKZrfiGFCAxn2TA55Jc6WJhbzL0nEPGyGX1
 fRR9tGAuWe1ihQHRIajLMtmdfA/HHJMmuPx8c0JDIAeTj3SdDch/gEhcKcXj6kP2
 eXRoQS3RD7RSz6jKKRTFxJsn8lYJhSYN8IGwfjQzPwruLRynDQaCpnN+7wfGsdCf
 rKb7NxdsbNdQKjkaewu2zb/Qk7U7TOVjJhAv2gN0u/RbGSJ3i+MJTLfk3d/LLJcD
 ncLaVh2zRHIdKNQj/dtmIeXiSuJ23NjNQszOiaI8xZ7YKexhJCimAWPP/8f4dCuB
 2tL/Xkg1GVy9+WZQXkApyOPzvtXXE9dDfQR+HiNTQaTlZpw5SxYg0ZPBBL22yv1J
 xDD9Hfl1jw==
 =/zzN
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:

 - Fix incorrect logic in module_kallsyms_on_each_symbol()

 - Fix for a Coccinelle warning

* tag 'modules-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: correctly exit module_kallsyms_on_each_symbol when fn() != 0
  kernel/module: Use BUG_ON instead of if condition followed by BUG
2021-07-07 11:41:32 -07:00
Rafael J. Wysocki
166fdb4dd0 Merge branches 'acpi-misc', 'acpi-video' and 'acpi-prm'
* acpi-misc:
  ACPI: AMBA: Fix resource name in /proc/iomem

* acpi-video:
  ACPI: video: Add quirk for the Dell Vostro 3350

* acpi-prm:
  ACPI: Do not singal PRM support if not enabled
  ACPI: Correct \_SB._OSC bit definition for PRM
  ACPI: Kconfig: Provide help text for the ACPI_PRMT option
2021-07-07 20:18:11 +02:00
Rafael J. Wysocki
843372db2e Merge branches 'pm-cpuidle', 'pm-sleep' and 'pm-domains'
* pm-cpuidle:
  cpuidle: qcom: Add SPM register data for MSM8226
  dt-bindings: arm: msm: Add SAW2 for MSM8226

* pm-sleep:
  PM: sleep: Use ktime_us_delta() in initcall_debug_report()

* pm-domains:
  PM: domains: Shrink locking area of the gpd_list_lock
2021-07-07 20:17:43 +02:00
Linus Torvalds
1423e2660c Fixes and improvements for FPU handling on x86:
- Prevent sigaltstack out of bounds writes. The kernel unconditionally
     writes the FPU state to the alternate stack without checking whether
     the stack is large enough to accomodate it.
 
     Check the alternate stack size before doing so and in case it's too
     small force a SIGSEGV instead of silently corrupting user space data.
 
   - MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been
     updated despite the fact that the FPU state which is stored on the
     signal stack has grown over time which causes trouble in the field
     when AVX512 is available on a CPU. The kernel does not expose the
     minimum requirements for the alternate stack size depending on the
     available and enabled CPU features.
 
     ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
     Add it to x86 as well
 
   - A major cleanup of the x86 FPU code. The recent discoveries of XSTATE
     related issues unearthed quite some inconsistencies, duplicated code
     and other issues.
 
     The fine granular overhaul addresses this, makes the code more robust
     and maintainable, which allows to integrate upcoming XSTATE related
     features in sane ways.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDlcpETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoeP5D/4i+AgYYeiMLgGb+NS7iaKPfoWo6LIz
 y3qdTSA0DQaIYbYivWwRO/g0GYdDMXDWeZalFi7eGnVI8O3eOog+22Zrf/y0UINB
 KJHdYd4ApWHhs401022y5hexrWQvnV8w1yQCuj/zLm6eC+AVhdwt2AY+IBoRrdUj
 wqY97B/4rJNsBvvqTDn9EeDrJA2y0y0Suc7AhIp2BGMI+dpIdxys8RJDamXNWyDL
 gJf0YRgUoiIn3AHKb+fgv60AoxfC175NSg/5/y/scFNXqVlW0Up4YCb7pqG9o2Ga
 f3XvtWfbw1N5PmUYjFkALwEkzGUbM3v0RA3xLY2j2WlWm9fBPPy59dt+i/h/VKyA
 GrA7i7lcIqX8dfVH6XkrReZBkRDSB6t9SZTvV54jAz5fcIZO2Rg++UFUvI/R6GKK
 XCcxukYaArwo+IG62iqDszS3gfLGhcor/cviOeULRC5zMUIO4Jah+IhDnifmShtC
 M5s9QzrwIRD/XMewGRQmvkiN4kBfE7jFoBQr1J9leCXJKrM+2JQmMzVInuubTQIq
 SdlKOaAIn7xtekz+6XdFG9Gmhck0PCLMJMOLNvQkKWI3KqGLRZ+dAWKK0vsCizAx
 0BA7ZeB9w9lFT+D8mQCX77JvW9+VNwyfwIOLIrJRHk3VqVpS5qvoiFTLGJJBdZx4
 /TbbRZu7nXDN2w==
 =Mq1m
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Thomas Gleixner:
 "Fixes and improvements for FPU handling on x86:

   - Prevent sigaltstack out of bounds writes.

     The kernel unconditionally writes the FPU state to the alternate
     stack without checking whether the stack is large enough to
     accomodate it.

     Check the alternate stack size before doing so and in case it's too
     small force a SIGSEGV instead of silently corrupting user space
     data.

   - MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never
     been updated despite the fact that the FPU state which is stored on
     the signal stack has grown over time which causes trouble in the
     field when AVX512 is available on a CPU. The kernel does not expose
     the minimum requirements for the alternate stack size depending on
     the available and enabled CPU features.

     ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
     Add it to x86 as well.

   - A major cleanup of the x86 FPU code. The recent discoveries of
     XSTATE related issues unearthed quite some inconsistencies,
     duplicated code and other issues.

     The fine granular overhaul addresses this, makes the code more
     robust and maintainable, which allows to integrate upcoming XSTATE
     related features in sane ways"

* tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
  x86/fpu/signal: Let xrstor handle the features to init
  x86/fpu/signal: Handle #PF in the direct restore path
  x86/fpu: Return proper error codes from user access functions
  x86/fpu/signal: Split out the direct restore code
  x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing()
  x86/fpu/signal: Sanitize the xstate check on sigframe
  x86/fpu/signal: Remove the legacy alignment check
  x86/fpu/signal: Move initial checks into fpu__restore_sig()
  x86/fpu: Mark init_fpstate __ro_after_init
  x86/pkru: Remove xstate fiddling from write_pkru()
  x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate()
  x86/fpu: Remove PKRU handling from switch_fpu_finish()
  x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
  x86/fpu: Hook up PKRU into ptrace()
  x86/fpu: Add PKRU storage outside of task XSAVE buffer
  x86/fpu: Dont restore PKRU in fpregs_restore_userspace()
  x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()
  x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs()
  x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs()
  ...
2021-07-07 11:12:01 -07:00
Linus Torvalds
4ea9031795 xen: branch for v5.14-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYOVQVQAKCRCAXGG7T9hj
 vnsfAQCLwcqSbno1iiUFDFGDzwnF0RZjIxpvEvdI/Zc0KMAO5gEA9tixIpfBYUO3
 0yAd9g/VRumPS0wRsfwUrnLYs8TZ4gY=
 =pPbL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "Only two minor patches this time: one cleanup patch and one patch
  refreshing a Xen header"

* tag 'for-linus-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: sync include/xen/interface/io/ring.h with Xen's newest version
  xen: Use DEVICE_ATTR_*() macro
2021-07-07 11:07:13 -07:00
Linus Torvalds
383df634f1 More fallthrough fixes for Clang for 5.14-rc1
Hi Linus,
 
 Please, pull the following patches that fix many fall-through warnings
 when building with Clang 12.0.0 and this[1] change reverted. Notice
 that in order to enable -Wimplicit-fallthrough for Clang, such change[1]
 is meant to be reverted at some point. So, these patches help to move
 in that direction.
 
 Thanks!
 
 [1] commit e2079e93f5 ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmDlNrUACgkQRwW0y0cG
 2zH/WRAAmQ1o2HoyWp1KCoad6mR6EeCXNAEBSF2F8PL+Yi9oGPTJSRVEn5EPO4wZ
 9X15lnMo63nwYx09Ka96uqbXKf90bnf/EvwB7+GY+3+qAUw7wbgxW9wW0208gddl
 u8JTQKHEfUZVMEyye5f7FSNwTVVOtsX581GDsYIFxmiuO00BkoQdoMm43NNXb2b2
 W3PlCFNjRkbN7kqvCwTKRQJZw586/7hv6SiNA1UgBVeFuJEZvhC2Uaj7oOXHv5AP
 ICQDZI419Zzb7Vh2JcPFCsc00Ak33YEEpdU+iTX0xVyPkFIW6/wsgaFlGYugRR3n
 PxdpJtrvVj7uW/ncJ1UaOAaZZ3qfs2qTbrfoa+ybN+qfmocVz9Zlh9+bWn00Ymxw
 sk+MUtcfJF6Lt0IgbFgtYtruBo7Qp8IPalJuyoBpXdhLHJUifT+n3qdhPNrH2rpo
 uHuuseg5OJHsP3QxBA2Q4KtwhvI84+RDFCfwrHdSM5BitCFAyYGOLSzbtZHvcpPX
 EKW2P4toqZlmxCX+Bfxdc6DjqgogUm/38YcMjoOX5h4leugyWRjuBbCgX+xh121W
 wz5g02qUaoTNTSNylN/mL3POKVLv7PgZNj/5OXu5Cz3hk9qt/WqfwCQI6caxzrjR
 sRIlzP4qc4THWa9I99XbR2/8jXLjK0mZV9xwgO8XMcCDMAxrugA=
 =2p48
 -----END PGP SIGNATURE-----

Merge tag 'Wimplicit-fallthrough-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull more fallthrough fixes from Gustavo Silva:
 "Fix maore fall-through warnings when building the kernel with clang
  and '-Wimplicit-fallthrough'"

* tag 'Wimplicit-fallthrough-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  Input: Fix fall-through warning for Clang
  scsi: aic94xx: Fix fall-through warning for Clang
  i3c: master: cdns: Fix fall-through warning for Clang
  net/mlx4: Fix fall-through warning for Clang
2021-07-07 11:03:04 -07:00
Linus Torvalds
b5e6d1261e hwspinlock updates for v5.14
This adds a driver for the hardware spinlock in Allwinner sun6i.
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmDkxIobHGJqb3JuLmFu
 ZGVyc3NvbkBsaW5hcm8ub3JnAAoJEAsfOT8Nma3FxBoP/Rdp/4LiewU/+rTHFj4E
 HKjvIUp6uNLbhx62Gn8qbhhJ9jI/tWfs7YIR64Vnfu6Og6WtmS9I7s57u1Cd6xon
 /RTkq1m3Z4Ji8DKBbQIoIf7zoJvCOVORKb3lrfU9ly2gk6lZF+OexNjwJG0Hlmsk
 J3r81CsvmQ6NewWL7pPVD15c6pwruManTOdNui1wdG+bmpx7bpPuidDBbuTJb4k8
 6rFUxzskIoJpLtElvj/9Jg1/+9ins/rSaVX2zlW4lBj7gxAkb4S4Op88zFZ6srtC
 1dpJg8s2REPsPeTN//yZzNf2Pk2QlDhV7qcd7MpfG//DX6yNyuD09BR04eghO4I5
 NubATPXtdZyOH5qyG6auRtHTUb9OTxWcqFus4JotHhPaiojB4YjUpY3KKTxPmFNy
 5aSXKFLdmfeSBg/JPJIkyh3BXIUunhe9jusATaySAAxoYvAEPyNRtbhpgZkXqQ/r
 rthCQ1sdoDaIlijT+zQmA7hD3H1nK5t1q5T5DMbn/4Bau8eAFFabaCsaKB0LYsbr
 DCsufUN/9CgWWaRjitOHYHkCt1j4bdM9vj2E60BZjgP0oZqLFLkoVFTnKlgsp1oM
 zfMetsSNlUsvSZ9t5fWrMwXMNAbqu0jsuH2KruBkp9VXDw3fyXDt91p5nvT110BE
 ASBut9Jt9kHXid/owJaboF9V
 =dJJr
 -----END PGP SIGNATURE-----

Merge tag 'hwlock-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc

Pull hwspinlock updates from Bjorn Andersson:
 "This adds a driver for the hardware spinlock in Allwinner sun6i"

* tag 'hwlock-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
  dt-bindings: hwlock: sun6i: Fix various warnings in binding
  hwspinlock: add sun6i hardware spinlock support
  dt-bindings: hwlock: add sun6i_hwspinlock
2021-07-07 10:57:51 -07:00
Linus Torvalds
d0fe3f47ef remoteproc updates for v5.14
This adds support for controlling the PRU and R5F clusters on the TI
 AM64x, the remote processor in i.MX7ULP, i.MX8MN/P and i.MX8ULP NXP and
 the audio, compute and modem remoteprocs in the Qualcomm SC8180x
 platform.
 
 It fixes improper ordering of cdev and device creation of the remoteproc
 control interface and it fixes resource leaks in the error handling path
 of rproc_add() and the Qualcomm modem and wifi remoteproc drivers.
 
 Lastly it fixes a few build warnings and replace the dummy parameter
 passed in the mailbox api of the stm32 driver to something not living on
 the stack.
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmDkw/4bHGJqb3JuLmFu
 ZGVyc3NvbkBsaW5hcm8ub3JnAAoJEAsfOT8Nma3F8T0QALb6hoaoJSIBoFWllMED
 Pw0XCaU4s5aQDN74lUYiV3cztkkjgwKawpylB74yVU5GBP/js4p+05LgYsNjZU1i
 YcGHyqV9DmmBMm+yMJotdF/Nk+YsP5GxVaFF5yH4hXy5n9J2ootmjjrn0bWB/iom
 /Ud781UwqmrsZHnsFAZH322xk/iht38lYXPOUMSGddMW8ekfIa6ptcY2zVTsprKb
 CuRPShUX/rs7iAXDZueRqpiap94YlDtu9PddDJObRjtuQ2wM7WCWDVWWmXE+kwfP
 c1G6Ci1i5ul7w1TwyhqW0dxuIvbbM5dPEZnIVDUK3WkbmiFDLi0HmszwXezBnS0z
 dzX7Fouh7fs57hC7q+6jS5sqqLx13zKFT7f4RhEvM63yKXmFj42ood7PTU9dm/nU
 rrStcsZMQyPsAl7IB22Sr3Tog7I/0au7NxJw+AAJ2IRO3n3WzpCWXCOksdyFoZ8X
 dCeATsX2w3g/jAjXeWeP081GKhHM/VUviS9lI9XhNXRoJESd0C7+USIq5R/vWcRt
 dPgpjonJz3PEw7kZYIT64Aa+oeEtycJPoNcsgEw1jeU3sWPDe8AHOR6qS9MejBjW
 TdpjvJx93E3Xx3AUAj7L87TK8/fyAbOoJVrx3SPNBtiZXXuw6aLsvSoOlH6UgXqu
 SyW1zK1/kZDSIv28gpNziyLI
 =mf8k
 -----END PGP SIGNATURE-----

Merge tag 'rproc-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc

Pull remoteproc updates from Bjorn Andersson:
 "This adds support for controlling the PRU and R5F clusters on the TI
  AM64x, the remote processor in i.MX7ULP, i.MX8MN/P and i.MX8ULP NXP
  and the audio, compute and modem remoteprocs in the Qualcomm SC8180x
  platform.

  It fixes improper ordering of cdev and device creation of the
  remoteproc control interface and it fixes resource leaks in the error
  handling path of rproc_add() and the Qualcomm modem and wifi
  remoteproc drivers.

  Lastly it fixes a few build warnings and replace the dummy parameter
  passed in the mailbox api of the stm32 driver to something not living
  on the stack"

* tag 'rproc-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: (32 commits)
  remoteproc: qcom: pas: Add SC8180X adsp, cdsp and mpss
  dt-bindings: remoteproc: qcom: pas: Add SC8180X adsp, cdsp and mpss
  remoteproc: imx_rproc: support i.MX8ULP
  dt-bindings: remoteproc: imx_rproc: support i.MX8ULP
  remoteproc: stm32: fix mbox_send_message call
  remoteproc: core: Cleanup device in case of failure
  remoteproc: core: Fix cdev remove and rproc del
  remoteproc: core: Move validate before device add
  remoteproc: core: Move cdev add before device add
  remoteproc: pru: Add support for various PRU cores on K3 AM64x SoCs
  dt-bindings: remoteproc: pru: Update bindings for K3 AM64x SoCs
  remoteproc: qcom_wcnss: Use devm_qcom_smem_state_get()
  remoteproc: qcom_q6v5: Use devm_qcom_smem_state_get() to fix missing put()
  soc: qcom: smem_state: Add devm_qcom_smem_state_get()
  dt-bindings: remoteproc: qcom: pas: Fix indentation warnings
  remoteproc: imx-rproc: Fix IMX_REMOTEPROC configuration
  remoteproc: imx_rproc: support i.MX8MN/P
  remoteproc: imx_rproc: support i.MX7ULP
  remoteproc: imx_rproc: make clk optional
  remoteproc: imx_rproc: initial support for mutilple start/stop method
  ...
2021-07-07 10:50:03 -07:00
Steven Rostedt (VMware)
26c5637310 tracing/histograms: Fix parsing of "sym-offset" modifier
With the addition of simple mathematical operations (plus and minus), the
parsing of the "sym-offset" modifier broke, as it took the '-' part of the
"sym-offset" as a minus, and tried to break it up into a mathematical
operation of "field.sym - offset", in which case it failed to parse
(unless the event had a field called "offset").

Both .sym and .sym-offset modifiers should not be entered into
mathematical calculations anyway. If ".sym-offset" is found in the
modifier, then simply make it not an operation that can be calculated on.

Link: https://lkml.kernel.org/r/20210707110821.188ae255@oasis.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 100719dcef ("tracing: Add simple expression support to hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-07 13:14:21 -04:00
Steve French
2a780e8b64 CIFS: Clarify SMB1 code for delete
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 SetFileDisposition (which is used to
unlink a file by setting the delete on close flag).  This
changeset doesn't change the address but makes it slightly
clearer.

Addresses-Coverity: 711524 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 11:53:17 -05:00
Steve French
e3973ea3a7 CIFS: Clarify SMB1 code for SetFileSize
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the header
structure rather than from the beginning of the struct plus
4 bytes) for setting the file size using SMB1. This changeset
doesn't change the address but makes it slightly clearer.

Addresses-Coverity: 711525 ("Out of bounds write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-07 11:52:53 -05:00
SanjayKumar Jeyakumar
5616e895ec tools/runqslower: Use __state instead of state
Commit 2f064a59a1 ("sched: Change task_struct::state") renamed task->state
to task->__state in task_struct. Fix runqslower to use the new name of the
field.

Fixes: 2f064a59a1 ("sched: Change task_struct::state")
Signed-off-by: SanjayKumar Jeyakumar <vjsanjay@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210707052914.21473-1-vjsanjay@gmail.com
2021-07-07 09:42:26 -07:00
Filipe Manana
ea32af47f0 btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree
When syncing the log, if we fail to allocate the root node for the log
root tree:

1) We are unlocking fs_info->tree_log_mutex, but at this point we have
   not yet locked this mutex;

2) We have locked fs_info->tree_root->log_mutex, but we end up not
   unlocking it;

So fix this by unlocking fs_info->tree_root->log_mutex instead of
fs_info->tree_log_mutex.

Fixes: e75f9fd194 ("btrfs: zoned: move log tree node allocation out of log_root_tree->log_mutex")
CC: stable@vger.kernel.org # 5.13+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 18:27:44 +02:00
Johannes Thumshirn
9cc0b837e1 btrfs: don't block if we can't acquire the reclaim lock
If we can't acquire the reclaim_bgs_lock on block group reclaim, we
block until it is free. This can potentially stall for a long time.

While reclaim of block groups is necessary for a good user experience on
a zoned file system, there still is no need to block as it is best
effort only, just like when we're deleting unused block groups.

CC: stable@vger.kernel.org # 5.13
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:57 +02:00
Naohiro Aota
abb99cfdaf btrfs: properly split extent_map for REQ_OP_ZONE_APPEND
Damien reported a test failure with btrfs/209. The test itself ran fine,
but the fsck ran afterwards reported a corrupted filesystem.

The filesystem corruption happens because we're splitting an extent and
then writing the extent twice. We have to split the extent though, because
we're creating too large extents for a REQ_OP_ZONE_APPEND operation.

When dumping the extent tree, we can see two EXTENT_ITEMs at the same
start address but different lengths.

$ btrfs inspect dump-tree /dev/nullb1 -t extent
...
   item 19 key (269484032 EXTENT_ITEM 126976) itemoff 15470 itemsize 53
           refs 1 gen 7 flags DATA
           extent data backref root FS_TREE objectid 257 offset 786432 count 1
   item 20 key (269484032 EXTENT_ITEM 262144) itemoff 15417 itemsize 53
           refs 1 gen 7 flags DATA
           extent data backref root FS_TREE objectid 257 offset 786432 count 1

The duplicated EXTENT_ITEMs originally come from wrongly split extent_map in
extract_ordered_extent(). Since extract_ordered_extent() uses
create_io_em() to split an existing extent_map, we will have
split->orig_start != split->start. Then, it will be logged with non-zero
"extent data offset". Finally, the logged entries are replayed into
a duplicated EXTENT_ITEM.

Introduce and use proper splitting function for extent_map. The function is
intended to be simple and specific usage for extract_ordered_extent() e.g.
not supporting compression case (we do not allow splitting compressed
extent_map anyway).

There was a question raised by Qu, in summary why we want to split the
extent map (and not the bio):

The problem is not the limit on the zone end, which as you mention is
the same as the block group end. The problem is that data write use zone
append (ZA) operations. ZA BIOs cannot be split so a large extent may
need to be processed with multiple ZA BIOs, While that is also true for
regular writes, the major difference is that ZA are "nameless" write
operation giving back the written sectors on completion. And ZA
operations may be reordered by the block layer (not intentionally
though). Combine both of these characteristics and you can see that the
data for a large extent may end up being shuffled when written resulting
in data corruption and the impossibility to map the extent to some start
sector.

To avoid this problem, zoned btrfs uses the principle "one data extent
== one ZA BIO". So large extents need to be split. This is unfortunate,
but we can revisit this later and optimize, e.g. merge back together the
fragments of an extent once written if they actually were written
sequentially in the zone.

Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
Fixes: d22002fd37 ("btrfs: zoned: split ordered extent when bio is sent")
CC: stable@vger.kernel.org # 5.12+
CC: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:45 +02:00
Filipe Manana
79bd37120b btrfs: rework chunk allocation to avoid exhaustion of the system chunk array
Commit eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array
due to concurrent allocations") fixed a problem that resulted in
exhausting the system chunk array in the superblock when there are many
tasks allocating chunks in parallel. Basically too many tasks enter the
first phase of chunk allocation without previous tasks having finished
their second phase of allocation, resulting in too many system chunks
being allocated. That was originally observed when running the fallocate
tests of stress-ng on a PowerPC machine, using a node size of 64K.

However that commit also introduced a deadlock where a task in phase 1 of
the chunk allocation waited for another task that had allocated a system
chunk to finish its phase 2, but that other task was waiting on an extent
buffer lock held by the first task, therefore resulting in both tasks not
making any progress. That change was later reverted by a patch with the
subject "btrfs: fix deadlock with concurrent chunk allocations involving
system chunks", since there is no simple and short solution to address it
and the deadlock is relatively easy to trigger on zoned filesystems, while
the system chunk array exhaustion is not so common.

This change reworks the chunk allocation to avoid the system chunk array
exhaustion. It accomplishes that by making the first phase of chunk
allocation do the updates of the device items in the chunk btree and the
insertion of the new chunk item in the chunk btree. This is done while
under the protection of the chunk mutex (fs_info->chunk_mutex), in the
same critical section that checks for available system space, allocates
a new system chunk if needed and reserves system chunk space. This way
we do not have chunk space reserved until the second phase completes.

The same logic is applied to chunk removal as well, since it keeps
reserved system space long after it is done updating the chunk btree.

For direct allocation of system chunks, the previous behaviour remains,
because otherwise we would deadlock on extent buffers of the chunk btree.
Changes to the chunk btree are by large done by chunk allocation and chunk
removal, which first reserve chunk system space and then later do changes
to the chunk btree. The other remaining cases are uncommon and correspond
to adding a device, removing a device and resizing a device. All these
other cases do not pre-reserve system space, they modify the chunk btree
right away, so they don't hold reserved space for a long period like chunk
allocation and chunk removal do.

The diff of this change is huge, but more than half of it is just addition
of comments describing both how things work regarding chunk allocation and
removal, including both the new behavior and the parts of the old behavior
that did not change.

CC: stable@vger.kernel.org # 5.12+
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:41 +02:00
Filipe Manana
1cb3db1cf3 btrfs: fix deadlock with concurrent chunk allocations involving system chunks
When a task attempting to allocate a new chunk verifies that there is not
currently enough free space in the system space_info and there is another
task that allocated a new system chunk but it did not finish yet the
creation of the respective block group, it waits for that other task to
finish creating the block group. This is to avoid exhaustion of the system
chunk array in the superblock, which is limited, when we have a thundering
herd of tasks allocating new chunks. This problem was described and fixed
by commit eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array
due to concurrent allocations").

However there are two very similar scenarios where this can lead to a
deadlock:

1) Task B allocated a new system chunk and task A is waiting on task B
   to finish creation of the respective system block group. However before
   task B ends its transaction handle and finishes the creation of the
   system block group, it attempts to allocate another chunk (like a data
   chunk for an fallocate operation for a very large range). Task B will
   be unable to progress and allocate the new chunk, because task A set
   space_info->chunk_alloc to 1 and therefore it loops at
   btrfs_chunk_alloc() waiting for task A to finish its chunk allocation
   and set space_info->chunk_alloc to 0, but task A is waiting on task B
   to finish creation of the new system block group, therefore resulting
   in a deadlock;

2) Task B allocated a new system chunk and task A is waiting on task B to
   finish creation of the respective system block group. By the time that
   task B enter the final phase of block group allocation, which happens
   at btrfs_create_pending_block_groups(), when it modifies the extent
   tree, the device tree or the chunk tree to insert the items for some
   new block group, it needs to allocate a new chunk, so it ends up at
   btrfs_chunk_alloc() and keeps looping there because task A has set
   space_info->chunk_alloc to 1, but task A is waiting for task B to
   finish creation of the new system block group and release the reserved
   system space, therefore resulting in a deadlock.

In short, the problem is if a task B needs to allocate a new chunk after
it previously allocated a new system chunk and if another task A is
currently waiting for task B to complete the allocation of the new system
chunk.

Unfortunately this deadlock scenario introduced by the previous fix for
the system chunk array exhaustion problem does not have a simple and short
fix, and requires a big change to rework the chunk allocation code so that
chunk btree updates are all made in the first phase of chunk allocation.
And since this deadlock regression is being frequently hit on zoned
filesystems and the system chunk array exhaustion problem is triggered
in more extreme cases (originally observed on PowerPC with a node size
of 64K when running the fallocate tests from stress-ng), revert the
changes from that commit. The next patch in the series, with a subject
of "btrfs: rework chunk allocation to avoid exhaustion of the system
chunk array" does the necessary changes to fix the system chunk array
exhaustion problem.

Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://lore.kernel.org/linux-btrfs/20210621015922.ewgbffxuawia7liz@naota-xeon/
Fixes: eafa4fd0ad ("btrfs: fix exhaustion of the system chunk array due to concurrent allocations")
CC: stable@vger.kernel.org # 5.12+
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-07 17:42:40 +02:00