Commit Graph

1170929 Commits

Author SHA1 Message Date
Linus Torvalds
a1e6fff395 four ksmbd server fixes, including three for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmQvWyAACgkQiiy9cAdy
 T1FSnAv/dqpNpgkqeC0qhytnJfcDCzuSDWcMlB7Nuyd3HcLRJAzdIs9T+Zphfrkj
 sDGbznjAdNt8+vw7uKIl25NEgm2UbX4SiYeaQX5wUqcrnksbzXyBiZG0Y2Hr0SKi
 MwIq75Lj7vA2wtFuCWM5Q3vTAqSE0RDp+m+LYjb8peLjRw2OJ8siaLdiK1l0e2PD
 YaWJg04yrzl14Uk8A+Qz/+EsNz0jHt7hQ9HETsNzbRVaIHBYfIpqgGQHILxEdU6a
 EqzbYAfYdmitGv2guj1iEMsgeJN3P8DQOQ2Nb80IdIcoq/Erq2q++T0O6xdy/84g
 PdFZD14fsAe7ao5lGc2K5DQPGLSIF7ZNkVaCJEa3chLpXxmnwqt3TLUzbj0jP7Yv
 huM7dEDpmVUosIahycKcR5vVjwB9lnCy18AIFwHZzdabzZRgNPx8eCAkjc7+HwD0
 RZ3zcZpqxY3aEUNbktnp7XNJo9ifo+gIcNQC/yAoINY7IfK3wQWamC3FZpo1fFqU
 jkUsq3lq
 =xypJ
 -----END PGP SIGNATURE-----

Merge tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:
 "Four fixes, three for stable:

   - slab out of bounds fix

   - lock cancellation fix

   - minor cleanup to address clang warning

   - fix for xfstest 551 (wrong parms passed to kvmalloc)"

* tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
  ksmbd: delete asynchronous work from list
  ksmbd: remove unused is_char_allowed function
  ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
2023-04-07 13:10:23 -07:00
Dan Carpenter
4f5d5b33fc cifs: double lock in cifs_reconnect_tcon()
This lock was supposed to be an unlock.

Fixes: 6cc041e90c ("cifs: avoid races in parallel reconnects in smb1")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-06 22:45:41 -05:00
Yu Kuai
3723091ea1 block: don't set GD_NEED_PART_SCAN if scan partition failed
Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still
set, and partition scan will be proceed again when blkdev_get_by_dev()
is called. However, this will cause a problem that re-assemble partitioned
raid device will creat partition for underlying disk.

Test procedure:

mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0
sgdisk -n 0:0:+100MiB /dev/md0
blockdev --rereadpt /dev/sda
blockdev --rereadpt /dev/sdb
mdadm -S /dev/md0
mdadm -A /dev/md0 /dev/sda /dev/sdb

Test result: underlying disk partition and raid partition can be
observed at the same time

Note that this can still happen in come corner cases that
GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid
device.

Fixes: e5cfefa97b ("block: fix scan partition for exclusively open device again")
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-06 20:41:53 -06:00
Rob Herring
30ba2d09ed PCI: Fix use-after-free in pci_bus_release_domain_nr()
Commit c14f7ccc9f ("PCI: Assign PCI domain IDs by ida_alloc()")
introduced a use-after-free bug in the bus removal cleanup. The issue was
found with kfence:

  [   19.293351] BUG: KFENCE: use-after-free read in pci_bus_release_domain_nr+0x10/0x70

  [   19.302817] Use-after-free read at 0x000000007f3b80eb (in kfence-#115):
  [   19.309677]  pci_bus_release_domain_nr+0x10/0x70
  [   19.309691]  dw_pcie_host_deinit+0x28/0x78
  [   19.309702]  tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194]
  [   19.309734]  tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194]
  [   19.309752]  platform_probe+0x90/0xd8
  ...

  [   19.311457] kfence-#115: 0x00000000063a155a-0x00000000ba698da8, size=1072, cache=kmalloc-2k

  [   19.311469] allocated by task 96 on cpu 10 at 19.279323s:
  [   19.311562]  __kmem_cache_alloc_node+0x260/0x278
  [   19.311571]  kmalloc_trace+0x24/0x30
  [   19.311580]  pci_alloc_bus+0x24/0xa0
  [   19.311590]  pci_register_host_bridge+0x48/0x4b8
  [   19.311601]  pci_scan_root_bus_bridge+0xc0/0xe8
  [   19.311613]  pci_host_probe+0x18/0xc0
  [   19.311623]  dw_pcie_host_init+0x2c0/0x568
  [   19.311630]  tegra_pcie_dw_probe+0x610/0xb28 [pcie_tegra194]
  [   19.311647]  platform_probe+0x90/0xd8
  ...

  [   19.311782] freed by task 96 on cpu 10 at 19.285833s:
  [   19.311799]  release_pcibus_dev+0x30/0x40
  [   19.311808]  device_release+0x30/0x90
  [   19.311814]  kobject_put+0xa8/0x120
  [   19.311832]  device_unregister+0x20/0x30
  [   19.311839]  pci_remove_bus+0x78/0x88
  [   19.311850]  pci_remove_root_bus+0x5c/0x98
  [   19.311860]  dw_pcie_host_deinit+0x28/0x78
  [   19.311866]  tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194]
  [   19.311883]  tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194]
  [   19.311900]  platform_probe+0x90/0xd8
  ...

  [   19.313579] CPU: 10 PID: 96 Comm: kworker/u24:2 Not tainted 6.2.0 #4
  [   19.320171] Hardware name:  /, BIOS 1.0-d7fb19b 08/10/2022
  [   19.325852] Workqueue: events_unbound deferred_probe_work_func

The stack trace is a bit misleading as dw_pcie_host_deinit() doesn't
directly call pci_bus_release_domain_nr(). The issue turns out to be in
pci_remove_root_bus() which first calls pci_remove_bus() which frees the
struct pci_bus when its struct device is released. Then
pci_bus_release_domain_nr() is called and accesses the freed struct
pci_bus. Reordering these fixes the issue.

Fixes: c14f7ccc9f ("PCI: Assign PCI domain IDs by ida_alloc()")
Link: https://lore.kernel.org/r/20230329123835.2724518-1-robh@kernel.org
Link: https://lore.kernel.org/r/b529cb69-0602-9eed-fc02-2f068707a006@nvidia.com
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org	# v6.2+
Cc: Pali Rohár <pali@kernel.org>
2023-04-06 18:20:59 -05:00
Asahi Lina
1e1d3574e6 drm/scheduler: Fix UAF race in drm_sched_entity_push_job()
After a job is pushed into the queue, it is owned by the scheduler core
and may be freed at any time, so we can't write nor read the submit
timestamp after that point.

Fixes oopses observed with the drm/asahi driver, found with kASAN.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Link: https://lore.kernel.org/r/20230406-scheduler-uaf-2-v1-1-972531cf0a81@asahilina.net
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
2023-04-06 17:30:16 -04:00
Basavaraj Natikar
f195fc1e97 x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
The AMD [1022:15b8] USB controller loses some internal functional MSI-X
context when transitioning from D0 to D3hot. BIOS normally traps D0->D3hot
and D3hot->D0 transitions so it can save and restore that internal context,
but some firmware in the field can't do this because it fails to clear the
AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit.

Clear AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit before USB controller
initialization during boot.

Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u
Link: https://lore.kernel.org/r/20230329172859.699743-1-Basavaraj.Natikar@amd.com
Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Tested-by: Thomas Glanzmann <thomas@glanzmann.de>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
2023-04-06 16:23:59 -05:00
Reinette Chatre
195d8e5da3 PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
pci_msix_can_alloc_dyn() is not declared when CONFIG_PCI_MSI is disabled.

There is no existing user of pci_msix_can_alloc_dyn() but work is in
progress to change this. This work encounters the following error when
CONFIG_PCI_MSI is disabled:

  drivers/vfio/pci/vfio_pci_intrs.c:427:21: error: implicit declaration of function 'pci_msix_can_alloc_dyn' [-Werror=implicit-function-declaration]

Provide definition for pci_msix_can_alloc_dyn() in preparation for users
that need to compile when CONFIG_PCI_MSI is disabled.

[bhelgaas: Also reported by Arnd Bergmann <arnd@kernel.org> in
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c; added his Fixes: line]

Fixes: fb0a6a268d ("net/mlx5: Provide external API for allocating vectors")
Fixes: 34026364df ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
Link: https://lore.kernel.org/oe-kbuild-all/202303291000.PWFqGCxH-lkp@intel.com/
Link: https://lore.kernel.org/r/310ecc4815dae4174031062f525245f0755c70e2.1680119924.git.reinette.chatre@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org	# v6.2+
2023-04-06 16:23:37 -05:00
Steven Rostedt (Google)
31c6839671 tracing/synthetic: Make lastcmd_mutex static
The lastcmd_mutex is only used in trace_events_synth.c and should be
static.

Link: https://lore.kernel.org/linux-trace-kernel/202304062033.cRStgOuP-lkp@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230406111033.6e26de93@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Fixes: 4ccf11c4e8 ("tracing/synthetic: Fix races on freeing last_cmd")
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-06 15:08:18 -04:00
Linus Torvalds
f2afccfefe Including fixes from wireless and can.
Current release - regressions:
 
  - wifi: mac80211:
    - fix potential null pointer dereference
    - fix receiving mesh packets in forwarding=0 networks
    - fix mesh forwarding
 
 Current release - new code bugs:
 
    - virtio/vsock: fix leaks due to missing skb owner
 
 Previous releases - regressions:
 
   - raw: fix NULL deref in raw_get_next().
 
   - sctp: check send stream number after wait_for_sndbuf
 
   - qrtr:
     - fix a refcount bug in qrtr_recvmsg()
     - do not do DEL_SERVER broadcast after DEL_CLIENT
 
   - wifi: brcmfmac: fix SDIO suspend/resume regression
 
   - wifi: mt76: fix use-after-free in fw features query.
 
   - can: fix race between isotp_sendsmg() and isotp_release()
 
   - eth: mtk_eth_soc: fix remaining throughput regression
 
    -eth: ice: reset FDIR counter in FDIR init stage
 
 Previous releases - always broken:
 
   - core: don't let netpoll invoke NAPI if in xmit context
 
   - icmp: guard against too small mtu
 
   - ipv6: fix an uninit variable access bug in __ip6_make_skb()
 
   - wifi: mac80211: fix the size calculation of ieee80211_ie_len_eht_cap()
 
   - can: fix poll() to not report false EPOLLOUT events
 
   - eth: gve: secure enough bytes in the first TX desc for all TCP pkts
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQu4qgACgkQMUZtbf5S
 Irv7fA//elLM+YvGQDPgGs3KDZVnb5vnGTEPosc6mCWsYqR6EBxk6sf89yqk31xg
 IYbzOGXqkmi5ozhdjnNaFRGCtb+mBluV3oSPm8pM8d0NcuZta7MPPhduguEfnMS9
 FcI98bxmzSXPIRzG/sCrc/tzedhepcAMlN80PtTzkxSUFlxA7z+vniatVymOZQtt
 MSWPa9gXl1Keon7DBzGvHlZtOK1ptDjti5cp81zw/bA20wArCEm3Zg99Xz2r9rYp
 eAF+KqKoclKieGUbJ7lXQIxWrHrFRznPoMbvW/ofU6JXQFi8KOh0zqJFIi9VnU0D
 EdtZxOgLXuLcjvKj8ijKFdIA5OFqMA65pWs2t2foBR9C0DVle8LztGpyZODf0huT
 agK9ZgM3av6jLzMe8CtJpz31nsWL1s4f3njM1PRucF/jTso72RWUdAx1fBurcnXm
 45MK+uS0aAGch6cFT7mHqUAniGUakR+NPChA7ecn5iMetasinEWRLFxw0eQXEBcM
 kSPFVGXlT4u0a56xN2FoTPnXHb+k08035+cd+bRbTlUXKeMCVYg/k7DiJUr21IWL
 hHWVOzEnzRpDa5gsQ7apct3bcRZnHO/jlWGjkl/g+AGjwaMXae0zDFjajEazsmJ0
 ZKOVsZgIcSCVAdnRLzP2IyKACuiFls6Qc46eARStKRwDjQsEoUU=
 =1AWK
 -----END PGP SIGNATURE-----

Merge tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and can.

  Current release - regressions:

   - wifi: mac80211:
      - fix potential null pointer dereference
      - fix receiving mesh packets in forwarding=0 networks
      - fix mesh forwarding

  Current release - new code bugs:

   - virtio/vsock: fix leaks due to missing skb owner

  Previous releases - regressions:

   - raw: fix NULL deref in raw_get_next().

   - sctp: check send stream number after wait_for_sndbuf

   - qrtr:
      - fix a refcount bug in qrtr_recvmsg()
      - do not do DEL_SERVER broadcast after DEL_CLIENT

   - wifi: brcmfmac: fix SDIO suspend/resume regression

   - wifi: mt76: fix use-after-free in fw features query.

   - can: fix race between isotp_sendsmg() and isotp_release()

   - eth: mtk_eth_soc: fix remaining throughput regression

   - eth: ice: reset FDIR counter in FDIR init stage

  Previous releases - always broken:

   - core: don't let netpoll invoke NAPI if in xmit context

   - icmp: guard against too small mtu

   - ipv6: fix an uninit variable access bug in __ip6_make_skb()

   - wifi: mac80211: fix the size calculation of
     ieee80211_ie_len_eht_cap()

   - can: fix poll() to not report false EPOLLOUT events

   - eth: gve: secure enough bytes in the first TX desc for all TCP
     pkts"

* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net: stmmac: check fwnode for phy device before scanning for phy
  net: stmmac: Add queue reset into stmmac_xdp_open() function
  selftests: net: rps_default_mask.sh: delete veth link specifically
  net: fec: make use of MDIO C45 quirk
  can: isotp: fix race between isotp_sendsmg() and isotp_release()
  can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
  can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
  can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
  gve: Secure enough bytes in the first TX desc for all TCP pkts
  netlink: annotate lockless accesses to nlk->max_recvmsg_len
  ethtool: reset #lanes when lanes is omitted
  ping: Fix potentail NULL deref for /proc/net/icmp.
  raw: Fix NULL deref in raw_get_next().
  ice: Reset FDIR counter in FDIR init stage
  ice: fix wrong fallback logic for FDIR
  net: stmmac: fix up RX flow hash indirection table when setting channels
  net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
  wifi: mt76: ignore key disable commands
  wifi: ath11k: reduce the MHI timeout to 20s
  ipv6: Fix an uninit variable access bug in __ip6_make_skb()
  ...
2023-04-06 11:39:07 -07:00
Linus Torvalds
8f2e1a855b linux-kselftest-fixes-6.3-rc6
This Kselftest fixes update for Linux 6.3-rc6 consists of one single
 fix to mount_setattr_test build failure.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmQu3wYACgkQCwJExA0N
 QxwKhQ//Xnnec3vQgdy4lD0nfhhVOJyRGLtwzcKOY73YJ6BuXjWwasghUVkaw0N8
 qakbpY4U2gVfzf+DgZkWsAOkhY013AHJojppOLvJF0tPkeD7oAfBgotjDZisnzqv
 5XABe7lSMhueCRjcFZDTLKA5MQJ/ifwA1cVpfWAo6aoKfYDHv36AAF3cWc2C831J
 JUGwVnFYJLk6DctpjJOStLzZSfWv07CsFIcX52I5so0zMSfUFkgfPrhDx56rAwUJ
 xs2GH/fYZJ5l2SYAoyhFviRsYd0AZWI2tGN4MY4a0XNkdlIgVh0dNPSMxoO3+sKG
 hXslQ4sTifKQLazTcdx056n+zE4SXwQ4EadzhXlBv2Gs2AcZiEiG1C3JGG0pkLEQ
 cr4BHQIkTRj15X6fUnd3S8AmVCpPzFXaT3Yl5ot1bXinzJzEnT3a/HHrsx9T2d9K
 8KEOANAkxYhedgPC/AoR2edSHCAwFXVRak+TXu/DsfT6J86u+NqZpboWDWMWajyr
 iUIX6i1mkv89JDg8+w1X8MFH2QdL050LitrDVo2tdcBuqcG9roEk/Au1CdO3j2Sl
 OB5JCA8owbofY4ANklXc3c80rJKAExxRMilYSx0jj+4/F8SjVNM40BfVWOoicINU
 DnHIjzoMByHN/lY1n+JbeGdyXZVQLNrp+nhtUxZuLApNsLJoPBk=
 =oGdA
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
 "One single fix to mount_setattr_test build failure"

* tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests mount: Fix mount_setattr_test builds failed
2023-04-06 11:34:18 -07:00
Linus Torvalds
105b64c838 iommufd for 6.3 rc
Three bugs found by syzkaller:
 
  - An invalid VA range can be be put in a pages and eventually trigger
    WARN_ON, reject it early
 
  - Use of the wrong start index value when doing the complex batch carry
    scheme
 
  - Wrong store ordering resulting in corrupting data used in a later
    calculation that corrupted the batch structure during carry
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZC7G7AAKCRCFwuHvBreF
 YQsjAQDiA56UTfVHwuMWEdZJ7clHbOeZk7xWMLTewVNBxktxhwD/fUVRqeC9uZKT
 TAWvcHUN4f6dzzfBecKLZaSHrft5lws=
 =u8cW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:

 - An invalid VA range can be be put in a pages and eventually trigger
   WARN_ON, reject it early

 - Use of the wrong start index value when doing the complex batch carry
   scheme

 - Wrong store ordering resulting in corrupting data used in a later
   calculation that corrupted the batch structure during carry

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Do not corrupt the pfn list when doing batch carry
  iommufd: Fix unpinning of pages when an access is present
  iommufd: Check for uptr overflow
2023-04-06 11:27:21 -07:00
Linus Torvalds
ae52f79790 pwm: Fixes for v6.3-rc6
These are some fixes to make sure the PWM state structure is always
 initialized to a known state. Prior to this it could happen in some
 situations that random data from the stack would leak into the data
 structure and cause subtle bugs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEEiOrDCAFJzPfAjcif3SOs138+s6EFAmQuwbMZHHRoaWVycnku
 cmVkaW5nQGdtYWlsLmNvbQAKCRDdI6zXfz6zobXUEACTnBZ6rnHVXvUBkyqeAoFf
 OAf9h4t39zYjN+cAU66nXJcZiBbWBO/r6CY6rsR+WErX4X41Fjb/mSHzkYIqb7ZF
 4kY7HpYRxAew8y7RQQvbLQRG0u8++SJ4YtuJScGrRcKxL1OILqK8S91YzKSIvIh1
 MYoIwHrGG9rSC+uKoJCDkOKfVenhZSiOXNXDvyIrlr6UjFC4QzGcTsAO1x8lFFNA
 w4szK9AXPgJdlSInsfUUiznk6q52Scl62+HSl5yHHijzMqxUm7UN1UNZ+demiG0U
 ujo8hw1nRuwXKpM7mmCRiELBdhVUyi3JLMbNCd+Q0GAYbQaIcEAkckL/zYEqtuDb
 hR2/32wMvXOLWY585L0LKqSJ5pFAkQ0g4PIAgqSRO44cG6r6yQ4NjkQGC4NKCIGS
 z9yyYZcBpWaOA1jHM3NZSWhN13ib3ES66a8ByKeA9PWwYuqqAbIHpIyL9MsFRWzL
 n2uCWLkByWY3NwM9nLRnwdMswtLffXuIbLhlt1x2M9cAVOQPZwlxeIQVbm/A0Low
 hFIRHYvPNfSiKaw7RBZBshQEwX6LCLmfeOUoX8KSF9/YdeBMJYOEe+OVM8h+P9hJ
 eESqDeM8p8hlsfk+gf/QYWu6ub52Cwv0uS1L/gHAsezJtjY1Qf1hnU/Bmv54sGoM
 lR/EjNJFtQiwi6uJDuCqVQ==
 =w7D4
 -----END PGP SIGNATURE-----

Merge tag 'pwm/for-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm fixes from Thierry Reding:
 "These are some fixes to make sure the PWM state structure is always
  initialized to a known state.

  Prior to this it could happen in some situations that random data from
  the stack would leak into the data structure and cause subtle bugs"

* tag 'pwm/for-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: Zero-initialize the pwm_state passed to driver's .get_state()
  pwm: meson: Explicitly set .polarity in .get_state()
  pwm: sprd: Explicitly set .polarity in .get_state()
  pwm: iqs620a: Explicitly set .polarity in .get_state()
  pwm: cros-ec: Explicitly set .polarity in .get_state()
  pwm: hibvt: Explicitly set .polarity in .get_state()
2023-04-06 11:08:03 -07:00
Paolo Bonzini
0bf9601f8e KVM/arm64 fixes for 6.3, part #3
- Ensure the guest PMU context is restored before the first KVM_RUN,
    fixing an issue where EL0 event counting is broken after vCPU
    save/restore
 
  - Actually initialize ID_AA64PFR0_EL1.{CSV2,CSV3} based on the
    sanitized, system-wide values for protected VMs
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZC2ZbwAKCRCivnWIJHzd
 FvovAP9aooqUBBs8w2myh8SXCv7dJOg88r6fKS5vCqMdkY7OhwD9GXA7hWU2dXdy
 X1L8qq6C7R+GtIY/kDm9E2HkNKg7rQc=
 =nfXd
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.3, part #3

 - Ensure the guest PMU context is restored before the first KVM_RUN,
   fixing an issue where EL0 event counting is broken after vCPU
   save/restore

 - Actually initialize ID_AA64PFR0_EL1.{CSV2,CSV3} based on the
   sanitized, system-wide values for protected VMs
2023-04-06 13:34:19 -04:00
Linus Torvalds
ac6c043391 drm-fixes for 6.3-rc6
Mostly i915 fixes: dp mst for compression/dsc, perf ioctl uaf, ctx rpm
 accounting, gt reset vs huc loading.
 
 And a few individual driver fixes: ivpu dma fence&suspend, panfrost
 mmap, nouveau color depth
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmQu+F4ACgkQTA9ye/CY
 qnFKQQ//dnQbe15pnOtoIdgen5+OC8kRe+N3g+CbMY7Yo7YTq3PIRgwlnInOjysl
 H1pFtGn8VOBUzqerPLUWRv0AKi/iLmjUZJ8QSAbW1sMwJZUQJ4KLoAafc8yO/weH
 jRcxU+0soctBqJ4G2sT2EAeXMgtk9dICUScC0CpxSwmGkHD8bF/U6Xln4O2xb4HS
 hI47AjeWcev/A2Fa2PsU3DtbGLqy37IrPPs0gMPlqjLQ84MN9+Op0OEK5fAfozq7
 mX3zya7eLGE8jCYQDgbZ2ePuSh+HeF4e09l3Ax4bFVDp8ZtFM0kpgezSGQ2lTNdV
 yFNoEbgpYG92Lf40NiTJ/7RgqH731AXAdTkFLjvl+62OCmC3OEaBjIiwaZed9Gf2
 Ec/diNhz0IEf7Ud9d2Q5z9/7Zk14zp/RwDlUlFx+RZBM3GLJ1zK3hj6InTK72YAF
 umHfXtXidiak1KB/Rx3GIb9Ii/EqmRgjPoW4oWY1+mtsGvNufLC7HoYIzHdYuzjH
 mip1j/RAJ6hwnI4EzkTFq9xFVVdqGOi7zIB6Vu4TzCjZt8JLOgIRLMbHi7YYwq0v
 Vc0D+FVE0I6EfOiP48kT2pnpzOdcVMdPHaNkAqPYjponeQm1bSISfbOVBA7+9YpW
 OtQA96K4yL22WebAyQkuOns1tD8DUEC6ofDxsg4pUPlOpQyAgSw=
 =kCTh
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
 "Mostly i915 fixes: dp mst for compression/dsc, perf ioctl uaf, ctx rpm
  accounting, gt reset vs huc loading.

  And a few individual driver fixes: ivpu dma fence&suspend, panfrost
  mmap, nouveau color depth"

* tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm:
  accel/ivpu: Fix S3 system suspend when not idle
  accel/ivpu: Add dma fence to command buffers only
  drm/i915: Fix context runtime accounting
  drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
  drm/i915: Use compressed bpp when calculating m/n value for DP MST DSC
  drm/i915/huc: Cancel HuC delayed load timer on reset.
  drm/i915/ttm: fix sparse warning
  drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
  drm/nouveau/disp: Support more modes by checking with lower bpc
2023-04-06 10:25:27 -07:00
Linus Torvalds
2a28a8b365 sound fixes for 6.3-rc6
The majority of changes here are various fixes for Intel drivers,
 while there is a change in ASoC PCM core for the format constraints.
 In addition, a workaround for HD-audio HDMI regressions and usual
 HD-audio quirks are found.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmQuvFgOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE8fCQ//T1IeIlaF1V9HGC31HZZIOUq2yCvNZMlRCRjT
 6R+p+QxHcubyLj01B5RtGt+DmuIm0Wq0g7RuuWLvlskYk941aOnfyPVIR5bCAJNc
 VOm1V0lcUFZItTqQXwk5J4pep9s3GrRPlWqRn2LX8GeIvhOdfj8fiGV91k/YOTKd
 toKAB34IgScX4s1KgW8S7ZG5swOe6/wqus0TIQUp+5CT+y6ciT4ky/bowmwCVpTR
 k0IJlYvVxgXLEX0maVXgOu22vfuXR3RCl7NmSti/6NyFNaI+HjTZS9SsUQSlCZVp
 JdV8KdXZBoo4GV/YHzmonO89F095DzRgsq5PzQEt9DkwaLtMlcwCB3qUwRPjW/A3
 3NA0S69cylhLwxHD9JMfD1NodhBHG1dHMn/qqF519CWGm6rGDOt9w4rGzF3q04VX
 2SsSgniAmraW+7yd0AjCUhfhk9yP2Bn/Wq2vjh+k0ciw++m3yG8ZJsjaomK27Yiz
 5zYPcW4Ms+5X8WVqxnqCgRbGh6AkunUk4e5cadkNQQGpQLN7DZVBTXQ965mIdFNt
 U47mvHHdTxtIeLUIgYajzV09y37KFWCow2OIYccnxRaNviDwr3onB3CbPhmaSwng
 uOQDAZVegwq59dsr3Ay3C+f3CoxZrBOH//SN2KlNc5WzaqEqLxUVMIjgC7dXoeCJ
 75bTss4=
 =ryKW
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "The majority of changes here are various fixes for Intel drivers,
  and there is a change in ASoC PCM core for the format constraints.

  In addition, a workaround for HD-audio HDMI regressions and usual
  HD-audio quirks are found"

* tag 'sound-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/hdmi: Preserve the previous PCM device upon re-enablement
  ALSA: hda/realtek: Add quirk for Clevo X370SNW
  ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
  ASoC: SOF: avoid a NULL dereference with unsupported widgets
  ASoC: da7213.c: add missing pm_runtime_disable()
  ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
  ASoC: codecs: lpass: fix the order or clks turn off during suspend
  ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
  ASoC: SOF: ipc4: Ensure DSP is in D0I0 during sof_ipc4_set_get_data()
  ASoC: amd: yc: Add DMI entries to support Victus by HP Laptop 16-e1xxx (8A22)
  ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
  ASoC: Intel: soc-acpi: add table for Intel 'Rooks County' NUC M15
  ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
2023-04-06 10:19:30 -07:00
Linus Torvalds
8dfab5237d platform-drivers-x86 for v6.3-5
Highlights:
  -  more think-lmi fixes
  -  1 DMI quirk addition
 
 The following is an automated git shortlog grouped by driver:
 
 think-lmi:
  -  Clean up display of current_value on Thinkstation
  -  Fix memory leaks when parsing ThinkStation WMI strings
  -  Fix memory leak when showing current settings
 
 thinkpad_acpi:
  -  Add missing T14s Gen1 type to s2idle quirk list
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmQulJoUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yDLAgAh4ES4m+fdH41qnzmceISk2b+wZhh
 z8SACLiTuitAPnUNAYs8yoR/tMYsuRZ7+6zJEaRZjKKjivFPYAamiTcjjprPMx51
 b47hzKUXa45NI1WfN3qIcmjHXSb3HzkEstdGBERYExgQrxoMSkJ3RHm2No32iJjP
 XO136iqheD/suPzFIdcdi3WR+ktCdNuqfHFYcO4SizPPfYr+3fa4TeJIF4E1sMeh
 f/Kx57apsijp5dIlpntNedcAaSWppuaW4WM71hEdZDaEae+m1bcSXq2XZnJ14LZY
 /t1CcWxJFqBIyod/hEJERBxi+51Lx9quaSp6JTdmZd0TkHn1ksvnZ9phkQ==
 =PE0w
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 -  more think-lmi fixes

 -  one DMI quirk addition

* tag 'platform-drivers-x86-v6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: thinkpad_acpi: Add missing T14s Gen1 type to s2idle quirk list
  platform/x86: think-lmi: Clean up display of current_value on Thinkstation
  platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
  platform/x86: think-lmi: Fix memory leak when showing current settings
2023-04-06 10:13:23 -07:00
Ziwei Dai
5da7cb193d rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode.  The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.

On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories.  Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed.  And
the kfree_rcu_monitor() function does in fact check for this.

Except that the kfree_rcu_monitor() function checks these pointers one
at a time.  This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately.  This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.

To see this, consider the following sequence of events, in which:

o	Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
	then is preempted.

o	CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
	after a later grace period.  Except that "from_cset" is freed
	right after the previous grace period ended, so that "from_cset"
	is immediately freed.  Task A resumes and references "from_cset"'s
	member, after which nothing good happens.

In full detail:

CPU 0					CPU 1
----------------------			----------------------
count_memcg_event_mm()
|rcu_read_lock()  <---
|mem_cgroup_from_task()
 |// css_set_ptr is the "from_cset" mentioned on CPU 1
 |css_set_ptr = rcu_dereference((task)->cgroups)
 |// Hard irq comes, current task is scheduled out.

					cgroup_attach_task()
					|cgroup_migrate()
					|cgroup_migrate_execute()
					|css_set_move_task(task, from_cset, to_cset, true)
					|cgroup_move_task(task, to_cset)
					|rcu_assign_pointer(.., to_cset)
					|...
					|cgroup_migrate_finish()
					|put_css_set_locked(from_cset)
					|from_cset->refcount return 0
					|kfree_rcu(cset, rcu_head) // free from_cset after new gp
					|add_ptr_to_bulk_krc_lock()
					|schedule_delayed_work(&krcp->monitor_work, ..)

					kfree_rcu_monitor()
					|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
					|queue_rcu_work(system_wq, &krwp->rcu_work)
					|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
					|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp

					// There is a perious call_rcu(.., rcu_work_rcufn)
					// gp end, rcu_work_rcufn() is called.
					rcu_work_rcufn()
					|__queue_work(.., rwork->wq, &rwork->work);

					|kfree_rcu_work()
					|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
					|The "from_cset" is freed before new gp end.

// the task resumes some time later.
 |css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.

This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.

v2: Use helper function instead of inserted code block at kfree_rcu_monitor().

Fixes: 34c8817455 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d6204 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-06 10:04:23 -07:00
Linus Torvalds
fcff5f99ea asm-generic fixes for 6.3
These are minor fixes to address false-positive build warnings:
 
 Some of the less common I/O accessors are missing __force casts and
 cause sparse warnings for their implied byteswap, and a recent change
 to __generic_cmpxchg_local() causes a warning about constant integer
 truncation.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmQufaMACgkQmmx57+YA
 GNmuyA//WBjgOgXPNA7kV3/UcScoW1MEq4Ri8NKJANKyOHWYa1TxIwrHehJkE2Zm
 B0Pr+DmRv3tYav/eytmXm8KMAGdqjVllHdBM4fe2HDjJspvqNKOEX/Z2UMzvNbLN
 uWQneHxFxHK8eZHT+wO4U3062heuBYQ0QQOK0Mk4OaWwsvWz0JVn6dC6uo8z0C4l
 20HAwkyQriB4GaFuEE9iVFYbUfjWGdTdRv9hbL8QpQKMGn+gsG9CgXDNgK+LJ/70
 Q7oJ8qvocjkKAxnbxtXzpb4iKLcnf1VDvwKmCFtvT6GEE/n4Rd00RIF+LKm6J+mC
 vLqAfaDu88mXP/JVRDz/Rpv/lNjGWMd+mR/Y9Rr8jmkA1imJXUKr9cRttJgsDcsT
 8KxJdejakLvHzZKIjdjoE4aOwr5HPcPNi3Kge6DVnmW4r88Ma+lz+aOQueheBsA3
 4mSSNi+c95AWSp0TznUR944RVKlqJ9FwNsXE6BskthhOBTG/4kOsU5nR1z6P6JlP
 De8i5Dd76oYGOXUxf8CAPcqTDligXkx8BBEA+AuLbUyimUBgvFWPqgiBvmLHftrK
 jR2mKjUznkC6A/WzHwUq/uwVz76qjor4aHo+WbvlhAJSSqYPlvZcdbOPU/O+hHhK
 obEhGHH+iKeGaRAd/Rv8fFHIjzq5hzriB8ls2uRiHe50FL/v10s=
 =UzVO
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic fixes from Arnd Bergmann:
 "These are minor fixes to address false-positive build warnings:

  Some of the less common I/O accessors are missing __force casts and
  cause sparse warnings for their implied byteswap, and a recent change
  to __generic_cmpxchg_local() causes a warning about constant integer
  truncation"

* tag 'asm-generic-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: avoid __generic_cmpxchg_local warnings
  asm-generic/io.h: suppress endianness warnings for relaxed accessors
  asm-generic/io.h: suppress endianness warnings for readq() and writeq()
2023-04-06 09:51:04 -07:00
Michael Sit Wei Hong
8fbc10b995 net: stmmac: check fwnode for phy device before scanning for phy
Some DT devices already have phy device configured in the DT/ACPI.
Current implementation scans for a phy unconditionally even though
there is a phy listed in the DT/ACPI and already attached.

We should check the fwnode if there is any phy device listed in
fwnode and decide whether to scan for a phy to attach to.

Fixes: fe2cfbc968 ("net: stmmac: check if MAC needs to attach to a PHY")
Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/lkml/20230403212434.296975-1-martin.blumenstingl@googlemail.com/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Shahab Vahedi <shahab@synopsys.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Link: https://lore.kernel.org/r/20230406024541.3556305-1-michael.wei.hong.sit@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06 08:10:43 -07:00
Zheng Yejian
2a2d8c51de ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().

Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but
not restored if error happened on calling ftrace_modify_direct_caller().
Then it can no longer find 'direct' by that 'old_addr'.

To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.

Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <ast@kernel.org>
Cc: <daniel@iogearbox.net>
Fixes: 8a141dd7f7 ("ftrace: Fix modify_ftrace_direct.")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-06 11:01:30 -04:00
Petr Tesarik
bbb73a103f swiotlb: fix a braino in the alignment check fix
The alignment mask in swiotlb_do_find_slots() masks off the high
bits which are not relevant for the alignment, so multiple
requirements are combined with a bitwise OR rather than AND.
In plain English, the stricter the alignment, the more bits must
be set in iotlb_align_mask.

Confusion may arise from the fact that the same variable is also
used to mask off the offset within a swiotlb slot, which is
achieved with a bitwise AND.

Fixes: 0eee5ae102 ("swiotlb: fix slot alignment checks")
Reported-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/all/CAA42JLa1y9jJ7BgQvXeUYQh-K2mDNHd2BYZ4iZUz33r5zY7oAQ@mail.gmail.com/
Reported-by: Kelsey Steele <kelseysteele@linux.microsoft.com>
Link: https://lore.kernel.org/all/20230405003549.GA21326@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-06 16:45:12 +02:00
Christoph Hellwig
68d99ab0e9 btrfs: fix fast csum implementation detection
The BTRFS_FS_CSUM_IMPL_FAST flag is currently set whenever a non-generic
crc32c is detected, which is the incorrect check if the file system uses
a different checksumming algorithm.  Refactor the code to only check
this if crc32c is actually used.  Note that in an ideal world the
information if an algorithm is hardware accelerated or not should be
provided by the crypto API instead, but that's left for another day.

CC: stable@vger.kernel.org # 5.4.x: c8a5f8ca9a: btrfs: print checksum type and implementation at mount time
CC: stable@vger.kernel.org # 5.4.x
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-06 16:34:13 +02:00
Christoph Hellwig
40fac6472f btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
Commit d7b9416fe5 ("btrfs: remove btrfs_end_io_wq") converted the read
and I/O handling from btrfs_workqueues to Linux workqueues, and as part
of that lost the code to apply the thread_pool= based max_active limit
on remount.  Restore it.

Fixes: d7b9416fe5 ("btrfs: remove btrfs_end_io_wq")
CC: stable@vger.kernel.org # 6.0+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-06 16:33:08 +02:00
Jens Axboe
5b3b9197c2 nvme fixes for Linux 6.3
- fix discard support without oncs (Keith Busch)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmQuzzQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPRDw//alkZxpAfdZCwyGEGU1oCarqQaf90DrkGc5frusZu
 6Xt30CFVAl8D/XupXbPiEnliVoTtO6amhT/ZtUS5cmGFy2A4Cgkc7wImCyq9DaCr
 cgErFWHH0VORD0tmhzjuJqbVRgfk4qY8mIvSXtt10U4Ppu2Dg9GO4+hlVCPRytAP
 OjF6YLkzm4ddsPPZTmRJPr7lgUraWQmu2WsEt5fbI8To78StUphcFpS5VHh50Gxy
 QOZQLA69w7pvS9dj9JfELBG7uzcXgWuBPtOW5VeGCbYCIhh78H7ZWuGhNIpj0nZY
 bsxBzI7mjeKwBvtIkZ8BSHHBnbGu5oJn+8XvqYk5w77MyQSjsKUGA0LIfdKtcM1R
 lirnF8QeXbCtM0gXwCj7FamwLw/wLZoTbMQbFk5mgh5WzI26QGj8pra53VXB2sAz
 vVNHY0wIouXCFrX0M9EddUlXPNwxkjkDwlyXjz6y7hXHcLwNh0EH0mPLvOb4Czb0
 yxqxqImUs0qQYqNmWkbnRmIDOgak4MJEOyN7WDLP33z9REl49MGq7byyw6/hELYl
 e4f5RfjtpUfNHb0nYRCHs7NVOFN20WTtYmvRtIn8480c4zOeNdM5lHgSlu+cqfYW
 xqeUHZUduPUimqVUZv9kx+hOcDl2lPHmqSZLLlVA2156kUlEJzj/BsF4t9EIMpjx
 sTY=
 =Pk/K
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.3-2023-04-06' of git://git.infradead.org/nvme into block-6.3

Pull NVMe fix from Christoph:

"nvme fixes for Linux 6.3

 - fix discard support without oncs (Keith Busch)"

* tag 'nvme-6.3-2023-04-06' of git://git.infradead.org/nvme:
  nvme: fix discard support without oncs
2023-04-06 08:12:19 -06:00
Ming Lei
1d1665279a block: ublk: make sure that block size is set correctly
block size is one very key setting for block layer, and bad block size
could panic kernel easily.

Make sure that block size is set correctly.

Meantime if ublk_validate_params() fails, clear ub->params so that disk
is prevented from being added.

Fixes: 71f28f3136 ("ublk_drv: add io_uring based userspace block driver")
Reported-and-tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-06 08:12:08 -06:00
Jens Axboe
8c68ae3b22 ublk: read any SQE values upfront
Since SQE memory is shared with userspace, we should only be reading it
once. We cannot read it multiple times, particularly when it's read once
for validation and then read again for the actual use.

ublk_ch_uring_cmd() is safe when called as a retry operation, as the
memory backing is stable at that point. But for normal issue, we want
to ensure that we only read ublksrv_io_cmd once. Wrap the function in
a helper that reads the value into an on-stack copy of the struct.

Cc: stable@vger.kernel.org # 6.0+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-05 20:59:17 -06:00
Song Yoong Siang
24e3fce00c net: stmmac: Add queue reset into stmmac_xdp_open() function
Queue reset was moved out from __init_dma_rx_desc_rings() and
__init_dma_tx_desc_rings() functions. Thus, the driver fails to transmit
and receive packet after XDP prog setup.

This commit adds the missing queue reset into stmmac_xdp_open() function.

Fixes: f9ec5723c3 ("net: ethernet: stmicro: stmmac: move queue reset to dedicated functions")
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230404044823.3226144-1-yoong.siang.song@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-05 19:02:56 -07:00
Hangbin Liu
38e058cc7d selftests: net: rps_default_mask.sh: delete veth link specifically
When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.

Before this patch:
  # ./rps_default_mask.sh
  empty rps_default_mask                                      [ ok ]
  changing rps_default_mask dont affect existing devices      [ ok ]
  changing rps_default_mask dont affect existing netns        [ ok ]
  changing rps_default_mask affect newly created devices      [ ok ]
  changing rps_default_mask don't affect newly child netns[II][ ok ]
  rps_default_mask is 0 by default in child netns             [ ok ]
  RTNETLINK answers: File exists
  changing rps_default_mask in child ns don't affect the main one[ ok ]
  cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
  changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
  [fail] expected 1 found
  changing rps_default_mask in child ns don't affect existing devices[ ok ]

After this patch:
  # ./rps_default_mask.sh
  empty rps_default_mask                                      [ ok ]
  changing rps_default_mask dont affect existing devices      [ ok ]
  changing rps_default_mask dont affect existing netns        [ ok ]
  changing rps_default_mask affect newly created devices      [ ok ]
  changing rps_default_mask don't affect newly child netns[II][ ok ]
  rps_default_mask is 0 by default in child netns             [ ok ]
  changing rps_default_mask in child ns don't affect the main one[ ok ]
  changing rps_default_mask in child ns affects new childns devices[ ok ]
  changing rps_default_mask in child ns don't affect existing devices[ ok ]

Fixes: 3a7d84eae0 ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-05 18:59:32 -07:00
Greg Ungerer
abc33494dd net: fec: make use of MDIO C45 quirk
Not all fec MDIO bus drivers support C45 mode transactions. The older fec
hardware block in many ColdFire SoCs does not appear to support them, at
least according to most of the different ColdFire SoC reference manuals.
The bits used to generate C45 access on the iMX parts, in the OP field
of the MMFR register, are documented as generating non-compliant MII
frames (it is not documented as to exactly how they are non-compliant).

Commit 8d03ad1ab0 ("net: fec: Separate C22 and C45 transactions")
means the fec driver will always register c45 MDIO read and write
methods. During probe these will always be accessed now generating
non-compliant MII accesses on ColdFire based devices.

Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to
distinguish silicon that supports MDIO C45 framing or not. Add this to
all the existing iMX quirks, so they will be behave as they do now (*).

(*) it seems that some iMX parts may not support C45 transactions either.
    The iMX25 and iMX50 Reference Manuals contain similar wording to
    the ColdFire Reference Manuals on this.

Fixes: 8d03ad1ab0 ("net: fec: Separate C22 and C45 transactions")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230404052207.3064861-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-05 18:58:30 -07:00
Peng Zhang
c45ea315a6 maple_tree: fix a potential concurrency bug in RCU mode
There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.

CPU1:
mtree_insert_range()
  mas_insert()
    mas_store_root()
      ...
      mas_root_expand()
        ...
        rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
        ma_set_meta(node, maple_leaf_64, 0, slot);    <---IP

CPU2:
mtree_load()
  mtree_lookup_walk()
    ma_data_end();

When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.

An example of triggering the bug:

Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().

static DEFINE_MTREE(tree);
int work(void *p) {
	unsigned long val;
	for (int i = 0 ; i< 30; ++i) {
		val = (unsigned long)mtree_load(&tree, 8);
		mdelay(5);
		pr_info("%lu",val);
	}
	return 0;
}

mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);

In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL.  Fix
it by put ma_set_meta() before rcu_assign_pointer().

Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:25 -07:00
Peng Zhang
ec07967d75 maple_tree: fix get wrong data_end in mtree_lookup_walk()
if (likely(offset > end))
	max = pivots[offset];

The above code should be changed to if (likely(offset < end)), which is
correct.  This affects the correctness of ma_data_end().  Now it seems
that the final result will not be wrong, but it is best to change it. 
This patch does not change the code as above, because it simplifies the
code by the way.

Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:25 -07:00
Rongwei Wang
6fe7d6b992 mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
The si->lock must be held when deleting the si from the available list. 
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption.  The only place we have found where this
happens is in the swapoff path.  This case can be described as below:

core 0                       core 1
swapoff

del_from_avail_list(si)      waiting

try lock si->lock            acquire swap_avail_lock
                             and re-add si into
                             swap_avail_head

acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.

It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g.  stress-ng-swap).

However, in the worst case, panic can be caused by the above scene.  In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap.  This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path. 
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)

------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 plist_check_prev_next_node+0x50/0x70
 plist_check_head+0x80/0xf0
 plist_add+0x28/0x140
 add_to_avail_list+0x9c/0xf0
 _enable_swap_info+0x78/0xb4
 __do_sys_swapon+0x918/0xa10
 __arm64_sys_swapon+0x20/0x30
 el0_svc_common+0x8c/0x220
 do_el0_svc+0x2c/0x90
 el0_svc+0x1c/0x30
 el0_sync_handler+0xa8/0xb0
 el0_sync+0x148/0x180
irq event stamp: 2082270

Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.

This problem exists in versions after stable 5.10.y.

Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bf ("swap: choose swap device according to numa node") 
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:24 -07:00
Ryusuke Konishi
42560f9c92 nilfs2: fix sysfs interface lifetime
The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.

Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.

Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.

In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".

Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking.  This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called.  Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".

This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.

Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com
Fixes: dd70edbde2 ("nilfs2: integrate sysfs support into driver")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com
  Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com
Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com
  Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:24 -07:00
Alistair Popple
7c7b962938 mm: take a page reference when removing device exclusive entries
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device.  Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access.  When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.

The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present.  However the fault
handling code does not hold the PTL when taking the page lock.  This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.

This can lead to threads locking or waiting on a folio with a zero
refcount.  Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration.  This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.

Fix this by trying to take a reference on the folio before locking it. 
The code already checks the PTE under the PTL and aborts if the entry is
no longer there.  It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio. 
This case is also detected by the PTE check and the folio is unlocked
without further changes.

Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7 ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:24 -07:00
Yafang Shao
f349b15e18 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
There're some suspicious warn_alloc on my test serer, for example,

[13366.518837] warn_alloc: 81 callbacks suppressed
[13366.518841] test_verifier: vmalloc error: size 4096, page order 0, failed to allocate pages, mode:0x500dc2(GFP_HIGHUSER|__GFP_ZERO|__GFP_ACCOUNT), nodemask=(null),cpuset=/,mems_allowed=0-1
[13366.522240] CPU: 30 PID: 722463 Comm: test_verifier Kdump: loaded Tainted: G        W  O       6.2.0+ #638
[13366.524216] Call Trace:
[13366.524702]  <TASK>
[13366.525148]  dump_stack_lvl+0x6c/0x80
[13366.525712]  dump_stack+0x10/0x20
[13366.526239]  warn_alloc+0x119/0x190
[13366.526783]  ? alloc_pages_bulk_array_mempolicy+0x9e/0x2a0
[13366.527470]  __vmalloc_area_node+0x546/0x5b0
[13366.528066]  __vmalloc_node_range+0xc2/0x210
[13366.528660]  __vmalloc_node+0x42/0x50
[13366.529186]  ? bpf_prog_realloc+0x53/0xc0
[13366.529743]  __vmalloc+0x1e/0x30
[13366.530235]  bpf_prog_realloc+0x53/0xc0
[13366.530771]  bpf_patch_insn_single+0x80/0x1b0
[13366.531351]  bpf_jit_blind_constants+0xe9/0x1c0
[13366.531932]  ? __free_pages+0xee/0x100
[13366.532457]  ? free_large_kmalloc+0x58/0xb0
[13366.533002]  bpf_int_jit_compile+0x8c/0x5e0
[13366.533546]  bpf_prog_select_runtime+0xb4/0x100
[13366.534108]  bpf_prog_load+0x6b1/0xa50
[13366.534610]  ? perf_event_task_tick+0x96/0xb0
[13366.535151]  ? security_capable+0x3a/0x60
[13366.535663]  __sys_bpf+0xb38/0x2190
[13366.536120]  ? kvm_clock_get_cycles+0x9/0x10
[13366.536643]  __x64_sys_bpf+0x1c/0x30
[13366.537094]  do_syscall_64+0x38/0x90
[13366.537554]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[13366.538107] RIP: 0033:0x7f78310f8e29
[13366.538561] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48
[13366.540286] RSP: 002b:00007ffe2a61fff8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[13366.541031] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f78310f8e29
[13366.541749] RDX: 0000000000000080 RSI: 00007ffe2a6200b0 RDI: 0000000000000005
[13366.542470] RBP: 00007ffe2a620010 R08: 00007ffe2a6202a0 R09: 00007ffe2a6200b0
[13366.543183] R10: 00000000000f423e R11: 0000000000000206 R12: 0000000000407800
[13366.543900] R13: 00007ffe2a620540 R14: 0000000000000000 R15: 0000000000000000
[13366.544623]  </TASK>
[13366.545260] Mem-Info:
[13366.546121] active_anon:81319 inactive_anon:20733 isolated_anon:0
 active_file:69450 inactive_file:5624 isolated_file:0
 unevictable:0 dirty:10 writeback:0
 slab_reclaimable:69649 slab_unreclaimable:48930
 mapped:27400 shmem:12868 pagetables:4929
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:15870308 free_pcp:142935 free_cma:0
[13366.551886] Node 0 active_anon:224836kB inactive_anon:33528kB active_file:175692kB inactive_file:13752kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:59248kB dirty:32kB writeback:0kB shmem:18252kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4616kB pagetables:10664kB sec_pagetables:0kB all_unreclaimable? no
[13366.555184] Node 1 active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:50352kB dirty:8kB writeback:0kB shmem:33220kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3896kB pagetables:9052kB sec_pagetables:0kB all_unreclaimable? no
[13366.558262] Node 0 DMA free:15360kB boost:0kB min:304kB low:380kB high:456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[13366.560821] lowmem_reserve[]: 0 2735 31873 31873 31873
[13366.561981] Node 0 DMA32 free:2790904kB boost:0kB min:56028kB low:70032kB high:84036kB reserved_highatomic:0KB active_anon:1936kB inactive_anon:20kB active_file:396kB inactive_file:344kB unevictable:0kB writepending:0kB present:3129200kB managed:2801520kB mlocked:0kB bounce:0kB free_pcp:5188kB local_pcp:0kB free_cma:0kB
[13366.565148] lowmem_reserve[]: 0 0 29137 29137 29137
[13366.566168] Node 0 Normal free:28533824kB boost:0kB min:596740kB low:745924kB high:895108kB reserved_highatomic:28672KB active_anon:222900kB inactive_anon:33508kB active_file:175296kB inactive_file:13408kB unevictable:0kB writepending:32kB present:30408704kB managed:29837172kB mlocked:0kB bounce:0kB free_pcp:295724kB local_pcp:0kB free_cma:0kB
[13366.569485] lowmem_reserve[]: 0 0 0 0 0
[13366.570416] Node 1 Normal free:32141144kB boost:0kB min:660504kB low:825628kB high:990752kB reserved_highatomic:69632KB active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB writepending:8kB present:33554432kB managed:33025372kB mlocked:0kB bounce:0kB free_pcp:270880kB local_pcp:46860kB free_cma:0kB
[13366.573403] lowmem_reserve[]: 0 0 0 0 0
[13366.574015] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
[13366.575474] Node 0 DMA32: 782*4kB (UME) 756*8kB (UME) 736*16kB (UME) 745*32kB (UME) 694*64kB (UME) 653*128kB (UME) 595*256kB (UME) 552*512kB (UME) 454*1024kB (UME) 347*2048kB (UME) 246*4096kB (UME) = 2790904kB
[13366.577442] Node 0 Normal: 33856*4kB (UMEH) 51815*8kB (UMEH) 42418*16kB (UMEH) 36272*32kB (UMEH) 22195*64kB (UMEH) 10296*128kB (UMEH) 7238*256kB (UMEH) 5638*512kB (UEH) 5337*1024kB (UMEH) 3506*2048kB (UMEH) 1470*4096kB (UME) = 28533784kB
[13366.580460] Node 1 Normal: 15776*4kB (UMEH) 37485*8kB (UMEH) 29509*16kB (UMEH) 21420*32kB (UMEH) 14818*64kB (UMEH) 13051*128kB (UMEH) 9918*256kB (UMEH) 7374*512kB (UMEH) 5397*1024kB (UMEH) 3887*2048kB (UMEH) 2002*4096kB (UME) = 32141240kB
[13366.583027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[13366.584380] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[13366.585702] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[13366.587042] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[13366.588372] 87386 total pagecache pages
[13366.589266] 0 pages in swap cache
[13366.590327] Free swap  = 0kB
[13366.591227] Total swap = 0kB
[13366.592142] 16777082 pages RAM
[13366.593057] 0 pages HighMem/MovableOnly
[13366.594037] 357226 pages reserved
[13366.594979] 0 pages hwpoisoned

This failure really confuse me as there're still lots of available pages. 
Finally I figured out it was caused by a fatal signal.  When a process is
allocating memory via vm_area_alloc_pages(), it will break directly even
if it hasn't allocated the requested pages when it receives a fatal
signal.  In that case, we shouldn't show this warn_alloc, as it is
useless.  We only need to show this warning when there're really no enough
pages.

Link: https://lkml.kernel.org/r/20230330162625.13604-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:24 -07:00
Tetsuo Handa
7397031622 nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing
"struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when
being passed to CRC function.

Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com
Reported-by: syzbot <syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com>
  Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b
Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com>
  Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_pg@mail.gmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:23 -07:00
Ryusuke Konishi
6be49d100c nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
The finalization of nilfs_segctor_thread() can race with
nilfs_segctor_kill_thread() which terminates that thread, potentially
causing a use-after-free BUG as KASAN detected.

At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member
of "struct nilfs_sc_info" to indicate the thread has finished, and then
notifies nilfs_segctor_kill_thread() of this using waitqueue
"sc_wait_task" on the struct nilfs_sc_info.

However, here, immediately after the NULL assignment to "sc_task", it is
possible that nilfs_segctor_kill_thread() will detect it and return to
continue the deallocation, freeing the nilfs_sc_info structure before the
thread does the notification.

This fixes the issue by protecting the NULL assignment to "sc_task" and
its notification, with spinlock "sc_state_lock" of the struct
nilfs_sc_info.  Since nilfs_segctor_kill_thread() does a final check to
see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate
the race.

Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:23 -07:00
Sergey Senozhatsky
618a8a917d zsmalloc: document freeable stats
When freeable class stat was added to classes file (back in 2016) we
forgot to update zsmalloc documentation.  Fix that.

Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org
Fixes: 1120ed5483 ("mm/zsmalloc: add `freeable' column to pool stat")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:23 -07:00
Sergey Senozhatsky
119b57eaf0 zsmalloc: document new fullness grouping
Patch series "zsmalloc: minor documentation updates".

Two minor patches that bring zsmalloc documentation up to date.


This patch (of 2):

Update documentation and reflect new zspages fullness grouping (we don't
use almost_empty and almost_full anymore).

Link: https://lkml.kernel.org/r/20230325024631.2817153-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20230325024631.2817153-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Fixes: 67e157eb3639 ("zsmalloc: show per fullness group class stats")
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:23 -07:00
Shiyang Ruan
f76b3a3287 fsdax: force clear dirty mark if CoW
XFS allows CoW on non-shared extents to combat fragmentation[1].  The old
non-shared extent could be mwrited before, its dax entry is marked dirty. 

This results in a WARNing:

[   28.512349] ------------[ cut here ]------------
[   28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390
[   28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
[   28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117
[   28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014
[   28.516307] RIP: 0010:dax_insert_entry+0x342/0x390
[   28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48
[   28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086
[   28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b
[   28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff
[   28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8
[   28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155
[   28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8
[   28.519703] FS:  00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
[   28.520148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0
[   28.520863] PKRU: 55555554
[   28.521043] Call Trace:
[   28.521219]  <TASK>
[   28.521368]  dax_fault_iter+0x196/0x390
[   28.521595]  dax_iomap_pte_fault+0x19b/0x3d0
[   28.521852]  __xfs_filemap_fault+0x234/0x2b0
[   28.522116]  __do_fault+0x30/0x130
[   28.522334]  do_fault+0x193/0x340
[   28.522586]  __handle_mm_fault+0x2d3/0x690
[   28.522975]  handle_mm_fault+0xe6/0x2c0
[   28.523259]  do_user_addr_fault+0x1bc/0x6f0
[   28.523521]  exc_page_fault+0x60/0x140
[   28.523763]  asm_exc_page_fault+0x22/0x30
[   28.524001] RIP: 0033:0x7f14a0b589ca
[   28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[   28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202
[   28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046
[   28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000
[   28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000
[   28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c
[   28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0
[   28.527449]  </TASK>
[   28.527600] ---[ end trace 0000000000000000 ]---


To be able to delete this entry, clear its dirty mark before
invalidate_inode_pages2_range().

[1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/

Link: https://lkml.kernel.org/r/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: f80e166888 ("fsdax: invalidate pages when CoW")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:23 -07:00
Peter Xu
60d5b473d6 mm/hugetlb: fix uffd wr-protection for CoW optimization path
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set.  It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.

Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't
need.  However there's one CoW optimization path that can trigger
hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.

This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected.  The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g.  when a present pte was
wr-protected) will resolve the uffd-wp bit already.  Also make sure
anonymous UNSHARE won't be affected and can still be resolved, IOW only
skip CoW not CoR.

This patch will be needed for v5.19+ hence copy stable.

[peterx@redhat.com: v2]
  Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n
[peterx@redhat.com: v3]
  Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com
Fixes: 166f3ecc0d ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:22 -07:00
Liam R. Howlett
3dd4432549 mm: enable maple tree RCU mode by default
Use the maple tree in RCU mode for VMA tracking.

The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock.  This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.

It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs.  syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space.  Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.

Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.

[Liam.Howlett@Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+8d95422d3537159ca390@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:22 -07:00
Liam R. Howlett
790e1fa86b maple_tree: add RCU lock checking to rcu callback functions
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain.  Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.

Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree.  Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking.  It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.

Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:22 -07:00
Liam R. Howlett
0a2b18d948 maple_tree: add smp_rmb() to dead node detection
Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.

The is necessary for RCU mode.

Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:22 -07:00
Liam R. Howlett
c13af03de4 maple_tree: fix write memory barrier of nodes once dead for RCU mode
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes.  To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.

There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead.  Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.

Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.

This is necessary for the RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:21 -07:00
Liam Howlett
8372f4d83f maple_tree: remove extra smp_wmb() from mas_dead_leaves()
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed.  This is an optimization for the RCU mode
of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:21 -07:00
Liam Howlett
2e5b4921f8 maple_tree: fix freeing of nodes in rcu mode
The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes. 
Avoid this by setting the correct node types.  This is necessary for the
RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-4-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:21 -07:00
Liam Howlett
a7b92d59c8 maple_tree: detect dead nodes in mas_start()
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode.  Detect and restart the walk if
this is the case.  This is necessary for RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:21 -07:00
Liam Howlett
39d0bd86c4 maple_tree: be more cautious about dead nodes
Patch series "Fix VMA tree modification under mmap read lock".

Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree.  The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.

This only happens for stack VMAs.  We had thought this was safe as it only
modifies a single pivot in the tree.  Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA.  This causes us to delete the NULL entry between the two VMAs
and rewrite the node.

We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node.  We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node. 
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.

These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset.  They
were also sent as part of the "Per-VMA locks" v4 patch series.  Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.

Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes.  This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.


This patch (of 8):

ma_pivots() and ma_data_end() may be called with a dead node.  Ensure to
that the node isn't dead before using the returned values.

This is necessary for RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chriscli@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: freak07 <michalechner92@googlemail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:20 -07:00
Jakub Kicinski
cbeb1c1b68 wireless fixes for v6.3
mt76 has a fix for leaking cleartext frames on a certain scenario and
 two firmware file handling related fixes. For brcmfmac we have a fix
 for an older SDIO suspend regression and for ath11k avoiding a kernel
 crash during hibernation with SUSE kernels.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmQtUzQRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZuz/wgAlpzStpfnJkVUJRRjCb0yWDPKz+NYYeLh
 wQ3Ad+T/KUiav870DlFStRcaZTUi6JyIk61cEnIk6ZB36ivCML6KJ/7GAZFIcI2l
 lq/dY8zG2GX0kkSufw24P+6tD3LQguejMv0YHYVUEEXTo4b16dyE5nja2acaOjQI
 LDQY6Gq5A/TdMyq9VPLWpPjBJkwEWjexyzJu57A4hExgGZviMAyVWNlBdNOkRn2E
 sMt/4ibgchVTxnmHfvS8d+gmEuxiMLi7PHn17gY80kHh2seCc9ODg6EbVxvIi18b
 gUj6QXfD9dSJ+ghhV3M7oH+6GLSXDf/c+f1Xy8gupupoQiDxJ+T8UQ==
 =mdQP
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.3

mt76 has a fix for leaking cleartext frames on a certain scenario and
two firmware file handling related fixes. For brcmfmac we have a fix
for an older SDIO suspend regression and for ath11k avoiding a kernel
crash during hibernation with SUSE kernels.

* tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mt76: ignore key disable commands
  wifi: ath11k: reduce the MHI timeout to 20s
  wifi: mt76: mt7921: fix fw used for offload check for mt7922
  wifi: mt76: mt7921: Fix use-after-free in fw features query.
  wifi: brcmfmac: Fix SDIO suspend/resume regression
====================

Link: https://lore.kernel.org/r/20230405105536.4E946C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-05 17:24:26 -07:00