Commit Graph

967925 Commits

Author SHA1 Message Date
Lang Cheng
f93c39bc95 RDMA/hns: Add support for QP stash
Stash is a mechanism that uses the core information carried by the ARM AXI
bus to access the L3 cache. It can be used to improve the performance by
increasing the hit ratio of L3 cache. QPs need to enable stash by default.

Link: https://lore.kernel.org/r/1606374251-21512-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 12:53:59 -04:00
Lang Cheng
bfefae9f10 RDMA/hns: Add support for CQ stash
Stash is a mechanism that uses the core information carried by the ARM AXI
bus to access the L3 cache. It can be used to improve the performance by
increasing the hit ratio of L3 cache. CQs need to enable stash by default.

Link: https://lore.kernel.org/r/1606374251-21512-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 12:53:59 -04:00
Yangyang Li
71586dd200 RDMA/hns: Create QP with selected QPN for bank load balance
In order to improve performance by balancing the load between different
banks of cache, the QPC cache is desigend to choose one of 8 banks
according to lower 3 bits of QPN. The hns driver needs to count the number
of QP on each bank and then assigns the QP being created to the bank with
the minimum load first.

Link: https://lore.kernel.org/r/1606220649-1465-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 12:44:26 -04:00
Leon Romanovsky
66f57b871e RDMA/restrack: Support all QP types
The latest changes in restrack name handling allowed to simplify the QP
creation code to support all types of QPs.

For example XRC QP are presented with rdmatool.

$ ibv_xsrq_pingpong &
$ rdma res show qp
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib]
link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs]
link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong

Link: https://lore.kernel.org/r/20201117070148.1974114-4-leon@kernel.org
Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 11:38:46 -04:00
Leon Romanovsky
2b1f747071 RDMA/core: Allow drivers to disable restrack DB
Driver QP types are special case with no IBTA restrictions. For example,
EFA implemented creation of this QP type as regular one, while mlx5
separated create to two step: create and modify. That separation causes to
the situation where DC QP (mlx5) is always added to the same xarray index
zero.

This change allows to drivers like mlx5 simply disable restrack DB
tracking, but it doesn't disable kref on the memory.

Fixes: 52e0a118a2 ("RDMA/restrack: Track driver QP types in resource tracker")
Link: https://lore.kernel.org/r/20201117070148.1974114-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 11:38:46 -04:00
Leon Romanovsky
b47a98efa9 RDMA/core: Track device memory MRs
Device memory (DM) are registered as MR during initialization flow, these
MRs were not tracked by resource tracker and had res->valid set as a
false. Update the code to manage them too.

Before this change:
$ ibv_rc_pingpong -j &
$ rdma res show mr <-- shows nothing

After this change:
$ ibv_rc_pingpong -j &
$ rdma res show mr
dev ibp0s9 mrn 0 mrlen 4096 pdn 3 pid 734 comm ibv_rc_pingpong

Fixes: be934cca9e ("IB/uverbs: Add device memory registration ioctl support")
Link: https://lore.kernel.org/r/20201117070148.1974114-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 11:38:46 -04:00
Parav Pandit
7ec3df174f RDMA/mlx5: Use PCI device for dma mappings
DMA operation of the IB device is done using ib_device->dma_device.

Instead of accessing parent of the IB device, use the PCI dma device which
is setup to ib_device->dma_device during IB device registration.

Link: https://lore.kernel.org/r/20201125064628.8431-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Leon Romanovsky
d4b2d19dc5 RDMA/mlx5: Silence the overflow warning while building offset mask
Coverity reports "Potentially overflowing expression ..." warning, which
is correct thing to complain from the compiler point of view, but this is
not possible in the current code. Still, this is a small error as there
are some future situations that might need to use a 32 bit offset. Use ULL
so the calculation works up to 63.

Fixes: b045db62f6 ("RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ")
Link: https://lore.kernel.org/r/20201125061704.6580-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Jason Gunthorpe
d0b7721c5e RDMA/mlx5: Check for ERR_PTR from uverbs_zalloc()
The return code from uverbs_zalloc() was wrongly checked, it is ERR_PTR
not NULL like other allocators:

drivers/infiniband/hw/mlx5/devx.c:2110 devx_umem_reg_cmd_alloc() warn: passing zero to 'PTR_ERR'

Fixes: 878f7b31c3 ("RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx")
Link: https://lore.kernel.org/r/0-v1-4d05ccc1c223+173-devx_err_ptr_jgg@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:04 -04:00
Weihang Li
66d86e529d RDMA/hns: Add UD support for HIP09
HIP09 supports service type of Unreliable Datagram, add necessary process
to enable this feature.

Link: https://lore.kernel.org/r/1605526408-6936-7-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li
534c9bdb02 RDMA/hns: Simplify process of filling UD SQ WQE
There are some codes can be simplified or encapsulated in set_ud_wqe() to
make them easier to be understand.

Link: https://lore.kernel.org/r/1605526408-6936-6-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li
148f904c6f RDMA/hns: Remove the portn field in UD SQ WQE
This field in UD WQE in not used by hardware.

Fixes: 7bdee4158b ("RDMA/hns: Fill sq wqe context of ud type in hip08")
Link: https://lore.kernel.org/r/1605526408-6936-5-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li
3631dadfb1 RDMA/hns: Avoid setting loopback indicator when smac is same as dmac
The loopback flag will be set to 1 by the hardware when the source mac
address is same as the destination mac address. So the driver don't need
to compare them.

Fixes: d6a3627e31 ("RDMA/hns: Optimize wqe buffer set flow for post send")
Link: https://lore.kernel.org/r/1605526408-6936-4-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:48 -04:00
Weihang Li
fba429fcf9 RDMA/hns: Fix missing fields in address vector
Traffic class and hop limit in address vector is not assigned from GRH,
but it will be filled into UD SQ WQE. So the hardware will get a wrong
value.

Fixes: 82e620d9c3 ("RDMA/hns: Modify the data structure of hns_roce_av")
Link: https://lore.kernel.org/r/1605526408-6936-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:47 -04:00
Weihang Li
7406c0036f RDMA/hns: Only record vlan info for HIP08
Information about vlan is stored in GMV(GID/MAC/VLAN) table for HIP09, so
there is no need to copy it to address vector.

Link: https://lore.kernel.org/r/1605526408-6936-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:24:47 -04:00
Avihai Horon
8138a4c21b RDMA/mlx4: Enable querying AH for XRC QP types
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.

Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.

Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.

Hence, we enable it now for INI and TGT QPs as well.

Link: https://lore.kernel.org/r/20201115121425.139833-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 11:58:19 -04:00
Avihai Horon
f957d4d09a RDMA/mlx5: Enable querying AH for XRC QP types
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.

Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.

Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.

Hence, we enable it now for INI and TGT QPs as well.

Link: https://lore.kernel.org/r/20201115121425.139833-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 11:58:19 -04:00
Jason Gunthorpe
dd37d2f59e RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwind
rdma_detroy_id() cannot be called under &lock - we must instead keep the
error'd ID around until &lock can be released, then destroy it.

This is complicated by the usual way listen IDs are destroyed through
cma_process_remove() which can run at any time and will asynchronously
destroy the same ID.

Remove the ID from visiblity of cma_process_remove() before going down the
destroy path outside the locking.

Fixes: c80a0c52d8 ("RDMA/cma: Add missing error handling of listen_id")
Link: https://lore.kernel.org/r/20201118133756.GK244516@ziepe.ca
Reported-by: syzbot+1bc48bf7f78253f664a9@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-25 11:07:01 -04:00
Xi Wang
6f6e2dcbb8 RDMA/hns: Refactor the hns_roce_buf allocation flow
Add a group of flags to control the 'struct hns_roce_buf' allocation
flow, this is used to support the caller running in atomic context.

Link: https://lore.kernel.org/r/1605347916-15964-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 19:36:54 -04:00
Maor Gottlieb
93035242d9 tools/testing/scatterlist: Test dynamic __sg_alloc_table_from_pages
Add few cases to test the dynamic allocation flow of
__sg_alloc_table_from_pages.

Link: https://lore.kernel.org/r/20201115120650.139277-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:51:30 -04:00
Jason Gunthorpe
ed92f6a52b Linux 5.10-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl+69egeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGTSYH/ifRBlaxy5UiHFc0
 2zdR7pkjWrYfDTTT3sazIAhdlzzcfnkUqgFxOP45F4ZIqeTzunH3sUY+5UlT9IX7
 liUgnLxQ/1R9Gx8kPGQfu+tLCey78xVFydGsqJoW9sPRw2R+apMdGGa/lOrk+OXz
 DXIN+dDnGFqwCCNJpK+rxQQhFf++IPpSI8z6Y23moOFhsDZrEziHuVFy2FGyRM6z
 prZ/us/tcobE8ptCk1RmOxLoJ1DR6UxpA2vLimTE+JD8siOsSWPbjE0KudnWCnd5
 BLqIjrsPJbSxyuzzK3v9dnO5wMv7tMDuMIuYM/MQTXDttNwtsqt/aP6gdnUCym7N
 5eHEj5g=
 =MuO1
 -----END PGP SIGNATURE-----

Merge tag 'v5.10-rc5' into rdma.git for-next

For dependencies in following patches

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:50:59 -04:00
Christophe JAILLET
df0e4de29c IB/qib: Use dma_set_mask_and_coherent to simplify code
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.

Link: https://lore.kernel.org/r/20201121095127.1335228-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:37:20 -04:00
Rikard Falkeborn
8210163022 RDMA/i40iw: Constify ops structs
The ops structs are never modified. Make them const to allow the compiler
to put them in read-only memory.

Link: https://lore.kernel.org/r/20201121002529.89148-1-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:30:17 -04:00
Kamal Heib
6d8285e604 RDMA/cxgb4: Validate the number of CQEs
Before create CQ, make sure that the requested number of CQEs is in the
supported range.

Fixes: cfdda9d764 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Link: https://lore.kernel.org/r/20201108132007.67537-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:19:46 -04:00
Jason Gunthorpe
a9d2e9ae95 RDMA/siw,rxe: Make emulated devices virtual in the device tree
This moves siw and rxe to be virtual devices in the device tree:

lrwxrwxrwx 1 root root 0 Nov  6 13:55 /sys/class/infiniband/rxe0 -> ../../devices/virtual/infiniband/rxe0/

Previously they were trying to parent themselves to the physical device of
their attached netdev, which doesn't make alot of sense.

My hope is this will solve some weird syzkaller hits related to sysfs as
it could be possible that the parent of a netdev is another netdev, eg
under bonding or some other syzkaller found netdev configuration.

Nesting a ib_device under anything but a physical device is going to cause
inconsistencies in sysfs during destructions.

Link: https://lore.kernel.org/r/0-v1-dcbfc68c4b4a+d6-virtual_dev_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 16:14:31 -04:00
Gustavo A. R. Silva
808b2c925d IB/mlx5: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding the new pseudo-keyword fallthrough; instead of
letting the code fall through to the next case.

Link: https://lore.kernel.org/r/2b0c87362bc86f6adfe56a5a6685837b71022bbf.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:11 -04:00
Gustavo A. R. Silva
c6191f83be IB/qedr: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Link: https://lore.kernel.org/r/8d7cf00ec3a4b27a895534e02077c2c9ed8a5f8e.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:11 -04:00
Gustavo A. R. Silva
667d457fa8 IB/mlx4: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.

Link: https://lore.kernel.org/r/0153716933e01608d46155941c447d011c59c1e4.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:10 -04:00
Gustavo A. R. Silva
4846bf44e1 IB/hfi1: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://lore.kernel.org/r/13cc2fe2cf8a71a778dbb3d996b07f5e5d04fd40.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:10 -04:00
Linus Torvalds
418baf2c28 Linux 5.10-rc5 2020-11-22 15:36:08 -08:00
Linus Torvalds
d5530d82ef Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:

 - Various functionality / regression fixes for Logitech devices from
   Hans de Goede

 - Fix for (recently added) GPIO support in mcp2221 driver from Lars
   Povlsen

 - Power management handling fix/quirk in i2c-hid driver for certain
   BIOSes that have strange aproach to power-cycle from Hans de Goede

 - a few device ID additions and device-specific quirks

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver
  HID: logitech-dj: Fix an error in mse_bluetooth_descriptor
  HID: Add Logitech Dinovo Edge battery quirk
  HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge
  HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad
  HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices
  HID: mcp2221: Fix GPIO output handling
  HID: hid-sensor-hub: Fix issue with devices with no report ID
  HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown
  HID: add support for Sega Saturn
  HID: cypress: Support Varmilo Keyboards' media hotkeys
  HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses
  HID: logitech-hidpp: Add PID for MX Anywhere 2
  HID: uclogic: Add ID for Trust Flex Design Tablet
2020-11-22 14:36:06 -08:00
Linus Torvalds
f4b936f5d6 A couple of scheduler fixes:
- Make the conditional update of the overutilized state work correctly by
    caching the relevant flags state before overwriting them and checking
    them afterwards.
 
  - Fix a data race in the wakeup path which caused loadavg on ARM64
    platforms to become a random number generator.
 
  - Fix the ordering of the iowaiter accounting operations so it can't be
    decremented before it is incremented.
 
  - Fix a bug in the deadline scheduler vs. priority inheritance when a
    non-deadline task A has inherited the parameters of a deadline task B
    and then blocks on a non-deadline task C.
 
    The second inheritance step used the static deadline parameters of task
    A, which are usually 0, instead of further propagating task B's
    parameters. The zero initialized parameters trigger a bug in the
    deadline scheduler.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+6edsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaJCEAC7VGr9IlWRzCI/173tKAXkLRrGXHVb
 yOYc/YjLMCTcERNxqpf8uIURd/ATSHU/RMwfFcB558NedKZ/QKZDoKmLqeCXnVeM
 e20tXv/fmpqRS7lgtmbBfhQ8mSDhst960oD1mHifdEwEBCCm7mLEaipTuTWjnZ0x
 rOz70Hir1mSjsP0E7ZorsxCr1yExbrt+jZfKCe9D2kUSvlWHf1ipzAYNlqb/DsfG
 n81G7q9LYV8NUhX3lt8oSZDq0K44aO6G6fEaP4EkfwsIAOh37yPHwuEuqDZCBmXw
 rQ17XUU3jQ2MtubPvVEKG/6Z+hAUyOsAKynpq/RhzueXQm/9Ns6+qHX/xY8yh39y
 S5qPd5DLRlac8f7cFwz2zPxP5E+xTJLONgRkuN1XlitMJZBxru9AzDNa0/6on8TM
 OtvbvVR+bPUfHiHULk4fTz7fLcbgYgxbCgfGoFsVlfskOxnzgEG8WfuI2Up2rRJ0
 nr1MCER+5fprciqPPs+18rVEFiC4mQSrV01cnwrNbpW8pqibZSomMilQ0oQvcTGL
 VDEHkaDTa5YbR92Szq4rYbr7Sf0ihFU0EZUNVQnu7SujdVFxTdHb1yr8UYcYp09b
 LqGFhr1FHBNYKbw3rEPx2R/FGuCii21oQkhz94ujDo1Np8EGVZYwFGh+iwbsa2Xn
 K1u0HzqLTfTkMw==
 =HiGq
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
 "A couple of scheduler fixes:

   - Make the conditional update of the overutilized state work
     correctly by caching the relevant flags state before overwriting
     them and checking them afterwards.

   - Fix a data race in the wakeup path which caused loadavg on ARM64
     platforms to become a random number generator.

   - Fix the ordering of the iowaiter accounting operations so it can't
     be decremented before it is incremented.

   - Fix a bug in the deadline scheduler vs. priority inheritance when a
     non-deadline task A has inherited the parameters of a deadline task
     B and then blocks on a non-deadline task C.

     The second inheritance step used the static deadline parameters of
     task A, which are usually 0, instead of further propagating task
     B's parameters. The zero initialized parameters trigger a bug in
     the deadline scheduler"

* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix priority inheritance with multiple scheduling classes
  sched: Fix rq->nr_iowait ordering
  sched: Fix data-race in wakeup
  sched/fair: Fix overutilized update in enqueue_task_fair()
2020-11-22 13:26:07 -08:00
Linus Torvalds
48da330589 A single fix for the x86 perf sysfs interfaces which used kobject
attributes instead of device attributes and therefore making Clangs control
 flow integrity checker upset.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+6diATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaVKD/9pbxJEopP0ojZx8MbGVxPn1dzgii6k
 YgWOemMPGV/hi87pCSah1RcNVY8aMwMUBnzMJEhx9bCYFMxCummM7wcoN7bJneWU
 f548CesgeKP662vFYjAXrsASX0DZAUUQm/MTnkLqVlD11yaqudxKEjEYNi2jVehq
 7tTip8Eb88w5dyQJLN7sjZuV9AdZCiXeiyGgYb3jVjfdfTuq4KRp6XdL9SghEfwV
 3qDOrcBuLKgQbz+eE2C1tVX+/kSNfG1stNAWkENHBWfrnmbeCt7YdT6Gd6bays0b
 1Ul0LgZwXbDdkQrtmoEGtI8j/Q+yiozeJY7+eeh/Z9nw0eviZ8ir36y96wRukOqw
 od+EjENiF1hIJk21r6mIuQoMyezHraIMgg0WF4/YT1XRGqwR1wqgHpmwqMjmem0Y
 Kuab7aYC4g8vGk9ah2/FmTnTHooEyk8GyxnVXO+aSdLbFI04L7HBDILvgvpHPyCS
 BnUDZE785W8yLydhKslfy92kVQGKuGaSqQUQcHD+c/L7m2AexoKrp1RJgA0gXtDj
 ilmE7KNCCbeNXt3cDep6BQzymw9YStPNSPBSv4dK73DM/Y2qKBk3RD0BfMqdCkOy
 sBqUCWRY1/q6Z+64zwDlER5Fn+KflE0ihobTxhA6m0yx7AOn9BoMDWfo3xGKw6Mn
 nhVDpafF8YuVIw==
 =xsma
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
 "A single fix for the x86 perf sysfs interfaces which used kobject
  attributes instead of device attributes and therefore making clang's
  control flow integrity checker upset"

* tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: fix sysfs type mismatches
2020-11-22 13:23:43 -08:00
Linus Torvalds
855cf1ee47 A single fix for lockdep which makes the recursion protection cover graph
lock/unlock.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+6dQ4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXLiD/91TaQTrgVht6NxF5CmvEpoYPzGg/TO
 MeU7GmgeXv7ciIIDaPONsCLOGjGx5dSxk1QJ5OH+GhJ/lIXbtvbVUgbs7Ezjkl4d
 SyjRHNv3wYV7VJySEgqxo+Rd6fCQN7rOf/J3ZbzPHtMl40qxSpz1FeXTRft7cn62
 pkunRKM+f/tE/Ue96tiHCQVbwRxG3LEPIuXtLJQqtPvgMzb9zhC9mZpUcr5ZJyVD
 BfX69iIs1W1XKql6yH/KuBUY9d3qRzzmhko6a2mMSL3+VX0j4o46LMlP4HRndKTT
 5fEsUC3UOItqn7X4f3HHeQhckEAKXNGEMR30th84CligiEanQuIIHNNKazsnp87A
 GT/XA8RO9+OAd5TRP4CEaFDUR0a0OOZjiw3vW6E77BObi31Se9/EO3Z7XpJC2BSl
 vTsukplOmeilyspQJT7inVgZrtUlF0lpVV5YbMLjtbn5kzLu/75GZN3JagwCmflN
 GKbix/8tnH0KrbizCE8ujUZl+aNjk2TM64mAzOXMlmg+yZ47af31hFs846tCMhv5
 JHXkiMaXpWx3PapgGHM+4PJ1WDGTxm6mUvgAcyCVWgT4gxVXsppnkn0RLdRTGQ50
 C9movzOn1K2fEe7Pp40arG0tm18l9Ukp7mqN1CL+givRKNb61Owybe6qh2qk0IWT
 DAcgdC9h2rdTqQ==
 =dsm4
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Thomas Gleixner:
 "A single fix for lockdep which makes the recursion protection cover
  graph lock/unlock"

* tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Put graph lock/unlock under lock_recursion protection
2020-11-22 13:19:53 -08:00
Linus Torvalds
68d3fa235f Couple of EFI fixes for v5.10:
- fix memory leak in efivarfs driver
 - fix HYP mode issue in 32-bit ARM version of the EFI stub when built in
   Thumb2 mode
 - avoid leaking EFI pgd pages on allocation failure
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl+tRcoACgkQwjcgfpV0
 +n1LZQf/af1p9A0zT1nC3IrcaABceJDnD3dJuV9SD6QhFuD2Dw/Mshr+MVzDsO+3
 btvuu0r4CzQ5ajfpfcGcvBFFWbbPTwKvWQe++9Unwoz5acw7hpV5yxNwMivdaJEh
 3o4pkgpCmwtliTwiroDficO7Vlqefqf4LZd7/iQYQTnuPK3waYQBjwp9t2D9tlx7
 kXiEQDP2BDRCUrKEjlR7AHTZ156mw+UsiquAuxMCGTKBqwiELEEV6aPseqa5MmNV
 RDV1IXWdhOQyQfzg0s6vTzwGeN0JubSxHng6O9UbE+tctz4EqaaHIEsRuMBq8oLD
 Y8JeGp1ovypTJxeLE5t6eEzsfvTRsg==
 =GnmM
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Borislav Petkov:
 "Forwarded EFI fixes from Ard Biesheuvel:

   - fix memory leak in efivarfs driver

   - fix HYP mode issue in 32-bit ARM version of the EFI stub when built
     in Thumb2 mode

   - avoid leaking EFI pgd pages on allocation failure"

* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/x86: Free efi_pgd with free_pages()
  efivarfs: fix memory leak in efivarfs_create()
  efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
2020-11-22 13:05:48 -08:00
Linus Torvalds
7d53be55c9 * An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
same because the proper one is going through the IOMMU tree. (Thomas Gleixner)
 
 * An Intel microcode loader fix to save the correct microcode patch to
   apply during resume. (Chen Yu)
 
 * A fix to not access user memory of other processes when dumping opcode
   bytes. (Thomas Gleixner)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+6QnYACgkQEsHwGGHe
 VUpMgg//cE4GvK7dFihq5iNYS16yKKgyVBLwHiHoCxQkOu4CPSf0EejTMLKPSKpA
 hS7pJm6dMYLJ7PhIKRwlQD1NYet3SHCOi1h0Y2/3rVay82Vz/As2nZRd7paEkP9k
 kiNd/Ib2uAQ9ZwCKhmLanOWDoXXFo5Fwmpw+O/zhn31Y3kBx05mc2ojXnEwCERK/
 j6GwOBkfOq+q6D7HzsfEehlX0iy2rlPSq6OUN1H/Z4w+W9zfpubLX9bhhr9RvDmS
 6Hsmwos8fM/cQXZC2oxoKUP6zeYk5lDF/Qf7hOFt3WgdCpTZP2z6uLwUL6+NXu/C
 Ms+77Cx+Loe8UaDkyxXXruiv7mZIZCnl0phZulE4ibOjzL7W7w5cEf0LHdDS7uDL
 JzZwTJlbK9RjrSp4pXR+rn+U/PdluZJYpjh/Gj8+V23QszvbDsVd55SV5+CLSTM/
 qwrNp5ffLEnIFHmW98/OQKMvNTTzBFdlltnJm7mMW2JucgfCZMx4zwY0+dZ+ocDq
 /Kt/HNCzez1jHWiS5jdZCM15KuIhPTNuTpq0ipU+UAXxZOaNtTo1st1lhKoPvTz/
 k4y7S/EibX6PK4lxBQMM3fBG/T1KK7ZMgNEN0jbhWegqEmdWtgwc57p1gVE70XUU
 f6elyUupD65TMwgio2I8sQuiNBCKfF2Wj144dxXGJnGnOoZKey8=
 =z/Gq
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
   same because the proper one is going through the IOMMU tree (Thomas
   Gleixner)

 - An Intel microcode loader fix to save the correct microcode patch to
   apply during resume (Chen Yu)

 - A fix to not access user memory of other processes when dumping
   opcode bytes (Thomas Gleixner)

* tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
  x86/dumpstack: Do not try to access user space code of other tasks
  x86/microcode/intel: Check patch signature before saving microcode for early loading
  iommu/vt-d: Take CONFIG_PCI_ATS into account
2020-11-22 12:55:50 -08:00
Linus Torvalds
4a51c60a11 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "8 patches.

  Subsystems affected by this patch series: mm (madvise, pagemap,
  readahead, memcg, userfaultfd), kbuild, and vfs"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: fix madvise WILLNEED performance problem
  libfs: fix error cast of negative value in simple_attr_write()
  mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
  mm: memcg/slab: fix root memcg vmstats
  mm: fix readahead_page_batch for retry entries
  mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
  compiler-clang: remove version check for BPF Tracing
  mm/madvise: fix memory leak from process_madvise
2020-11-22 12:14:46 -08:00
Linus Torvalds
d27637ece8 Staging/IIO fixes for 5.10-rc5
Here are some small Staging and IIO driver fixes for 5.10-rc5.  They
 include:
 	- IIO fixes for reported regressions and problems
 	- new device ids for IIO drivers
 	- new device id for rtl8723bs driver
 	- staging ralink driver Kconfig dependency fix
 	- staging mt7621-pci bus resource fix
 
 All of these have been in linux-next all week with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX7pN0A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymnAACgrCPVrnWr2mblmLhBWYpq3LjLri0AoIsYd7FG
 138ClPlHDOjNTzKQaF5Q
 =YLIp
 -----END PGP SIGNATURE-----

Merge tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO fixes from Greg KH:
 "Here are some small Staging and IIO driver fixes for 5.10-rc5. They
  include:

   - IIO fixes for reported regressions and problems

   - new device ids for IIO drivers

   - new device id for rtl8723bs driver

   - staging ralink driver Kconfig dependency fix

   - staging mt7621-pci bus resource fix

  All of these have been in linux-next all week with no reported issues"

* tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
  iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
  docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI
  iio: light: fix kconfig dependency bug for VCNL4035
  iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
  iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
  staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
  staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK
  staging: mt7621-pci: avoid to request pci bus resources
  iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
  counter/ti-eqep: Fix regmap max_register
  iio: adc: stm32-adc: fix a regression when using dma and irq
  iio: adc: mediatek: fix unset field
  iio: cros_ec: Use default frequencies when EC returns invalid information
2020-11-22 11:58:49 -08:00
Linus Torvalds
de75803570 TTY fixes for 5.10-rc5
Here are some small tty/serial fixes for 5.10-rc5 that resolve some
 reported issues:
 	- speakup crash when telling the kernel to use a device that
 	  isn't really there
 	- imx serial driver fixes for reported problems
 	- ar933x_uart driver fix for probe error handling path
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX7pOWQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynoYACeL51JkQiL72sGLe8T+nuMzKptckEAniCgLHb0
 0IBcBzMM/PgpqmOIQRRE
 =CDRI
 -----END PGP SIGNATURE-----

Merge tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty fixes from Greg KH:
 "Here are some small tty/serial fixes for 5.10-rc5 that resolve some
  reported issues:

   - speakup crash when telling the kernel to use a device that isn't
     really there

   - imx serial driver fixes for reported problems

   - ar933x_uart driver fix for probe error handling path

  All have been in linux-next for a while with no reported issues"

* tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: ar933x_uart: disable clk on error handling path in probe
  tty: serial: imx: keep console clocks always on
  speakup: Do not let the line discipline be used several times
  tty: serial: imx: fix potential deadlock
2020-11-22 11:52:10 -08:00
Linus Torvalds
a7f07fc14f A final set of miscellaneous bug fixes for ext4
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl+6tkoACgkQ8vlZVpUN
 gaO1xgf/aJ5chEWFEQVrdSdd+cvhuILz1Hp2iW8xdZgPeN2ovWIC3LPCTDr9FWB0
 MhpGS9avYIf8mHZgsw7HVzqUv6gPcT0khragPp348QJzxnbz/saZ5ujK/WR2zJxr
 SoB9f2vdqW0gBbKMO6avXm0gTnuNemcK5oH6tzI5ECBpV3Ltk1dJWtgQkVp9rAyP
 EFEb9hUYpdZ3J1cm8SCUIO99Tu2KMd+yNRv42z0BKNTfBNe2P5aG56p1sMYMcIr7
 BiVUrhPkbAf3gMsMDzZQE5mHZHyzJoHNHssLHWcEU/o9Wd2wZHjAEzzn9Tz49rUg
 yhYTrhLcQJcfmL0XvgrIhJaXTfMc6w==
 =s/1A
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "A final set of miscellaneous bug fixes for ext4"

* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix bogus warning in ext4_update_dx_flag()
  jbd2: fix kernel-doc markups
  ext4: drop fast_commit from /proc/mounts
2020-11-22 11:39:32 -08:00
David Howells
a9e5c87ca7 afs: Fix speculative status fetch going out of order wrt to modifications
When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.

To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).

When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.

A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes.  What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.

This results in something like the following being seen in dmesg:

	kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification.  This was causing the cache to be
invalidated for that vnode when it shouldn't have been.  If it happens
on a data file, this might lead to local changes being lost.

Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.

Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway.  It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 11:27:03 -08:00
Matthew Wilcox (Oracle)
66383800df mm: fix madvise WILLNEED performance problem
The calculation of the end page index was incorrect, leading to a
regression of 70% when running stress-ng.

With this fix, we instead see a performance improvement of 3%.

Fixes: e6e88712e4 ("mm: optimise madvise WILLNEED")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Yicong Yang
488dac0c92 libfs: fix error cast of negative value in simple_attr_write()
The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion.  It will lead to the error cast if user inputs a
negative value.

Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value.  The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly.  Make 'val' unsigned long long as what kstrtoull() takes,
this will eliminate the compile warning on no 64-bit architectures.

Fixes: f7b88631a8 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Gerald Schaefer
bfe8cc1db0 mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.

In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases.  In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock.  Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.

For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm.  The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.

For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.

Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.

Commit 6b251fc96c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().

  BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
  Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334

  CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
  Hardware name: IBM 3906 M04 701 (KVM/Linux)
  Call Trace:
    do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
    create_huge_pmd mm/memory.c:4256 [inline]
    __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
    handle_mm_fault+0x288/0x748 mm/memory.c:4607
    do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
    do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
    pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
    copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
    raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
    _copy_from_user+0x48/0xa8 lib/usercopy.c:16
    copy_from_user include/linux/uaccess.h:192 [inline]
    __do_sys_sigaltstack kernel/signal.c:4064 [inline]
    __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
    system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

  Allocated by task 9334:
    slab_alloc_node mm/slub.c:2891 [inline]
    slab_alloc mm/slub.c:2899 [inline]
    kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
    vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
    __split_vma+0xba/0x560 mm/mmap.c:2742
    split_vma+0xca/0x108 mm/mmap.c:2800
    mlock_fixup+0x4ae/0x600 mm/mlock.c:550
    apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
    do_mlock+0x1aa/0x718 mm/mlock.c:711
    __do_sys_mlock2 mm/mlock.c:738 [inline]
    __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
    system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

  Freed by task 9333:
    slab_free mm/slub.c:3142 [inline]
    kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
    __vma_adjust+0x7b2/0x2508 mm/mmap.c:960
    vma_merge+0x87e/0xce0 mm/mmap.c:1209
    userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
    __fput+0x22c/0x7a8 fs/file_table.c:281
    task_work_run+0x200/0x320 kernel/task_work.c:151
    tracehook_notify_resume include/linux/tracehook.h:188 [inline]
    do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
    system_call+0xe6/0x28c arch/s390/kernel/entry.S:416

  The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200
  The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10)
  The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab)
  raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
  raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
  page dumped because: kasan: bad access detected
  page->mem_cgroup:0000000096959501

  Memory state around the buggy address:
   00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
  >00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                        ^
   00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
   00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ==================================================================

Fixes: 6b251fc96c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org>	[4.3+]
Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Muchun Song
8faeb1ffd7 mm: memcg/slab: fix root memcg vmstats
If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for
the root memcg.  Now this at least affects the vmstat of
NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack size is
smaller than the PAGE_SIZE.

David said:
 "I assume that without this fix that the root memcg's vmstat would
  always be inflated if we reparented"

Fixes: ec9f02384f ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: <stable@vger.kernel.org>	[5.3+]
Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Matthew Wilcox (Oracle)
4349a83a31 mm: fix readahead_page_batch for retry entries
Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for.  This was caused by a missing
check in the iterator.

As can be seen in the below panic log, the accessing 0x402 causes a
panic.  In the xarray.h, 0x402 means RETRY_ENTRY.

  BUG: kernel NULL pointer dereference, address: 0000000000000402
  CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1
  Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020
  RIP: 0010:fuse_readahead+0x152/0x470 [fuse]
  Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28 f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b 10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01
  RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246
  RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002
  RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60
  RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0
  R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68
  FS:  00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0
  Call Trace:
    read_pages+0x83/0x270
    page_cache_readahead_unbounded+0x197/0x230
    generic_file_buffered_read+0x57a/0xa20
    new_sync_read+0x112/0x1a0
    vfs_read+0xf8/0x180
    ksys_read+0x5f/0xe0
    do_syscall_64+0x33/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 042124cc64 ("mm: add new readahead_control API")
Reported-by: David Sterba <dsterba@suse.com>
Reported-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201103142852.8543-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201103124349.16722-1-vvghjk1234@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Dan Williams
a927bd6ba9 mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid().  That
symbol is exported for modules.  However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:

	CONFIG_NUMA_KEEP_MEMINFO=y
	CONFIG_MEMORY_HOTPLUG=y

...and:

	CONFIG_NUMA_KEEP_MEMINFO=n
	CONFIG_MEMORY_HOTPLUG=y

...it failed to export the symbol in the case of:

	CONFIG_NUMA_KEEP_MEMINFO=y
	CONFIG_MEMORY_HOTPLUG=n

Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.

Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols.  Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.

The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h.  In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h.  An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.

The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.

[dan.j.williams@intel.com: v4]
  Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
  Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com

Fixes: a035b6bf86 ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Nick Desaulniers
bc2dc4406c compiler-clang: remove version check for BPF Tracing
bpftrace parses the kernel headers and uses Clang under the hood.

Remove the version check when __BPF_TRACING__ is defined (as bpftrace
does) so that this tool can continue to parse kernel headers, even with
older clang sources.

Fixes: commit 1f7a44f63e ("compiler-clang: add build check for clang 10.0.1")
Reported-by: Chen Yu <yu.chen.surf@gmail.com>
Reported-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lkml.kernel.org/r/20201104191052.390657-1-ndesaulniers@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Eric Dumazet
450677dcb0 mm/madvise: fix memory leak from process_madvise
The early return in process_madvise() will produce a memory leak.

Fix it.

Fixes: ecb8ac8b1f ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201116155132.GA3805951@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22 10:48:22 -08:00
Linus Torvalds
a349e4c659 Fixes for 5.10-rc5:
- Fix various deficiencies in online fsck's metadata checking code.
 - Fix an integer casting bug in the xattr code on 32-bit systems.
 - Fix a hang in an inode walk when the inode index is corrupt.
 - Fix error codes being dropped when initializing per-AG structures
 - Fix nowait directio writes that partially succeed but return EAGAIN.
 - Revert last week's rmap comparison patch because it was wrong.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl+38UwACgkQ+H93GTRK
 tOtMFQ/9EV2/I673TJj8GY5gJE9mUXGbiUyMedl8XammhNPL0JZMAGsnMXBWCtjO
 0pmO+TS4epwlYpZ4XVLlxNUxUwDkxgq3nHKbEz2jezepKPTCasL7XZOECGQ1gKdA
 11BRNP7Y91ndYtVZGxHu+92oeZAzJgTh6OVYtJytniTgF9r96hgr/+3dA8GQxkqm
 bkkfWfKxxCwMYLRLRNcnVbkj0xDMgmKOILyFR63ZhW8RtrfmdIUYDUty7RGvj4bJ
 csZmrkcu/wIj+9NeXw8KS5KpNOWu2q3baORXe6EodoVgFMa4I11kiuGucZehsIbH
 yNgTLDaFNUm1aBCkSrYtz7m4iwLq8No7XB/OIXrALSd5yJqaXhDyMnEV/tBeAL7D
 MXn032Sc6hPSyGBtCmurTSo61oKP3HjgMXA4vvNw5CxJ7Q4EoZyBXCdHtZcRnB7+
 MSa+ylBTbmP/AJ2AQrPiArGlAKUTnJM6WknIBCWiIueRtadTh1cquBFVbDxoEIX5
 eKcjdQrX2xNrFNE2rRuYI4ml+wwtdgk7JO41gjAw+NA2V1LJW6Q5A5RKX2PiOidC
 oGdNPTLG7Rfh7sMaPo66X3xTQPoOwcV0O+ArXlFNDBZXDUw0d1tWzVfYo+/2Zym6
 3sFcTKMdTKtG8NasNjvbanmZTV1VLbZAJRdevH1NFAWUICiTBkY=
 =HoPI
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "The critical fixes are for a crash that someone reported in the xattr
  code on 32-bit arm last week; and a revert of the rmap key comparison
  change from last week as it was totally wrong. I need a vacation. :(

  Summary:

   - Fix various deficiencies in online fsck's metadata checking code

   - Fix an integer casting bug in the xattr code on 32-bit systems

   - Fix a hang in an inode walk when the inode index is corrupt

   - Fix error codes being dropped when initializing per-AG structures

   - Fix nowait directio writes that partially succeed but return EAGAIN

   - Revert last week's rmap comparison patch because it was wrong"

* tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: revert "xfs: fix rmap key and record comparison functions"
  xfs: don't allow NOWAIT DIO across extent boundaries
  xfs: return corresponding errcode if xfs_initialize_perag() fail
  xfs: ensure inobt record walks always make forward progress
  xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
  xfs: directory scrub should check the null bestfree entries too
  xfs: strengthen rmap record flags checking
  xfs: fix the minrecs logic when dealing with inode root child blocks
2020-11-21 10:36:25 -08:00