Commit Graph

1074338 Commits

Author SHA1 Message Date
Jakub Kicinski
ea97ab9889 Here are some batman-adv bugfixes:
- Remove redundant iflink requests, by Sven Eckelmann (2 patches)
 
  - Don't expect inter-netns unique iflink indices, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmIfmmUWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSy0D/9XfRgiSQH9DuyzXjlRMV1qMgNO
 9O0HOVqZagRhtydGNsBhu1mSFcBIFUhyRD20BvyKrulsgb9Cpkojk9zv31kV0W9S
 PIK81oNVplAiDnz3aZmGWqC1GEZHziJOezvr8eBIvdRay+F/M5oqWqqtQmDyZFXc
 JEnQcJyNZTGAmfYKM8Nq5iAnqoIkPIxChvipR7P12g/cjd0owbzl/QShdlLw0nDe
 zK8Pj/zXSA+hXdIsWrh9n7yU7m7sTWWMOo0ElkOhwb6DyDxAhEvIe59jcv0SE+HL
 7zHFf1hJluO7mswK9pJSIm80+P5WtyEBskUjCL0rv7fdAzvuLsv5YLoZvEmYf39h
 /5Wo3rM5iXwNuT9hB9+teUZaPBF1dXAYOusN1Y/vuvNdG+YCq8BJn5UDmiXyYEWF
 23N1R1LQrYs+JoitIS01cpTi69rjZFN5xqtFs+gN84jgfb4vkP2JxjHyz2eBB47Q
 UTYaUh60uYtMLYXZdVFwUP7DbJnN5fGlPg9zIPAU+yHHvyqmrGjY80vn+voJ0Y5w
 wNOzTpyg7+KFpM1CpCccqivWxuJeVbgVS5jL+xegpkf7PgOy5oepkMyThZ24yy9o
 K3uTHCl0HH+YrvVFWZVrqEVQoMZ3ErQmihHXwZZy/p71MeQ/m8gnT2iEMaOlxX3k
 Efrh9mM6JI+wLCtZLA==
 =kOlX
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - Remove redundant iflink requests, by Sven Eckelmann (2 patches)

 - Don't expect inter-netns unique iflink indices, by Sven Eckelmann

* tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
  batman-adv: Don't expect inter-netns unique iflink indices
  batman-adv: Request iflink once in batadv_get_real_netdevice
  batman-adv: Request iflink once in batadv-on-batadv check
====================

Link: https://lore.kernel.org/r/20220302163049.101957-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 21:53:35 -08:00
Jakub Kicinski
95749c1033 Three more fixes:
- fix build issue in iwlwifi, now that I understood
    what's going on there
  - propagate error in iwlwifi/mvm to userspace so it
    can figure out what's happening
  - fix channel switch related updates in P2P-client
    in cfg80211
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmIf5NEACgkQB8qZga/f
 l8T8fA/7BakwQhXoWHd/czGHP+/YSLoXTlXIOTLxiTdle+HXwjpvNUsfCHkp0ZDd
 vgoePLE/QgIEXPovF1yvKTKNhVbXMEbV6wAmK7rcmn9YdD9EfXawobRXm5+fe13B
 mCjkTBp9xXJ5iRCWN7O9FS7w11mGnyFVQE2gY3pUP/+B5zHgtuKB2WB/x30l2RZm
 j1Yqg7LcEWyU83GIGegrfcLbvJ5bWI7Wg7YOVzzOKL/8H48ktVjcwd4FlM4r+IKC
 HHzTG+iCgPwnT8wi63vEnyeMMkHz6EqZJvNDdsJVBHgWhDkd0FHDshHEPKwtf9wA
 Yi7cRyyAT87OMvW974nJ0e7WcgJ/ISUpKA+mMNmiYkCaZ9eECsID4r03vR6+IxDU
 rtgwSyqy7wizXHPiIcDUbt62Csu69WhaD5XV3tXvwwI2HndNyEH0bAE7F0n3WOnV
 4giVXvy9RkN2XaTAcydtLkGhlKKUNOwauT8tJvF4Qs5Mr/GlEMOEBkY+3qaliXxL
 at0jtcwSi25YXKhspOrBocqflmd7Ne7TWaN1X/FOElv98A8DikcwQlI3ZpnK8wvv
 J3VjAnFIyzdY1p1DWOI4+ufWcETzHFJpHqerm/xDiBvWf+rxDJE1Exk8OR4mzDsl
 c/cap1XknEzoA7BzLKNZy6zxxM/7Jje2A9FKCkUzWrp3psz8cvY=
 =g/JU
 -----END PGP SIGNATURE-----

Merge tag 'wireless-for-net-2022-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Three more fixes:
 - fix build issue in iwlwifi, now that I understood
   what's going on there
 - propagate error in iwlwifi/mvm to userspace so it
   can figure out what's happening
 - fix channel switch related updates in P2P-client
   in cfg80211

* tag 'wireless-for-net-2022-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  iwlwifi: mvm: return value for request_ownership
  nl80211: Update bss channel on channel switch for P2P_CLIENT
  iwlwifi: fix build error for IWLMEI
====================

Link: https://lore.kernel.org/r/20220302214444.100180-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 21:49:57 -08:00
Linus Torvalds
5859a2b199 Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman:
 "Etienne Dechamps recently found a regression caused by enforcing
  RLIMIT_NPROC for root where the rlimit was not previously enforced.

  Michal Koutný had previously pointed out the inconsistency in
  enforcing the RLIMIT_NPROC that had been on the root owned process
  after the root user creates a user namespace.

  Which makes the fix for the regression simply removing the
  inconsistency"

* 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Fix systemd LimitNPROC with private users regression
2022-03-02 16:20:04 -08:00
Linus Torvalds
7e3d76139b ARM further fixes for 5.17-rc:
- Fix kgdb breakpoint for Thumb2
 - Fix dependency for BITREVERSE kconfig
 - Fix nommu early_params and __setup returns
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmIc4SIACgkQ9OeQG+St
 rGTlCQ//XQD4xLnvM2LScGFVOOvoQwOmv77H6jOVrfO1xs8dD0W5mBG3LdgaAKkW
 CnYRb9qF2i+lq1p3ZH9u+5bSX6ttzlRmvwUQB89YM7gkU5AY535gz1nFKScdT932
 FNftd1h4FJXvdOsVQM3MnwTNFtp3YodkkkNzKS8PkMxSuvQffMXBMo8cTpkkIF+M
 Eq/QRGIavreFqsI7UtN24j1FkDlBGYrVT8aGHwfyYRCIiFw6InaCpZ1eElJl0xdH
 v80h1ihYqIfLgkHH3Bkk8edsNoosJII5B67n1t1ZdkNBKEiPR5tLa5IMmEv2Dy07
 ufUvU1dullN5KXLQzD/8H4BZMGR1m0tDKWqCt1x1wcug/a1R0xPuO5QEOKXU0HpW
 wegV8ueYmGqAN5HN1iRpNctCJSos+qPZYuDDevJMnXjQsDRamUqUy/0V/rgc7qKE
 yzBfzgKM+Vhn5bKhtXu09Z6xAwVa0wknsJ+NF++EbukAW/WK5m3ck1Z0Ab6e3C1i
 phlnCIH083yejpxuoQMxDaGDhWwEE+a9R63BUPUxmdBIxrVc2yZLo+BUDJxaDh8n
 GcsiFnrsziwJIRlL0FsEWh1PbwWd8xhfHCBV7qbRDZ98RfDyjrajJrl9eK9u9pT+
 nUZKTC6Y+v6N3qfGJzvTHgjhOAA+crfgHcgDGZthoz3UJ0A1Tk0=
 =cSgl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

 - Fix kgdb breakpoint for Thumb2

 - Fix dependency for BITREVERSE kconfig

 - Fix nommu early_params and __setup returns

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
  ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE
  ARM: Fix kgdb breakpoint for Thumb2
2022-03-02 16:11:56 -08:00
Qiang Yu
f1ef17011c drm/amdgpu: fix suspend/resume hang regression
Regression has been reported that suspend/resume may hang with
the previous vm ready check commit.

So bring back the evicted list check as a temp fix.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922
Fixes: c1a66c3bc4 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:36:43 -05:00
Andy Shevchenko
9ed331f8a0 auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
While it might work, the current approach is fragile in a few ways:
- whenever members in the structure are shuffled, the pointer will be wrong
- the resource freeing may include more than covered by kfree()

Fix this by using charlcd_free() call instead of kfree().

Fixes: 8c9108d014 ("auxdisplay: add a driver for lcd2s character display")
Cc: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-03-03 00:30:31 +01:00
Andy Shevchenko
898c0a1542 auxdisplay: lcd2s: Fix memory leak in ->remove()
Once allocated the struct lcd2s_data is never freed.
Fix the memory leak by switching to devm_kzalloc().

Fixes: 8c9108d014 ("auxdisplay: add a driver for lcd2s character display")
Cc: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-03-03 00:30:14 +01:00
Andy Shevchenko
4424c35ead auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
It seems that the lcd2s_redefine_char() has never been properly
tested. The buffer is filled by DEF_CUSTOM_CHAR command followed
by the character number (from 0 to 7), but immediately after that
these bytes are rewritten by the decoded hex stream.

Fix the index to fill the buffer after the command and number.

Fixes: 8c9108d014 ("auxdisplay: add a driver for lcd2s character display")
Cc: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
[fixed typo in commit message]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-03-03 00:29:26 +01:00
Emmanuel Grumbach
e6e91ec966 iwlwifi: mvm: return value for request_ownership
Propagate the value to the user space so it can understand
if the operation failed or not.

Fixes: bfcfdb59b6 ("iwlwifi: mvm: add vendor commands needed for iwlmei")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://lore.kernel.org/r/20220302072715.4885-1-emmanuel.grumbach@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-02 22:37:25 +01:00
Sreeramya Soratkal
e50b88c4f0 nl80211: Update bss channel on channel switch for P2P_CLIENT
The wdev channel information is updated post channel switch only for
the station mode and not for the other modes. Due to this, the P2P client
still points to the old value though it moved to the new channel
when the channel change is induced from the P2P GO.

Update the bss channel after CSA channel switch completion for P2P client
interface as well.

Signed-off-by: Sreeramya Soratkal <quic_ssramya@quicinc.com>
Link: https://lore.kernel.org/r/1646114600-31479-1-git-send-email-quic_ssramya@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-02 22:37:05 +01:00
Randy Dunlap
875ad06015 iwlwifi: fix build error for IWLMEI
When CONFIG_IWLWIFI=m and CONFIG_IWLMEI=y, the kernel build system
must be told to build the iwlwifi/ subdirectory for both IWLWIFI and
IWLMEI so that builds for both =y and =m are done.

This resolves an undefined reference build error:

ERROR: modpost: "iwl_mei_is_connected" [drivers/net/wireless/intel/iwlwifi/iwlwifi.ko] undefined!

Fixes: 977df8bd58 ("wlwifi: work around reverse dependency on MEI")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: linux-wireless@vger.kernel.org
Link: https://lore.kernel.org/r/20220227200051.7176-1-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-02 22:36:49 +01:00
Linus Torvalds
92ebf5f91b Change since last update:
- Fix ztailpacking z_idataoff getting trimmed on > 4GiB filesystems.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYh98dxEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBJ5NAQDb7o+5L9lnmxyVnx4UvPnj4k0JKnNEqO7Q
 iCuVn31XQAD/Six/UARs13Tbvinvoigb1CWxSrLhh3QDumWwDl8l7Ak=
 =906m
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.17-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fix from Gao Xiang:
 "A one-line patch to fix the new ztailpacking feature on > 4GiB
  filesystems because z_idataoff can get trimmed improperly.

  ztailpacking is still a brand new EXPERIMENTAL feature, but it'd be
  better to fix the issue as soon as possible to avoid unnecessary
  backporting.

  Summary:

   - Fix ztailpacking z_idataoff getting trimmed on > 4GiB filesystems"

* tag 'erofs-for-5.17-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix ztailpacking on > 4GiB filesystems
2022-03-02 12:08:36 -08:00
Linus Torvalds
ae5f531d17 Bug fixes for sparse warning, intel port config offset, and a new
mailing list
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEoE9b9c3U2JxX98mqbmZLrHqL0iMFAmH7CEMACgkQbmZLrHqL
 0iM+jg/+Isu/FM2aS2g0jp4k8uLHjkXLbcnd2HktGQKU1ot0yfrySPM0s7eQb9PV
 RUnjuW9WDjr60j8bxmiMAHlDJNRiSGyV7UzntgvHg/PrwqDp8eZOKKfKA7NehQdQ
 riS7EOUfPK8eLxZfEcax5PS9TI/cS+U4HjbsAhZ08vT88PZCxPq2s/3yPYjtMDCA
 QTx14m4roO3Kp+IqHwwapVZO3m+mmBrAEzTuNudfgL0j8WUb9odb9212FEo4a0E1
 BKk3OcYwcn36Cftr8KU4kg96u9k+AuH908pDRGQcIQNSgbqpTxLbaM9j37wWdEG8
 v4M240pmJyG1/49Aj0htFJFmAgGlyeOpyAZXNrqfbiqyvR/e8xUR2QILtqc9gMg/
 TXzWkFy//SQN4dWe5EZT6WJVHR5mtSdGzNq4y35yXMZC6MRT5JKIisrnMJnIS1rA
 up0YiIoahVoV8KF/sXq2ko+kMoEySNXtyAma061NUploKyo0ymXu1XD+nPbGW6Mn
 m7OL/1eX8PQ4Hx2TmtLjE/sTEjPrcvkr0fC3AAhPCh87dvd/pbrz+LuyxQ6zNx6i
 qa1l+MHPhRWjfwdqBkyl5Yt43lit9wtNOAAxMima2VLRdewI7PHOF4+TGGcXPU70
 4uHf4IhgH7rbd3xfHP/sF+OGeE1jYPXyRXqxxaapkVC3pF8kGpc=
 =Isk3
 -----END PGP SIGNATURE-----

Merge tag 'ntb-5.17-bugfixes' of git://github.com/jonmason/ntb

Pull NTB fixes from Jon Mason:
 "Bug fixes for sparse warning, intel port config offset, and a new
  mailing list"

* tag 'ntb-5.17-bugfixes' of git://github.com/jonmason/ntb:
  MAINTAINERS: update mailing list address for NTB subsystem
  ntb: intel: fix port config status offset for SPR
  NTB/msi: Use struct_size() helper in devm_kzalloc()
2022-03-02 11:58:27 -08:00
Jonathan Lemon
90f8f4c0e3 ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
In ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime."), the
ns adjustment was written to the FPGA register, so the clock could
accurately perform adjustments.

However, the adjtime() call passes in a s64, while the clock adjustment
registers use a s32.  When trying to perform adjustments with a large
value (37 sec), things fail.

Examine the incoming delta, and if larger than 1 sec, use the original
(coarse) adjustment method.  If smaller than 1 sec, then allow the
FPGA to fold in the changes over a 1 second window.

Fixes: 6d59d4fa17 ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime.")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20220228203957.367371-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 09:51:21 -08:00
Paolo Bonzini
8d25b7beca KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
kvm_arch_vcpu_ioctl_run is already doing srcu_read_lock/unlock in two
places, namely vcpu_run and post_kvm_run_save, and a third is actually
needed around the call to vcpu->arch.complete_userspace_io to avoid
the following splat:

  WARNING: suspicious RCU usage
  arch/x86/kvm/pmu.c:190 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by CPU 28/KVM/370841:
  #0: ff11004089f280b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x87/0x730 [kvm]
  Call Trace:
   <TASK>
   dump_stack_lvl+0x59/0x73
   reprogram_fixed_counter+0x15d/0x1a0 [kvm]
   kvm_pmu_trigger_event+0x1a3/0x260 [kvm]
   ? free_moved_vector+0x1b4/0x1e0
   complete_fast_pio_in+0x8a/0xd0 [kvm]

This splat is not at all unexpected, since complete_userspace_io callbacks
can execute similar code to vmexits.  For example, SVM with nrips=false
will call into the emulator from svm_skip_emulated_instruction().

While it's tempting to never acquire kvm->srcu for an uninitialized vCPU,
practically speaking there's no penalty to acquiring kvm->srcu "early"
as the KVM_MP_STATE_UNINITIALIZED path is a one-time thing per vCPU.  On
the other hand, seemingly innocuous helpers like kvm_apic_accept_events()
and sync_regs() can theoretically reach code that might access
SRCU-protected data structures, e.g. sync_regs() can trigger forced
existing of nested mode via kvm_vcpu_ioctl_x86_set_vcpu_events().

Reported-by: Like Xu <likexu@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-02 10:55:58 -05:00
Like Xu
c6c937d673 KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
Just like on the optional mmu_alloc_direct_roots() path, once shadow
path reaches "r = -EIO" somewhere, the caller needs to know the actual
state in order to enter error handling and avoid something worse.

Fixes: 4a38162ee9 ("KVM: MMU: load PDPTRs outside mmu_lock")
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220301124941.48412-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-02 10:55:58 -05:00
Filipe Manana
4751dc9962 btrfs: add missing run of delayed items after unlink during log replay
During log replay, whenever we need to check if a name (dentry) exists in
a directory we do searches on the subvolume tree for inode references or
or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY
keys as well, before kernel 5.17). However when during log replay we
unlink a name, through btrfs_unlink_inode(), we may not delete inode
references and dir index keys from a subvolume tree and instead just add
the deletions to the delayed inode's delayed items, which will only be
run when we commit the transaction used for log replay. This means that
after an unlink operation during log replay, if we attempt to search for
the same name during log replay, we will not see that the name was already
deleted, since the deletion is recorded only on the delayed items.

We run delayed items after every unlink operation during log replay,
except at unlink_old_inode_refs() and at add_inode_ref(). This was due
to an overlook, as delayed items should be run after evert unlink, for
the reasons stated above.

So fix those two cases.

Fixes: 0d836392ca ("Btrfs: fix mount failure after fsync due to hard link recreation")
Fixes: 1f250e929a ("Btrfs: fix log replay failure after unlink and link combination")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:53:11 +01:00
Sidong Yang
d4aef1e122 btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
The commit e804861bd4 ("btrfs: fix deadlock between quota disable and
qgroup rescan worker") by Kawasaki resolves deadlock between quota
disable and qgroup rescan worker. But also there is a deadlock case like
it. It's about enabling or disabling quota and creating or removing
qgroup. It can be reproduced in simple script below.

for i in {1..100}
do
    btrfs quota enable /mnt &
    btrfs qgroup create 1/0 /mnt &
    btrfs qgroup destroy 1/0 /mnt &
    btrfs quota disable /mnt &
done

Here's why the deadlock happens:

1) The quota rescan task is running.

2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
   mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
   the quota rescan task to complete.

3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
   the qgroup_ioctl_lock mutex, because it's being held by task A. At that
   point task B is holding a transaction handle for the current transaction.

4) The quota rescan task calls btrfs_commit_transaction(). This results
   in it waiting for all other tasks to release their handles on the
   transaction, but task B is blocked on the qgroup_ioctl_lock mutex
   while holding a handle on the transaction, and that mutex is being held
   by task A, which is waiting for the quota rescan task to complete,
   resulting in a deadlock between these 3 tasks.

To resolve this issue, the thread disabling quota should unlock
qgroup_ioctl_lock before waiting rescan completion. Move
btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.

Fixes: e804861bd4 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:53:04 +01:00
Omar Sandoval
5fd76bf31c btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
We are seeing crashes similar to the following trace:

[38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54
[38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206
[38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14
[38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360
[38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000
[38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800
[38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360
[38.987146] FS:  00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000
[38.988662] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0
[38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[38.992528] Call Trace:
[38.992854]  <TASK>
[38.993148]  btrfs_relocate_chunk+0x27/0xe0 [btrfs]
[38.993941]  btrfs_balance+0x78e/0xea0 [btrfs]
[38.994801]  ? vsnprintf+0x33c/0x520
[38.995368]  ? __kmalloc_track_caller+0x351/0x440
[38.996198]  btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs]
[38.997084]  btrfs_ioctl+0x11b0/0x2da0 [btrfs]
[38.997867]  ? mod_objcg_state+0xee/0x340
[38.998552]  ? seq_release+0x24/0x30
[38.999184]  ? proc_nr_files+0x30/0x30
[38.999654]  ? call_rcu+0xc8/0x2f0
[39.000228]  ? __x64_sys_ioctl+0x84/0xc0
[39.000872]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[39.001973]  __x64_sys_ioctl+0x84/0xc0
[39.002566]  do_syscall_64+0x3a/0x80
[39.003011]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[39.003735] RIP: 0033:0x7f11c166959b
[39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b
[39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003
[39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0
[39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3
[39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001
[39.015040]  </TASK>
[39.015418] ---[ end trace 0000000000000000 ]---
[43.131559] ------------[ cut here ]------------
[43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717!
[43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G        W         5.17.0-rc4 #54
[43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.145686] FS:  00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000
[43.146808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[43.150559] Call Trace:
[43.150904]  <TASK>
[43.151253]  btrfs_finish_extent_commit+0x88/0x290 [btrfs]
[43.152127]  btrfs_commit_transaction+0x74f/0xaa0 [btrfs]
[43.152932]  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
[43.153786]  btrfs_ioctl+0x1edc/0x2da0 [btrfs]
[43.154475]  ? __check_object_size+0x150/0x170
[43.155170]  ? preempt_count_add+0x49/0xa0
[43.155753]  ? __x64_sys_ioctl+0x84/0xc0
[43.156437]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[43.157456]  __x64_sys_ioctl+0x84/0xc0
[43.157980]  do_syscall_64+0x3a/0x80
[43.158543]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[43.159231] RIP: 0033:0x7f7657f1e59b
[43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b
[43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
[43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800
[43.166863]  </TASK>
[43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common
[43.169552] ---[ end trace 0000000000000000 ]---
[43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.180071] FS:  00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000
[43.181073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

We first hit the WARN_ON(rc->block_group->pinned > 0) in
btrfs_relocate_block_group() and then the BUG_ON(!cache) in
unpin_extent_range(). This tells us that we are exiting relocation and
removing the block group with bytes still pinned for that block group.
This is supposed to be impossible: the last thing relocate_block_group()
does is commit the transaction to get rid of pinned extents.

Commit d0c2f4fa55 ("btrfs: make concurrent fsyncs wait less when
waiting for a transaction commit") introduced an optimization so that
commits from fsync don't have to wait for the previous commit to unpin
extents. This was only intended to affect fsync, but it inadvertently
made it possible for any commit to skip waiting for the previous commit
to unpin. This is because if a call to btrfs_commit_transaction() finds
that another thread is already committing the transaction, it waits for
the other thread to complete the commit and then returns. If that other
thread was in fsync, then it completes the commit without completing the
previous commit. This makes the following sequence of events possible:

Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________
btrfs_commit_transaction(N) |                                     |
  btrfs_run_delayed_refs    |                                     |
    pin extents             |                                     |
  ...                       |                                     |
  state = UNBLOCKED         |btrfs_sync_file                      |
                            |  btrfs_start_transaction(N + 1)     |relocate_block_group
                            |                                     |  btrfs_join_transaction(N + 1)
                            |  btrfs_commit_transaction(N + 1)    |
  ...                       |  trans->state = COMMIT_START        |
                            |                                     |  btrfs_commit_transaction(N + 1)
                            |                                     |    wait_for_commit(N + 1, COMPLETED)
                            |  wait_for_commit(N, SUPER_COMMITTED)|
  state = SUPER_COMMITTED   |  ...                                |
  btrfs_finish_extent_commit|                                     |
    unpin_extent_range()    |  trans->state = COMPLETED           |
                            |                                     |    return
                            |                                     |
    ...                     |                                     |Thread 1 isn't done, so pinned > 0
                            |                                     |and we WARN
                            |                                     |
                            |                                     |btrfs_remove_block_group
    unpin_extent_range()    |                                     |
      Thread 3 removed the  |                                     |
      block group, so we BUG|                                     |

There are other sequences involving SUPER_COMMITTED transactions that
can cause a similar outcome.

We could fix this by making relocation explicitly wait for unpinning,
but there may be other cases that need it. Josef mentioned ENOSPC
flushing and the free space cache inode as other potential victims.
Rather than playing whack-a-mole, this fix is conservative and makes all
commits not in fsync wait for all previous transactions, which is what
the optimization intended.

Fixes: d0c2f4fa55 ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:52:46 +01:00
Josef Bacik
b4be6aefa7 btrfs: do not start relocation until in progress drops are done
We hit a bug with a recovering relocation on mount for one of our file
systems in production.  I reproduced this locally by injecting errors
into snapshot delete with balance running at the same time.  This
presented as an error while looking up an extent item

  WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
  CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
  RIP: 0010:lookup_inline_extent_backref+0x647/0x680
  RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
  RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
  R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
  R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
  FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
  Call Trace:
   <TASK>
   insert_inline_extent_backref+0x46/0xd0
   __btrfs_inc_extent_ref.isra.0+0x5f/0x200
   ? btrfs_merge_delayed_refs+0x164/0x190
   __btrfs_run_delayed_refs+0x561/0xfa0
   ? btrfs_search_slot+0x7b4/0xb30
   ? btrfs_update_root+0x1a9/0x2c0
   btrfs_run_delayed_refs+0x73/0x1f0
   ? btrfs_update_root+0x1a9/0x2c0
   btrfs_commit_transaction+0x50/0xa50
   ? btrfs_update_reloc_root+0x122/0x220
   prepare_to_merge+0x29f/0x320
   relocate_block_group+0x2b8/0x550
   btrfs_relocate_block_group+0x1a6/0x350
   btrfs_relocate_chunk+0x27/0xe0
   btrfs_balance+0x777/0xe60
   balance_kthread+0x35/0x50
   ? btrfs_balance+0xe60/0xe60
   kthread+0x16b/0x190
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x22/0x30
   </TASK>

Normally snapshot deletion and relocation are excluded from running at
the same time by the fs_info->cleaner_mutex.  However if we had a
pending balance waiting to get the ->cleaner_mutex, and a snapshot
deletion was running, and then the box crashed, we would come up in a
state where we have a half deleted snapshot.

Again, in the normal case the snapshot deletion needs to complete before
relocation can start, but in this case relocation could very well start
before the snapshot deletion completes, as we simply add the root to the
dead roots list and wait for the next time the cleaner runs to clean up
the snapshot.

Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
had a pending drop_progress key.  If they do then we know we were in the
middle of the drop operation and set a flag on the fs_info.  Then
balance can wait until this flag is cleared to start up again.

If there are DEAD_ROOT's that don't have a drop_progress set then we're
safe to start balance right away as we'll be properly protected by the
cleaner_mutex.

CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:52:39 +01:00
Su Yue
a6ab66eb85 btrfs: tree-checker: use u64 for item data end to avoid overflow
User reported there is an array-index-out-of-bounds access while
mounting the crafted image:

  [350.411942 ] loop0: detected capacity change from 0 to 262144
  [350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
  [350.428564 ] BTRFS info (device loop0): disk space caching is enabled
  [350.428568 ] BTRFS info (device loop0): has skinny extents
  [350.429589 ]
  [350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
  [350.429636 ] index 1048096 is out of range for type 'page *[16]'
  [350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
  [350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
  [350.429772 ] Call Trace:
  [350.429774 ]  <TASK>
  [350.429776 ]  dump_stack_lvl+0x47/0x5c
  [350.429780 ]  ubsan_epilogue+0x5/0x50
  [350.429786 ]  __ubsan_handle_out_of_bounds+0x66/0x70
  [350.429791 ]  btrfs_get_16+0xfd/0x120 [btrfs]
  [350.429832 ]  check_leaf+0x754/0x1a40 [btrfs]
  [350.429874 ]  ? filemap_read+0x34a/0x390
  [350.429878 ]  ? load_balance+0x175/0xfc0
  [350.429881 ]  validate_extent_buffer+0x244/0x310 [btrfs]
  [350.429911 ]  btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
  [350.429935 ]  end_bio_extent_readpage+0x3af/0x850 [btrfs]
  [350.429969 ]  ? newidle_balance+0x259/0x480
  [350.429972 ]  end_workqueue_fn+0x29/0x40 [btrfs]
  [350.429995 ]  btrfs_work_helper+0x71/0x330 [btrfs]
  [350.430030 ]  ? __schedule+0x2fb/0xa40
  [350.430033 ]  process_one_work+0x1f6/0x400
  [350.430035 ]  ? process_one_work+0x400/0x400
  [350.430036 ]  worker_thread+0x2d/0x3d0
  [350.430037 ]  ? process_one_work+0x400/0x400
  [350.430038 ]  kthread+0x165/0x190
  [350.430041 ]  ? set_kthread_struct+0x40/0x40
  [350.430043 ]  ret_from_fork+0x1f/0x30
  [350.430047 ]  </TASK>
  [350.430047 ]
  [350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2

btrfs check reports:
  corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
  item end, have 4294971193 expect 3897

The first slot item offset is 4293005033 and the size is 1966160.
In check_leaf, we use btrfs_item_end() to check item boundary versus
extent_buffer data size. However, return type of btrfs_item_end() is u32.
(u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
equals to leaf data size reasonably.

Fix it by use u64 variable to store item data end in check_leaf() to
avoid u32 overflow.

This commit does solve the invalid memory access showed by the stack
trace.  However, its metadata profile is DUP and another copy of the
leaf is fine.  So the image can be mounted successfully. But when umount
is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
because the only node in extent tree has 0 item and invalid owner. It's
solved by another commit
"btrfs: check extent buffer owner against the owner rootid".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:52:32 +01:00
Josef Bacik
a50e1fcbc9 btrfs: do not WARN_ON() if we have PageError set
Whenever we do any extent buffer operations we call
assert_eb_page_uptodate() to complain loudly if we're operating on an
non-uptodate page.  Our overnight tests caught this warning earlier this
week

  WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
  CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G        W         5.17.0-rc3+ #564
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Workqueue: btrfs-cache btrfs_work_helper
  RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
  RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
  RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
  RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
  RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
  R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
  R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
  FS:  0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
  Call Trace:

   extent_buffer_test_bit+0x3f/0x70
   free_space_test_bit+0xa6/0xc0
   load_free_space_tree+0x1f6/0x470
   caching_thread+0x454/0x630
   ? rcu_read_lock_sched_held+0x12/0x60
   ? rcu_read_lock_sched_held+0x12/0x60
   ? rcu_read_lock_sched_held+0x12/0x60
   ? lock_release+0x1f0/0x2d0
   btrfs_work_helper+0xf2/0x3e0
   ? lock_release+0x1f0/0x2d0
   ? finish_task_switch.isra.0+0xf9/0x3a0
   process_one_work+0x26d/0x580
   ? process_one_work+0x580/0x580
   worker_thread+0x55/0x3b0
   ? process_one_work+0x580/0x580
   kthread+0xf0/0x120
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x1f/0x30

This was partially fixed by c2e3930529 ("btrfs: clear extent buffer
uptodate when we fail to write it"), however all that fix did was keep
us from finding extent buffers after a failed writeout.  It didn't keep
us from continuing to use a buffer that we already had found.

In this case we're searching the commit root to cache the block group,
so we can start committing the transaction and switch the commit root
and then start writing.  After the switch we can look up an extent
buffer that hasn't been written yet and start processing that block
group.  Then we fail to write that block out and clear Uptodate on the
page, and then we start spewing these errors.

Normally we're protected by the tree lock to a certain degree here.  If
we read a block we have that block read locked, and we block the writer
from locking the block before we submit it for the write.  However this
isn't necessarily fool proof because the read could happen before we do
the submit_bio and after we locked and unlocked the extent buffer.

Also in this particular case we have path->skip_locking set, so that
won't save us here.  We'll simply get a block that was valid when we
read it, but became invalid while we were using it.

What we really want is to catch the case where we've "read" a block but
it's not marked Uptodate.  On read we ClearPageError(), so if we're
!Uptodate and !Error we know we didn't do the right thing for reading
the page.

Fix this by checking !Uptodate && !Error, this way we will not complain
if our buffer gets invalidated while we're using it, and we'll maintain
the spirit of the check which is to make sure we have a fully in-cache
block while we're messing with it.

CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:52:24 +01:00
Filipe Manana
d994788743 btrfs: fix lost prealloc extents beyond eof after full fsync
When doing a full fsync, if we have prealloc extents beyond (or at) eof,
and the leaves that contain them were not modified in the current
transaction, we end up not logging them. This results in losing those
extents when we replay the log after a power failure, since the inode is
truncated to the current value of the logged i_size.

Just like for the fast fsync path, we need to always log all prealloc
extents starting at or beyond i_size. The fast fsync case was fixed in
commit 471d557afe ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay") but it missed the full fsync path. The problem
exists since the very early days, when the log tree was added by
commit e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize
synchronous operations").

Example reproducer:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  # Create our test file with many file extent items, so that they span
  # several leaves of metadata, even if the node/page size is 64K. Use
  # direct IO and not fsync/O_SYNC because it's both faster and it avoids
  # clearing the full sync flag from the inode - we want the fsync below
  # to trigger the slow full sync code path.
  $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo

  # Now add two preallocated extents to our file without extending the
  # file's size. One right at i_size, and another further beyond, leaving
  # a gap between the two prealloc extents.
  $ xfs_io -c "falloc -k 16M 1M" /mnt/foo
  $ xfs_io -c "falloc -k 20M 1M" /mnt/foo

  # Make sure everything is durably persisted and the transaction is
  # committed. This makes all created extents to have a generation lower
  # than the generation of the transaction used by the next write and
  # fsync.
  sync

  # Now overwrite only the first extent, which will result in modifying
  # only the first leaf of metadata for our inode. Then fsync it. This
  # fsync will use the slow code path (inode full sync bit is set) because
  # it's the first fsync since the inode was created/loaded.
  $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo

  # Extent list before power failure.
  $ xfs_io -c "fiemap -v" /mnt/foo
  /mnt/foo:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..7]:          2178048..2178055     8   0x0
     1: [8..16383]:      26632..43007     16376   0x0
     2: [16384..32767]:  2156544..2172927 16384   0x0
     3: [32768..34815]:  2172928..2174975  2048 0x800
     4: [34816..40959]:  hole              6144
     5: [40960..43007]:  2174976..2177023  2048 0x801

  <power fail>

  # Mount fs again, trigger log replay.
  $ mount /dev/sdc /mnt

  # Extent list after power failure and log replay.
  $ xfs_io -c "fiemap -v" /mnt/foo
  /mnt/foo:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..7]:          2178048..2178055     8   0x0
     1: [8..16383]:      26632..43007     16376   0x0
     2: [16384..32767]:  2156544..2172927 16384   0x1

  # The prealloc extents at file offsets 16M and 20M are missing.

So fix this by calling btrfs_log_prealloc_extents() when we are doing a
full fsync, so that we always log all prealloc extents beyond eof.

A test case for fstests will follow soon.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:51:55 +01:00
Qu Wenruo
c992fa1fd5 btrfs: subpage: fix a wrong check on subpage->writers
[BUG]
When looping btrfs/074 with 64K page size and 4K sectorsize, there is a
low chance (1/50~1/100) to crash with the following ASSERT() triggered
in btrfs_subpage_start_writer():

	ret = atomic_add_return(nbits, &subpage->writers);
	ASSERT(ret == nbits); <<< This one <<<

[CAUSE]
With more debugging output on the parameters of
btrfs_subpage_start_writer(), it shows a very concerning error:

  ret=29 nbits=13 start=393216 len=53248

For @nbits it's correct, but @ret which is the returned value from
atomic_add_return(), it's not only larger than nbits, but also larger
than max sectors per page value (for 64K page size and 4K sector size,
it's 16).

This indicates that some call sites are not properly decreasing the value.

And that's exactly the case, in btrfs_page_unlock_writer(), due to the
fact that we can have page locked either by lock_page() or
process_one_page(), we have to check if the subpage has any writer.

If no writers, it's locked by lock_page() and we only need to unlock it.

But unfortunately the check for the writers are completely opposite:

	if (atomic_read(&subpage->writers))
		/* No writers, locked by plain lock_page() */
		return unlock_page(page);

We directly unlock the page if it has writers, which is the completely
opposite what we want.

Thankfully the affected call site is only limited to
extent_write_locked_range(), so it's mostly affecting compressed write.

[FIX]
Just fix the wrong check condition to fix the bug.

Fixes: e55a0de185 ("btrfs: rework page locking in __extent_writepage()")
CC: stable@vger.kernel.org # 5.16
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02 16:51:39 +01:00
Gao Xiang
22ba5e99b9 erofs: fix ztailpacking on > 4GiB filesystems
z_idataoff here is an absolute physical offset, so it should use
erofs_off_t (64 bits at least). Otherwise, it'll get trimmed and
cause the decompresion failure.

Link: https://lore.kernel.org/r/20220222033118.20540-1-hsiangkao@linux.alibaba.com
Fixes: ab92184ff8 ("erofs: add on-disk compressed tail-packing inline support")
Reviewed-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-03-02 21:58:45 +08:00
Zhen Ni
0aa6b294b3 ALSA: intel_hdmi: Fix reference to PCM buffer address
PCM buffers might be allocated dynamically when the buffer
preallocation failed or a larger buffer is requested, and it's not
guaranteed that substream->dma_buffer points to the actually used
buffer.  The driver needs to refer to substream->runtime->dma_addr
instead for the buffer address.

Signed-off-by: Zhen Ni <nizhen@uniontech.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220302074241.30469-1-nizhen@uniontech.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-03-02 09:25:37 +01:00
Sven Eckelmann
6c1f41afc1 batman-adv: Don't expect inter-netns unique iflink indices
The ifindex doesn't have to be unique for multiple network namespaces on
the same machine.

  $ ip netns add test1
  $ ip -net test1 link add dummy1 type dummy
  $ ip netns add test2
  $ ip -net test2 link add dummy2 type dummy

  $ ip -net test1 link show dev dummy1
  6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
  $ ip -net test2 link show dev dummy2
  6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff

But the batman-adv code to walk through the various layers of virtual
interfaces uses this assumption because dev_get_iflink handles it
internally and doesn't return the actual netns of the iflink. And
dev_get_iflink only documents the situation where ifindex == iflink for
physical devices.

But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
because ipoib_get_iflink implements it even when it sometimes returns an
iflink != ifindex and sometimes iflink == ifindex. The caller must
therefore make sure itself to check both netns and iflink + ifindex for
equality. Only when they are equal, a "physical" interface was detected
which should stop the traversal. On the other hand, vxcan_get_iflink can
also return 0 in case there was currently no valid peer. In this case, it
is still necessary to stop.

Fixes: b7eddd0b39 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Fixes: 5ed4a460a1 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:24:55 +01:00
Sven Eckelmann
6116ba0942 batman-adv: Request iflink once in batadv_get_real_netdevice
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_get_real_netdevice. And since some of the
ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.

Fixes: 5ed4a460a1 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:01:28 +01:00
Sven Eckelmann
690bb6fb64 batman-adv: Request iflink once in batadv-on-batadv check
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_is_on_batman_iface. And since some of the
.ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.

Fixes: b7eddd0b39 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02 09:01:25 +01:00
Hans de Goede
04b7762e37 Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Before these changes elan_suspend() would only disable the regulator
when device_may_wakeup() returns false; whereas elan_resume() would
unconditionally enable it, leading to an enable count imbalance when
device_may_wakeup() returns true.

This triggers the "WARN_ON(regulator->enable_count)" in regulator_put()
when the elan_i2c driver gets unbound, this happens e.g. with the
hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1.

Fix this by making the regulator_enable() call also be conditional
on device_may_wakeup() returning false.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220131135436.29638-2-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-03-01 20:41:22 -08:00
Hans de Goede
81a36d8ce5 Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
elan_disable_power() is called conditionally on suspend, where as
elan_enable_power() is always called on resume. This leads to
an imbalance in the regulator's enable count.

Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power()
in preparation of fixing this.

No functional changes intended.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220131135436.29638-1-hdegoede@redhat.com
[dtor: consolidate elan_[en|dis]able() into elan_set_power()]
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-03-01 20:41:20 -08:00
Steven Rostedt (Google)
1d1898f656 tracing/histogram: Fix sorting on old "cpu" value
When trying to add a histogram against an event with the "cpu" field, it
was impossible due to "cpu" being a keyword to key off of the running CPU.
So to fix this, it was changed to "common_cpu" to match the other generic
fields (like "common_pid"). But since some scripts used "cpu" for keying
off of the CPU (for events that did not have "cpu" as a field, which is
most of them), a backward compatibility trick was added such that if "cpu"
was used as a key, and the event did not have "cpu" as a field name, then
it would fallback and switch over to "common_cpu".

This fix has a couple of subtle bugs. One was that when switching over to
"common_cpu", it did not change the field name, it just set a flag. But
the code still found a "cpu" field. The "cpu" field is used for filtering
and is returned when the event does not have a "cpu" field.

This was found by:

  # cd /sys/kernel/tracing
  # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
  # cat events/sched/sched_wakeup/hist

Which showed the histogram unsorted:

{ cpu:         19, pid:       1175 } hitcount:          1
{ cpu:          6, pid:        239 } hitcount:          2
{ cpu:         23, pid:       1186 } hitcount:         14
{ cpu:         12, pid:        249 } hitcount:          2
{ cpu:          3, pid:        994 } hitcount:          5

Instead of hard coding the "cpu" checks, take advantage of the fact that
trace_event_field_field() returns a special field for "cpu" and "CPU" if
the event does not have "cpu" as a field. This special field has the
"filter_type" of "FILTER_CPU". Check that to test if the returned field is
of the CPU type instead of doing the string compare.

Also, fix the sorting bug by testing for the hist_field flag of
HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
the special CPU field to know what compare routine to use, and since that
special field does not have a size, it returns tracing_map_cmp_none.

Cc: stable@vger.kernel.org
Fixes: 1e3bac71c5 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-01 22:48:30 -05:00
Vladimir Oltean
0b0e2ff103 net: dsa: restore error path of dsa_tree_change_tag_proto
When the DSA_NOTIFIER_TAG_PROTO returns an error, the user space process
which initiated the protocol change exits the kernel processing while
still holding the rtnl_mutex. So any other process attempting to lock
the rtnl_mutex would deadlock after such event.

The error handling of DSA_NOTIFIER_TAG_PROTO was inadvertently changed
by the blamed commit, introducing this regression. We must still call
rtnl_unlock(), and we must still call DSA_NOTIFIER_TAG_PROTO for the old
protocol. The latter is due to the limiting design of notifier chains
for cross-chip operations, which don't have a built-in error recovery
mechanism - we should look into using notifier_call_chain_robust for that.

Fixes: dc452a471d ("net: dsa: introduce tagger-owned storage for private and shared data")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220228141715.146485-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 18:26:21 -08:00
Jakub Kicinski
2e77551c61 bluetooth pull request for net:
- Fix regression with scanning not working in some systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmIevJQZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKTWBD/0ZN7d1sClyVmH/ymc5l+77
 8HXYPCbhDez1Tix4m2cS5demyAJj+T1YQE5tAmtiTZT2NyWphWNgmLOKC4VLTjmu
 YVSjyvJfn7lWgBFPuv2G8XatqELnsYNaq86urPmxHpwohbNFsT4IVPMWgUicPEjj
 iFLU6MY2Pskkinh72FaRHmOGpDN8v7KIL4RnFbR1DZq2fNLVYTVexQcmAsqOIoP5
 2GLuTkmXnYlTplbqbvJMxFbUBOQO8MBaQlR2+n7nEctZ9NhhmyrayrCoCQ8PxLEQ
 9zhgzJFtgpJ4Nr45TOCBPZ+hBk3hNaqYM42stJ+3nydYKupSNu4ccCiJ181FxNuy
 R2YweVz+/1MkTbhFAuhxOLh0QQ3JWlKCRPqduT6q/VxyTR19hOrGP2eIiPxHr8k6
 b1u0i4ExgyjdJJo29Mw4UqFs2mGXDYo8FvVz0FV5xabfyto2QuFi4dV1V7NxKsCI
 XSrpgPBntOR/WRV36bO+64NknEGIwyfg/BBTAu7kQvVfGH9mDKX26C3UxfBr2/s+
 pK7bRALyYpVceI1rZA9kHFVCrX4iLSkUyTNF9voj0khifnRaRR7xfWumdG8PEJpf
 2kE9Bn4Vi4czz3kwLK4AXsoLABHI3aiyt30CMw7xCqoDD1FSzebbfzaJDKdelrgJ
 BUnmOMQgkj6OhMlMz/THvg==
 =qugn
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix regression with scanning not working in some systems.

* tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: Fix not checking MGMT cmd pending queue
====================

Link: https://lore.kernel.org/r/20220302004330.125536-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 17:16:46 -08:00
Brian Gix
275f3f6487 Bluetooth: Fix not checking MGMT cmd pending queue
A number of places in the MGMT handlers we examine the command queue for
other commands (in progress but not yet complete) that will interact
with the process being performed. However, not all commands go into the
queue if one of:

1. There is no negative side effect of consecutive or redundent commands
2. The command is entirely perform "inline".

This change examines each "pending command" check, and if it is not
needed, deletes the check. Of the remaining pending command checks, we
make sure that the command is in the pending queue by using the
mgmt_pending_add/mgmt_pending_remove pair rather than the
mgmt_pending_new/mgmt_pending_free pair.

Link: https://lore.kernel.org/linux-bluetooth/f648f2e11bb3c2974c32e605a85ac3a9fac944f1.camel@redhat.com/T/
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-03-01 16:10:58 -08:00
Jakub Kicinski
4761df52f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Use kfree_rcu(ptr, rcu) variant, using kfree_rcu(ptr) was not
   intentional. From Eric Dumazet.

2) Use-after-free in netfilter hook core, from Eric Dumazet.

3) Missing rcu read lock side for netfilter egress hook,
   from Florian Westphal.

4) nf_queue assume state->sk is full socket while it might not be.
   Invoke sock_gen_put(), from Florian Westphal.

5) Add selftest to exercise the reported KASAN splat in 4)

6) Fix possible use-after-free in nf_queue in case sk_refcnt is 0.
   Also from Florian.

7) Use input interface index only for hardware offload, not for
   the software plane. This breaks tc ct action. Patch from Paul Blakey.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
  netfilter: nf_queue: handle socket prefetch
  netfilter: nf_queue: fix possible use-after-free
  selftests: netfilter: add nfqueue TCP_NEW_SYN_RECV socket race test
  netfilter: nf_queue: don't assume sk is full socket
  netfilter: egress: silence egress hook lockdep splats
  netfilter: fix use-after-free in __nf_register_net_hook()
  netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
====================

Link: https://lore.kernel.org/r/20220301215337.378405-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01 15:13:47 -08:00
Paul Blakey
db6140e5e3 net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
After cited commit optimizted hw insertion, flow table entries are
populated with ifindex information which was intended to only be used
for HW offload. This tuple ifindex is hashed in the flow table key, so
it must be filled for lookup to be successful. But tuple ifindex is only
relevant for the netfilter flowtables (nft), so it's not filled in
act_ct flow table lookup, resulting in lookup failure, and no SW
offload and no offload teardown for TCP connection FIN/RST packets.

To fix this, add new tc ifindex field to tuple, which will
only be used for offloading, not for lookup, as it will not be
part of the tuple hash.

Fixes: 9795ded7f9 ("net/sched: act_ct: Fill offloading tuple iifidx")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-01 22:08:31 +01:00
Linus Torvalds
fb184c4af9 The bigger part of the change is a revert for x86 hosts. Here the
second patch was supposed to fix the first, but in reality it was
 just as broken, so both have to go.
 
 x86 host:
 
 * Revert incorrect assumption that cr3 changes come with preempt notifier
   callbacks (they don't when static branches are changed, for example)
 
 ARM host:
 
 * Correctly synchronise PMR and co on PSCI CPU_SUSPEND
 
 * Skip tests that depend on GICv3 when the HW isn't available
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIeGnUUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMCYgf9GPVUOQUbHVxVqvKB1ABnY3ZIZuS3
 +/XLgVifSmXb2sPQmcKIPk7eQkxlzpnVdbznJO5qFMtVKRv/ppj+/ly2wwF8l+rR
 m1XvyYo2sukK5vTpBrQiRm3aWY7vpx0ds4DStLnrnBuPF/U7x6WlHSL/BqXaNcSJ
 e+SXd/UFhkg7dEQaU3eqXyf2/mMfR2ZLdUb4v+/UiV7kfzzvRqNERd8HUoVk2FcM
 VYBr07ChaV4XB/dZsCDVSz2Z7f7rH3sMMW82ZHKjuFUEW4Dij9NiX2ycaeRvkSLG
 tnliTuROCY2bOQeIVCTHf5XqCAAm7sA1AoClFaUy30+UW9s9j45NuhUQbA==
 =nuHK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "The bigger part of the change is a revert for x86 hosts. Here the
  second patch was supposed to fix the first, but in reality it was just
  as broken, so both have to go.

  x86 host:

   - Revert incorrect assumption that cr3 changes come with preempt
     notifier callbacks (they don't when static branches are changed,
     for example)

  ARM host:

   - Correctly synchronise PMR and co on PSCI CPU_SUSPEND

   - Skip tests that depend on GICv3 when the HW isn't available"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: aarch64: Skip tests if we can't create a vgic-v3
  Revert "KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()"
  Revert "KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()"
  KVM: arm64: Don't miss pending interrupts for suspended vCPU
2022-03-01 12:01:18 -08:00
Heiko Carstens
c194dad210 s390/extable: fix exception table sorting
s390 has a swap_ex_entry_fixup function, however it is not being used
since common code expects a swap_ex_entry_fixup define. If it is not
defined the default implementation will be used. So fix this by adding
a proper define.
However also the implementation of the function must be fixed, since a
NULL value for handler has a special meaning and must not be adjusted.

Luckily all of this doesn't fix a real bug currently: the main extable
is correctly sorted during build time, and for runtime sorting there
is currently no case where the handler field is not NULL.

Fixes: 05a68e892e ("s390/kernel: expand exception table logic to allow new handling options")
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 20:41:28 +01:00
Heiko Carstens
1389f17937 s390/ftrace: fix arch_ftrace_get_regs implementation
arch_ftrace_get_regs is supposed to return a struct pt_regs pointer
only if the pt_regs structure contains all register contents, which
means it must have been populated when created via ftrace_regs_caller.

If it was populated via ftrace_caller the contents are not complete
(the psw mask part is missing), and therefore a NULL pointer needs be
returned.

The current code incorrectly always returns a struct pt_regs pointer.

Fix this by adding another pt_regs flag which indicates if the
contents are complete, and fix arch_ftrace_get_regs accordingly.

Fixes: 894979689d ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 20:41:28 +01:00
Heiko Carstens
9fa881f7e3 s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
ftrace_caller was used for both ftrace_caller and ftrace_regs_caller,
which means that the target address of the hotpatch trampoline was
never updated.

With commit 894979689d ("s390/ftrace: provide separate
ftrace_caller/ftrace_regs_caller implementations") a separate
ftrace_regs_caller entry point was implemeted, however it was
forgotten to implement the necessary changes for ftrace_modify_call
and ftrace_make_call, where the branch target has to be modified
accordingly.

Therefore add the missing code now.

Fixes: 894979689d ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 20:41:28 +01:00
Alexander Egorenkov
6b4b54c7ca s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
We need to preserve the values at OLDMEM_BASE and OLDMEM_SIZE which are
used by zgetdump in case when kdump crashes. In that case zgetdump will
attempt to read OLDMEM_BASE and OLDMEM_SIZE in order to find out where
the memory range [0 - OLDMEM_SIZE] belonging to the production kernel is.

Fixes: f1a5469474 ("s390/setup: don't reserve memory that occupied decompressor's head")
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01 20:41:28 +01:00
Manasi Navare
62929726ef drm/vrr: Set VRR capable prop only if it is attached to connector
VRR capable property is not attached by default to the connector
It is attached only if VRR is supported.
So if the driver tries to call drm core set prop function without
it being attached that causes NULL dereference.

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225013055.9282-1-manasi.d.navare@intel.com
2022-03-01 11:37:21 -08:00
Linus Torvalds
5751153606 binfmt_elf fix for v5.17-rc7
- Fix ia64 ET_EXEC loading
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmIeZoEWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhinD/4mOyZ8VaCYblv9Jh/RaYOK/cjh
 LfiIkXPr+FiieTB1OJztd7gp9wbxfL3YhhfYGMf+309cio3qvosWpINVYpy3HzdF
 PYvuW0TBASCNPhFebqRXS955wkbWeiyp4ED/1wEi/i5oA+zX/HGe6/YRn7uveFp7
 nZqBKNBwJG14QjqX/drNEUgTgAgIbcIlt48heMxfe/imD98QCLR46X9gL48kM2Fz
 plKHcIqqg3zOWXvs0RqitaChYqoQ7wgWoWTlk8drjIGKVaMieLY/ZSEvSNmxPhW3
 /hjfvpT/DiLXssomVo8qWlcl/IyqixvhhKayCsrMAwpzb+9FwtmRGX3jqYrScWPx
 OJzX8IdHxkij0lUMYavZK7Ed5i0rXMFEttRbTbwxJxWc8jXIBdyWbIAw2vbM21So
 6PQw5OlCwTZhcCS1ZPNYB2jLmhHIs6mFGPcweAVMLcEIEG1nPkkgkO3zQLEcjmH7
 eujWR9tYGKokE1qUbj+k/alyNxV9c1DmjrE9OVph+PWA7oj8PvlL2qtV6Uj7iLF7
 ps255EsyW8pNTA2315TQpIukba0LMQ2bgWi+2mpMU9FMnQvYfm1JIwr0YV7EvFAk
 Kkd5U9tu345O/R2uIUhNhwnxb+clCDLKoA5rZ+RnMBHc5O2iDA8nTUewPbmpZQWk
 StVSktb4TpqUPDOtpQ==
 =3ZcP
 -----END PGP SIGNATURE-----

Merge tag 'binfmt_elf-v5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull binfmt_elf fix from Kees Cook:
 "This addresses a regression[1] under ia64 where some ET_EXEC binaries
  were not loading"

Link: https://linux-regtracking.leemhuis.info/regzbot/regression/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info/ [1]

- Fix ia64 ET_EXEC loading

* tag 'binfmt_elf-v5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_elf: Avoid total_mapping_size for ET_EXEC
2022-03-01 11:31:37 -08:00
Kees Cook
439a846824 binfmt_elf: Avoid total_mapping_size for ET_EXEC
Partially revert commit 5f501d5556 ("binfmt_elf: reintroduce using
MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size"
logic also to ET_EXEC.

At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address
contiguous (but _are_ file-offset contiguous). This would result in a
giant mapping attempting to cover the entire span, including the virtual
address range hole, and well beyond the size of the ELF file itself,
causing the kernel to refuse to load it. For example:

$ readelf -lW /usr/bin/gcc
...
Program Headers:
  Type Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   ...
...
  LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ...
  LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ...
...
       ^^^^^^^^ ^^^^^^^^^^^^^^^^^^                    ^^^^^^^^ ^^^^^^^^

File offset range     : 0x000000-0x00bb4c
			0x00bb4c bytes

Virtual address range : 0x4000000000000000-0x600000000000bcb0
			0x200000000000bcb0 bytes

Remove the total_mapping_size logic for ET_EXEC, which reduces the
ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better
than nothing), and retains it for ET_DYN.

Ironically, this is the reverse of the problem that originally caused
problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future
work could restore full coverage if load_elf_binary() were to perform
mappings in a separate phase from the loading (where it could resolve
both overlaps and holes).

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Reported-by: matoro <matoro_bugzilla_kernel@matoro.tk>
Fixes: 5f501d5556 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE")
Link: https://lore.kernel.org/r/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info
Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Link: https://lore.kernel.org/lkml/ce8af9c13bcea9230c7689f3c1e0e2cd@matoro.tk
Tested-By: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/lkml/49182d0d-708b-4029-da5f-bc18603440a6@physik.fu-berlin.de
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-03-01 10:29:20 -08:00
Nicolas Cavallari
5838a14832 thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
Do not call get_trip_hyst() from thermal_genl_cmd_tz_get_trip() if
the thermal zone does not define one.

Fixes: 1ce50e7d40 ("thermal: core: genetlink support for events/cmd/sampling")
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-01 16:11:38 +01:00
David S. Miller
b8d06ce712 Some last-minute fixes:
* rfkill
    - add missing rfill_soft_blocked() when disabled
 
  * cfg80211
    - handle a nla_memdup() failure correctly
    - fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in
      Makefile
 
  * mac80211
    - fix EAPOL handling in 802.3 RX path
    - reject setting up aggregation sessions before
      connection is authorized to avoid timeouts or
      similar
    - handle some SAE authentication steps correctly
    - fix AC selection in mesh forwarding
 
  * iwlwifi
    - remove TWT support as it causes firmware crashes
      when the AP isn't behaving correctly
    - check debugfs pointer before dereferncing it
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmIeHPcACgkQB8qZga/f
 l8QwBQ//QAXCzemdYF6PpeIvrjOdNU+lJ+ajX/bYyk+pzpW6BRJRUM/MocN+vhUH
 scDCE4Ve8I7Xqx+H6zFOm0Wr2M3qqnzJwMni/4qeQw7mV8msFw4SY2XqaE9nMXkV
 dhVYgrbrmluevBCXCm/rCu9JpWe08A5nH1IycVGJXHbxdMgvifPPHm0/gHBiEvJh
 16itDwJZcqUWZj3DswMe011HMrJubfL6wSfbGdmMgeOdRAkWJHu/bKBLrOM/sveL
 QfPx5RL6MIHcWRLwtLDdDYRTuI1DmhcGWKXOK+BlYtL1vj/zsp8EXCPzTN3uxQw0
 ld58G5pMU16o3iLpwuRlJAUWfQKE6qV1c4obiYZPLzkWpQCWJRrtjd+U4eR0Oewz
 IoQr1NYd6kFB8MFqa8xKY5JMiuEYsABWWho9udkODoaLS4Ege1J4bI7sub33ifER
 qnBE7TB+XO01a+Ys5GOWwEgO6d3t1lEW/mVVLsxdjq3qV1PpWE3ExYnXJEKd6guj
 oU4nDdtaV0AII6ByoB/uxPobqpyAEky8TDd4c2i9Z7qCs8z0O+J9kvTD5jtrGv//
 g4F/6KZ2aQAKYba9CuAoP91VLiiAC4bhagitDFx5mtVaCSj1wdcaz+PVfzYtPxAb
 Ll7HDBqCjC8jfoJx6FoVbaa8xk1rCM9sjr/EGun7iNW9Y4N9Ocg=
 =Q7qF
 -----END PGP SIGNATURE-----

Merge tag 'wireless-for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

johannes Berg says:

====================

Some last-minute fixes:
 * rfkill
   - add missing rfill_soft_blocked() when disabled

 * cfg80211
   - handle a nla_memdup() failure correctly
   - fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in
     Makefile

 * mac80211
   - fix EAPOL handling in 802.3 RX path
   - reject setting up aggregation sessions before
     connection is authorized to avoid timeouts or
     similar
   - handle some SAE authentication steps correctly
   - fix AC selection in mesh forwarding

 * iwlwifi
   - remove TWT support as it causes firmware crashes
     when the AP isn't behaving correctly
   - check debugfs pointer before dereferncing it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-01 14:45:55 +00:00
Johannes Berg
a12f76345e cfg80211: fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo
The kbuild change here accidentally removed not only the
unquoting, but also the last character of the variable
name. Fix that.

Fixes: 129ab0d2d9 ("kbuild: do not quote string values in include/config/auto.conf")
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20220221155512.1d25895f7c5f.I50fa3d4189fcab90a2896fe8cae215035dae9508@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-01 14:10:14 +01:00
Florian Westphal
3b836da408 netfilter: nf_queue: handle socket prefetch
In case someone combines bpf socket assign and nf_queue, then we will
queue an skb who references a struct sock that did not have its
reference count incremented.

As we leave rcu protection, there is no guarantee that skb->sk is still
valid.

For refcount-less skb->sk case, try to increment the reference count
and then override the destructor.

In case of failure we have two choices: orphan the skb and 'delete'
preselect or let nf_queue() drop the packet.

Do the latter, it should not happen during normal operation.

Fixes: cf7fbe660f ("bpf: Add socket assign support")
Acked-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-01 11:51:15 +01:00
Florian Westphal
c387307024 netfilter: nf_queue: fix possible use-after-free
Eric Dumazet says:
  The sock_hold() side seems suspect, because there is no guarantee
  that sk_refcnt is not already 0.

On failure, we cannot queue the packet and need to indicate an
error.  The packet will be dropped by the caller.

v2: split skb prefetch hunk into separate change

Fixes: 271b72c7fa ("udp: RCU handling for Unicast packets.")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-01 11:50:35 +01:00