linux

Author	SHA1	Message	Date
Damien Le Moal	de3510e52b	null_blk: fix command timeout completion handling Memory backed or zoned null block devices may generate actual request timeout errors due to the submission path being blocked on memory allocation or zone locking. Unlike fake timeouts or injected timeouts, the request submission path will call blk_mq_complete_request() or blk_mq_end_request() for these real timeout errors, causing a double completion and use after free situation as the block layer timeout handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in blk_mq_check_expired(). This problem often triggers a NULL pointer dereference such as: BUG: kernel NULL pointer dereference, address: 0000000000000050 RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20 ... Call Trace: dd_finish_request+0x56/0x80 blk_mq_free_request+0x37/0x130 null_handle_cmd+0xbf/0x250 [null_blk] ? null_queue_rq+0x67/0xd0 [null_blk] blk_mq_dispatch_rq_list+0x122/0x850 __blk_mq_do_dispatch_sched+0xbb/0x2c0 __blk_mq_sched_dispatch_requests+0x13d/0x190 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x49/0x90 process_one_work+0x26c/0x580 worker_thread+0x55/0x3c0 ? process_one_work+0x580/0x580 kthread+0x134/0x150 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x1f/0x30 This problem very often triggers when running the full btrfs xfstests on a memory-backed zoned null block device in a VM with limited amount of memory. Avoid this by executing blk_mq_complete_request() in null_timeout_rq() only for commands that are marked for a fake timeout completion using the fake_timeout boolean in struct null_cmd. For timeout errors injected through debugfs, the timeout handler will execute blk_mq_complete_request()i as before. This is safe as the submission path does not execute complete requests in this case. In null_timeout_rq(), also make sure to set the command error field to BLK_STS_TIMEOUT and to propagate this error through to the request completion. Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-01 07:03:46 -06:00
Matthew Wilcox (Oracle)	2c7e57a027	idr test suite: Improve reporting from idr_find_test_1 Instead of just reporting an assertion failure, report enough information that we can start diagnosing exactly went wrong. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-04-01 07:50:42 -04:00
Matthew Wilcox (Oracle)	094ffbd1d8	idr test suite: Create anchor before launching throbber The throbber could race with creation of the anchor entry and cause the IDR to have zero entries in it, which would cause the test to fail. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-04-01 07:50:19 -04:00
Matthew Wilcox (Oracle)	703586410d	idr test suite: Take RCU read lock in idr_find_test_1 When run on a single CPU, this test would frequently access already-freed memory. Due to timing, this bug never showed up on multi-CPU tests. Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-04-01 07:44:48 -04:00
Matthew Wilcox (Oracle)	1bb4bd266c	radix tree test suite: Register the main thread with the RCU library Several test runners register individual worker threads with the RCU library, but neglect to register the main thread, which can lead to objects being freed while the main thread is in what appears to be an RCU critical section. Reported-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-04-01 07:41:30 -04:00
Vitaly Kuznetsov	8cdddd182b	ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead() Commit `496121c021` ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g. I'm observing the following on AWS Nitro (e.g r5b.xlarge but other instance types are affected as well): # echo 0 > /sys/devices/system/cpu/cpu0/online # echo 1 > /sys/devices/system/cpu/cpu0/online <10 seconds delay> -bash: echo: write error: Input/output error In fact, the above mentioned commit only revealed the problem and did not introduce it. On x86, to wakeup CPU an NMI is being used and hlt_play_dead()/mwait_play_dead() loops are prepared to handle it: /* * If NMI wants to wake up CPU0, start CPU0. */ if (wakeup_cpu0()) start_cpu0(); cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on systems where it wasn't called before the above mentioned commit) serves the same purpose but it doesn't have a path for CPU0. What happens now on wakeup is: - NMI is sent to CPU0 - wakeup_cpu0_nmi() works as expected - we get back to while (1) loop in acpi_idle_play_dead() - safe_halt() puts CPU0 to sleep again. The straightforward/minimal fix is add the special handling for CPU0 on x86 and that's what the patch is doing. Fixes: `496121c021` ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-04-01 13:37:55 +02:00
Srinivas Kandagatla	adfc3ed7dc	ASoC: codecs: lpass-rx-macro: set npl clock rate correctly NPL clock rate is twice the MCLK rate, so set this correctly to avoid soundwire timeouts. Fixes: `af3d54b997` ("ASoC: codecs: lpass-rx-macro: add support for lpass rx macro") Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210331171235.24824-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>	2021-04-01 12:18:09 +01:00
Srinivas Kandagatla	b861106f3c	ASoC: codecs: lpass-tx-macro: set npl clock rate correctly NPL clock rate is twice the MCLK rate, so set this correctly to avoid soundwire timeouts. Fixes: `c39667ddcf` ("ASoC: codecs: lpass-tx-macro: add support for lpass tx macro") Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210331171235.24824-1-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>	2021-04-01 12:18:07 +01:00
Arnd Bergmann	89e21e1ad9	i.MX fixes for 5.12, round 2: - Fix a system failure on imx6qdl-phytec-pfla02 board when booting from SD, by adding missing vmmc supply for SD interfaces. - Fix address typo in i.MX8MM/Q IOMUXC_SD1_DATA0_GPIO2_IO2 definition. -----BEGIN PGP SIGNATURE----- iQFIBAABCgAyFiEEFmJXigPl4LoGSz08UFdYWoewfM4FAmBi5tIUHHNoYXduZ3Vv QGtlcm5lbC5vcmcACgkQUFdYWoewfM63qQf9H+AmuNEw3Sm9+kW+VH3u+7cBGY0r gkdV+hc+pabC/lzkvGTJhmncW2Y35BfzuEG6Bd6s6QEEPAqtqZ0fzDZlcS444b9Z e2hLPraKo/C51SCOoAmCUd5JA3to/ZVC+zg1ZiN92SrqgBm5e3we7xvp+Qa/Rzxs ZYmzll20U4gt9Dq2HX7dSLc8F/yq6EIGEMkXPKkkDUdWXxM4qbUpN0LlzWCV529f SNppkfeA1VfB9Kb8MrawvBRldN4j3T0SWhRFZfa6LqzJEP1dy+885u+4YknMdeJc ibpab/oEAzz/yiOiTBmtNCUBFEh3Xdiwh+0Y4T5nGhRd2kFWi2TBJB7hFg== =8P0W -----END PGP SIGNATURE----- Merge tag 'imx-fixes-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.12, round 2: - Fix a system failure on imx6qdl-phytec-pfla02 board when booting from SD, by adding missing vmmc supply for SD interfaces. - Fix address typo in i.MX8MM/Q IOMUXC_SD1_DATA0_GPIO2_IO2 definition. * tag 'imx-fixes-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0 Link: https://lore.kernel.org/r/20210330090236.GQ22955@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2021-04-01 11:34:06 +02:00
Arnd Bergmann	111a5a421f	More fixes for omaps for v5.12-rc cycle Two fixes for hangs, mmc slot order fix, and a voltage typo fix: - Remove unused duplicate sha2md5_fck clock node that can race with the OMAP4_SHA2MD5_CLKCTRL clock node for disable for unused clocks - Add aliases for omap4/5 mmc to put the slots back into the right order again - Fix typo for bionic voltage controllers that accidentally use mpu for all instances instead of mpu, core and iva - Fix random hangs for droid4 caused by missing fix from TI Android kernel tree to do a dummy smc call on cpuidle wakeup path -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAmBbHlwRHHRvbnlAYXRv bWlkZS5jb20ACgkQG9Q+yVyrpXM7bRAAhm539aZUQE0mXrLZTbVxTo4PfOaA4ToB 3ZsoHFP2QK6RwulS6J7ebHLOVE6fMmVOj2UBpXMtTsNrBrI0k/7ziAgFiunTxZGa GpKma4AoNFjz3WLjkX7XpxlEH3W/oaIW6My5UQxn827m8oTqjN9mb/b0qxLu2zAp xc0sGM5t18A/v64Bx2OY2EimrieqzreNC5YUUKTH/CZnxnii6dla1Di6tZtT6iXw ARaqNM46qrd9iV1lfjncp0a2nfWAdlR4GJ2qXCKgLjs0J9T8xquUxda33zjRiXET /4pKJPVcU9jf1er839qk2gCoqzRhJhINQWxrzEBpj/ern4XR3Z0fQ6i0oT21roMr ho6mWKYudKd6k8fua2cWqKepaOoKVDhJYvvUN/3SvxV23rf8A26NddzrfZU/7H7S IQr1cg7vM2gKlFZJ3oGUCn9SL1DNDFHLvfJYYnSLW1dB04BQHvZjjHdxyXg5wg3P 2ZUVr2dER6mF49kHvtRHu5avKw4d5KGodG/E645sZQe+g/sTQGZcIIVRhI4lsa4C VnpJfokokkGcouOoy5mipLO+gPIMNlhdp05hMGtPu/iKjo378VbS/07097k8DTaT dlS8V+lDAthS5aOf86RzsgLAcs2f0UvUNvgeXxYeN7uRxSFqE/sWg9J0pMtPISt7 zyygoqP2oT4= =QQY6 -----END PGP SIGNATURE----- Merge tag 'omap-for-v5.12/fixes-rc4-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes More fixes for omaps for v5.12-rc cycle Two fixes for hangs, mmc slot order fix, and a voltage typo fix: - Remove unused duplicate sha2md5_fck clock node that can race with the OMAP4_SHA2MD5_CLKCTRL clock node for disable for unused clocks - Add aliases for omap4/5 mmc to put the slots back into the right order again - Fix typo for bionic voltage controllers that accidentally use mpu for all instances instead of mpu, core and iva - Fix random hangs for droid4 caused by missing fix from TI Android kernel tree to do a dummy smc call on cpuidle wakeup path * tag 'omap-for-v5.12/fixes-rc4-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP4: PM: update ROM return address for OSWR and OFF ARM: OMAP4: Fix PMIC voltage domains for bionic ARM: dts: Fix moving mmc devices with aliases for omap4 & 5 ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race Link: https://lore.kernel.org/r/pull-1616584662-702939@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2021-04-01 11:33:35 +02:00
Arnd Bergmann	70a6062cc2	This pull request contains Broadcom ARM-based SoC changes for 5.12, please pull the following: - Florian reverts the adding of the second level interrupt controller for HDMI BSC interrupts since they collide with the main I2C controller (i2c-bcm2835). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEm+Rq3+YGJdiR9yuFh9CWnEQHBwQFAmAsbcMACgkQh9CWnEQH BwQzMA/+OKiEhSrnMvJ2OCrvy1FQHoexapmG+i9hMW2fO7Zd2CX9we2kAVU7HC3r aGCrYAQkUbyGROx+CjjRYQJDt8XP66tgD8vMgODxq2ujfbFXSOe5xlI9OBeDaSRm RveqOM4K/2+QkxBI/LbJdT2cZwsqxud4y8xuuw0/WDRI5FypQynuaxI+uDgkSVWb cuFJ9iJuBCAUE3PxqtQr5/nbDvnsjZO6Ib/GkVulvm4/YU5o1bFcSZ4Z+tD3ip8U o+uPdO1pfb7HlLkczPMwB6zI7SKWPOnsxIxGE0sgQGWoWYeE5HQqzryeNYokLaoj Ygm3xZp6HnRfngNJ6H1dN8h95Nd1DU+Jjz6Z+PXUfrekpzdhBtSQnyQ4sxi8Jhrf 0jNxUvbJ625IduqV36xMWMW+WOKs4xLIBQ7FSqmlYjdURYBvpjHxoE1oUccBc7tc 4lBbIIxaENB+bmAS/qKFsDZWQdjkNxoSiLpBb4w2VnNE/yBePL/z9czFeawqmPc7 TIsxm8Rqozbt7btffSmZYJb7rNr//bMAR8gmmWmNvwAaJXRZdHFuPAbtONmTkXdQ yxuLtAjafHUM+j791QGAMIXwfWjfCc4ERSINuQoNm474bzCAQVsi0kUtcHrDFvQM UU+0brs3F3yGap3PkxsrId67DlgzUTCUm+/WiDzhQv9zh402f5U= =B9w6 -----END PGP SIGNATURE----- Merge tag 'arm-soc/for-5.12/devicetree-part2' of https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoC changes for 5.12, please pull the following: - Florian reverts the adding of the second level interrupt controller for HDMI BSC interrupts since they collide with the main I2C controller (i2c-bcm2835). * tag 'arm-soc/for-5.12/devicetree-part2' of https://github.com/Broadcom/stblinux: Revert "ARM: dts: bcm2711: Add the BSC interrupt controller" Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2021-04-01 11:33:08 +02:00
Vitaly Kuznetsov	55626ca9c6	selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) Add a test for the issue when KVM_SET_CLOCK(0) call could cause TSC page value to go very big because of a signedness issue around hv_clock->system_time. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210326155551.17446-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:14:19 -04:00
Vitaly Kuznetsov	77fcbe823f	KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update() When guest time is reset with KVM_SET_CLOCK(0), it is possible for 'hv_clock->system_time' to become a small negative number. This happens because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, kvm_guest_time_update() does (masterclock in use case): hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; And 'master_kernel_ns' represents the last time when masterclock got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a problem, the difference is very small, e.g. I'm observing hv_clock.system_time = -70 ns. The issue comes from the fact that 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in compute_tsc_page_parameters() becomes a very big number. Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from going negative. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210331124130.337992-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:14:19 -04:00
Paolo Bonzini	a83829f56c	KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken pvclock_gtod_sync_lock can be taken with interrupts disabled if the preempt notifier calls get_kvmclock_ns to update the Xen runstate information: spin_lock include/linux/spinlock.h:354 [inline] get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 So change the users of the spinlock to spin_lock_irqsave and spin_unlock_irqrestore. Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com Fixes: `30b5c851af` ("KVM: x86/xen: Add support for vCPU runstate information") Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:14:19 -04:00
Paolo Bonzini	c2c647f91a	KVM: x86: reduce pvclock_gtod_sync_lock critical sections There is no need to include changes to vcpu->requests into the pvclock_gtod_sync_lock critical section. The changes to the shared data structures (in pvclock_update_vm_gtod_copy) already occur under the lock. Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:14:19 -04:00
Paolo Bonzini	6ebae23c07	Merge branch 'kvm-fix-svm-races' into kvm-master	2021-04-01 05:14:05 -04:00
Paolo Bonzini	3c346c0c60	KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared. Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit). Cc: stable@vger.kernel.org Fixes: `2fcf4876ad` ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:11:35 -04:00
Paolo Bonzini	a58d9166a7	KVM: SVM: load control fields from VMCB12 before checking them Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests. This bug is CVE-2021-29657. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@vger.kernel.org Fixes: `2fcf4876ad` ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:09:31 -04:00
Dave Airlie	dcdb7aa452	Merge tag 'amd-drm-fixes-5.12-2021-03-31' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.12-2021-03-31: amdgpu: - Polaris idle power fix - VM fix - Vangogh S3 fix - Fixes for non-4K page sizes amdkfd: - dqm fence memory corruption fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210401020057.17831-1-alexander.deucher@amd.com	2021-04-01 15:04:58 +10:00
Dave Airlie	7344c82777	Just one cleanup which drops of_gpio.h inclusion. - This header file isn't used anymore so drop it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJgYbbvAAoJEFc4NIkMQxK4KMYQALXOVKqGu8fgQXct3lyIn9cK lvFkrWv03v5DYdyHio7XR8egFzRMKw0XENYzAM0CAaUsVApOgi63puZrtyO5+taW ++Ai3oclCrGvwEWXhxx6jGEPUPaPPrD8sQF60+3bOxbAGDxHk99RhEHvYwIQbQzD 5ZJa3V2K7xiBmnnXP2mlm5qNSVC0c8rRWtf5rXjRHcAfaWHy9gqScaXqwn/9KKZy Nrf5dsV3vxG9F+HdoWyKIdvhGfVrjdIldkLBtCn2P3n0aHJXAeqpoK1K6JpVIFtO mJPHwB9XwZZ/I2jxXBpATU70C50SAKgFd0bS5f6caZOZJHw1S+VkocgcOqLZA4Kz 7QnaMJfef2MkSK/1I3XsLgissd+4GsnghNt8t5lF6+vNuTTEcqY+k/eL+2xpXzaF lKtXqTgL3+JKlCibYUK0ZCGa7M3cGGWjmSSUsXTo2jWY7GqlT5T6lB0YOp8fmr8P 7IQEm3l1gNCgFpF8S2mbIHbWKRlv/Bm0oTbUixAPMojLYjIPC48YZ7qZZ5WwPcSz CB+y2F+QkIDR1Du8Ys8cacNgedVfSZz3SGdJ8pnpTI5UkzGWvgzqTwnqRumNmPUb uzAtmA9Fi43f/DWgNqsA5FPODtDin3ubU8/JwGKlNEYgJfrXYga4trINOAqCjKoJ sHgi5Ekd8ZzVsf9CPWRe =Rp8S -----END PGP SIGNATURE----- Merge tag 'exynos-drm-fixes-for-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes Just one cleanup which drops of_gpio.h inclusion. - This header file isn't used anymore so drop it. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/1617016858-14081-1-git-send-email-inki.dae@samsung.com	2021-04-01 13:31:11 +10:00
Xℹ Ruoyao	e3512fb670	drm/amdgpu: check alignment on CPU page for bo map The page table of AMDGPU requires an alignment to CPU page so we should check ioctl parameters for it. Return -EINVAL if some parameter is unaligned to CPU page, instead of corrupt the page table sliently. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-03-31 21:53:38 -04:00
Huacai Chen	566c6e25f9	drm/amdgpu: Set a suitable dev_info.gart_page_size In Mesa, dev_info.gart_page_size is used for alignment and it was set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU driver requires an alignment on CPU pages. So, for non-4KB page system, gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE). Signed-off-by: Rui Wang <wangr@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1 [Xi: rebased for drm-next, use max_t for checkpatch, and reworded commit message.] Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Tested-by: Dan Horák <dan@danny.cz> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-03-31 21:53:38 -04:00
Alex Deucher	6951c3e4a2	drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspend Do the same thing we do for Renoir. We can check, but since the sbios has started DPM, it will always return true which causes the driver to skip some of the SMU init when it shouldn't. Reviewed-by: Zhan Liu <zhan.liu@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-03-31 21:53:38 -04:00
Qu Huang	e92049ae45	drm/amdkfd: dqm fence memory corruption Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang <jinsdb@126.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-03-31 21:53:25 -04:00
Yufen Yu	3edf5346e4	block: only update parent bi_status when bio fail For multiple split bios, if one of the bio is fail, the whole should return error to application. But we found there is a race between bio_integrity_verify_fn and bio complete, which return io success to application after one of the bio fail. The race as following: split bio(READ) kworker nvme_complete_rq blk_update_request //split error=0 bio_endio bio_integrity_endio queue_work(kintegrityd_wq, &bip->bip_work); bio_integrity_verify_fn bio_endio //split bio __bio_chain_endio if (!parent->bi_status) <interrupt entry> nvme_irq blk_update_request //parent error=7 req_bio_endio bio->bi_status = 7 //parent bio <interrupt exit> parent->bi_status = 0 parent->bi_end_io() // return bi_status=0 The bio has been split as two: split and parent. When split bio completed, it depends on kworker to do endio, while bio_integrity_verify_fn have been interrupted by parent bio complete irq handler. Then, parent bio->bi_status which have been set in irq handler will overwrite by kworker. In fact, even without the above race, we also need to conside the concurrency beteen mulitple split bio complete and update the same parent bi_status. Normally, multiple split bios will be issued to the same hctx and complete from the same irq vector. But if we have updated queue map between multiple split bios, these bios may complete on different hw queue and different irq vector. Then the concurrency update parent bi_status may cause the final status error. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-03-31 19:18:04 -06:00
Linus Torvalds	d19cc4bfbf	Add check of order < 0 before calling free_pages() The function addresses that are traced by ftrace are stored in pages, and the size is held in a variable. If there's some error in creating them, the allocate ones will be freed. In this case, it is possible that the order of pages to be freed may end up being negative due to a size of zero passed to get_count_order(), and then that negative number will cause free_pages() to free a very large section. Make sure that does not happen. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYGR30BQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qnbDAP9yEhTLcDRUi3VLWnEq19Dt4Lsg86Bf QRpbWG6Ze9EbZQEAgYAOe1fsNCNEIMXXh/4nlKVpKKH+vviS0ux9Z6uhpQQ= =Veyq -----END PGP SIGNATURE----- Merge tag 'trace-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Add check of order < 0 before calling free_pages() The function addresses that are traced by ftrace are stored in pages, and the size is held in a variable. If there's some error in creating them, the allocate ones will be freed. In this case, it is possible that the order of pages to be freed may end up being negative due to a size of zero passed to get_count_order(), and then that negative number will cause free_pages() to free a very large section. Make sure that does not happen" * tag 'trace-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Check if pages were allocated before calling free_pages()	2021-03-31 10:14:55 -07:00
Linus Torvalds	39192106d4	Pin control fixes for the v5.12 kernel cycle: - Fix up some Intel GPIO base calculations. - Fix a register offset in the Microchip driver. - Fix suspend/resume bug in the Rockchip driver. - Default pull up strength in the Qualcomm LPASS driver. - Fix two pingroup offsets in the Qualcomm SC7280 driver. - Fix SDC1 register offset in the Qualcomm SC7280 driver. - Fix a nasty string concatenation in the Qualcomm SDX55 driver. - Check the REVID register to see if the device is real or virtualized during virtualization in the Intel driver. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmBkalgACgkQQRCzN7AZ XXMY/g/+OUZm+mT0DR0AFDW7UkMytns0Zgt7Zk9V6W5rytW7731LNCC5CRF1F3kI iZiutqNfe7uDWop5yo1ub8UqCxBtgrJADzXT4uTGid11jQrsfwUjZEqjlW1J4oHz Y4cfmLCPtuB4rA1GrISqYQ2s2YO2+kdFVdL8ZrhFQug5GEOJrtmjAYvTU1GshaLH 9oxd7bg5QV2ZmzAQoH7tScqUi+60u7CqWLBJzuW6/qIBCBpYxc6NFLlfPbuqJbyb NuJeRl+e7OA2tQE5CR4ymQyuoz7iGCrgsDIOi7maADDTJQEW/RPG9565qzqQxCgv UFt1sqodgegEgXLRNJG6Y9zO1qktCzDHJGNFyH9EZ2FHVKox+Gicf8TxYM3ulHca az/qs+MXH32YdLr//lg6XvogAMjnl3nXeq9j1U7nunki4N93oxBhq89d5qBcyZSP /3D53HViq6EEcBnChAZ64SY21Ro7HNzE6x5bOSL2HPsWy9B8U5jRgAkHxPjd8/Vr LF5EPblI+bZc8fOxAi9ifMu9OROB+I5ZUm00zYzucJUKoMtrkAkObW9/GS88vpwN j6uAAr/WFlWaXpKynIxyME3zPuhoyZHFlSnI8LI3UTBiZtFsxoMWbVP1OVkbK6bc kUtQUPsE0T4OowYa+ulYet+US/LucmVn7KgTFGOodSKtpxHU7Mk= =+Gsn -----END PGP SIGNATURE----- Merge tag 'pinctrl-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some overly ripe fixes for the v5.12 kernel. I should have sent earlier but had my head stuck in GDB. All are driver fixes: - Fix up some Intel GPIO base calculations. - Fix a register offset in the Microchip driver. - Fix suspend/resume bug in the Rockchip driver. - Default pull up strength in the Qualcomm LPASS driver. - Fix two pingroup offsets in the Qualcomm SC7280 driver. - Fix SDC1 register offset in the Qualcomm SC7280 driver. - Fix a nasty string concatenation in the Qualcomm SDX55 driver. - Check the REVID register to see if the device is real or virtualized during virtualization in the Intel driver" * tag 'pinctrl-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: intel: check REVID register value for device presence pinctrl: qcom: fix unintentional string concatenation pinctrl: qcom: sc7280: Fix SDC1_RCLK configurations pinctrl: qcom: sc7280: Fix SDC_QDSD_PINGROUP and UFS_RESET offsets pinctrl: qcom: lpass lpi: use default pullup/strength values pinctrl: rockchip: fix restore error in resume pinctrl: microchip-sgpio: Fix wrong register offset for IRQ trigger pinctrl: intel: Show the GPIO base calculation explicitly	2021-03-31 10:09:44 -07:00
Bastian Germann	7c0d6e4820	ASoC: sunxi: sun4i-codec: fill ASoC card owner card->owner is a required property and since commit `81033c6b58` ("ALSA: core: Warn on empty module") a warning is issued if it is empty. Add it. This fixes following warning observed on Lamobo R1: WARNING: CPU: 1 PID: 190 at sound/core/init.c:207 snd_card_new+0x430/0x480 [snd] Modules linked in: sun4i_codec(E+) sun4i_backend(E+) snd_soc_core(E) ... CPU: 1 PID: 190 Comm: systemd-udevd Tainted: G C E 5.10.0-1-armmp #1 Debian 5.10.4-1 Hardware name: Allwinner sun7i (A20) Family Call trace: (snd_card_new [snd]) (snd_soc_bind_card [snd_soc_core]) (snd_soc_register_card [snd_soc_core]) (sun4i_codec_probe [sun4i_codec]) Fixes: `45fb6b6f2a` ("ASoC: sunxi: add support for the on-chip codec on early Allwinner SoCs") Related: commit `3c27ea23ff` ("ASoC: qcom: Set card->owner to avoid warnings") Related: commit `ec653df2a0` ("drm/vc4/vc4_hdmi: fill ASoC card owner") Cc: linux-arm-kernel@lists.infradead.org Cc: alsa-devel@alsa-project.org Signed-off-by: Bastian Germann <bage@linutronix.de> Link: https://lore.kernel.org/r/20210331151843.30583-1-bage@linutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>	2021-03-31 17:59:43 +01:00
Paolo Bonzini	825e34d3c9	Merge commit 'kvm-tdp-fix-flushes' into kvm-master	2021-03-31 07:45:41 -04:00
Tetsuo Handa	5e46d1b78a	reiserfs: update reiserfs_xattrs_initialized() condition syzbot is reporting NULL pointer dereference at reiserfs_security_init() [1], for commit `ab17c4f021` ("reiserfs: fixup xattr_root caching") is assuming that REISERFS_SB(s)->xattr_root != NULL in reiserfs_xattr_jcreate_nblocks() despite that commit made REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL case possible. I guess that commit `6cb4aff0a7` ("reiserfs: fix oops while creating privroot with selinux enabled") wanted to check xattr_root != NULL before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking about the xattr root. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Therefore, update reiserfs_xattrs_initialized() to check both the privroot and the xattr root. Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1] Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: `6cb4aff0a7` ("reiserfs: fix oops while creating privroot with selinux enabled") Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-03-30 14:27:32 -07:00
Jens Axboe	82734c5b1b	io_uring: drop sqd lock before handling signals for SQPOLL Don't call into get_signal() with the sqd mutex held, it'll fail if we're freezing the task and we'll get complaints on locks still being held: ==================================== WARNING: iou-sqp-8386/8387 still has locks held! 5.12.0-rc4-syzkaller #0 Not tainted ------------------------------------ 1 lock held by iou-sqp-8386/8387: #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731 stack backtrace: CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 try_to_freeze include/linux/freezer.h:66 [inline] get_signal+0x171a/0x2150 kernel/signal.c:2576 io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748 Fold the get_signal() case in with the parking checks, as we need to drop the lock in both cases, and since we need to be checking for parking when juggling the lock anyway. Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com Fixes: `dbe1bdbb39` ("io_uring: handle signals for IO threads like a normal thread") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-03-30 14:36:46 -06:00
Hans de Goede	3e759425cc	ACPI: scan: Fix _STA getting called on devices with unmet dependencies Commit `71da201f38` ("ACPI: scan: Defer enumeration of devices with _DEP lists") dropped the following 2 lines from acpi_init_device_object(): /* Assume there are unmet deps until acpi_device_dep_initialize() runs */ device->dep_unmet = 1; Leaving the initial value of dep_unmet at the 0 from the kzalloc(). This causes the acpi_bus_get_status() call in acpi_add_single_object() to actually call _STA, even though there maybe unmet deps, leading to errors like these: [ 0.123579] ACPI Error: No handler for Region [ECRM] (00000000ba9edc4c) [GenericSerialBus] (20170831/evregion-166) [ 0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler (20170831/exfldio-299) [ 0.123618] ACPI Error: Method parse/execution failed \_SB.I2C1.BAT1._STA, AE_NOT_EXIST (20170831/psparse-550) Fix this by re-adding the dep_unmet = 1 initialization to acpi_init_device_object() and modifying acpi_bus_check_add() to make sure that dep_unmet always gets setup there, overriding the initial 1 value. This re-fixes the issue initially fixed by commit `63347db0af` ("ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs"), which introduced the removed "device->dep_unmet = 1;" statement. This issue was noticed; and the fix tested on a Dell Venue 10 Pro 5055. Fixes: `71da201f38` ("ACPI: scan: Defer enumeration of devices with _DEP lists") Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: 5.11+ <stable@vger.kernel.org> # 5.11+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-03-30 21:36:20 +02:00
Linus Torvalds	6ac86aae89	s390 updates for 5.12-rc6 - fix incorrect initialization and update of vdso data pages, which results in incorrect tod clock steering, and that clock_gettime(CLOCK_MONOTONIC_RAW, ...) returns incorrect values. - update MAINTAINERS for s390 vfio drivers -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmBjXH8ACgkQIg7DeRsp bsLSWg//QhUiFdeUvyMoUxUj7Pld5/VPwAdVyuYbWlaiAwzHQwdhSCEXQ0CoIJm2 TWKpDA768KjYI8RowS+WxjGtGgDFtagtwAUm9UveiCuV4QJ5Zt/n+IRoXlUuwZWG 9mjx2UcUHxPFGJ7WtAR5VI1MLxZDHeEHVjLh7bOugO/7388CZr4TnCoIJZRXk1sh m8WLr5gavBmYTA2PrmayuOkxOPMV56d5rRi/ZGiUYWHJJgskg+LDWcBCr4IER2am MgeCMIpVgs2o4z6O2xCRmJeOG4Ha1JgeI0+8OhkkWmmZCo/ZMrjW4rjP8s0nSSif PdKot7KaeDAVGCm11jv4V+YVN9HaVPFqGimNV7XuEXBZKnwh7N2yKK4qWvwRy+TJ Y1nBMUvzGw7IEl60FYPCPUryk9KEq2qAYc77jRcRi9H7ji0X3BMYVKr/9AT3sdeJ O7FkGSdEk7rfYzDiqRyisbT3rJKMwRy7i8sp7ap/m1qcgVbQlUSFoFNXSchF+2wP OCQuvLOX7RNsFP9YiXcqUTgFO6JkN0UbVfwyOH/SxTf9Pyj53wEmphURGiHTZkbr 9TrjDmAU3MA52IGU9gYHLqQ9otCrDhj4H9Uuuo6IDJ3v1pVgdt87fIq3nHjOU8Si KxVQ7V86Z3bXpHYglohWG7AfltfqkdO7FFAj7zvKgvq06zqRmtQ= =5z4s -----END PGP SIGNATURE----- Merge tag 's390-5.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - fix incorrect initialization and update of vdso data pages, which results in incorrect tod clock steering, and that clock_gettime(CLOCK_MONOTONIC_RAW, ...) returns incorrect values. - update MAINTAINERS for s390 vfio drivers * tag 's390-5.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: MAINTAINERS: add backups for s390 vfio drivers s390/vdso: fix initializing and updating of vdso_data s390/vdso: fix tod_steering_delta type s390/vdso: copy tod_steering_delta value to vdso_data page	2021-03-30 10:54:22 -07:00
Thierry Reding	ac097aecfe	drm/tegra: sor: Grab runtime PM reference across reset The SOR resets are exclusively shared with the SOR power domain. This means that exclusive access can only be granted temporarily and in order for that to work, a rigorous sequence must be observed. To ensure that a single consumer gets exclusive access to a reset, each consumer must implement a rigorous protocol using the reset_control_acquire() and reset_control_release() functions. However, these functions alone don't provide any guarantees at the system level. Drivers need to ensure that the only a single consumer has access to the reset at the same time. In order for the SOR to be able to exclusively access its reset, it must therefore ensure that the SOR power domain is not powered off by holding on to a runtime PM reference to that power domain across the reset assert/deassert operation. This used to work fine by accident, but was revealed when recently more devices started to rely on the SOR power domain. Fixes: `11c632e1cf` ("drm/tegra: sor: Implement acquire/release for reset") Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2021-03-30 19:51:39 +02:00
Matthew Wilcox (Oracle)	7487de534d	radix tree test suite: Fix compilation Commit `4bba4c4bb0` added tools/include/linux/compiler_types.h which includes linux/compiler-gcc.h. Unfortunately, we had our own (empty) compiler_types.h which overrode the one added by that commit, and so we lost the definition of __must_be_array(). Removing our empty compiler_types.h fixes the problem and reduces our divergence from the rest of the tools. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-03-30 13:44:35 -04:00
Matthew Wilcox (Oracle)	df59d0a461	XArray: Add xa_limit_16b A 16-bit limit is a more common limit than I had realised. Make it generally available. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-03-30 13:42:33 -04:00
Matthew Wilcox (Oracle)	3012110d71	XArray: Fix splitting to non-zero orders Splitting an order-4 entry into order-2 entries would leave the array containing pointers to 000040008000c000 instead of 000044448888cccc. This is a one-character fix, but enhance the test suite to check this case. Reported-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-03-30 13:42:33 -04:00
Matthew Wilcox (Oracle)	12efebab09	XArray: Fix split documentation I wrote the documentation backwards; the new order of the entry is stored in the xas and the caller passes the old entry. Reported-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2021-03-30 13:42:33 -04:00
Thierry Reding	a31500fe70	drm/tegra: dc: Restore coupling of display controllers Coupling of display controllers used to rely on runtime PM to take the companion controller out of reset. Commit `fd67e9c6ed` ("drm/tegra: Do not implement runtime PM") accidentally broke this when runtime PM was removed. Restore this functionality by reusing the hierarchical host1x client suspend/resume infrastructure that's similar to runtime PM and which perfectly fits this use-case. Fixes: `fd67e9c6ed` ("drm/tegra: Do not implement runtime PM") Reported-by: Dmitry Osipenko <digetx@gmail.com> Reported-by: Paul Fertser <fercerpav@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2021-03-30 19:40:43 +02:00
Mikko Perttunen	a24f98176d	gpu: host1x: Use different lock classes for each client To avoid false lockdep warnings, give each client lock a different lock class, passed from the initialization site by macro. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2021-03-30 19:37:20 +02:00
Dmitry Osipenko	f8fb97c915	drm/tegra: dc: Don't set PLL clock to 0Hz RGB output doesn't allow to change parent clock rate of the display and PCLK rate is set to 0Hz in this case. The tegra_dc_commit_state() shall not set the display clock to 0Hz since this change propagates to the parent clock. The DISP clock is defined as a NODIV clock by the tegra-clk driver and all NODIV clocks use the CLK_SET_RATE_PARENT flag. This bug stayed unnoticed because by default PLLP is used as the parent clock for the display controller and PLLP silently skips the erroneous 0Hz rate changes because it always has active child clocks that don't permit rate changes. The PLLP isn't acceptable for some devices that we want to upstream (like Samsung Galaxy Tab and ASUS TF700T) due to a display panel clock rate requirements that can't be fulfilled by using PLLP and then the bug pops up in this case since parent clock is set to 0Hz, killing the display output. Don't touch DC clock if pclk=0 in order to fix the problem. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2021-03-30 19:37:20 +02:00
Sean Christopherson	33a3164161	KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages Prevent the TDP MMU from yielding when zapping a gfn range during NX page recovery. If a flush is pending from a previous invocation of the zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU has not accumulated a flush for the current invocation, then yielding will release mmu_lock with stale TLB entries. That being said, this isn't technically a bug fix in the current code, as the TDP MMU will never yield in this case. tdp_mmu_iter_cond_resched() will yield if and only if it has made forward progress, as defined by the current gfn vs. the last yielded (or starting) gfn. Because zapping a single shadow page is guaranteed to (a) find that page and (b) step sideways at the level of the shadow page, the TDP iter will break its loop before getting a chance to yield. But that is all very, very subtle, and will break at the slightest sneeze, e.g. zapping while holding mmu_lock for read would break as the TDP MMU wouldn't be guaranteed to see the present shadow page, and thus could step sideways at a lower level. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-4-seanjc@google.com> [Add lockdep assertion. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:19:56 -04:00
Sean Christopherson	048f49809c	KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which does the flush itself if and only if it yields (which it will never do in this particular scenario), and otherwise expects the caller to do the flush. If pages are zapped from the TDP MMU but not the legacy MMU, then no flush will occur. Fixes: `29cf0f5007` ("kvm: x86/mmu: NX largepage recovery for TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-3-seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:19:55 -04:00
Sean Christopherson	a835429cda	KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap When flushing a range of GFNs across multiple roots, ensure any pending flush from a previous root is honored before yielding while walking the tables of the current root. Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local "flush" with the result to avoid redundant flushes. zap_gfn_range() preserves and return the incoming "flush", unless of course the flush was performed prior to yielding and no new flush was triggered. Fixes: `1af4a96025` ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Cc: stable@vger.kernel.org Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:19:55 -04:00
Siddharth Chandrasekaran	6fb3084ab5	KVM: make: Fix out-of-source module builds Building kvm module out-of-source with, make -C $SRC O=$BIN M=arch/x86/kvm fails to find "irq.h" as the include dir passed to cflags-y does not prefix the source dir. Fix this by prefixing $(srctree) to the include dir path. Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <20210324124347.18336-1-sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:10 -04:00
Vitaly Kuznetsov	f982fb62a3	selftests: kvm: make hardware_disable_test less verbose hardware_disable_test produces 512 snippets like ... main: [511] waiting semaphore run_test: [511] start vcpus run_test: [511] all threads launched main: [511] waiting 368us main: [511] killing child and this doesn't have much value, let's print this info with pr_debug(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210323104331.1354800-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:10 -04:00
Vitaly Kuznetsov	1973cadd4c	KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs are only available when X86_FEATURE_PERFCTR_CORE CPUID bit was exposed to the guest. KVM, however, allows these MSRs unconditionally because kvm_pmu_is_valid_msr() -> amd_msr_idx_to_pmc() check always passes and because kvm_pmu_set_msr() -> amd_pmu_set_msr() doesn't fail. In case of a counter (CTRn), no big harm is done as we only increase internal PMC's value but in case of an eventsel (CTLn), we go deep into perf internals with a non-existing counter. Note, kvm_get_msr_common() just returns '0' when these MSRs don't exist and this also seems to contradict architectural behavior which is #GP (I did check one old Opteron host) but changing this status quo is a bit scarier. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210323084515.1346540-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:10 -04:00
Dongli Zhang	ecaf088f53	KVM: x86: remove unused declaration of kvm_write_tsc() kvm_write_tsc() was renamed and made static since commit `0c899c25d7` ("KVM: x86: do not attempt TSC synchronization on guest writes"). Remove its unused declaration. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210326070334.12310-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:09 -04:00
Haiwei Li	d632826f26	KVM: clean up the unused argument kvm_msr_ignored_check function never uses vcpu argument. Clean up the function and invokers. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20210313051032.4171-1-lihaiwei.kernel@gmail.com> Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:09 -04:00
Stefan Raspl	75f94ecbd0	tools/kvm_stat: Add restart delay If this service is enabled and the system rebooted, Systemd's initial attempt to start this unit file may fail in case the kvm module is not loaded. Since we did not specify a delay for the retries, Systemd restarts with a minimum delay a number of times before giving up and disabling the service. Which means a subsequent kvm module load will have kvm running without monitoring. Adding a delay to fix this. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Message-Id: <20210325122949.1433271-1-raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:09 -04:00

1 2 3 4 5 ...

997813 Commits