linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-03 17:41:22 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	e8d8c5d80f	bnxt: make sure xmit_more + errors does not miss doorbells skbs are freed on error and not put on the ring. We may, however, be in a situation where we're freeing the last skb of a batch, and there is a doorbell ring pending because of xmit_more() being true earlier. Make sure we ring the door bell in such situations. Since errors are rare don't pay attention to xmit_more() and just always flush the pending frames. The busy case should be safe to be left alone because it can only happen if start_xmit races with completions and they both enable the queue. In that case the kick can't be pending. Noticed while reading the code. Fixes: `4d172f21ce` ("bnxt_en: Implement xmit_more.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:17 -07:00
Jakub Kicinski	01cca6b933	bnxt: disable napi before canceling DIM napi schedules DIM, napi has to be disabled first, then DIM canceled. Noticed while reading the code. Fixes: `0bc0b97fca` ("bnxt_en: cleanup DIM work on device shutdown") Fixes: `6a8788f256` ("bnxt_en: add support for software dynamic interrupt moderation") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:17 -07:00
Jakub Kicinski	3c603136c9	bnxt: don't lock the tx queue from napi poll We can't take the tx lock from the napi poll routine, because netpoll can poll napi at any moment, including with the tx lock already held. The tx lock is protecting against two paths - the disable path, and (as Michael points out) the NETDEV_TX_BUSY case which may occur if NAPI completions race with start_xmit and both decide to re-enable the queue. For the disable/ifdown path use synchronize_net() to make sure closing the device does not race we restarting the queues. Annotate accesses to dev_state against data races. For the NAPI cleanup vs start_xmit path - appropriate barriers are already in place in the main spot where Tx queue is stopped but we need to do the same careful dance in the TX_BUSY case. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:26:17 -07:00
Xie Yongji	cddce01160	nbd: Aovid double completion of a request There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: `96d97e1782` ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 09:46:48 -06:00
Ilya Leoshkevich	3776f3517e	selftests, bpf: Test that dead ldx_w insns are accepted Prevent regressions related to zero-extension metadata handling during dead code sanitization. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-3-iii@linux.ibm.com	2021-08-13 17:46:26 +02:00
Ilya Leoshkevich	45c709f8c7	bpf: Clear zext_dst of dead insns "access skb fields ok" verifier test fails on s390 with the "verifier bug. zext_dst is set, but no reg is defined" message. The first insns of the test prog are ... 0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0] 8: 35 00 00 01 00 00 00 00 jge %r0,0,1 10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8] ... and the 3rd one is dead (this does not look intentional to me, but this is a separate topic). sanitize_dead_code() converts dead insns into "ja -1", but keeps zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such an insn, it sees this discrepancy and bails. This problem can be seen only with JITs whose bpf_jit_needs_zext() returns true. Fix by clearning dead insns' zext_dst. The commits that contributed to this problem are: 1. `5aa5bd14c5` ("bpf: add initial suite for selftests"), which introduced the test with the dead code. 2. `5327ed3d44` ("bpf: verifier: mark verified-insn with sub-register zext flag"), which introduced the zext_dst flag. 3. `83a2881903` ("bpf: Account for BPF_FETCH in insn_has_def32()"), which introduced the sanity check. 4. `9183671af6` ("bpf: Fix leakage under speculation on mispredicted branches"), which bisect points to. It's best to fix this on stable branches that contain the second one, since that's the point where the inconsistency was introduced. Fixes: `5327ed3d44` ("bpf: verifier: mark verified-insn with sub-register zext flag") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com	2021-08-13 17:43:43 +02:00
Konstantin Komarov	96b18047a7	fs/ntfs3: Add MAINTAINERS This adds MAINTAINERS Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:59:20 -07:00
Konstantin Komarov	6e5be40d32	fs/ntfs3: Add NTFS3 in fs/Kconfig and fs/Makefile This adds NTFS3 in fs/Kconfig and fs/Makefile Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:59:09 -07:00
Jens Axboe	8f40d03707	tools/io_uring/io_uring-cp: sync with liburing example This example is missing a few fixes that are in the liburing version, synchronize with the upstream version. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 08:58:11 -06:00
Konstantin Komarov	12dad495ea	fs/ntfs3: Add Kconfig, Makefile and doc This adds Kconfig, Makefile and doc Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:56:37 -07:00
Konstantin Komarov	b46acd6a6a	fs/ntfs3: Add NTFS journal This adds NTFS journal Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:56:05 -07:00
Konstantin Komarov	522e010b58	fs/ntfs3: Add compression This patch adds different types of NTFS-applicable compressions: - lznt - lzx - xpress Latter two (lzx, xpress) implement Windows Compact OS feature and were taken from ntfs-3g system comression plugin authored by Eric Biggers (https://github.com/ebiggers/ntfs-3g-system-compression) which were ported to ntfs3 and adapted to Linux Kernel environment. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:56:00 -07:00
Konstantin Komarov	be71b5cba2	fs/ntfs3: Add attrib operations This adds attrib operations Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:55:55 -07:00
Konstantin Komarov	4342306f0f	fs/ntfs3: Add file operations and implementation This adds file operations and implementation Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:55:49 -07:00
Konstantin Komarov	3f3b442b5a	fs/ntfs3: Add bitmap This adds bitmap Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:55:41 -07:00
Konstantin Komarov	82cae269cf	fs/ntfs3: Add initialization of super block This adds initialization of super block Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:55:35 -07:00
Konstantin Komarov	4534a70b70	fs/ntfs3: Add headers and misc files This adds headers and misc files Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>	2021-08-13 07:52:52 -07:00
Yu Kuai	454bb67752	blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED We run a test that delete and recover devcies frequently(two devices on the same host), and we found that 'active_queues' is super big after a period of time. If device a and device b share a tag set, and a is deleted, then blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there is only one queue that are using the tag set. However, if b is still active, the active_queues of b might never be cleared even if b is deleted. Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 08:01:34 -06:00
Thomas Gleixner	7a3dc4f35b	driver core: Add missing kernel doc for device::msi_lock Fixes: `77e89afc25` ("PCI/MSI: Protect msi_desc::masked for multi-MSI") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2021-08-13 12:38:48 +02:00
Dongliang Mu	50f05bd114	ipack: tpci200: fix memory leak in the tpci200_register The error handling code in tpci200_register does not free interface_regs allocated by ioremap and the current version of error handling code is problematic. Fix this by refactoring the error handling code and free interface_regs when necessary. Fixes: `43986798fd` ("ipack: add error handling for ioremap_nocache") Cc: stable@vger.kernel.org Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210810100323.3938492-2-mudongliangabcd@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-13 10:24:37 +02:00
Dongliang Mu	57a1681095	ipack: tpci200: fix many double free issues in tpci200_pci_probe The function tpci200_register called by tpci200_install and tpci200_unregister called by tpci200_uninstall are in pair. However, tpci200_unregister has some cleanup operations not in the tpci200_register. So the error handling code of tpci200_pci_probe has many different double free issues. Fix this problem by moving those cleanup operations out of tpci200_unregister, into tpci200_pci_remove and reverting the previous commit `9272e5d002` ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe"). Fixes: `9272e5d002` ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe") Cc: stable@vger.kernel.org Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210810100323.3938492-1-mudongliangabcd@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-13 10:24:37 +02:00
Srinivas Kandagatla	d77772538f	slimbus: ngd: reset dma setup during runtime pm During suspend/resume NGD remote instance is power cycled along with remotely controlled bam dma engine. So Reset the dma configuration during this suspend resume path so that we are not dealing with any stale dma setup. Without this transactions timeout after first suspend resume path. Fixes: `917809e228` ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-5-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-13 10:22:30 +02:00
Srinivas Kandagatla	c0e38eaa8d	slimbus: ngd: set correct device for pm For some reason we ended up using wrong device in some places for pm_runtime calls. Fix this so that NGG driver can do runtime pm correctly. Fixes: `917809e228` ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-4-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-13 10:22:30 +02:00
Srinivas Kandagatla	a263c1ff6a	slimbus: messaging: check for valid transaction id In some usecases transaction ids are dynamically allocated inside the controller driver after sending the messages which have generic acknowledge responses. So check for this before refcounting pm_runtime. Without this we would end up imbalancing runtime pm count by doing pm_runtime_put() in both slim_do_transfer() and slim_msg_response() for a single pm_runtime_get() in slim_do_transfer() Fixes: `d3062a2109` ("slimbus: messaging: add slim_alloc/free_txn_tid()") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-13 10:22:30 +02:00
Srinivas Kandagatla	9659281ce7	slimbus: messaging: start transaction ids from 1 instead of zero As tid is unsigned its hard to figure out if the tid is valid or invalid. So Start the transaction ids from 1 instead of zero so that we could differentiate between a valid tid and invalid tids This is useful in cases where controller would add a tid for controller specific transfers. Fixes: `d3062a2109` ("slimbus: messaging: add slim_alloc/free_txn_tid()") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-13 10:22:30 +02:00
Paolo Bonzini	6e949ddb0a	Merge branch 'kvm-tdpmmu-fixes' into kvm-master Merge topic branch with fixes for both 5.14-rc6 and 5.15.	2021-08-13 03:33:13 -04:00
Sean Christopherson	ce25681d59	KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write. Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages. Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree. For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise. Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock. Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective). Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging). Fixes: `a2855afc7e` ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181815.3378104-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:32:14 -04:00
Sean Christopherson	0103098fb4	KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs Set the min_level for the TDP iterator at the root level when zapping all SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf SPTE will recursively zap all its children, thus there is no need for the iterator to attempt to step down. This avoids rereading the top-level SPTEs after they are zapped by causing try_step_down() to short-circuit. In most cases, optimizing try_step_down() will be in the noise as the cost of zapping SPTEs completely dominates the overall time. The optimization is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM is zapping in response to multiple memslot updates when userspace is adding and removing read-only memslots for option ROMs. In that case, the task doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock for read and thus can be a noisy neighbor of sorts. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:31:56 -04:00
Sean Christopherson	524a1e4e38	KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level paging, the "zap all" flows will skip top-level SPTEs whose range extends beyond shadow_phys_bits and leak their SPs when the VM is destroyed. Use the current upper bound (based on host.MAXPHYADDR) to detect that the caller wants to zap all SPTEs, e.g. instead of using the max theoretical gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator to terminate its walk earlier when running on hosts with MAXPHYADDR < 52. Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to help future debuggers should KVM decide to leak SPTEs again. The bug is most easily reproduced by running (and unloading!) KVM in a VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped. ============================================================================= BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab\|zone=1) CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x45/0x59 slab_err+0x95/0xc9 __kmem_cache_shutdown.cold+0x3c/0x158 kmem_cache_destroy+0x3d/0xf0 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x5d/0x90 [kvm] kvm_exit+0x78/0x90 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x13f/0x220 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `faaf05b00a` ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:31:46 -04:00
Paolo Bonzini	c5e2bf0b4a	KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmEWEsoPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpD1IIQAIbZdNAIy68j2/H8sgaYT4GuYICLOvz3WhTI Li/yRP2b0th4wT4LaKlATKJKQgliPxXZ0KCJMZxFr7aiKEyY1LZe+ddJBzetzgy2 S12v5V3cp/0DHQ6CEflUy0x8gM/BeudeYyZcHxSbLZcVB4bzFx9pBJeJ1WkLG+GC Bx4zxdARNas+9zOUuHLCQbWfihMSrbj3CI6WIafpNeFOs3lLldT8WcRofgQfAsAx V3FKETIOb5NUU6LKUHkYgyM3n1MZwAukaCsepDhayeeT5iEyIGXb1HkjcYOx6bfn BhDvA7PH9oXBOFFL2sxlJKamXWZP3Bz7xyZ40MXDqC1lSMAUEh8TXJFptncEDxPb OgXewTgCulKVSjT8YXnoTe1UNQ2dLqjw1TsqV5jXhVXIjeBcR8S4gM0hcqwvgWlO BHaDt8BPd39rBzfC0gUkE5BHE04QuboK/Vz/+Qc6Slc3EUIdnuCtjefdRLvSxxgB bEBW+s3zcZ7RhoSLvXgvTe3an11Os8BH921VCxgMyEnIvSDEbw3KypmPYuNCkSLc t9GLAbPU139w7Gk7vp0oqhI8xIV7QoFk+b94JIHMvtS13yVaqBrZF33RrFzmAwVN lXDiOdoR8mqbX2EPQVIn+BhSlebfvnJANm46tzgY1/u2mUgH//fu/cH3kpjgohco kY+Ztnb9 =hL2s -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM	2021-08-13 03:21:13 -04:00
Sean Christopherson	18712c1370	KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit `a0c134347b` ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: `1dbf5d68af` ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:20:58 -04:00
Junaid Shahid	85aa8889b8	kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault When a nested EPT violation/misconfig is injected into the guest, the shadow EPT PTEs associated with that address need to be synced. This is done by kvm_inject_emulated_page_fault() before it calls nested_ept_inject_page_fault(). However, that will only sync the shadow EPT PTE associated with the current L1 EPTP. Since the ASID is based on EP4TA rather than the full EPTP, so syncing the current EPTP is not enough. The SPTEs associated with any other L1 EPTPs in the prev_roots cache with the same EP4TA also need to be synced. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20210806222229.1645356-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:20:58 -04:00
Paolo Bonzini	375d1adebc	Merge branch 'kvm-vmx-secctl' into kvm-master Merge common topic branch for 5.14-rc6 and 5.15 merge window.	2021-08-13 03:20:18 -04:00
Paolo Bonzini	ffbe17cada	KVM: x86: remove dead initialization hv_vcpu is initialized again a dozen lines below, and at this point vcpu->arch.hyperv is not valid. Remove the initializer. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:20:18 -04:00
Sean Christopherson	1383279c64	KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels Remove an ancient restriction that disallowed exposing EFER.NX to the guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU. The motivation of the check, added by commit `2cc51560ae` ("KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run the guest with the host's EFER.NX and thus avoid context switching EFER if the only divergence was the NX bit. Fast forward to today, and KVM has long since stopped running the guest with the host's EFER.NX. Not only does KVM context switch EFER if host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 && guest.EFER.NX=1 when using shadow paging (to emulate SMEP). Furthermore, the entire motivation for the restriction was made obsolete over a decade ago when Intel added dedicated host and guest EFER fields in the VMCS (Nehalem timeframe), which reduced the overhead of context switching EFER from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles. In practice, the removed restriction only affects non-PAE 32-bit kernels, as EFER.NX is set during boot if NX is supported and the kernel will use PAE paging (32-bit or 64-bit), regardless of whether or not the kernel will actually use NX itself (mark PTEs non-executable). Alternatively and/or complementarily, startup_32_smp() in head_32.S could be modified to set EFER.NX=1 regardless of paging mode, thus eliminating the scenario where NX is supported but not enabled. However, that runs the risk of breaking non-KVM non-PAE kernels (though the risk is very, very low as there are no known EFER.NX errata), and also eliminates an easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER across nested virtualization transitions. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210805183804.1221554-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-13 03:20:17 -04:00
Linus Torvalds	f8e6dfc64f	Networking fixes for 5.14-rc6, including fixes from netfilter, bpf, can and ieee802154. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEVb/AACgkQMUZtbf5S Irvlzw//XGDHNNPPOueHVhYK50+WiqPMxezQ5nbnG6uR6JtPyirMNTgzST8rQRsu HmQy8/Oi6bK5rbPC9iDtKK28ba6Ldvu1ic8lTkuWyNNthG/pZGJJQ+Pg7dmkd7te soJGZKnTbNWwbgGOFbfw9rLRuzWsjQjQ43vxTMjjNnpOwNxANuNR1GN0S/t8e9di 9BBT8jtgcHhtW5jRMHMNWHk+k8aeyIZPxjl9fjzzsMt7meX50DFrCJgf8bKkZ5dA W2b/fzUyMqVQJpgmIY4ktFmR4mV382pWOOs6rl+ppSu+mU/gpTuYCofF7FqAUU5S 71mzukW6KdOrqymVuwiTXBlGnZB370aT7aUU5PHL/ZkDJ9shSyVRcg/iQa40myzn 5wxunZX936z5f84bxZPW1J5bBZklba8deKPXHUkl5RoIXsN2qWFPJpZ1M0eHyfPm ZdqvRZ1IkSSFZFr6FF374bEqa88NK1wbVKUbGQ+yn8abE+HQfXQR9ZWZa1DR1wkb rF8XWOHjQLp/zlTRnj3gj3T4pEwc5L1QOt7RUrYfI36Mh7iUz5EdzowaiEaDQT6/ neThilci1F6Mz4Uf65pK4TaDTDvj1tqqAdg3g8uneHBTFARS+htGXqkaKxP6kSi+ T/W4woOqCRT6c0+BhZ2jPRhKsMZ5kR1vKLUVBHShChq32mDpn6g= =hzDl -----END PGP SIGNATURE----- Merge tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, bpf, can and ieee802154. The size of this is pretty normal, but we got more fixes for 5.14 changes this week than last week. Nothing major but the trend is the opposite of what we like. We'll see how the next week goes.. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups" * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) vsock/virtio: avoid potential deadlock when vsock device remove wwan: core: Avoid returning NULL from wwan_create_dev() net: dsa: sja1105: unregister the MDIO buses during teardown Revert "tipc: Return the correct errno code" net: mscc: Fix non-GPL export of regmap APIs net: igmp: increase size of mr_ifc_count MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets net: pcs: xpcs: fix error handling on failed to allocate memory net: linkwatch: fix failure to restore device state across suspend/resume net: bridge: fix memleak in br_add_if() net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge net: bridge: fix flags interpretation for extern learn fdb entries net: dsa: sja1105: fix broken backpressure in .port_fdb_dump net: dsa: lantiq: fix broken backpressure in .port_fdb_dump net: dsa: lan9303: fix broken backpressure in .port_fdb_dump net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump bpf, core: Fix kernel-doc notation net: igmp: fix data-race in igmp_ifc_timer_expire() net: Fix memory leak in ieee802154_raw_deliver ...	2021-08-12 16:24:03 -10:00
Linus Torvalds	3a03c67de2	A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis and a reference handling fix from Jeff that should address some memory corruption reports in the snaprealm area. Both marked for stable. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmEVaqsTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi/DBCACd7+mnAXIwajwoDdXFIJT7/tfimdvU cMrh6ciZNtEKxm23flQ1AFJXlXR/nlZRspfOmlmsl9bB4TAlXnhJ/s4JaiuOMMTh OQ4oz0vAbGELkPsXB/FXGSSk1wTFEjCocFsJwoYiUkYjD7Qt12BZKNkFYgj/MVc2 wyJ5K1buqBLVFDU+CymqDzc07YpG1zn888o7UGWFTyevldRAHl2euxqbnr0S4qb9 OS5UKO3aFCEt5PT9RKRHygCGjuHym/fgXgPm9aNY4rYBE9qOXloVUOD5bhMHBJ2E g506xhOurqbGv4O9oj+gvBwtQwY/TF8BvCA79koQSHNIYQsC/bcXenST =m8x8 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis and a reference handling fix from Jeff that should address some memory corruption reports in the snaprealm area. Both marked for stable" * tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client: ceph: take snap_empty_lock atomically with snaprealm refcount change ceph: reduce contention in ceph_check_delayed_caps()	2021-08-12 16:16:01 -10:00
Linus Torvalds	82cce5f429	drm fixes for 5.14-rc6 amdgpu: - Yellow carp update - RAS EEPROM fixes - BACO/BOCO fixes - Fix a memory leak in an error path - Freesync fix - VCN harvesting fix - Display fixes i915: - GVT fix for Windows VM hang. - Display fix of 12 BPC bits for display 12 and newer. - Don't try to access some media register for fused off domains. - Fix kerneldoc build warnings. mediatek: - Fix dpi bridge bug. - Fix cursor plane no update. meson: - Fix colors when booting with HDR -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmEVjJYACgkQDHTzWXnE hr5RNQ/9EpOCgrgKCDXyFdpW6NqGIPbkSeNYaa4fCVNLzKkkCsuwapRDamsBTvvO O/p6aIVbzlPzn/BkMj3COy5W44Za4n02495jYVYU9CRlkOTiSa6CwzKtcsu4X6d/ uZsOdyOc5inHr8j2abUdqq1rmL7JW9uDnn0PNFBG739DITAmJvmKOUA3TcWe7Fdb KUI1k3Ukjw8HD38OqTQNUuYVEZ6Dq4yVkIKsUOtW/kcvvgas4d1cDhe8NurJxtqr kw1HqzDFflPmJ1V4AJWl+wpaHsE11cR1kldI7iVYlP0MKhAWCXNbdgcFjcyFKkhh NWn1qSoWVMKZ5vmbpMn/EDnNZjoGZ+dsW7c7+vS0g/JWjN8VkUgqO+cAYwG1vm+P S+smD138g7j1DP0uP1XYTETyLOr9BoApkxYyaYAFZZKLcWNBiEuxU/fu8i+yDVNX 44L8ICi8KZezAoI5qHyTYIt2NyN3bjdhUXEnAUiqIL1mKF/XacCciVLQnK+NH3FR qVg6XWBV6LDvhrffatft5wYGAzUXC6SDbUas5hES4zd1L/M0A7dYiJ9BFRzIL/r1 bVPAcgmy3NbHITxdAM+WB9bGhEiV5PxlzYs1L9itFw8DSETexOKEFVq8Ni5kqeul U5LFWTQFTrZ2lvK1Rn6Ys1KNYpH3wJV9tuTvGBCrSQsZ8PgLAWE= =w0XU -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2021-08-13' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Another week, another set of pretty regular fixes, nothing really stands out too much. amdgpu: - Yellow carp update - RAS EEPROM fixes - BACO/BOCO fixes - Fix a memory leak in an error path - Freesync fix - VCN harvesting fix - Display fixes i915: - GVT fix for Windows VM hang. - Display fix of 12 BPC bits for display 12 and newer. - Don't try to access some media register for fused off domains. - Fix kerneldoc build warnings. mediatek: - Fix dpi bridge bug. - Fix cursor plane no update. meson: - Fix colors when booting with HDR" * tag 'drm-fixes-2021-08-13' of git://anongit.freedesktop.org/drm/drm: drm/doc/rfc: drop lmem uapi section drm/i915: Only access SFC_DONE when media domain is not fused off drm/i915/display: Fix the 12 BPC bits for PIPE_MISC reg drm/amd/display: use GFP_ATOMIC in amdgpu_dm_irq_schedule_work drm/amd/display: Remove invalid assert for ODM + MPC case drm/amd/pm: bug fix for the runtime pm BACO drm/amdgpu: handle VCN instances when harvesting (v2) drm/meson: fix colour distortion from HDR set during vendor u-boot drm/i915/gvt: Fix cached atomics setting for Windows VM drm/amdgpu: Add preferred mode in modeset when freesync video mode's enabled. drm/amd/pm: Fix a memory leak in an error handling path in 'vangogh_tables_init()' drm/amdgpu: don't enable baco on boco platforms in runpm drm/amdgpu: set RAS EEPROM address from VBIOS drm/amd/pm: update smu v13.0.1 firmware header drm/mediatek: Fix cursor plane no update drm/mediatek: mtk-dpi: Set out_fmt from config if not the last bridge drm/mediatek: dpi: Fix NULL dereference in mtk_dpi_bridge_atomic_check	2021-08-12 16:09:25 -10:00
Arnd Bergmann	cbfece7518	ARM: ixp4xx: fix building both pci drivers When both the old and the new PCI drivers are enabled in the same kernel, there are a couple of namespace conflicts that cause a build failure: drivers/pci/controller/pci-ixp4xx.c:38: error: "IXP4XX_PCI_CSR" redefined [-Werror] 38 \| #define IXP4XX_PCI_CSR 0x1c \| In file included from arch/arm/mach-ixp4xx/include/mach/hardware.h:23, from arch/arm/mach-ixp4xx/include/mach/io.h:15, from arch/arm/include/asm/io.h:198, from include/linux/io.h:13, from drivers/pci/controller/pci-ixp4xx.c:20: arch/arm/mach-ixp4xx/include/mach/ixp4xx-regs.h:221: note: this is the location of the previous definition 221 \| #define IXP4XX_PCI_CSR(x) ((volatile u32 )(IXP4XX_PCI_CFG_BASE_VIRT+(x))) \| drivers/pci/controller/pci-ixp4xx.c:148:12: error: 'ixp4xx_pci_read' redeclared as different kind of symbol 148 \| static int ixp4xx_pci_read(struct ixp4xx_pci p, u32 addr, u32 cmd, u32 *data) \| ^~~~~~~~~~~~~~~ Rename both the ixp4xx_pci_read/ixp4xx_pci_write functions and the IXP4XX_PCI_CSR macro. In each case, I went with the version that has fewer callers to keep the change small. Fixes: `f7821b4934` ("PCI: ixp4xx: Add a new driver for IXP4xx") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: soc@kernel.org Link: https://lore.kernel.org/r/20210721151546.2325937-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2021-08-12 23:10:09 +02:00
Linus Walleij	813bacf410	ARM: configs: Update the nhk8815_defconfig The platform lost the framebuffer due to a commit solving a circular dependency in v5.14-rc1, so add it back in by explicitly selecting the framebuffer. Also fix up some Kconfig options that got dropped or moved around while we're at it. Fixes: `f611b1e762` ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210807225518.3607126-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2021-08-12 23:09:47 +02:00
Dave Airlie	a1fa726831	Short summary of fixes pull: * meson: Fix colors when booting with HDR -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmEU27UACgkQaA3BHVML eiMABwf+Kuvyh6mXHewMwqRNnxYFwyyhy5AVJm17Z74pZcbxSWp2+WBxDTaugCRX Dl5Yt27DsCCW0JFpZ0w4xITizUxhD/VKnVUONBB1HK1s92h7WxG+2SQEPQ7uHQj5 iBbaysCn5uIJhpDnTXcqos0wnLMxkquZ5ZK6uQwPD+Q8VsAwMVw0dZ1C/Kkmo9zF 55VUATGIGBtOI4FEDJfDZGukVfbDt28ipMNeyCiqfzREnDeIyojJ/KEbSD3XHhi7 HMDkeAM5SHToNaBEIaEikPWKSSa/b5P3y9U/yT6d4MZ1PH/lr1he4SQnThcTRCg8 +vnMC1iJiwgdX7gQ2StHBoB1GMqv1Q== =NuGL -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2021-08-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * meson: Fix colors when booting with HDR Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YRTb+qUuBYWjJDVg@linux-uq9g.fritz.box	2021-08-13 06:37:40 +10:00
Dave Airlie	3e234e9f7f	- GVT fix for Windows VM hang. - Display fix of 12 BPC bits for display 12 and newer. - Don't try to access some media register for fused off domains. - Fix kerneldoc build warnings. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmEVP3QACgkQ+mJfZA7r E8qGKQgApmIuFnhFspZ2BMsbhxOckZW7mxg+4RzNbIZ/pb+2aO8CnkfemrYZgFX/ OaZtvJdqtbbkN25glZTeP7wuV//uXSkPdr7zTqaL+005U0PL+EUzlfYZR4AzdhI2 D2AMEDfVwwbMrB/hI8sk01XKmEXFfBkX3Acy9svLqi2TOvFCqjnBcV/E5K+eklDs lY5KoklziRKw2FpckOdKYPiyjWbTC02s7Co4QXdkkSoZm69HDGuBlchH2JNyDrer Dyarp5btGzkVMZy5q9avIsBgFE1RRVCEnGLaSzMi1qtNaorCTYs5FJtp48yFfmzi O1X6XOGYDS4nGl782sfxjkOaUNOdbQ== =vN7o -----END PGP SIGNATURE----- Merge tag 'drm-intel-fixes-2021-08-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - GVT fix for Windows VM hang. - Display fix of 12 BPC bits for display 12 and newer. - Don't try to access some media register for fused off domains. - Fix kerneldoc build warnings. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YRU/hnQ1sNr+j37x@intel.com	2021-08-13 06:31:26 +10:00
Jakub Kicinski	a9a507013a	Merge tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== ieee802154 for net 2021-08-12 Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw. * tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: net: Fix memory leak in ieee802154_raw_deliver ieee802154: hwsim: fix GPF in hwsim_new_edge_nl ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi ==================== Link: https://lore.kernel.org/r/20210812183912.1663996-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-12 11:50:17 -07:00
Babu Moger	064855a690	x86/resctrl: Fix default monitoring groups reporting Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem. Steps to reproduce: 1. mount -t resctrl resctrl /sys/fs/resctrl/ 2. cd /sys/fs/resctrl/ 3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832 4. Create sub monitor group: mkdir mon_groups/test1 5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable". When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable". Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs. Fixes: `d89b737901` ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik <pawel.szulik@intel.com> Signed-off-by: Babu Moger <Babu.Moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu	2021-08-12 20:12:20 +02:00
Longpeng(Mike)	49b0b6ffe2	vsock/virtio: avoid potential deadlock when vsock device remove There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Fixes: `0ea9e1d3a9` ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-12 10:57:27 -07:00
Steven Rostedt (VMware)	5acce0bff2	tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name The following commands: # echo 'read_max u64 size;' > synthetic_events # echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger Causes: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strcmp+0xc/0x20 Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89 RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000 RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8 R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210 FS: 00007f1fa6785740(0000) GS:ffff9714da400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0 Call Trace: __find_event_file+0x4e/0x80 action_create+0x6b7/0xeb0 ? kstrdup+0x44/0x60 event_hist_trigger_func+0x1a07/0x2130 trigger_process_regex+0xbd/0x110 event_trigger_write+0x71/0xd0 vfs_write+0xe9/0x310 ksys_write+0x68/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f1fa6879e87 The problem was the "trace(read_max,count)" where the "count" should be "$count" as "onmax()" only handles variables (although it really should be able to figure out that "count" is a field of sys_enter_read). But there's a path that does not find the variable and ends up passing a NULL for the event, which ends up getting passed to "strcmp()". Add a check for NULL to return and error on the command with: # cat error_log hist:syscalls:sys_enter_read: error: Couldn't create or find variable Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count) ^ Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: `50450603ec` tracing: Add 'onmax' hist trigger action support Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-08-12 13:35:57 -04:00
Masami Hiramatsu	d0ac5fbaf7	init: Suppress wrong warning for bootconfig cmdline parameter Since the 'bootconfig' command line parameter is handled before parsing the command line, it doesn't use early_param(). But in this case, kernel shows a wrong warning message about it. [ 0.013714] Kernel command line: ro console=ttyS0 bootconfig console=tty0 [ 0.013741] Unknown command line parameters: bootconfig To suppress this message, add a dummy handler for 'bootconfig'. Link: https://lkml.kernel.org/r/162812945097.77369.1849780946468010448.stgit@devnote2 Fixes: `86d1919a4f` ("init: print out unknown kernel parameters") Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-08-12 13:35:57 -04:00
Lukas Bulwahn	12f9951d3f	tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS Commit `2860cd8a23` ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available") intends to enable config LIVEPATCH when ftrace with ARGS is available. However, the chain of configs to enable LIVEPATCH is incomplete, as HAVE_DYNAMIC_FTRACE_WITH_ARGS is available, but the definition of DYNAMIC_FTRACE_WITH_ARGS, combining DYNAMIC_FTRACE and HAVE_DYNAMIC_FTRACE_WITH_ARGS, needed to enable LIVEPATCH, is missing in the commit. Fortunately, ./scripts/checkkconfigsymbols.py detects this and warns: DYNAMIC_FTRACE_WITH_ARGS Referencing files: kernel/livepatch/Kconfig So, define the config DYNAMIC_FTRACE_WITH_ARGS analogously to the already existing similar configs, DYNAMIC_FTRACE_WITH_REGS and DYNAMIC_FTRACE_WITH_DIRECT_CALLS, in ./kernel/trace/Kconfig to connect the chain of configs. Link: https://lore.kernel.org/kernel-janitors/CAKXUXMwT2zS9fgyQHKUUiqo8ynZBdx2UEUu1WnV_q0OCmknqhw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20210806195027.16808-1-lukas.bulwahn@gmail.com Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: stable@vger.kernel.org Fixes: `2860cd8a23` ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-08-12 13:35:57 -04:00
Daniel Bristot de Oliveira	0e05ba498d	trace/osnoise: Print a stop tracing message When using osnoise/timerlat with stop tracing, sometimes it is not clear in which CPU the stop condition was hit, mainly when using some extra events. Print a message informing in which CPU the trace stopped, like in the example below: <idle>-0 [006] d.h. 2932.676616: #1672599 context irq timer_latency 34689 ns <idle>-0 [006] dNh. 2932.676618: irq_noise: local_timer:236 start 2932.676615639 duration 2391 ns <idle>-0 [006] dNh. 2932.676620: irq_noise: virtio0-output.0:47 start 2932.676620180 duration 86 ns <idle>-0 [003] d.h. 2932.676621: #1673374 context irq timer_latency 1200 ns <idle>-0 [006] d... 2932.676623: thread_noise: swapper/6:0 start 2932.676615964 duration 4339 ns <idle>-0 [003] dNh. 2932.676623: irq_noise: local_timer:236 start 2932.676620597 duration 1881 ns <idle>-0 [006] d... 2932.676623: sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/6 next_pid=852 next_prio=4 timerlat/6-852 [006] .... 2932.676623: #1672599 context thread timer_latency 41931 ns <idle>-0 [003] d... 2932.676623: thread_noise: swapper/3:0 start 2932.676620854 duration 880 ns <idle>-0 [003] d... 2932.676624: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/3 next_pid=849 next_prio=4 timerlat/6-852 [006] .... 2932.676624: timerlat_main: stop tracing hit on cpu 6 timerlat/3-849 [003] .... 2932.676624: #1673374 context thread timer_latency 4310 ns Link: https://lkml.kernel.org/r/b30a0d7542adba019185f44ee648e60e14923b11.1626598844.git.bristot@kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-08-12 13:35:56 -04:00
Daniel Bristot de Oliveira	e1c4ad4a7f	trace/timerlat: Add a header with PREEMPT_RT additional fields Some extra flags are printed to the trace header when using the PREEMPT_RT config. The extra flags are: need-resched-lazy, preempt-lazy-depth, and migrate-disable. Without printing these fields, the timerlat specific fields are shifted by three positions, for example: # tracer: timerlat # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\| / # \|\|\|\| ACTIVATION # TASK-PID CPU# \|\|\|\| TIMESTAMP ID CONTEXT LATENCY # \| \| \| \|\|\|\| \| \| \| \| <idle>-0 [000] d..h... 3279.798871: #1 context irq timer_latency 830 ns <...>-807 [000] ....... 3279.798881: #1 context thread timer_latency 11301 ns Add a new header for timerlat with the missing fields, to be used when the PREEMPT_RT is enabled. Link: https://lkml.kernel.org/r/babb83529a3211bd0805be0b8c21608230202c55.1626598844.git.bristot@kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-08-12 13:35:56 -04:00

... 3 4 5 6 7 ...

1031328 Commits