Commit Graph

47401 Commits

Author SHA1 Message Date
Linus Torvalds
d129377639 ARM64:
* Fix the guest view of the ID registers, making the relevant fields
   writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)
 
 * Correcly expose S1PIE to guests, fixing a regression introduced
   in 6.12-rc1 with the S1POE support
 
 * Fix the recycling of stage-2 shadow MMUs by tracking the context
   (are we allowed to block or not) as well as the recycling state
 
 * Address a couple of issues with the vgic when userspace misconfigures
   the emulation, resulting in various splats. Headaches courtesy
   of our Syzkaller friends
 
 * Stop wasting space in the HYP idmap, as we are dangerously close
   to the 4kB limit, and this has already exploded in -next
 
 * Fix another race in vgic_init()
 
 * Fix a UBSAN error when faking the cache topology with MTE
   enabled
 
 RISCV:
 
 * RISCV: KVM: use raw_spinlock for critical section in imsic
 
 x86:
 
 * A bandaid for lack of XCR0 setup in selftests, which causes trouble
   if the compiler is configured to have x86-64-v3 (with AVX) as the
   default ISA.  Proper XCR0 setup will come in the next merge window.
 
 * Fix an issue where KVM would not ignore low bits of the nested CR3
   and potentially leak up to 31 bytes out of the guest memory's bounds
 
 * Fix case in which an out-of-date cached value for the segments could
   by returned by KVM_GET_SREGS.
 
 * More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
 
 * Override MTRR state for KVM confidential guests, making it WB by
   default as is already the case for Hyper-V guests.
 
 Generic:
 
 * Remove a couple of unused functions
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmcVK54UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOfrgf7BRyihd28OGaqVuv2BqGYrxqfOkd6
 ZqpJDOy+X7UE3iG5NhTxw4mghCJFhOwIL7gDSZwPLe6D2k01oqPSP2pLMqXb5oOv
 /EkltRvzG0YIH3sjZY5PROrMMxnvSKkJKxETFxFQQzMKRym2v/T5LAzrium58YIT
 vWZXxo2HTPXOw/U5upAqqMYJMeeJEL3kurVHtOsPytUFjrIOl0BfeKvgjOwonDIh
 Awm4JZwk0+1d8sYfkuzsSrTQmtshDCx1jkFN1juirt90s1EwgmOvVKiHo3gMsVP9
 veDRoLTx2fM/r7TrhoHo46DTA2vbfmCltWcT0cn5x8P24BFGXXe/IDJIHA==
 =IVlI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix the guest view of the ID registers, making the relevant fields
     writable from userspace (affecting ID_AA64DFR0_EL1 and
     ID_AA64PFR1_EL1)

   - Correcly expose S1PIE to guests, fixing a regression introduced in
     6.12-rc1 with the S1POE support

   - Fix the recycling of stage-2 shadow MMUs by tracking the context
     (are we allowed to block or not) as well as the recycling state

   - Address a couple of issues with the vgic when userspace
     misconfigures the emulation, resulting in various splats. Headaches
     courtesy of our Syzkaller friends

   - Stop wasting space in the HYP idmap, as we are dangerously close to
     the 4kB limit, and this has already exploded in -next

   - Fix another race in vgic_init()

   - Fix a UBSAN error when faking the cache topology with MTE enabled

  RISCV:

   - RISCV: KVM: use raw_spinlock for critical section in imsic

  x86:

   - A bandaid for lack of XCR0 setup in selftests, which causes trouble
     if the compiler is configured to have x86-64-v3 (with AVX) as the
     default ISA. Proper XCR0 setup will come in the next merge window.

   - Fix an issue where KVM would not ignore low bits of the nested CR3
     and potentially leak up to 31 bytes out of the guest memory's
     bounds

   - Fix case in which an out-of-date cached value for the segments
     could by returned by KVM_GET_SREGS.

   - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL

   - Override MTRR state for KVM confidential guests, making it WB by
     default as is already the case for Hyper-V guests.

  Generic:

   - Remove a couple of unused functions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
  RISCV: KVM: use raw_spinlock for critical section in imsic
  KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
  KVM: selftests: x86: Avoid using SSE/AVX instructions
  KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
  KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
  KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
  KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
  KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
  x86/kvm: Override default caching mode for SEV-SNP and TDX
  KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
  KVM: Remove unused kvm_vcpu_gfn_to_pfn
  KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
  KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
  KVM: arm64: Fix shift-out-of-bounds bug
  KVM: arm64: Shave a few bytes from the EL2 idmap code
  KVM: arm64: Don't eagerly teardown the vgic on init error
  KVM: arm64: Expose S1PIE to guests
  KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
  KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
  KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
  ...
2024-10-21 11:22:04 -07:00
Linus Torvalds
db87114dcf - Explicitly disable the TSC deadline timer when going idle to address
some CPU errata in that area
 
 - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
   late microcode loading path
 
 - Clear CPU buffers later in the NMI exit path on 32-bit to avoid
   register clearing while they still contain sensitive data, for the
   RDFS mitigation
 
 - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
   path on 32-bit
 
 - Fix parsing issues of memory bandwidth specification in sysfs for
   resctrl's memory bandwidth allocation feature
 
 - Other small cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmcU6aMACgkQEsHwGGHe
 VUqXPxAAjG0m9J11jBNlNsorPKe0dlhkgV6RpEOtCWov0mvxSAPQazT9PE0FTCvx
 Hm/IdEmj5vkkJOC/R7pga8Yz5fRwGtYwIHyS5618Wh+KAfdsXDgTFvCKaBQt0ltB
 9U5+mwmyzzL6rS6jcv/y28qwi0STd4dHKg6K9sWAtga1bQSPCyJMZjeh9op5CxNh
 QOppCJR23jrp9I9c1zFd1LJPM4GY+KTYXTa7076sfcoD2taHbxAwsC/wiMooh5A2
 k0EItyzy2UWWSUxAW8QhZJyuAWav631tHjcz9iETgNZmjgpR0sTGFGkRaYB74qkf
 vS2yyGpTSoKhxXVcBe7Z6cMf5DhUUjMa7itXZnY7kWCenvwfa3/nuSUKtIeqTPyg
 a6BXypPFyYaqRWHtCiN6KjwXaS+fbc385Fh6m8Q/NDrHnXG84oLQ3DK0WKj4Z37V
 YRflsWJ4ZRIwLALGsKJX+qbe9Oh3VDE3Q8MH9pCiJi227YB2OzyImJmCUBRY9bIC
 7Amw4aUBUxX/VUpUOC4CJnx8SOG7cIeM06E6jM7J6LgWHpee++ccbFpZNqFh3VW/
 j67AifRJFljG+JcyPLZxZ4M/bzpsGkpZ7iiW8wI8k0CPoG7lcvbkZ3pQ4eizAHIJ
 0a+WQ9jHj1/64g4bT7Ml8lZRbzfBG/ksLkRwq8Gakt+h7GQbsd4=
 =n0wZ
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Explicitly disable the TSC deadline timer when going idle to address
   some CPU errata in that area

 - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
   late microcode loading path

 - Clear CPU buffers later in the NMI exit path on 32-bit to avoid
   register clearing while they still contain sensitive data, for the
   RDFS mitigation

 - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
   path on 32-bit

 - Fix parsing issues of memory bandwidth specification in sysfs for
   resctrl's memory bandwidth allocation feature

 - Other small cleanups and improvements

* tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Always explicitly disarm TSC-deadline timer
  x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
  x86/bugs: Use code segment selector for VERW operand
  x86/entry_32: Clear CPU buffers after register restore in NMI return
  x86/entry_32: Do not clobber user EFLAGS.ZF
  x86/resctrl: Annotate get_mem_config() functions as __init
  x86/resctrl: Avoid overflow in MB settings in bw_validate()
  x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
2024-10-20 12:04:32 -07:00
Sean Christopherson
f559b2e9c5 KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits
4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't
enforce 32-byte alignment of nCR3.

In the absolute worst case scenario, failure to ignore bits 4:0 can result
in an out-of-bounds read, e.g. if the target page is at the end of a
memslot, and the VMM isn't using guard pages.

Per the APM:

  The CR3 register points to the base address of the page-directory-pointer
  table. The page-directory-pointer table is aligned on a 32-byte boundary,
  with the low 5 address bits 4:0 assumed to be 0.

And the SDM's much more explicit:

  4:0    Ignored

Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow
that is broken.

Fixes: e4e517b4be ("KVM: MMU: Do not unconditionally read PDPTE from guest memory")
Reported-by: Kirk Swidowski <swidowski@google.com>
Cc: Andy Nguyen <theflow@google.com>
Cc: 3pvd <3pvd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241009140838.1036226-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:31:06 -04:00
Maxim Levitsky
731285fbb6 KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
Reset the segment cache after segment initialization in vmx_vcpu_reset()
to harden KVM against caching stale/uninitialized data.  Without the
recent fix to bypass the cache in kvm_arch_vcpu_put(), the following
scenario is possible:

 - vCPU is just created, and the vCPU thread is preempted before
   SS.AR_BYTES is written in vmx_vcpu_reset().

 - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
   vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.

 - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
   invoke vmx_segment_cache_clear() to invalidate the cache.

As a result, KVM retains a stale value in the cache, which can be read,
e.g. via KVM_GET_SREGS.  Usually this is not a problem because the VMX
segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
selftests) reads and writes system registers just after the vCPU was
created, _without_ modifying SS.AR_BYTES, userspace will write back the
stale '0' value and ultimately will trigger a VM-Entry failure due to
incorrect SS segment type.

Invalidating the cache after writing the VMCS doesn't address the general
issue of cache accesses from IRQ context being unsafe, but it does prevent
KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a
bug that results in an unsafe cache access.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Fixes: 2fb92db1ec ("KVM: VMX: Cache vmcs segment fields")
[sean: rework changelog to account for previous patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241009175002.1118178-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:31:06 -04:00
Sean Christopherson
28cf497881 KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either
mmu_invalidate_in_progress is elevated, or that the range is being zapped
due to memslot removal (loosely detected by slots_lock being held).
Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe
as KVM's page fault path snapshots state before acquiring mmu_lock, and
thus can create SPTEs with stale information if vCPUs aren't forced to
retry faults (due to seeing an in-progress or past MMU invalidation).

Memslot removal is a special case, as the memslot is retrieved outside of
mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and
instead relies on SRCU synchronization to ensure any in-flight page faults
are fully resolved before zapping SPTEs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241009192345.1148353-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:31:05 -04:00
Sean Christopherson
58a20a9435 KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
When performing a targeted zap on memslot removal, zap only MMU pages that
shadow guest PTEs, as zapping all SPs that "match" the gfn is inexact and
unnecessary.  Furthermore, for_each_gfn_valid_sp() arguably shouldn't
exist, because it doesn't do what most people would it expect it to do.
The "round gfn for level" adjustment that is done for direct SPs (no gPTE)
means that the exact gfn comparison will not get a match, even when a SP
does "cover" a gfn, or was even created specifically for a gfn.

For memslot deletion specifically, KVM's behavior will vary significantly
based on the size and alignment of a memslot, and in weird ways.  E.g. for
a 4KiB memslot, KVM will zap more SPs if the slot is 1GiB aligned than if
it's only 4KiB aligned.  And as described below, zapping SPs in the
aligned case overzaps for direct MMUs, as odds are good the upper-level
SPs are serving other memslots.

To iterate over all potentially-relevant gfns, KVM would need to make a
pass over the hash table for each level, with the gfn used for lookup
rounded for said level.  And then check that the SP is of the correct
level, too, e.g. to avoid over-zapping.

But even then, KVM would massively overzap, as processing every level is
all but guaranteed to zap SPs that serve other memslots, especially if the
memslot being removed is relatively small.  KVM could mitigate that issue
by processing only levels that can be possible guest huge pages, i.e. are
less likely to be re-used for other memslot, but while somewhat logical,
that's quite arbitrary and would be a bit of a mess to implement.

So, zap only SPs with gPTEs, as the resulting behavior is easy to describe,
is predictable, and is explicitly minimal, i.e. KVM only zaps SPs that
absolutely must be zapped.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241009192345.1148353-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:08:17 -04:00
Kirill A. Shutemov
8e690b817e x86/kvm: Override default caching mode for SEV-SNP and TDX
AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
CR0.CD clear).

This results in guests using uncached mappings where it shouldn't and
pmd/pud_set_huge() failures due to non-uniform memory type reported by
mtrr_type_lookup().

Override MTRR state, making it WB by default as the kernel does for
Hyper-V guests.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Binbin Wu <binbin.wu@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:07:02 -04:00
Linus Torvalds
ade8ff3b6a - AMD Zen CPUs before gen 4 do not flush the RAS (Return Address Stack)
as part of IBPB. Make sure that happens by doing the flushing in
   software on those generations
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmcI+6UACgkQEsHwGGHe
 VUrYVw/+JmJHckfzI1jqc+FIGkG124X5l5ml3nUwExyL5anZk3KY6QEhvEjk8xgt
 5pLYaHd76W21DWsP5AOXwsLAOBptsN8E7zwmG4Wg4H9EOkcQYDujlm0a8ne+zmqk
 NQh/y7NzACruYJoDzo0S89Gcz2IUZ3C5HTKp9GUor4cUOw1wZsEm5RHAkl+SlK9j
 amGq4ABO6xE6UjnrZMDW1uo253nCTZjH9DZvwzzLXULaAQjTvn6lowSPCJWZezNh
 ue2Tdl/GYo6qbHyd7OYK4N4IxWNJujHLlcIXJ/mU3EPVKBh98f3SZakvoXMuWkBL
 KS5xxHf86Un+8UM59ZYIK8263O8CmlgmOosk+wPV2DZfnomG/dxoYvaZ7x41X2I+
 xdGMiHBP3SaQmqIxdvVCbtBIoLLd5MQ/JtAcDuLM4pbXBgLTxSfF7fDb4OtpuCwe
 QybeQ33QNCAn63DT+3bbWKxQpzC9vpu2+t48XV9a/rgQpsnMBodFP6RSxXxBsH4I
 zRDMoeyfn1mTiGRbbuhwNq52M01L1G8bkJ5sX0m/PB5XGhkg46998i93W8yAKKHY
 5sF1sP53idK94CNcA0fs/Z8ZoKmkszoh8GDAn3Pb+eP+m9f7EDL9AVanBzerJhfJ
 i69EfM9r0ESIkmmmjaycn2EVzwQ5Vtv1r/LgDI4up2bQzbPS6So=
 =ovJa
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 IBPB fixes from Borislav Petkov:
 "This fixes the IBPB implementation of older AMDs (< gen4) that do not
  flush the RSB (Return Address Stack) so you can still do some leaking
  when using a "=ibpb" mitigation for Retbleed or SRSO. Fix it by doing
  the flushing in software on those generations.

  IBPB is not the default setting so this is not likely to affect
  anybody in practice"

* tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
  x86/bugs: Skip RSB fill at VMEXIT
  x86/entry: Have entry_ibpb() invalidate return predictions
  x86/cpufeatures: Add a IBPB_NO_RET BUG flag
  x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
2024-10-17 19:12:38 -07:00
Zhang Rui
ffd95846c6 x86/apic: Always explicitly disarm TSC-deadline timer
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag and it is not in this low power mode, it is likely
to get quite toasty while it quickly sucks the battery dry.

The problem boils down to some CPUs' inability to power down until the
CPU recognizes that the local APIC timer is shut down. The current
kernel code works in one-shot and periodic modes but does not work for
deadline mode. Deadline mode has been the supported and preferred mode
on Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.

Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.

Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.

1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revision-guides/41322_10h_Rev_Gd.pdf

Fixes: 279f146143 ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241015061522.25288-1-rui.zhang%40intel.com
2024-10-15 05:45:18 -07:00
Linus Torvalds
d947d6848a xen: branch for v6.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZwkZXAAKCRCAXGG7T9hj
 vntCAP9pcmYoVLIUtnOhe3HN1nj8Y+QTBmCP0s63sCgifkZMjAD+KmkuE7pkGQ70
 j/DPQzmvRoTQfEoByAWI612PUKifBw4=
 =bSZK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "A fix for topology information of Xen PV guests"

* tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE
2024-10-11 14:34:18 -07:00
John Allen
ee4d4e8d2c x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
Commit

  f69759be25 ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")

causes a bit in the DE_CFG MSR to get set erroneously after a microcode late
load.

The microcode late load path calls into amd_check_microcode() and subsequently
zen2_zenbleed_check(). Since the above commit removes the cpu_has_amd_erratum()
call from zen2_zenbleed_check(), this will cause all non-Zen2 CPUs to go
through the function and set the bit in the DE_CFG MSR.

Call into the Zenbleed fix path on Zen2 CPUs only.

  [ bp: Massage commit message, use cpu_feature_enabled(). ]

Fixes: f69759be25 ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240923164404.27227-1-john.allen@amd.com
2024-10-11 21:26:45 +02:00
Johannes Wikner
c62fa117c3 x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
Since X86_FEATURE_ENTRY_IBPB will invalidate all harmful predictions
with IBPB, no software-based untraining of returns is needed anymore.
Currently, this change affects retbleed and SRSO mitigations so if
either of the mitigations is doing IBPB and the other one does the
software sequence, the latter is not needed anymore.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
2024-10-10 10:38:21 +02:00
Johannes Wikner
0fad287864 x86/bugs: Skip RSB fill at VMEXIT
entry_ibpb() is designed to follow Intel's IBPB specification regardless
of CPU. This includes invalidating RSB entries.

Hence, if IBPB on VMEXIT has been selected, entry_ibpb() as part of the
RET untraining in the VMEXIT path will take care of all BTB and RSB
clearing so there's no need to explicitly fill the RSB anymore.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
2024-10-10 10:35:53 +02:00
Johannes Wikner
50e4b3b940 x86/entry: Have entry_ibpb() invalidate return predictions
entry_ibpb() should invalidate all indirect predictions, including return
target predictions. Not all IBPB implementations do this, in which case the
fallback is RSB filling.

Prevent SRSO-style hijacks of return predictions following IBPB, as the return
target predictor can be corrupted before the IBPB completes.

  [ bp: Massage. ]

Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
2024-10-10 10:35:27 +02:00
Johannes Wikner
3ea87dfa31 x86/cpufeatures: Add a IBPB_NO_RET BUG flag
Set this flag if the CPU has an IBPB implementation that does not
invalidate return target predictions. Zen generations < 4 do not flush
the RSB when executing an IBPB and this bug flag denotes that.

  [ bp: Massage. ]

Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
2024-10-10 10:34:29 +02:00
Jim Mattson
ff898623af x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
AMD's initial implementation of IBPB did not clear the return address
predictor. Beginning with Zen4, AMD's IBPB *does* clear the return address
predictor. This behavior is enumerated by CPUID.80000008H:EBX.IBPB_RET[30].

Define X86_FEATURE_AMD_IBPB_RET for use in KVM_GET_SUPPORTED_CPUID,
when determining cross-vendor capabilities.

Suggested-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
2024-10-10 10:34:14 +02:00
Pawan Gupta
e4d2102018 x86/bugs: Use code segment selector for VERW operand
Robert Gill reported below #GP in 32-bit mode when dosemu software was
executing vm86() system call:

  general protection fault: 0000 [#1] PREEMPT SMP
  CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
  Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
  EIP: restore_all_switch_stack+0xbe/0xcf
  EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
  ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
  DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
  CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
  Call Trace:
   show_regs+0x70/0x78
   die_addr+0x29/0x70
   exc_general_protection+0x13c/0x348
   exc_bounds+0x98/0x98
   handle_exception+0x14d/0x14d
   exc_bounds+0x98/0x98
   restore_all_switch_stack+0xbe/0xcf
   exc_bounds+0x98/0x98
   restore_all_switch_stack+0xbe/0xcf

This only happens in 32-bit mode when VERW based mitigations like MDS/RFDS
are enabled. This is because segment registers with an arbitrary user value
can result in #GP when executing VERW. Intel SDM vol. 2C documents the
following behavior for VERW instruction:

  #GP(0) - If a memory operand effective address is outside the CS, DS, ES,
	   FS, or GS segment limit.

CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
space. Use %cs selector to reference VERW operand. This ensures VERW will
not #GP for an arbitrary user %ds.

[ mingo: Fixed the SOB chain. ]

Fixes: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
Reported-by: Robert Gill <rtgill82@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com
Cc: stable@vger.kernel.org # 5.10+
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-10-09 09:42:30 +02:00
Pawan Gupta
48a2440d0f x86/entry_32: Clear CPU buffers after register restore in NMI return
CPU buffers are currently cleared after call to exc_nmi, but before
register state is restored. This may be okay for MDS mitigation but not for
RDFS. Because RDFS mitigation requires CPU buffers to be cleared when
registers don't have any sensitive data.

Move CLEAR_CPU_BUFFERS after RESTORE_ALL_NMI.

Fixes: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240925-fix-dosemu-vm86-v7-2-1de0daca2d42%40linux.intel.com
2024-10-08 15:16:28 -07:00
Pawan Gupta
2e2e5143d4 x86/entry_32: Do not clobber user EFLAGS.ZF
Opportunistic SYSEXIT executes VERW to clear CPU buffers after user EFLAGS
are restored. This can clobber user EFLAGS.ZF.

Move CLEAR_CPU_BUFFERS before the user EFLAGS are restored. This ensures
that the user EFLAGS.ZF is not clobbered.

Closes: https://lore.kernel.org/lkml/yVXwe8gvgmPADpRB6lXlicS2fcHoV5OHHxyuFbB_MEleRPD7-KhGe5VtORejtPe-KCkT8Uhcg5d7-IBw4Ojb4H7z5LQxoZylSmJ8KNL3A8o=@protonmail.com/
Fixes: a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
Reported-by: Jari Ruusu <jariruusu@protonmail.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240925-fix-dosemu-vm86-v7-1-1de0daca2d42%40linux.intel.com
2024-10-08 15:16:28 -07:00
Nathan Chancellor
d5fd042bf4 x86/resctrl: Annotate get_mem_config() functions as __init
After a recent LLVM change [1] that deduces __cold on functions that only call
cold code (such as __init functions), there is a section mismatch warning from
__get_mem_config_intel(), which got moved to .text.unlikely. as a result of
that optimization:

  WARNING: modpost: vmlinux: section mismatch in reference: \
  __get_mem_config_intel+0x77 (section: .text.unlikely.) -> thread_throttle_mode_init (section: .init.text)

Mark __get_mem_config_intel() as __init as well since it is only called
from __init code, which clears up the warning.

While __rdt_get_mem_config_amd() does not exhibit a warning because it
does not call any __init code, it is a similar function that is only
called from __init code like __get_mem_config_intel(), so mark it __init
as well to keep the code symmetrical.

CONFIG_SECTION_MISMATCH_WARN_ONLY=n would turn this into a fatal error.

Fixes: 05b93417ce ("x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)")
Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: <stable@kernel.org>
Link: 6b11573b8c [1]
Link: https://lore.kernel.org/r/20240917-x86-restctrl-get_mem_config_intel-init-v3-1-10d521256284@kernel.org
2024-10-08 21:05:10 +02:00
Juergen Gross
bf56c41016 x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE
Recent topology checks of the x86 boot code uncovered the need for
PV guests to have the boot cpu marked in the APICBASE MSR.

Fixes: 9d22c96316 ("x86/topology: Handle bogus ACPI tables correctly")
Reported-by: Niels Dettenbach <nd@syndicat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-08 16:18:57 +02:00
Martin Kletzander
2b5648416e x86/resctrl: Avoid overflow in MB settings in bw_validate()
The resctrl schemata file supports specifying memory bandwidth associated with
the Memory Bandwidth Allocation (MBA) feature via a percentage (this is the
default) or bandwidth in MiBps (when resctrl is mounted with the "mba_MBps"
option).

The allowed range for the bandwidth percentage is from
/sys/fs/resctrl/info/MB/min_bandwidth to 100, using a granularity of
/sys/fs/resctrl/info/MB/bandwidth_gran. The supported range for the MiBps
bandwidth is 0 to U32_MAX.

There are two issues with parsing of MiBps memory bandwidth:

* The user provided MiBps is mistakenly rounded up to the granularity
  that is unique to percentage input.

* The user provided MiBps is parsed using unsigned long (thus accepting
  values up to ULONG_MAX), and then assigned to u32 that could result in
  overflow.

Do not round up the MiBps value and parse user provided bandwidth as the u32
it is intended to be. Use the appropriate kstrtou32() that can detect out of
range values.

Fixes: 8205a078ba ("x86/intel_rdt/mba_sc: Add schemata support")
Fixes: 6ce1560d35 ("x86/resctrl: Switch over to the resctrl mbps_val list")
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Martin Kletzander <nert.pinx@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
2024-10-08 16:17:38 +02:00
Richard Gong
f8bc84b609 x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
Add new PCI ID for Device 18h and Function 4.

Signed-off-by: Richard Gong <richard.gong@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20240913162903.649519-1-richard.gong@amd.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-10-07 21:04:28 +02:00
Linus Torvalds
4563243ede ARM64:
* Fix pKVM error path on init, making sure we do not change critical
   system registers as we're about to fail
 
 * Make sure that the host's vector length is at capped by a value
   common to all CPUs
 
 * Fix kvm_has_feat*() handling of "negative" features, as the current
   code is pretty broken
 
 * Promote Joey to the status of official reviewer, while James steps
   down -- hopefully only temporarly
 
 x86:
 
 * Fix compilation with KVM_INTEL=KVM_AMD=n
 
 * Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use
 
 Selftests:
 
 * Fix compilation on non-x86 architectures
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmcCRMgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMNIgf/T80+VxFy7eP1yTkZy9nd3UjSsAeT
 fWvYMyN2isOTWTVbl3ckjMZc4i7L/nOngxfkLzI3OfFUO8TI8cw11hNFn85m+WKM
 95DVgEaqz1kuJg25VjSj9AySvPFDNec8bV37C2vk2jF4YsGo6qBugSSjktZUgGiW
 ozsdV39lcVcLf+x8/52Vc2eb736nrrYg8QaFP0tEQs9MHuYob/XBw3Zx42dJoZYl
 tCjGP5oW7EvUdRD48GkgXP9DWA12QmDxNOHEmUdxWamsK88YQXFyWwb7uwV5x+hd
 mO3bJaYInkJsh3D2e5QARswQb+D5HMVYFwvEkxQF/wvmcMosRVz4vv65Sw==
 =P4uw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix pKVM error path on init, making sure we do not change critical
     system registers as we're about to fail

   - Make sure that the host's vector length is at capped by a value
     common to all CPUs

   - Fix kvm_has_feat*() handling of "negative" features, as the current
     code is pretty broken

   - Promote Joey to the status of official reviewer, while James steps
     down -- hopefully only temporarly

  x86:

   - Fix compilation with KVM_INTEL=KVM_AMD=n

   - Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use

  Selftests:

   - Fix compilation on non-x86 architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86/reboot: emergency callbacks are now registered by common KVM code
  KVM: x86: leave kvm.ko out of the build if no vendor module is requested
  KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU
  KVM: arm64: Fix kvm_has_feat*() handling of negative features
  KVM: selftests: Fix build on architectures other than x86_64
  KVM: arm64: Another reviewer reshuffle
  KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM
  KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
2024-10-06 10:53:28 -07:00
Paolo Bonzini
c8d430db8e KVM/arm64 fixes for 6.12, take #1
- Fix pKVM error path on init, making sure we do not change critical
   system registers as we're about to fail
 
 - Make sure that the host's vector length is at capped by a value
   common to all CPUs
 
 - Fix kvm_has_feat*() handling of "negative" features, as the current
   code is pretty broken
 
 - Promote Joey to the status of official reviewer, while James steps
   down -- hopefully only temporarly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmb++hkACgkQI9DQutE9
 ekNDyQ/9GwamcXC4KfYFtfQrcNRl/6RtlF/PFC0R6iiD1OoqNFHv2D/zscxtOj5a
 nw3gbof1Y59eND/6dubDzk82/A1Ff6bXpygybSQ6LG6Jba7H+01XxvvB0SMTLJ1S
 7hREe6m1EBHG/4VJk2Mx8iHJ7OjgZiTivojjZ1tY2Ez3nSUecL8prjqBFft3lAhg
 rFb20iJiijoZDgEjFZq/gWDxPq5m3N51tushqPRIMJ6wt8TeLYx3uUd2DTO0MzG/
 1K2vGbc1O6010jiR+PO3szi7uJFZfb58IsKCx7/w2e9AbzpYx4BXHKCax00DlGAP
 0PiuEMqG82UXR5a58UQrLC2aonh5VNj7J1Lk3qLb0NCimu6PdYWyIGNsKzAF/f4s
 tRVTRqcPr0RN/IIoX9vFjK3CKF9FcwAtctoO7IbxLKp+OGbPXk7Fk/gmhXKRubPR
 +4L4DCcARTcBflnWDzdLaz02fr13UfhM80mekJXlS1YHlSArCfbrsvjNrh4iL+G0
 UDamq8+8ereN0kT+ZM2jw3iw+DaF2kg24OEEfEQcBHZTS9HqBNVPplqqNSWRkjTl
 WSB79q1G6iOYzMUQdULP4vFRv1OePgJzg/voqMRZ6fUSuNgkpyXT0fLf5X12weq9
 NBnJ09Eh5bWfRIpdMzI1E1Qjfsm7E6hEa79DOnHmiLgSdVk3M9o=
 =Rtrz
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not change critical
  system registers as we're about to fail

- Make sure that the host's vector length is at capped by a value
  common to all CPUs

- Fix kvm_has_feat*() handling of "negative" features, as the current
  code is pretty broken

- Promote Joey to the status of official reviewer, while James steps
  down -- hopefully only temporarly
2024-10-06 03:59:22 -04:00
Paolo Bonzini
2a5fe5a016 x86/reboot: emergency callbacks are now registered by common KVM code
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d8 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222 ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-06 03:55:37 -04:00
Paolo Bonzini
ea4290d77b KVM: x86: leave kvm.ko out of the build if no vendor module is requested
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
It provides no functionality on its own and it is unnecessary unless one
of the vendor-specific module is compiled.  In particular, /dev/kvm is
not created until one of kvm-intel.ko or kvm-amd.ko is loaded.

Use CONFIG_KVM to decide if it is built-in or a module, but use the
vendor-specific modules for the actual decision on whether to build it.

This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
are both disabled.  The cpu_emergency_register_virt_callback() function
is called from kvm.ko, but it is only defined if at least one of
CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.

Fixes: 590b09b1d8 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-06 03:53:41 -04:00
Linus Torvalds
622a3ed1ac Various fixes for tracing:
- Fix tp_printk command line option crashing the kernel
 
   With the code that can handle a buffer from a previous boot, the
   trace_check_vprintf() needed access to the delta of the address
   space used by the old buffer and the current buffer. To do so,
   the trace_array (tr) parameter was used. But when tp_printk is
   enabled on the kernel command line, no trace buffer is used and
   the trace event is sent directly to printk(). That meant the tr
   field of the iterator descriptor was NULL, and since tp_printk still
   uses trace_check_vprintf() it caused a NULL dereference.
 
 - Add ptrace.h include to x86 ftrace file for completeness
 
 - Fix rtla installation when done with out-of-tree build
 
 - Fix the help messages in rtla that were incorrect
 
 - Several fixes to fix races with the timerlat and hwlat code
 
   Several locking issues were discovered with the coordination
   between timerlat kthread creation and hotplug. As timerlat has
   callbacks from hotplug code to start kthreads when CPUs come online.
   There are also locking issues with grabbing the cpu_read_lock()
   and the locks within timerlat.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZwAUghQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qmMCAP9Z+0sN/Tx+F+5LmrXV1R9UxRPykmpm
 4NeZYEp+hAQSegD+MdHsEJLIDfmsZnGOBivmzfepuv35GMLrqQMIhhWQOA0=
 =/OQq
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix tp_printk command line option crashing the kernel

   With the code that can handle a buffer from a previous boot, the
   trace_check_vprintf() needed access to the delta of the address space
   used by the old buffer and the current buffer. To do so, the
   trace_array (tr) parameter was used. But when tp_printk is enabled on
   the kernel command line, no trace buffer is used and the trace event
   is sent directly to printk(). That meant the tr field of the iterator
   descriptor was NULL, and since tp_printk still uses
   trace_check_vprintf() it caused a NULL dereference.

 - Add ptrace.h include to x86 ftrace file for completeness

 - Fix rtla installation when done with out-of-tree build

 - Fix the help messages in rtla that were incorrect

 - Several fixes to fix races with the timerlat and hwlat code

   Several locking issues were discovered with the coordination between
   timerlat kthread creation and hotplug. As timerlat has callbacks from
   hotplug code to start kthreads when CPUs come online. There are also
   locking issues with grabbing the cpu_read_lock() and the locks within
   timerlat.

* tag 'trace-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/hwlat: Fix a race during cpuhp processing
  tracing/timerlat: Fix a race during cpuhp processing
  tracing/timerlat: Drop interface_lock in stop_kthread()
  tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline
  x86/ftrace: Include <asm/ptrace.h>
  rtla: Fix the help text in osnoise and timerlat top tools
  tools/rtla: Fix installation from out-of-tree build
  tracing: Fix trace_check_vprintf() when tp_printk is used
2024-10-04 12:11:06 -07:00
Paolo Bonzini
fcd1ec9cb5 KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU
As was tried in commit 4e103134b8 ("KVM: x86/mmu: Zap only the relevant
pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs,
need to be zapped.  All of the accounting for a shadow page is tied to the
memslot, i.e. the shadow page holds a reference to the memslot, for all
intents and purposes.  Deleting the memslot without removing all relevant
shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled,
results in NULL pointer derefs when tearing down the VM.

Reintroduce from that commit the code that walks the whole memslot when
there are active shadow MMU pages.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-03 18:51:13 -04:00
Sami Tolvanen
ad686707ea x86/ftrace: Include <asm/ptrace.h>
<asm/ftrace.h> uses struct pt_regs in several places. Include
<asm/ptrace.h> to ensure it's visible. This is needed to make sure
object files that only include <asm/asm-prototypes.h> compile.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/20240916221557.846853-2-samitolvanen@google.com
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03 16:43:22 -04:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Linus Torvalds
3f749befb0 x86: kvm: fix build error
The cpu_emergency_register_virt_callback() function is used
unconditionally by the x86 kvm code, but it is declared (and defined)
conditionally:

  #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
  void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
  ...

leading to a build error when neither KVM_INTEL nor KVM_AMD support is
enabled:

  arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’:
  arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration]
  12517 |         cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’:
  arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration]
  12522 |         cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix the build by defining empty helper functions the same way the old
cpu_emergency_disable_virtualization() function was dealt with for the
same situation.

Maybe we could instead have made the call sites conditional, since the
callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak
fallback.  I'll leave that to the kvm people to argue about, this at
least gets the build going for that particular config.

Fixes: 590b09b1d8 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Chao Gao <chao.gao@intel.com>
Cc: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-29 14:47:33 -07:00
Linus Torvalds
d37421e655 Fix TDX MMIO #VE fault handling, and add two new Intel model numbers
for "Pantherlake" and "Diamond Rapids".
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmb4/iURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gawg//dj6dX4ft7pV2OICGg9oqIsqoFRZfppAW
 i9SvqsBWRXcj8QS3pd4U6vcQgOexolinJbNEGxaQUuOOUS3FJ/un1frnlpK8bGGp
 JP8jY3QK2QlVg8Gb5lGPzO2PSmSaUBDpU0aFI36DTA+p07Fv9qiaiByOxfoSn8WL
 YwKVvacPp2j2SAVi92hcgQAiXc4jsZtg3Jbi2yN2MrMDUhEvF+CP/g5QHf1VStdY
 jR1TCkDMDB/o0zWn5CpMkcBQIdPe3izYPTr7peX6LkRYdxNSM7wynApcOdFLo8/z
 HjMOIyL6F+lEtznlH01cscNyKd7VLKRRG1NAOj9Rx3l0F3jFYsAvTPdb2SPfxstN
 pLn8ierFN/+y9kNZrigdB/6r7zJAV5RJ4oyy/O41dT0NozbirYyah5eqCj3UqglE
 k9Mwj+gNpGH04OBv6Qh+J6yLVlojrP5AXfQsC2RbiTrUjH4D39xnfbcuuR5ONXfQ
 61yeBSe0FoK+E4B+gbH4KBi1zmwG+07lNchLC1F0+sy8x104OBYl6YSUcORyBnny
 adyFRDXMQ2qh1Ab929DhkPwULcP6wulryKuKmXOep00iGv8VJy3O3vWhTLsAcTmn
 dhcRToeZ95sUfjShdJJwkNNvB+PN3k5rR1S5MYwCHnSdKAgdCou7OsxpdLETBk4m
 Mwim6c3sQW4=
 =XM4E
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Fix TDX MMIO #VE fault handling, and add two new Intel model numbers
  for 'Pantherlake' and 'Diamond Rapids'"

* tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add two Intel CPU model numbers
  x86/tdx: Fix "in-kernel MMIO" check
2024-09-29 09:10:00 -07:00
Linus Torvalds
ec03de73b1 Locking changes for v6.12:
- lockdep:
     - Fix potential deadlock between lockdep and RCU (Zhiguo Niu)
     - Use str_plural() to address Coccinelle warning (Thorsten Blum)
     - Add debuggability enhancement (Luis Claudio R. Goncalves)
 
  - static keys & calls:
     - Fix static_key_slow_dec() yet again (Peter Zijlstra)
     - Handle module init failure correctly in static_call_del_module() (Thomas Gleixner)
     - Replace pointless WARN_ON() in static_call_module_notify() (Thomas Gleixner)
 
  - <linux/cleanup.h>:
     - Add usage and style documentation (Dan Williams)
 
  - rwsems:
     - Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS (Waiman Long)
 
  - atomic ops, x86:
     - Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak)
     - Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros Bizjak)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmb4/IIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ix5g/9HbnP8nR6y2JQ+K4FPj911jcNdPg66rb3
 MTERO+9kLiODFi1N6L/s7w+l4umXlmDSwxd7GaMLIQuaxgQD/lGpw0P5MoZhKfwj
 2AyJWjc9KiW/NwdOLWrJXESrsPSQSHan9EpEV8FV6X8QaflBvYCjcWmUXmu0lW+I
 r+pqHuQFrISL5eBZDd38PGHWNB4UB9YoY5GojUmoDHgJQiyn2oJVopva11RsLneR
 64m4slWRnOG/IjY6AlUlcFK4s7b8g5v1p0NHuJQNTFnzxsKp/QmFnP49dUC2fiZd
 FuMbGv+nPA7rRI1eZ/pCTk0h2CTT1RotQt78WJmL/R6jrQRIxkeFSTiKC2sZ5smp
 +CWiGUiKxy426qBO9Wzien2BXq5RTL8dLuX31ioflhXPEvTfWFHX3yw73sbhQZGW
 QbXztV9xz/B70TneGVPCHHsFDGwrT+EnC8tQbWw+Mv4OxfUknoMEVD9eIye6jXbV
 lJkx8Y8y7AQewQ2uAVOKn6xSXhsAnMGS/BQ1KWITO5rdLhNInkqKfYssaoVroXhA
 2qBtNBoPULWz+pvB8d8J/kljK4o3jPVTZYESpW3cLQ76NolTuXpe9i3zkNHGBj0A
 tZE9ZAumJIXGj0lhnoiOB9ezgqKUIK+LQ1yxrCVUpjZ2rd4ZT1BlQj/Nvoc1witS
 6iq+S/FCSbY=
 =LbkS
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "lockdep:
    - Fix potential deadlock between lockdep and RCU (Zhiguo Niu)
    - Use str_plural() to address Coccinelle warning (Thorsten Blum)
    - Add debuggability enhancement (Luis Claudio R. Goncalves)

  static keys & calls:
    - Fix static_key_slow_dec() yet again (Peter Zijlstra)
    - Handle module init failure correctly in static_call_del_module()
      (Thomas Gleixner)
    - Replace pointless WARN_ON() in static_call_module_notify() (Thomas
      Gleixner)

  <linux/cleanup.h>:
    - Add usage and style documentation (Dan Williams)

  rwsems:
    - Move is_rwsem_reader_owned() and rwsem_owner() under
      CONFIG_DEBUG_RWSEMS (Waiman Long)

  atomic ops, x86:
    - Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak)
    - Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros
      Bizjak)"

Signed-off-by: Ingo Molnar <mingo@kernel.org>

* tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS
  jump_label: Fix static_key_slow_dec() yet again
  static_call: Replace pointless WARN_ON() in static_call_module_notify()
  static_call: Handle module init failure correctly in static_call_del_module()
  locking/lockdep: Simplify character output in seq_line()
  lockdep: fix deadlock issue between lockdep and rcu
  lockdep: Use str_plural() to fix Coccinelle warning
  cleanup: Add usage and style documentation
  lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug
  locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void
  locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
2024-09-29 08:51:30 -07:00
Ingo Molnar
ae39e0bd15 Merge branch 'locking/core' into locking/urgent, to pick up pending commits
Merge all pending locking commits into a single branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-09-29 08:57:18 +02:00
Linus Torvalds
3efc57369a x86:
* KVM currently invalidates the entirety of the page tables, not just
   those for the memslot being touched, when a memslot is moved or deleted.
   The former does not have particularly noticeable overhead, but Intel's
   TDX will require the guest to re-accept private pages if they are
   dropped from the secure EPT, which is a non starter.  Actually,
   the only reason why this is not already being done is a bug which
   was never fully investigated and caused VM instability with assigned
   GeForce GPUs, so allow userspace to opt into the new behavior.
 
 * Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
   functionality that is on the horizon).
 
 * Rework common MSR handling code to suppress errors on userspace accesses to
   unsupported-but-advertised MSRs.  This will allow removing (almost?) all of
   KVM's exemptions for userspace access to MSRs that shouldn't exist based on
   the vCPU model (the actual cleanup is non-trivial future work).
 
 * Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
   64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
   stores the entire 64-bit value at the ICR offset.
 
 * Fix a bug where KVM would fail to exit to userspace if one was triggered by
   a fastpath exit handler.
 
 * Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
   there's already a pending wake event at the time of the exit.
 
 * Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest
   state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN
   (the SHUTDOWN hits the VM altogether, not the nested guest)
 
 * Overhaul the "unprotect and retry" logic to more precisely identify cases
   where retrying is actually helpful, and to harden all retry paths against
   putting the guest into an infinite retry loop.
 
 * Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
   the shadow MMU.
 
 * Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
   adding multi generation LRU support in KVM.
 
 * Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
   i.e. when the CPU has already flushed the RSB.
 
 * Trace the per-CPU host save area as a VMCB pointer to improve readability
   and cleanup the retrieval of the SEV-ES host save area.
 
 * Remove unnecessary accounting of temporary nested VMCB related allocations.
 
 * Set FINAL/PAGE in the page fault error code for EPT violations if and only
   if the GVA is valid.  If the GVA is NOT valid, there is no guest-side page
   table walk and so stuffing paging related metadata is nonsensical.
 
 * Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
   emulating posted interrupt delivery to L2.
 
 * Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.
 
 * Harden eVMCS loading against an impossible NULL pointer deref (really truly
   should be impossible).
 
 * Minor SGX fix and a cleanup.
 
 * Misc cleanups
 
 Generic:
 
 * Register KVM's cpuhp and syscore callbacks when enabling virtualization in
   hardware, as the sole purpose of said callbacks is to disable and re-enable
   virtualization as needed.
 
 * Enable virtualization when KVM is loaded, not right before the first VM
   is created.  Together with the previous change, this simplifies a
   lot the logic of the callbacks, because their very existence implies
   virtualization is enabled.
 
 * Fix a bug that results in KVM prematurely exiting to userspace for coalesced
   MMIO/PIO in many cases, clean up the related code, and add a testcase.
 
 * Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
   the gpa+len crosses a page boundary, which thankfully is guaranteed to not
   happen in the current code base.  Add WARNs in more helpers that read/write
   guest memory to detect similar bugs.
 
 Selftests:
 
 * Fix a goof that caused some Hyper-V tests to be skipped when run on bare
   metal, i.e. NOT in a VM.
 
 * Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
 
 * Explicitly include one-off assets in .gitignore.  Past Sean was completely
   wrong about not being able to detect missing .gitignore entries.
 
 * Verify userspace single-stepping works when KVM happens to handle a VM-Exit
   in its fastpath.
 
 * Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb201AUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOM1gf+Ij7dpCh0KwoNYlHfW2aCHAv3PqQd
 cKMDSGxoCernbJEyPO/3qXNUK+p4zKedk3d92snW3mKa+cwxMdfthJ3i9d7uoNiw
 7hAgcfKNHDZGqAQXhx8QcVF3wgp+diXSyirR+h1IKrGtCCmjMdNC8ftSYe6voEkw
 VTVbLL+tER5H0Xo5UKaXbnXKDbQvWLXkdIqM8dtLGFGLQ2PnF/DdMP0p6HYrKf1w
 B7LBu0rvqYDL8/pS82mtR3brHJXxAr9m72fOezRLEUbfUdzkTUi/b1vEe6nDCl0Q
 i/PuFlARDLWuetlR0VVWKNbop/C/l4EmwCcKzFHa+gfNH3L9361Oz+NzBw==
 =Q7kz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
 "x86:

   - KVM currently invalidates the entirety of the page tables, not just
     those for the memslot being touched, when a memslot is moved or
     deleted.

     This does not traditionally have particularly noticeable overhead,
     but Intel's TDX will require the guest to re-accept private pages
     if they are dropped from the secure EPT, which is a non starter.

     Actually, the only reason why this is not already being done is a
     bug which was never fully investigated and caused VM instability
     with assigned GeForce GPUs, so allow userspace to opt into the new
     behavior.

   - Advertise AVX10.1 to userspace (effectively prep work for the
     "real" AVX10 functionality that is on the horizon)

   - Rework common MSR handling code to suppress errors on userspace
     accesses to unsupported-but-advertised MSRs

     This will allow removing (almost?) all of KVM's exemptions for
     userspace access to MSRs that shouldn't exist based on the vCPU
     model (the actual cleanup is non-trivial future work)

   - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
     splits the 64-bit value into the legacy ICR and ICR2 storage,
     whereas Intel (APICv) stores the entire 64-bit value at the ICR
     offset

   - Fix a bug where KVM would fail to exit to userspace if one was
     triggered by a fastpath exit handler

   - Add fastpath handling of HLT VM-Exit to expedite re-entering the
     guest when there's already a pending wake event at the time of the
     exit

   - Fix a WARN caused by RSM entering a nested guest from SMM with
     invalid guest state, by forcing the vCPU out of guest mode prior to
     signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
     nested guest)

   - Overhaul the "unprotect and retry" logic to more precisely identify
     cases where retrying is actually helpful, and to harden all retry
     paths against putting the guest into an infinite retry loop

   - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
     rmaps in the shadow MMU

   - Refactor pieces of the shadow MMU related to aging SPTEs in
     prepartion for adding multi generation LRU support in KVM

   - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
     enabled, i.e. when the CPU has already flushed the RSB

   - Trace the per-CPU host save area as a VMCB pointer to improve
     readability and cleanup the retrieval of the SEV-ES host save area

   - Remove unnecessary accounting of temporary nested VMCB related
     allocations

   - Set FINAL/PAGE in the page fault error code for EPT violations if
     and only if the GVA is valid. If the GVA is NOT valid, there is no
     guest-side page table walk and so stuffing paging related metadata
     is nonsensical

   - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
     instead of emulating posted interrupt delivery to L2

   - Add a lockdep assertion to detect unsafe accesses of vmcs12
     structures

   - Harden eVMCS loading against an impossible NULL pointer deref
     (really truly should be impossible)

   - Minor SGX fix and a cleanup

   - Misc cleanups

  Generic:

   - Register KVM's cpuhp and syscore callbacks when enabling
     virtualization in hardware, as the sole purpose of said callbacks
     is to disable and re-enable virtualization as needed

   - Enable virtualization when KVM is loaded, not right before the
     first VM is created

     Together with the previous change, this simplifies a lot the logic
     of the callbacks, because their very existence implies
     virtualization is enabled

   - Fix a bug that results in KVM prematurely exiting to userspace for
     coalesced MMIO/PIO in many cases, clean up the related code, and
     add a testcase

   - Fix a bug in kvm_clear_guest() where it would trigger a buffer
     overflow _if_ the gpa+len crosses a page boundary, which thankfully
     is guaranteed to not happen in the current code base. Add WARNs in
     more helpers that read/write guest memory to detect similar bugs

  Selftests:

   - Fix a goof that caused some Hyper-V tests to be skipped when run on
     bare metal, i.e. NOT in a VM

   - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
     guest

   - Explicitly include one-off assets in .gitignore. Past Sean was
     completely wrong about not being able to detect missing .gitignore
     entries

   - Verify userspace single-stepping works when KVM happens to handle a
     VM-Exit in its fastpath

   - Misc cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  Documentation: KVM: fix warning in "make htmldocs"
  s390: Enable KVM_S390_UCONTROL config in debug_defconfig
  selftests: kvm: s390: Add VM run test case
  KVM: SVM: let alternatives handle the cases when RSB filling is required
  KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
  KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
  KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
  KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
  KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
  KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
  KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
  KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
  KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
  KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
  KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
  KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
  KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
  KVM: x86: Update retry protection fields when forcing retry on emulation failure
  KVM: x86: Apply retry protection to "unprotect on failure" path
  KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
  ...
2024-09-28 09:20:14 -07:00
Linus Torvalds
12cc5240f4 This pull request contains the following changes for UML:
- Removal of dead code (TT mode leftovers, etc.)
 - Fixes for the network vector driver
 - Fixes for time-travel mode
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmb3Bf8WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wZLED/9xfTCB2l8gbiw+vslfXNRKdt6A
 jcqC4MfZ/dxkt5X2iy9pGROBl0dRe2WAbdSJIBIQikVDaTWg4Vix7WcnMs+FTXxk
 gGlJU8xx69mh7MWH0DopJrCHYX1SxA4bxve4J7iljTzuMleYhUUJ6bdqmG9XBRBN
 M1hYWpAodR/BPaDRZtLpzdYuuo3cZPM3ESpke5GupsFEWxqRZqvdlgwdsb9RfrOe
 3HvWZUMrw+hWJg1NkQ7+ljqPjoaafGu8/ic89r44bNtNQqN+o5b1v1E792e9qCJD
 1jRXTQMtTDH+ETgYskzWEIFyMQ4WlRy/N+mKKUHUJrTwAm76zdSpxmvU2Fh8cvLy
 ofWdbtqR127WnKii7UpZLf6kXzmC6pcmLbHU78PohoOnEjk4TeMjEKw6FrcSZY51
 wGgz29mLJOZ33mh7So37bRU/x5OKkq0u+BHyrhZYiHXdcBN8R5KBSJlWvl+A+A7y
 F5VpUvqAazc6H0HazZDWtoPDJ4HpbSEbH/8G4rR3IlZ4DmqyRuOr6f4AeiEPFz2n
 VNQVivgFL59zPflo8eWsLQvK8ZaZSop05RDYRk53uMooUZUNKvhTFmIRCb6bNFT/
 c4Ycoi3qa+YQQxSUEKmrE71dOYxdT1nHvl7YgR0BGABWYt2G1j/UyTjJMJ7L49Ws
 HGpAnI4FdPELu3qIVA==
 =DHl2
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Removal of dead code (TT mode leftovers, etc)

 - Fixes for the network vector driver

 - Fixes for time-travel mode

* tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: fix time-travel syscall scheduling hack
  um: Remove outdated asm/sysrq.h header
  um: Remove the declaration of user_thread function
  um: Remove the call to SUBARCH_EXECVE1 macro
  um: Remove unused mm_fd field from mm_id
  um: Remove unused fields from thread_struct
  um: Remove the redundant newpage check in update_pte_range
  um: Remove unused kpte_clear_flush macro
  um: Remove obsoleted declaration for execute_syscall_skas
  user_mode_linux_howto_v2: add VDE vector support in doc
  vector_user: add VDE support
  um: remove ARCH_NO_PREEMPT_DYNAMIC
  um: vector: Fix NAPI budget handling
  um: vector: Replace locks guarding queue depth with atomics
  um: remove variable stack array in os_rcv_fd_msg()
2024-09-27 12:48:48 -07:00
Linus Torvalds
653608c67a xen: branch for v6.12-rc1a
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZvZ8dgAKCRCAXGG7T9hj
 vhirAQCR1LAU+czZlqmx6jmKRPTGff1ss66vh04XbtgTjH+8PQEA8O5KvD/KnnxY
 AnrOvrx6fTLwR6iTN7ANVvPO3kGK/w0=
 =0Tol
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:
 "A second round of Xen related changes and features:

   - a small fix of the xen-pciback driver for a warning issued by
     sparse

   - support PCI passthrough when using a PVH dom0

   - enable loading the kernel in PVH mode at arbitrary addresses,
     avoiding conflicts with the memory map when running as a Xen dom0
     using the host memory layout"

* tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/pvh: Add 64bit relocation page tables
  x86/kernel: Move page table macros to header
  x86/pvh: Set phys_base when calling xen_prepare_pvh()
  x86/pvh: Make PVH entrypoint PIC for x86-64
  xen: sync elfnote.h from xen tree
  xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t
  xen/privcmd: Add new syscall to get gsi from dev
  xen/pvh: Setup gsi for passthrough device
  xen/pci: Add a function to reset device for xen
2024-09-27 09:55:30 -07:00
Al Viro
cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Linus Torvalds
348325d644 asm-generic updates for 6.12
These are only two small patches, one cleanup for arch/alpha
 and a preparation patch cleaning up the handling of runtime
 constants in the linker scripts.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmboHV0ACgkQYKtH/8kJ
 UifHfhAAqTHHxxe+HiphGBPHN0ODyLVUs7fOQHtLOSmJlQa6x1TCR/+1nL1kTDbe
 j6EcIRxZrllQZ+jZBA8z2XsAmjjBLUxCB4yu6oxYJh8OdFyqeVM/myZEr2TAyb0o
 A3D9b+rfnY8sr9XaFHSHGWbh4c33cGQhACumHVAjtPvU06Voskq4pAf9ZnpGkNBe
 AdKNTVG6+w84dKUNuzXcexP8d7SnsXNfd6T9+evtW/M+fziWzs3aPQr+GZED96E5
 8IRldXi2nzIwm9LT5IzZAt+QvpVb2Qob1+rej9p5WpptGp840CROTo61SwaYHCMV
 DDxTlmADsApWJQ3B5gDu6QS2jXT4eeOrY3JI2baeCyOV6auj15UXKiWc2QVoHOVU
 6+PzlSFuLatI6WsxXfOcD0o3bfQXMKS6zCC/4eD7Y/SmmMqBbL5+d9sU5lwkiOFl
 swoswF4HTwo5d6NdkSuJOt6KA/V8a68lBhKYBXHu2yuLi/LDNOaipEvBHQLzfnlY
 91e5DtDiHK9CYDNkwiR+bV9rQnhA535JSlfR8VtpU/SJTTjyF+dkt9JGPdivXoIA
 8Zv+DN/oyrahUtCrgzzPXahOuBrfD/WfIajsvpEK6vNPuBhscsZFg/thc70FMIXo
 qn8Dmpi/CnDWFNOy0xO0cbYWrGBGn9E7kzbSZ78tUIjPUmmEKfk=
 =OOMl
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "These are only two small patches, one cleanup for arch/alpha and a
  preparation patch cleaning up the handling of runtime constants in the
  linker scripts"

* tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  runtime constants: move list of constants to vmlinux.lds.h
  alpha: no need to include asm/xchg.h twice
2024-09-26 11:54:40 -07:00
Tony Luck
d1fb034b75 x86/cpu: Add two Intel CPU model numbers
Pantherlake is a mobile CPU. Diamond Rapids next generation Xeon.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240923173750.16874-1-tony.luck%40intel.com
2024-09-26 10:47:49 -07:00
Alexey Gladkov (Intel)
d4fc4d0147 x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.

However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.

Ensure that the target MMIO address is within the kernel before decoding
instruction.

Fixes: 31d58c4e55 ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.1726237595.git.legion%40kernel.org
2024-09-26 09:45:04 -07:00
Linus Torvalds
5701725692 Rust changes for v6.12
Toolchain and infrastructure:
 
  - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up objtool
    warnings), teach objtool about 'noreturn' Rust symbols and mimic
    '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we should be
    objtool-warning-free, so enable it to run for all Rust object files.
 
  - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support.
 
  - Support 'RUSTC_VERSION', including re-config and re-build on change.
 
  - Split helpers file into several files in a folder, to avoid conflicts
    in it. Eventually those files will be moved to the right places with
    the new build system. In addition, remove the need to manually export
    the symbols defined there, reusing existing machinery for that.
 
  - Relax restriction on configurations with Rust + GCC plugins to just
    the RANDSTRUCT plugin.
 
 'kernel' crate:
 
  - New 'list' module: doubly-linked linked list for use with reference
    counted values, which is heavily used by the upcoming Rust Binder.
    This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed
    unique for the given ID), 'AtomicTracker' (tracks whether a 'ListArc'
    exists using an atomic), 'ListLinks' (the prev/next pointers for an
    item in a linked list), 'List' (the linked list itself), 'Iter' (an
    iterator over a 'List'), 'Cursor' (a cursor into a 'List' that allows
    to remove elements), 'ListArcField' (a field exclusively owned by a
    'ListArc'), as well as support for heterogeneous lists.
 
  - New 'rbtree' module: red-black tree abstractions used by the upcoming
    Rust Binder. This includes 'RBTree' (the red-black tree itself),
    'RBTreeNode' (a node), 'RBTreeNodeReservation' (a memory reservation
    for a node), 'Iter' and 'IterMut' (immutable and mutable iterators),
    'Cursor' (bidirectional cursor that allows to remove elements), as
    well as an entry API similar to the Rust standard library one.
 
  - 'init' module: add 'write_[pin_]init' methods and the 'InPlaceWrite'
    trait. Add the 'assert_pinned!' macro.
 
  - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by
    introducing an associated type in the trait.
 
  - 'alloc' module: add 'drop_contents' method to 'BoxExt'.
 
  - 'types' module: implement the 'ForeignOwnable' trait for
    'Pin<Box<T>>' and improve the trait's documentation. In addition,
    add the 'into_raw' method to the 'ARef' type.
 
  - 'error' module: in preparation for the upcoming Rust support for
    32-bit architectures, like arm, locally allow Clippy lint for those.
 
 Documentation:
 
  - https://rust.docs.kernel.org has been announced, so link to it.
 
  - Enable rustdoc's "jump to definition" feature, making its output a
    bit closer to the experience in a cross-referencer.
 
  - Debian Testing now also provides recent Rust releases (outside of
    the freeze period), so add it to the list.
 
 MAINTAINERS:
 
  - Trevor is joining as reviewer of the "RUST" entry.
 
 And a few other small bits.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmbzNz4ACgkQGXyLc2ht
 IW3muA/9HcPL0QqVB5+SqSRqcatmrFU/wq8Oaa6Z/No0JaynqyikK+R1WNokUd/5
 WpQi4PC1OYV+ekyAuWdkooKmaSqagH5r53XlezNw+cM5zo8y7p0otVlbepQ0t3Ky
 pVEmfDRIeSFXsKrg91BJUKyJf70TQlgSggDVCExlanfOjPz88C1+s3EcJ/XWYGKQ
 cRk/XDdbF5eNaldp2MriVF0fw7XktgIrmVzxt/z0lb4PE7RaCAnO6gSQI+90Vb2d
 zvyOYKS4AkqE3suFvDIIUlPUv+8XbACj0c4wvBZHH5uZGTbgWUffqygJ45GqChEt
 c4fS/+E8VaM1z0EvxNczC0nQkfLwkTc1mgbP+sG3VZJMPVCJ2zQan1/ond7GqCpw
 pt6uQaGvDsAvllm7sbiAIVaAY81icqyYWKfNBXLLEL7DhY5je5Wq+E83XQ8d5u5F
 EuapnZhW3y12d6UCsSe9bD8W45NFoWHPXky1TzT+whTxnX1yH9YsPXbJceGSbbgd
 Lw3GmUtZx2bVAMToVjNFD2lPA3OmPY1e2lk0jwzTuQrEXfnZYuzbjqs3YUijb7xR
 AlsWfIb0IHBwHWpB7da24ezqWP2VD4eaDdD8/+LmDSj6XLngxMNWRLKmXT000eTW
 vIFP9GJrvag2R3YFPhrurgGpRsp8HUTLtvcZROxp2JVQGQ7Z4Ww=
 =52BN
 -----END PGP SIGNATURE-----

Merge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux

Pull Rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up
     objtool warnings), teach objtool about 'noreturn' Rust symbols and
     mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we
     should be objtool-warning-free, so enable it to run for all Rust
     object files.

   - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support.

   - Support 'RUSTC_VERSION', including re-config and re-build on
     change.

   - Split helpers file into several files in a folder, to avoid
     conflicts in it. Eventually those files will be moved to the right
     places with the new build system. In addition, remove the need to
     manually export the symbols defined there, reusing existing
     machinery for that.

   - Relax restriction on configurations with Rust + GCC plugins to just
     the RANDSTRUCT plugin.

  'kernel' crate:

   - New 'list' module: doubly-linked linked list for use with reference
     counted values, which is heavily used by the upcoming Rust Binder.

     This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed
     unique for the given ID), 'AtomicTracker' (tracks whether a
     'ListArc' exists using an atomic), 'ListLinks' (the prev/next
     pointers for an item in a linked list), 'List' (the linked list
     itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor
     into a 'List' that allows to remove elements), 'ListArcField' (a
     field exclusively owned by a 'ListArc'), as well as support for
     heterogeneous lists.

   - New 'rbtree' module: red-black tree abstractions used by the
     upcoming Rust Binder.

     This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a
     node), 'RBTreeNodeReservation' (a memory reservation for a node),
     'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor'
     (bidirectional cursor that allows to remove elements), as well as
     an entry API similar to the Rust standard library one.

   - 'init' module: add 'write_[pin_]init' methods and the
     'InPlaceWrite' trait. Add the 'assert_pinned!' macro.

   - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by
     introducing an associated type in the trait.

   - 'alloc' module: add 'drop_contents' method to 'BoxExt'.

   - 'types' module: implement the 'ForeignOwnable' trait for
     'Pin<Box<T>>' and improve the trait's documentation. In addition,
     add the 'into_raw' method to the 'ARef' type.

   - 'error' module: in preparation for the upcoming Rust support for
     32-bit architectures, like arm, locally allow Clippy lint for
     those.

  Documentation:

   - https://rust.docs.kernel.org has been announced, so link to it.

   - Enable rustdoc's "jump to definition" feature, making its output a
     bit closer to the experience in a cross-referencer.

   - Debian Testing now also provides recent Rust releases (outside of
     the freeze period), so add it to the list.

  MAINTAINERS:

   - Trevor is joining as reviewer of the "RUST" entry.

  And a few other small bits"

* tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits)
  kasan: rust: Add KASAN smoke test via UAF
  kbuild: rust: Enable KASAN support
  rust: kasan: Rust does not support KHWASAN
  kbuild: rust: Define probing macros for rustc
  kasan: simplify and clarify Makefile
  rust: cfi: add support for CFI_CLANG with Rust
  cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS
  rust: support for shadow call stack sanitizer
  docs: rust: include other expressions in conditional compilation section
  kbuild: rust: replace proc macros dependency on `core.o` with the version text
  kbuild: rust: rebuild if the version text changes
  kbuild: rust: re-run Kconfig if the version text changes
  kbuild: rust: add `CONFIG_RUSTC_VERSION`
  rust: avoid `box_uninit_write` feature
  MAINTAINERS: add Trevor Gross as Rust reviewer
  rust: rbtree: add `RBTree::entry`
  rust: rbtree: add cursor
  rust: rbtree: add mutable iterator
  rust: rbtree: add iterator
  rust: rbtree: add red-black tree implementation backed by the C version
  ...
2024-09-25 10:25:40 -07:00
Jason Andryuk
47ffe0578a x86/pvh: Add 64bit relocation page tables
The PVH entry point is 32bit.  For a 64bit kernel, the entry point must
switch to 64bit mode, which requires a set of page tables.  In the past,
PVH used init_top_pgt.

This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
page tables are prebuilt for this address.  If the kernel is loaded at a
different address, they need to be adjusted.

__startup_64() adjusts the prebuilt page tables for the physical load
address, but it is 64bit code.  The 32bit PVH entry code can't call it
to adjust the page tables, so it can't readily be re-used.

64bit PVH entry needs page tables set up for identity map, the kernel
high map and the direct map.  pvh_start_xen() enters identity mapped.
Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
into the highmap.  The direct map is used for __va() on the initramfs
and other guest physical addresses.

Add a dedicated set of prebuild page tables for PVH entry.  They are
adjusted in assembly before loading.

Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
along with the kernel's loading constraints.  The maximum load address,
KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
page.  It could be larger with more pages.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-6-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25 16:06:07 +02:00
Jason Andryuk
e3e8cd90f8 x86/kernel: Move page table macros to header
The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to pgtable_64.h, so they can be re-used.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Message-ID: <20240823193630.2583107-5-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25 16:06:03 +02:00
Jason Andryuk
b464b461d2 x86/pvh: Set phys_base when calling xen_prepare_pvh()
phys_base needs to be set for __pa() to work in xen_pvh_init() when
finding the hypercall page.  Set it before calling into
xen_prepare_pvh(), which calls xen_pvh_init().  Clear it afterward to
avoid __startup_64() adding to it and creating an incorrect value.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-4-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25 14:15:11 +02:00
Jason Andryuk
1db29f99ed x86/pvh: Make PVH entrypoint PIC for x86-64
The PVH entrypoint is 32bit non-PIC code running the uncompressed
vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000
(16MB).  The kernel is loaded at that physical address inside the VM by
the VMM software (Xen/QEMU).

When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
into the PVH container.  There exist system firmwares (Coreboot/EDK2)
with reserved memory at 16MB.  This creates a conflict where the PVH
kernel cannot be loaded at that address.

Modify the PVH entrypoint to be position-indepedent to allow flexibility
in load address.  Only the 64bit entry path is converted.  A 32bit
kernel is not PIC, so calling into other parts of the kernel, like
xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
relocated.

This makes the code PIC, but the page tables need to be updated as well
to handle running from the kernel high map.

The UNWIND_HINT_END_OF_STACK is to silence:
vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
after the lret into 64bit code.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-3-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25 14:15:08 +02:00
Jiqian Chen
b166b8ab41 xen/pvh: Setup gsi for passthrough device
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assigning a device to passthrough, proactively setup the gsi
of the device during that process.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-ID: <20240924061437.2636766-3-Jiqian.Chen@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25 09:54:52 +02:00
Linus Torvalds
68e5c7d4ce Kbuild updates for v6.12
- Support cross-compiling linux-headers Debian package and kernel-devel
    RPM package
 
  - Add support for the linux-debug Pacman package
 
  - Improve module rebuilding speed by factoring out the common code to
    scripts/module-common.c
 
  - Separate device tree build rules into scripts/Makefile.dtbs
 
  - Add a new script to generate modules.builtin.ranges, which is useful
    for tracing tools to find symbols in built-in modules
 
  - Refactor Kconfig and misc tools
 
  - Update Kbuild and Kconfig documentation
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmby2+QVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGpQ0QALWMgox3OdceNiBT8QieqRFfwKFv
 5jxtsZt+MbTdWNMEfgc4Cq2i5ZAqpYGZh32RwTiZJogBvYEIoO7M4Md9VwoEe/BC
 q8VZ6FhUy7358IX/FCukfB0dYvkziRalBRDrE4iFmMMdhBvZ9nrvMxllqFCMllLj
 DTrBTTiMus3qiiczr4tb5QwaIR6C+yqiEBF++ftLmWvo9dn8YNNUnI65fGjyQM/w
 0wMPwsB3Y2HdnRpLUS6T18gZbjoXsAk4+WX0TpdBfTs3d7AdbzlSMtc0BslEm6Tb
 JjIK6SbJCM3kNC7O0/gsUenOaSBxSbKjjg33gQxn/eNoi0nRt+qnBMMreYiTd95G
 Hq86QcNfKQtWAagKRTppMkYEDqMU2RKH7BmJOsfQyeG9cGpAAu+0HsQv3f/h5QP1
 MlA8o+NP5oQn6RbrhZz1Pqm24+OMxiXaBhmo8XbZ+MXzi/CBR54Eo4ip/FSHzXII
 EGEAQL7t7YU7xu8qMIE6ZQMH7BJsjJNee0vrNiYZa4xHLYyHi6mJl8K6LlHQ3nEx
 WOsPX9MLITtSJwcvIio/0sEnuR7pjcShGfqhbHO5tiOYznsbcSvu3+18HPGCpFRt
 vYFkNIRc298k7++A+Zp2wwdD2TS+SSilrAImmJXMhf0M+Nyg2vnlfAo8t0QSkFlh
 1g9dJuy+8jYRjHXP
 =g4t/
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support cross-compiling linux-headers Debian package and kernel-devel
   RPM package

 - Add support for the linux-debug Pacman package

 - Improve module rebuilding speed by factoring out the common code to
   scripts/module-common.c

 - Separate device tree build rules into scripts/Makefile.dtbs

 - Add a new script to generate modules.builtin.ranges, which is useful
   for tracing tools to find symbols in built-in modules

 - Refactor Kconfig and misc tools

 - Update Kbuild and Kconfig documentation

* tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
  kbuild: doc: replace "gcc" in external module description
  kbuild: doc: describe the -C option precisely for external module builds
  kbuild: doc: remove the description about shipped files
  kbuild: doc: drop section numbering, use references in modules.rst
  kbuild: doc: throw out the local table of contents in modules.rst
  kbuild: doc: remove outdated description of the limitation on -I usage
  kbuild: doc: remove description about grepping CONFIG options
  kbuild: doc: update the description about Kbuild/Makefile split
  kbuild: remove unnecessary export of RUST_LIB_SRC
  kbuild: remove append operation on cmd_ld_ko_o
  kconfig: cache expression values
  kconfig: use hash table to reuse expressions
  kconfig: refactor expr_eliminate_dups()
  kconfig: add comments to expression transformations
  kconfig: change some expr_*() functions to bool
  scripts: move hash function from scripts/kconfig/ to scripts/include/
  kallsyms: change overflow variable to bool type
  kallsyms: squash output_address()
  kbuild: add install target for modules.builtin.ranges
  scripts: add verifier script for builtin module range data
  ...
2024-09-24 13:02:06 -07:00
Linus Torvalds
3a37872316 pci-v6.12-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmbseugUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxdwxAAvdvDyTuiPo2R8pQtvKg4YL2IUnK5
 UR28mBxZDK5DFhLtD/QzmVVG/eaLY6bJHthHgJgTApzekkqU0h9dcRI0eegXrvcz
 I3HRsZK2yatUky9l8O148OLzF897r7vXL3QtGe6qjKU+9D83IEeooLKgBca+GoBC
 bRLvG/fYRzdjOe8UHFqCoeMIg3IOY7CNifvFOihAGpJpxfZQktj6hSKu6q7BL1Rx
 NRgYlxh0eLcb7vAJqz6RZpQ8PRCwhAjlDuu0BOkES8/6EwisD1xUh3qdDxfVgNA6
 FpcAb/53yr46cs4tM9ZTwluka86AskuXj3jwSKf7nE3zqr4nM9OD3sGOSYzK8UdE
 EDBKj+9iEpYRC6rJMk5gNH2AZkR1OEpNUisR6+kEn81A9yNNoTmkHdHUOWo8TuxD
 btc0sTM+eWApvTiZwgL4VjMZulQllV51K8tcfvODRhlMkbOPNWGWdmpWqEbUS2HU
 i7+zzQC3DC5iPlAKgRSeYB0aad6la6brqPW16sGhGovNhgwbzakDLCUJJGn/LNuO
 wd0UNpJTnHlfChbvNh2bBxiMOo0cab1tJ5Jp97STQYhLg2nW93s/dAfdpSAsYO4S
 5YzjSADWeyeuDsHE1RdUdDvYAPMb1VZBUd2OSHis5zw7kmh25c9KYXEkDJ25q/ju
 sVXK4oMNW/Gnd5M=
 =L3s9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Wait for device readiness after reset by polling Vendor ID and
     looking for Configuration RRS instead of polling the Command
     register and looking for non-error completions, to avoid hardware
     retries done for RRS on non-Vendor ID reads (Bjorn Helgaas)

   - Rename CRS Completion Status to RRS ('Request Retry Status') to
     match PCIe r6.0 spec usage (Bjorn Helgaas)

   - Clear LBMS bit after a manual link retrain so we don't try to
     retrain a link when there's no downstream device anymore (Maciej W.
     Rozycki)

   - Revert to the original link speed after retraining fails instead of
     leaving it restricted to 2.5GT/s, so a future device has a chance
     to use higher speeds (Maciej W. Rozycki)

   - Wait for each level of downstream bus, not just the first, to
     become accessible before restoring devices on that bus (Ilpo
     Järvinen)

   - Add ARCH_PCI_DEV_GROUPS so s390 can add its own attribute_groups
     without having to stomp on the core's pdev->dev.groups (Lukas
     Wunner)

  Driver binding:

   - Export pcim_request_region(), a managed counterpart of
     pci_request_region(), for use by drivers (Philipp Stanner)

   - Export pcim_iomap_region() and deprecate pcim_iomap_regions()
     (Philipp Stanner)

   - Request the PCI BAR used by xboxvideo (Philipp Stanner)

   - Request and map drm/ast BARs with pcim_iomap_region() (Philipp
     Stanner)

  MSI:

   - Add MSI_FLAG_NO_AFFINITY flag for devices that mux MSIs onto a
     single IRQ line and cannot set the affinity of each MSI to a
     specific CPU core (Marek Vasut)

   - Use MSI_FLAG_NO_AFFINITY and remove unnecessary .irq_set_affinity()
     implementations in aardvark, altera, brcmstb, dwc, mediatek-gen3,
     mediatek, mobiveil, plda, rcar, tegra, vmd, xilinx-nwl,
     xilinx-xdma, and xilinx drivers to avoid 'IRQ: set affinity failed'
     warnings (Marek Vasut)

  Power management:

   - Add pwrctl support for ATH11K inside the WCN6855 package (Konrad
     Dybcio)

  PCI device hotplug:

   - Remove unnecessary hpc_ops struct from shpchp (ngn)

   - Check for PCI_POSSIBLE_ERROR(), not 0xffffffff, in cpqphp
     (weiyufeng)

  Virtualization:

   - Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson)

   - Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS
     but does provide ACS-like features (Subramanian Ananthanarayanan)

  IOMMU:

   - Add function 0 DMA alias quirk for Glenfly Arise audio function,
     which uses the function 0 Requester ID (WangYuli)

  NPEM:

   - Add Native PCIe Enclosure Management (NPEM) support for sysfs
     control of NVMe RAID storage indicators (ok/fail/locate/
     rebuild/etc) (Mariusz Tkaczyk)

   - Add support for the ACPI _DSM PCIe SSD status LED management, which
     is functionally similar to NPEM but mediated by platform firmware
     (Mariusz Tkaczyk)

  Device trees:

   - Drop minItems and maxItems from ranges in PCI generic host binding
     since host bridges may have several MMIO and I/O port apertures
     (Frank Li)

   - Add kirin, rcar-gen2, uniphier DT binding top-level constraints for
     clocks (Krzysztof Kozlowski)

  Altera PCIe controller driver:

   - Convert altera DT bindings from text to YAML (Matthew Gerlach)

   - Replace TLP_REQ_ID() with macro PCI_DEVID(), which does the same
     thing and is what other drivers use (Jinjie Ruan)

  Broadcom STB PCIe controller driver:

   - Add DT binding maxItems for reset controllers (Jim Quinlan)

   - Use the 'bridge' reset method if described in the DT (Jim Quinlan)

   - Use the 'swinit' reset method if described in the DT (Jim Quinlan)

   - Add 'has_phy' so the existence of a 'rescal' reset controller
     doesn't imply software control of it (Jim Quinlan)

   - Add support for many inbound DMA windows (Jim Quinlan)

   - Rename SoC 'type' to 'soc_base' express the fact that SoCs come in
     families of multiple similar devices (Jim Quinlan)

   - Add Broadcom 7712 DT description and driver support (Jim Quinlan)

   - Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings for
     maintainability (Bjorn Helgaas)

  Freescale i.MX6 PCIe controller driver:

   - Add imx6q-pcie 'dbi2' and 'atu' reg-names for i.MX8M Endpoints
     (Richard Zhu)

   - Fix a code restructuring error that caused i.MX8MM and i.MX8MP
     Endpoints to fail to establish link (Richard Zhu)

   - Fix i.MX8MP Endpoint occasional failure to trigger MSI by enforcing
     outbound alignment requirement (Richard Zhu)

   - Call phy_power_off() in the .probe() error path (Frank Li)

   - Rename internal names from imx6_* to imx_* since i.MX7/8/9 are also
     supported (Frank Li)

   - Manage Refclk by using SoC-specific callbacks instead of switch
     statements (Frank Li)

   - Manage core reset by using SoC-specific callbacks instead of switch
     statements (Frank Li)

   - Expand comments for erratum ERR010728 workaround (Frank Li)

   - Use generic PHY APIs to configure mode, speed, and submode, which
     is harmless for devices that implement their own internal PHY
     management and don't set the generic imx_pcie->phy (Frank Li)

   - Add i.MX8Q (i.MX8QM, i.MX8QXP, and i.MX8DXL) DT binding and driver
     Root Complex support (Richard Zhu)

  Freescale Layerscape PCIe controller driver:

   - Replace layerscape-pcie DT binding compatible fsl,lx2160a-pcie with
     fsl,lx2160ar2-pcie (Frank Li)

   - Add layerscape-pcie DT binding deprecated 'num-viewport' property
     to address a DT checker warning (Frank Li)

   - Change layerscape-pcie DT binding 'fsl,pcie-scfg' to phandle-array
     (Frank Li)

  Loongson PCIe controller driver:

   - Increase max PCI hosts to 8 for Loongson-3C6000 and newer chipsets
     (Huacai Chen)

  Marvell Aardvark PCIe controller driver:

   - Fix issue with emulating Configuration RRS for two-byte reads of
     Vendor ID; previously it only worked for four-byte reads (Bjorn
     Helgaas)

  MediaTek PCIe Gen3 controller driver:

   - Add per-SoC struct mtk_gen3_pcie_pdata to support multiple SoC
     types (Lorenzo Bianconi)

   - Use reset_bulk APIs to manage PHY reset lines (Lorenzo Bianconi)

   - Add DT and driver support for Airoha EN7581 PCIe controller
     (Lorenzo Bianconi)

  Qualcomm PCIe controller driver:

   - Update qcom,pcie-sc7280 DT binding with eight interrupts (Rayyan
     Ansari)

   - Add back DT 'vddpe-3v3-supply', which was incorrectly removed
     earlier (Johan Hovold)

   - Drop endpoint redundant masking of global IRQ events (Manivannan
     Sadhasivam)

   - Clarify unknown global IRQ message and only log it once to avoid a
     flood (Manivannan Sadhasivam)

   - Add 'linux,pci-domain' property to endpoint DT binding (Manivannan
     Sadhasivam)

   - Assign PCI domain number for endpoint controllers (Manivannan
     Sadhasivam)

   - Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for
     endpoint controller (Manivannan Sadhasivam)

   - Add global SPI interrupt for PCIe link events to DT binding
     (Manivannan Sadhasivam)

   - Add global RC interrupt handler to handle 'Link up' events and
     automatically enumerate hot-added devices (Manivannan Sadhasivam)

   - Avoid mirroring of DBI and iATU register space so it doesn't
     overlap BAR MMIO space (Prudhvi Yarlagadda)

   - Enable controller resources like PHY only after PERST# is
     deasserted to partially avoid the problem that the endpoint SoC
     crashes when accessing things when Refclk is absent (Manivannan
     Sadhasivam)

   - Add 16.0 GT/s equalization and RX lane margining settings (Shashank
     Babu Chinta Venkata)

   - Pass domain number to pci_bus_release_domain_nr() explicitly to
     avoid a NULL pointer dereference (Manivannan Sadhasivam)

  Renesas R-Car PCIe controller driver:

   - Make the read-only const array 'check_addr' static (Colin Ian King)

   - Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding
     (Yoshihiro Shimoda)

  TI DRA7xx PCIe controller driver:

   - Request IRQF_ONESHOT for 'dra7xx-pcie-main' IRQ since the primary
     handler is NULL (Siddharth Vadapalli)

   - Handle IRQ request errors during root port and endpoint probe
     (Siddharth Vadapalli)

  TI J721E PCIe driver:

   - Add DT 'ti,syscon-acspcie-proxy-ctrl' and driver support to enable
     the ACSPCIE module to drive Refclk for the Endpoint (Siddharth
     Vadapalli)

   - Extract the cadence link setup from cdns_pcie_host_setup() so link
     setup can be done separately during resume (Thomas Richard)

   - Add T_PERST_CLK_US definition for the mandatory delay between
     Refclk becoming stable and PERST# being deasserted (Thomas Richard)

   - Add j721e suspend and resume support (Théo Lebrun)

  TI Keystone PCIe controller driver:

   - Fix NULL pointer checking when applying MRRS limitation quirk for
     AM65x SR 1.0 Errata #i2037 (Dan Carpenter)

  Xilinx NWL PCIe controller driver:

   - Fix off-by-one error in INTx IRQ handler that caused INTx
     interrupts to be lost or delivered as the wrong interrupt (Sean
     Anderson)

   - Rate-limit misc interrupt messages (Sean Anderson)

   - Turn off the clock on probe failure and device removal (Sean
     Anderson)

   - Add DT binding and driver support for enabling/disabling PHYs (Sean
     Anderson)

   - Add PCIe phy bindings for the ZCU102 (Sean Anderson)

  Xilinx XDMA PCIe controller driver:

   - Add support for Xilinx QDMA Soft IP PCIe Root Port Bridge to DT
     binding and xilinx-dma-pl driver (Thippeswamy Havalige)

  Miscellaneous:

   - Fix buffer overflow in kirin_pcie_parse_port() (Alexandra Diupina)

   - Fix minor kerneldoc issues and typos (Bjorn Helgaas)

   - Use PCI_DEVID() macro in aer_inject() instead of open-coding it
     (Jinjie Ruan)

   - Check pcie_find_root_port() return in x86 fixups to avoid NULL
     pointer dereferences (Samasth Norway Ananda)

   - Make pci_bus_type constant (Kunwu Chan)

   - Remove unused declarations of __pci_pme_wakeup() and
     pci_vpd_release() (Yue Haibing)

   - Remove any leftover .*.cmd files with make clean (zhang jiao)

   - Remove unused BILLION macro (zhang jiao)"

* tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (132 commits)
  PCI: Fix typos
  dt-bindings: PCI: qcom: Allow 'vddpe-3v3-supply' again
  tools: PCI: Remove unused BILLION macro
  tools: PCI: Remove .*.cmd files with make clean
  PCI: Pass domain number to pci_bus_release_domain_nr() explicitly
  PCI: dra7xx: Fix error handling when IRQ request fails in probe
  PCI: dra7xx: Fix threaded IRQ request for "dra7xx-pcie-main" IRQ
  PCI: qcom: Add RX lane margining settings for 16.0 GT/s
  PCI: qcom: Add equalization settings for 16.0 GT/s
  PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed
  PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed'
  PCI: qcom-ep: Enable controller resources like PHY only after refclk is available
  PCI: Mark Creative Labs EMU20k2 INTx masking as broken
  dt-bindings: PCI: imx6q-pcie: Add reg-name "dbi2" and "atu" for i.MX8M PCIe Endpoint
  dt-bindings: PCI: altera: msi: Convert to YAML
  PCI: imx6: Add i.MX8Q PCIe Root Complex (RC) support
  PCI: Rename CRS Completion Status to RRS
  PCI: aardvark: Correct Configuration RRS checking
  PCI: Wait for device readiness with Configuration RRS
  PCI: brcmstb: Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings
  ...
2024-09-23 12:47:06 -07:00