Commit Graph

11356 Commits

Author SHA1 Message Date
Peter Zijlstra
5b2fc51576 x86/ibt,xen: Sprinkle the ENDBR
Even though Xen currently doesn't advertise IBT, prepare for when it
will eventually do so and sprinkle the ENDBR dust accordingly.

Even though most of the entry points are IRET like, the CPL0
Hypervisor can set WAIT-FOR-ENDBR and demand ENDBR at these sites.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.873919996@infradead.org
2022-03-15 10:32:35 +01:00
Peter Zijlstra
8b87d8cec1 x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel()
By doing an early rewrite of 'jmp native_iret` in
restore_regs_and_return_to_kernel() we can get rid of the last
INTERRUPT_RETURN user and paravirt_iret.

Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.815039833@infradead.org
2022-03-15 10:32:34 +01:00
Peter Zijlstra
ba27d1a808 x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch()
Less duplication is more better.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.697253958@infradead.org
2022-03-15 10:32:34 +01:00
Peter Zijlstra
bbf92368b0 x86/text-patching: Make text_gen_insn() play nice with ANNOTATE_NOENDBR
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.638561109@infradead.org
2022-03-15 10:32:33 +01:00
Peter Zijlstra
156ff4a544 x86/ibt: Base IBT bits
Add Kconfig, Makefile and basic instruction support for x86 IBT.

(Ab)use __DISABLE_EXPORTS to disable IBT since it's already employed
to mark compressed and purgatory. Additionally mark realmode with it
as well to avoid inserting ENDBR instructions there. While ENDBR is
technically a NOP, inserting them was causing some grief due to code
growth. There's also a problem with using __noendbr in code compiled
without -fcf-protection=branch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.519875203@infradead.org
2022-03-15 10:32:33 +01:00
Ingo Molnar
ccdbf33c23 Merge tag 'v5.17-rc8' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-03-15 10:28:12 +01:00
Jakub Kicinski
1e8a3f0d2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/dsa/dsa2.c
  commit afb3cc1a39 ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
  commit e83d565378 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/

drivers/net/ethernet/intel/ice/ice.h
  commit 97b0129146 ("ice: Fix error with handling of bonding MTU")
  commit 43113ff734 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/

drivers/staging/gdm724x/gdm_lte.c
  commit fc7f750dc9 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
  commit 4bcc4249b4 ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 17:16:56 -08:00
Ionela Voinescu
1132e6de11 x86, ACPI: rename init_freq_invariance_cppc() to arch_init_invariance_cppc()
init_freq_invariance_cppc() was called in acpi_cppc_processor_probe(),
after CPU performance information and controls were populated from the
per-cpu _CPC objects.

But these _CPC objects provide information that helps with both CPU
(u-arch) and frequency invariance. Therefore, change the function name
to a more generic one, while adding the arch_ prefix, as this function
is expected to be defined differently by different architectures.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-10 20:21:58 +01:00
Huang Rui
eb5616d4ad x86/ACPI: CPPC: Move init_freq_invariance_cppc() into x86 CPPC
The init_freq_invariance_cppc code actually doesn't need the SMP
functionality. So setting the CONFIG_SMP as the check condition for
init_freq_invariance_cppc may cause the confusion to misunderstand the
CPPC. And the x86 CPPC file is better space to store the CPPC related
functions, while the init_freq_invariance_cppc is out of smpboot, that
means, the CONFIG_SMP won't be mandatory condition any more. And It's more
clear than before.

Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-08 19:16:43 +01:00
Huang Rui
666f6ecf35 x86: Expose init_freq_invariance() to topology header
The function init_freq_invariance will be used on x86 CPPC, so expose it in
the topology header.

Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-08 19:16:43 +01:00
Huang Rui
82d8936914 x86/ACPI: CPPC: Move AMD maximum frequency ratio setting function into x86 CPPC
The AMD maximum frequency ratio setting function depends on CPPC, so the
x86 CPPC implementation file is better space for this function.

Signed-off-by: Huang Rui <ray.huang@amd.com>
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-08 19:16:43 +01:00
Suravee Suthikulpanit
4a204f7895 KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255
Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined
by the architecture.  The number of bits consumed by hardware is model
specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM
to enumerate the "true" size.  So, KVM must allow using all bits, else it
risks rejecting completely legal x2APIC IDs on newer CPUs.

This means KVM relies on hardware to not assign x2APIC IDs that exceed the
"true" width of the field, but presumably hardware is smart enough to tie
the width to the max x2APIC ID.  KVM also relies on hardware to support at
least 8 bits, as the legacy xAPIC ID is writable by software.  But, those
assumptions are unavoidable due to the lack of any way to enumerate the
"true" width.

Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Fixes: 44a95dae1d ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220211000851.185799-1-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-08 10:59:12 -05:00
Paolo Bonzini
22b94c4b63 KVM: x86/mmu: Zap invalidated roots via asynchronous worker
Use the system worker threads to zap the roots invalidated
by the TDP MMU's "fast zap" mechanism, implemented by
kvm_tdp_mmu_invalidate_all_roots().

At this point, apart from allowing some parallelism in the zapping of
roots, the workqueue is a glorified linked list: work items are added and
flushed entirely within a single kvm->slots_lock critical section.  However,
the workqueue fixes a latent issue where kvm_mmu_zap_all_invalidated_roots()
assumes that it owns a reference to all invalid roots; therefore, no
one can set the invalid bit outside kvm_mmu_zap_all_fast().  Putting the
invalidated roots on a linked list... erm, on a workqueue ensures that
tdp_mmu_zap_root_work() only puts back those extra references that
kvm_mmu_zap_all_invalidated_roots() had gifted to it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-08 10:55:27 -05:00
Linus Torvalds
4a01e748a5 Merge tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spectre fixes from Borislav Petkov:

 - Mitigate Spectre v2-type Branch History Buffer attacks on machines
   which support eIBRS, i.e., the hardware-assisted speculation
   restriction after it has been shown that such machines are vulnerable
   even with the hardware mitigation.

 - Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
   it is insufficient to mitigate such attacks. Instead, switch to
   retpolines on all AMD by default.

 - Update the docs and add some warnings for the obviously vulnerable
   cmdline configurations.

* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
  x86/speculation: Warn about Spectre v2 LFENCE mitigation
  x86/speculation: Update link to AMD speculation whitepaper
  x86/speculation: Use generic retpoline by default on AMD
  x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  Documentation/hw-vuln: Update spectre doc
  x86/speculation: Add eIBRS + Retpoline options
  x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
2022-03-07 17:29:47 -08:00
Paolo Bonzini
0564eeb71b Merge branch 'kvm-bugfixes' into HEAD
Merge bugfixes from 5.17 before merging more tricky work.
2022-03-04 18:39:29 -05:00
Jakub Kicinski
80901bff81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/batman-adv/hard-interface.c
  commit 690bb6fb64 ("batman-adv: Request iflink once in batadv-on-batadv check")
  commit 6ee3c393ee ("batman-adv: Demote batadv-on-batadv skip error message")
https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/

net/smc/af_smc.c
  commit 4d08b7b57e ("net/smc: Fix cleanup when register ULP fails")
  commit 462791bbfa ("net/smc: add sysctl interface for SMC")
https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 11:55:12 -08:00
Suma Hegde
91f410aa67 platform/x86: Add AMD system management interface
Recent Fam19h EPYC server line of processors from AMD support system
management functionality via HSMP (Host System Management Port) interface.

The Host System Management Port (HSMP) is an interface to provide
OS-level software with access to system management functions via a
set of mailbox registers.

More details on the interface can be found in chapter
"7 Host System Management Port (HSMP)" of the following PPR
https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip

This patch adds new amd_hsmp module under the drivers/platforms/x86/
which creates miscdevice with an IOCTL interface to the user space.
/dev/hsmp is for running the hsmp mailbox commands.

Signed-off-by: Suma Hegde <suma.hegde@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Nathan Fontenot <nathan.fontenot@amd.com>
Link: https://lore.kernel.org/r/20220222050501.18789-1-nchatrad@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-03-02 11:42:36 +01:00
Sean Christopherson
527d5cd7ee KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped
Zap only obsolete roots when responding to zapping a single root shadow
page.  Because KVM keeps root_count elevated when stuffing a previous
root into its PGD cache, shadowing a 64-bit guest means that zapping any
root causes all vCPUs to reload all roots, even if their current root is
not affected by the zap.

For many kernels, zapping a single root is a frequent operation, e.g. in
Linux it happens whenever an mm is dropped, e.g. process exits, etc...

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220225182248.3812651-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01 08:58:25 -05:00
Paolo Bonzini
0c1c92f15f KVM: x86/mmu: do not pass vcpu to root freeing functions
These functions only operate on a given MMU, of which there is more
than one in a vCPU (we care about two, because the third does not have
any roots and is only used to walk guest page tables).  They do need a
struct kvm in order to lock the mmu_lock, but they do not needed anything
else in the struct kvm_vcpu.  So, pass the vcpu->kvm directly to them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:17 -05:00
Paolo Bonzini
b9e5603c2a KVM: x86: use struct kvm_mmu_root_info for mmu->root
The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info.
Use the struct to have more consistency between mmu->root and
mmu->prev_roots.

The patch is entirely search and replace except for cached_root_available,
which does not need a temporary struct kvm_mmu_root_info anymore.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:16 -05:00
David Dunn
ba7bb663f5 KVM: x86: Provide per VM capability for disabling PMU virtualization
Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of
settings/features to allow userspace to configure PMU virtualization on
a per-VM basis.  For now, support a single flag, KVM_PMU_CAP_DISABLE,
to allow disabling PMU virtualization for a VM even when KVM is configured
with enable_pmu=true a module level.

To keep KVM simple, disallow changing VM's PMU configuration after vCPUs
have been created.

Signed-off-by: David Dunn <daviddunn@google.com>
Message-Id: <20220223225743.2703915-2-daviddunn@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:14 -05:00
Sean Christopherson
925088781e KVM: x86: Fix pointer mistmatch warning when patching RET0 static calls
Cast kvm_x86_ops.func to 'void *' when updating KVM static calls that are
conditionally patched to __static_call_return0().  clang complains about
using mismatching pointers in the ternary operator, which breaks the
build when compiling with CONFIG_KVM_WERROR=y.

  >> arch/x86/include/asm/kvm-x86-ops.h:82:1: warning: pointer type mismatch
  ('bool (*)(struct kvm_vcpu *)' and 'void *') [-Wpointer-type-mismatch]

Fixes: 5be2226f41 ("KVM: x86: allow defining return-0 static calls")
Reported-by: Like Xu <like.xu.linux@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Message-Id: <20220223162355.3174907-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 08:20:14 -05:00
Arnd Bergmann
dd865f090f Merge branch 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic into asm-generic
Christoph Hellwig and a few others spent a huge effort on removing
set_fs() from most of the important architectures, but about half the
other architectures were never completed even though most of them don't
actually use set_fs() at all.

I did a patch for microblaze at some point, which turned out to be fairly
generic, and now ported it to most other architectures, using new generic
implementations of access_ok() and __{get,put}_kernel_nocheck().

Three architectures (sparc64, ia64, and sh) needed some extra work,
which I also completed.

* 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  uaccess: fix integer overflow on access_ok()
2022-02-25 11:16:58 +01:00
Arnd Bergmann
12700c17fc uaccess: generalize access_ok()
There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.

Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.

For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.

Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.

Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
34737e2698 uaccess: add generic __{get,put}_kernel_nofault
Nine architectures are still missing __{get,put}_kernel_nofault:
alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa.

Add a generic version that lets everything use the normal
copy_{from,to}_kernel_nofault() code based on these, removing the last
use of get_fs()/set_fs() from architecture-independent code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
1830a1d6a5 x86: use more conventional access_ok() definition
The way that access_ok() is defined on x86 is slightly different from
most other architectures, and a bit more complex.

The generic version tends to result in the best output on all
architectures, as it results in single comparison against a constant
limit for calls with a known size.

There are a few callers of __range_not_ok(), all of which use TASK_SIZE
as the limit rather than TASK_SIZE_MAX, but I could not see any reason
for picking this. Changing these to call __access_ok() instead uses the
default limit, but keeps the behavior otherwise.

x86 is the only architecture with a WARN_ON_IN_IRQ() checking
access_ok(), but it's probably best to leave that in place.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Arnd Bergmann
36903abedf x86: remove __range_not_ok()
The __range_not_ok() helper is an x86 (and sparc64) specific interface
that does roughly the same thing as __access_ok(), but with different
calling conventions.

Change this to use the normal interface in order for consistency as we
clean up all access_ok() implementations.

This changes the limit from TASK_SIZE to TASK_SIZE_MAX, which Al points
out is the right thing do do here anyway.

The callers have to use __access_ok() instead of the normal access_ok()
though, because on x86 that contains a WARN_ON_IN_IRQ() check that cannot
be used inside of NMI context while tracing.

The check in copy_code() is not needed any more, because this one is
already done by copy_from_user_nmi().

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/lkml/YgsUKcXGR7r4nINj@zeniv-ca.linux.org.uk/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:05 +01:00
Linus Torvalds
1f840c0ef4 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "x86 host:

   - Expose KVM_CAP_ENABLE_CAP since it is supported

   - Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode

   - Ensure async page fault token is nonzero

   - Fix lockdep false negative

   - Fix FPU migration regression from the AMX changes

  x86 guest:

   - Don't use PV TLB/IPI/yield on uniprocessor guests

  PPC:

   - reserve capability id (topic branch for ppc/kvm)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled
  KVM: x86/mmu: make apf token non-zero to fix bug
  KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3
  x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
  x86/kvm: Fix compilation warning in non-x86_64 builds
  x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
  x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
  KVM: Fix lockdep false negative during host resume
  KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
2022-02-24 14:05:49 -08:00
Brijesh Singh
1e8c5971c2 x86/mm/cpa: Generalize __set_memory_enc_pgtable()
The kernel provides infrastructure to set or clear the encryption mask
from the pages for AMD SEV, but TDX requires few tweaks.

- TDX and SEV have different requirements to the cache and TLB
  flushing.

- TDX has own routine to notify VMM about page encryption status change.

Modify __set_memory_enc_pgtable() and make it flexible enough to cover
both AMD SEV and Intel TDX. The AMD-specific behavior is isolated in the
callbacks under x86_platform.guest. TDX will provide own version of said
callbacks.

  [ bp: Beat into submission. ]

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20220223043528.2093214-1-brijesh.singh@amd.com
2022-02-23 19:14:29 +01:00
Kirill A. Shutemov
b577f542f9 x86/coco: Add API to handle encryption mask
AMD SME/SEV uses a bit in the page table entries to indicate that the
page is encrypted and not accessible to the VMM.

TDX uses a similar approach, but the polarity of the mask is opposite to
AMD: if the bit is set the page is accessible to VMM.

Provide vendor-neutral API to deal with the mask: cc_mkenc() and
cc_mkdec() modify given address to make it encrypted/decrypted. It can
be applied to phys_addr_t, pgprotval_t or page table entry value.

pgprot_encrypted() and pgprot_decrypted() reimplemented using new
helpers.

The implementation will be extended to cover TDX.

pgprot_decrypted() is used by drivers (i915, virtio_gpu, vfio).
cc_mkdec() called by pgprot_decrypted(). Export cc_mkdec().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220222185740.26228-5-kirill.shutemov@linux.intel.com
2022-02-23 19:14:29 +01:00
Kirill A. Shutemov
655a0fa34b x86/coco: Explicitly declare type of confidential computing platform
The kernel derives the confidential computing platform
type it is running as from sme_me_mask on AMD or by using
hv_is_isolation_supported() on HyperV isolation VMs. This detection
process will be more complicated as more platforms get added.

Declare a confidential computing vendor variable explicitly and set it
via cc_set_vendor() on the respective platform.

  [ bp: Massage commit message, fixup HyperV check. ]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220222185740.26228-4-kirill.shutemov@linux.intel.com
2022-02-23 19:14:16 +01:00
Christoph Hellwig
4509d950a6 x86/pat: Remove the unused set_pages_array_wt() function
Commit

  623dffb2a2 ("x86/mm/pat: Add set_memory_wt() for Write-Through type")

added it but there were no users.

  [ bp: Add a commit message. ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220223072852.616143-1-hch@lst.de
2022-02-23 13:34:08 +01:00
Ingo Molnar
669f45f19c sched/headers: Add initial new headers as identity mappings
This allows code sharing between fast-headers tree and the vanilla
scheduler tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23 10:58:28 +01:00
Ingo Molnar
6255b48aeb Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts
New conflicts in sched/core due to the following upstream fixes:

  44585f7bc0 ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n")
  a06247c680 ("psi: Fix uaf issue when psi trigger is destroyed while being polled")

Conflicts:
	include/linux/psi_types.h
	kernel/sched/psi.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-02-21 11:53:51 +01:00
Peter Zijlstra
1e19da8522 x86/speculation: Add eIBRS + Retpoline options
Thanks to the chaps at VUsec it is now clear that eIBRS is not
sufficient, therefore allow enabling of retpolines along with eIBRS.

Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and
spectre_v2=eibrs,retpoline options to explicitly pick your preferred
means of mitigation.

Since there's new mitigations there's also user visible changes in
/sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these
new mitigations.

  [ bp: Massage commit message, trim error messages,
    do more precise eIBRS mode checking. ]

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2022-02-21 10:21:35 +01:00
Peter Zijlstra (Intel)
d45476d983 x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
The RETPOLINE_AMD name is unfortunate since it isn't necessarily
AMD only, in fact Hygon also uses it. Furthermore it will likely be
sufficient for some Intel processors. Therefore rename the thing to
RETPOLINE_LFENCE to better describe what it is.

Add the spectre_v2=retpoline,lfence option as an alias to
spectre_v2=retpoline,amd to preserve existing setups. However, the output
of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed.

  [ bp: Fix typos, massage. ]

Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2022-02-21 10:21:28 +01:00
Mark Rutland
8a69fe0be1 sched/preempt: Refactor sched_dynamic_update()
Currently sched_dynamic_update needs to open-code the enabled/disabled
function names for each preemption model it supports, when in practice
this is a boolean enabled/disabled state for each function.

Make this clearer and avoid repetition by defining the enabled/disabled
states at the function definition, and using helper macros to perform the
static_call_update(). Where x86 currently overrides the enabled
function, it is made to provide both the enabled and disabled states for
consistency, with defaults provided by the core code otherwise.

In subsequent patches this will allow us to support PREEMPT_DYNAMIC
without static calls.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-3-mark.rutland@arm.com
2022-02-19 11:11:07 +01:00
Sean Christopherson
1bbc60d0c7 KVM: x86/mmu: Remove MMU auditing
Remove mmu_audit.c and all its collateral, the auditing code has suffered
severe bitrot, ironically partly due to shadow paging being more stable
and thus not benefiting as much from auditing, but mostly due to TDP
supplanting shadow paging for non-nested guests and shadowing of nested
TDP not heavily stressing the logic that is being audited.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 13:46:23 -05:00
Paolo Bonzini
5be2226f41 KVM: x86: allow defining return-0 static calls
A few vendor callbacks are only used by VMX, but they return an integer
or bool value.  Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is
NULL in struct kvm_x86_ops, it will be changed to __static_call_return0
when updating static calls.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:44:22 -05:00
Paolo Bonzini
abb6d479e2 KVM: x86: make several APIC virtualization callbacks optional
All their invocations are conditional on vcpu->arch.apicv_active,
meaning that they need not be implemented by vendor code: even
though at the moment both vendors implement APIC virtualization,
all of them can be optional.  In fact SVM does not need many of
them, and their implementation can be deleted now.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:41:23 -05:00
Paolo Bonzini
dd2319c618 KVM: x86: warn on incorrectly NULL members of kvm_x86_ops
Use the newly corrected KVM_X86_OP annotations to warn about possible
NULL pointer dereferences as soon as the vendor module is loaded.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:31:43 -05:00
Paolo Bonzini
e4fc23bad8 KVM: x86: remove KVM_X86_OP_NULL and mark optional kvm_x86_ops
The original use of KVM_X86_OP_NULL, which was to mark calls
that do not follow a specific naming convention, is not in use
anymore.  Instead, let's mark calls that are optional because
they are always invoked within conditionals or with static_call_cond.
Those that are _not_, i.e. those that are defined with KVM_X86_OP,
must be defined by both vendor modules or some kind of NULL pointer
dereference is bound to happen at runtime.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:31:27 -05:00
Paolo Bonzini
8a2897853c KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC
The two ioctls used to implement userspace-accelerated TPR,
KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available
even if hardware-accelerated TPR can be used.  So there is
no reason not to report KVM_CAP_VAPIC.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18 12:30:56 -05:00
Jakub Kicinski
6b5567b1b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 11:44:20 -08:00
Leonardo Bras
988896bb61 x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
kvm_vcpu_arch currently contains the guest supported features in both
guest_supported_xcr0 and guest_fpu.fpstate->user_xfeatures field.

Currently both fields are set to the same value in
kvm_vcpu_after_set_cpuid() and are not changed anywhere else after that.

Since it's not good to keep duplicated data, remove guest_supported_xcr0.

To keep the code more readable, introduce kvm_guest_supported_xcr()
and kvm_guest_supported_xfd() to replace the previous usages of
guest_supported_xcr0.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220217053028.96432-3-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-17 10:06:49 -05:00
Gustavo A. R. Silva
5224f79096 treewide: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].

This code was transformed with the help of Coccinelle:
(next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)

@@
identifier S, member, array;
type T1, T2;
@@

struct S {
  ...
  T1 member;
  T2 array[
- 0
  ];
};

UAPI and wireless changes were intentionally excluded from this patch
and will be sent out separately.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/78
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2022-02-17 07:00:39 -06:00
Masahiro Yamada
4a3233c1a6 shmbuf.h: add asm/shmbuf.h to UAPI compile-test coverage
asm/shmbuf.h is currently excluded from the UAPI compile-test because of
the errors like follows:

    HDRTEST usr/include/asm/shmbuf.h
  In file included from ./usr/include/asm/shmbuf.h:6,
                   from <command-line>:
  ./usr/include/asm-generic/shmbuf.h:26:33: error: field ‘shm_perm’ has incomplete type
     26 |         struct ipc64_perm       shm_perm;       /* operation perms */
        |                                 ^~~~~~~~
  ./usr/include/asm-generic/shmbuf.h:27:9: error: unknown type name ‘size_t’
     27 |         size_t                  shm_segsz;      /* size of segment (bytes) */
        |         ^~~~~~
  ./usr/include/asm-generic/shmbuf.h:40:9: error: unknown type name ‘__kernel_pid_t’
     40 |         __kernel_pid_t          shm_cpid;       /* pid of creator */
        |         ^~~~~~~~~~~~~~
  ./usr/include/asm-generic/shmbuf.h:41:9: error: unknown type name ‘__kernel_pid_t’
     41 |         __kernel_pid_t          shm_lpid;       /* pid of last operator */
        |         ^~~~~~~~~~~~~~

The errors can be fixed by replacing size_t with __kernel_size_t and by
including proper headers.

Then, remove the no-header-test entry from user/include/Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-17 09:09:37 +01:00
Masahiro Yamada
72113d0a7d signal.h: add linux/signal.h and asm/signal.h to UAPI compile-test coverage
linux/signal.h and asm/signal.h are currently excluded from the UAPI
compile-test because of the errors like follows:

    HDRTEST usr/include/asm/signal.h
  In file included from <command-line>:
  ./usr/include/asm/signal.h:103:9: error: unknown type name ‘size_t’
    103 |         size_t ss_size;
        |         ^~~~~~

The errors can be fixed by replacing size_t with __kernel_size_t.

Then, remove the no-header-test entries from user/include/Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-17 09:09:36 +01:00
Linus Torvalds
c5d9ae265b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Read HW interrupt pending state from the HW

  x86:

   - Don't truncate the performance event mask on AMD

   - Fix Xen runstate updates to be atomic when preempting vCPU

   - Fix for AMD AVIC interrupt injection race

   - Several other AMD fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
  KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
  KVM: SVM: fix race between interrupt delivery and AVIC inhibition
  KVM: SVM: set IRR in svm_deliver_interrupt
  KVM: SVM: extract avic_ring_doorbell
  selftests: kvm: Remove absent target file
  KVM: arm64: vgic: Read HW interrupt pending state from the HW
  KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU
  KVM: x86: SVM: move avic definitions from AMD's spec to svm.h
  KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it
  KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them
  KVM: x86: nSVM: expose clean bit support to the guest
  KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM
  KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
  KVM: x86: nSVM: fix potential NULL derefernce on nested migration
  KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case
  Revert "svm: Add warning message for AVIC IPI invalid target"
2022-02-15 11:07:59 -08:00
Alexander Shishkin
161a9a3370 perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called TNT-Disable which is enabled config bit 55.

TNT-Disable disables Taken-Not-Taken packets to reduce the tracing
overhead, but with the result that exact control flow information is
lost.

Add a capability and config bit for TNT-Disable.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20220126104815.2807416-3-adrian.hunter@intel.com
2022-02-15 17:47:11 +01:00