Commit Graph

41677 Commits

Author SHA1 Message Date
Linus Torvalds
af2c9ac240 - Reorganize the perf LBR init code so that a TSX quirk is applied early
enough in order for the LBR MSR access to not #GP
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLdD5sACgkQEsHwGGHe
 VUoaDA//Yiir+zrTjbCOSnma/pgn3wbWQVe2Y1E1+Tj+av3hBuzTdoAqgDDrFTQ7
 zKwJBjGRb+yQduFbLpLCD9BittX2yk2WGIP3gvLB0ffCLMKU/NOCSKPDZ+HDtvPE
 MkJ1usBLhAYbExgoQVI0Cu9dBL5AR1hKy+ZAw/W7uUhRdR7g9B2gLd1Hg261jmLS
 +imOAQiUgbCAl8d7kMk9lB5gCl3j5DZ+FgpCTYWV4n87cvGBIjAESyjhfAKXRwmg
 Y7acUtOWB935LleA9su2HCtejeIhRoLlV5qUrc0oMUiM6GSymkd6PBRykQ16kajL
 MzKb56daTiK8q4sEhmte0grNNNdyY0nVp2YmvzQ1Cf16Au0XvHIjhLg9WWI+WYmd
 rSP1OXrGoUQ9C2Vex008ajIPQFVDCgRV4euzE8CveiBnNzVp6JCScjNlvVpOgaxM
 04Yesok4nAhJYSYLg1IWXuF5caOyNODJYRNj18ypt2FCEJdDxuDwqVL8DW/oqFz5
 JxTRAuyvojhir/0ztDdMiQ9IkMTjdSboIp1gbjkj0C1J0gJq8Bh9tqNK/uChw58J
 UY9rojGnAa2JEXxQ3P5Dj/zSfw5FoCe7VgACrw+lXw0sJ3CKh08ojRjofP4ET+s1
 77ru8deYJ4sy0IyGLml6WpA3hWwUvkjxWVx1cdnyeBHW9oc7fGQ=
 =nYW6
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Reorganize the perf LBR init code so that a TSX quirk is applied
   early enough in order for the LBR MSR access to not #GP

* tag 'perf_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/lbr: Fix unchecked MSR access error on HSW
2022-07-24 09:55:53 -07:00
Linus Torvalds
05017fed92 - Make retbleed mitigations 64-bit only (32-bit will need a bit more
work if even needed, at all).
 
 - Prevent return thunks patching of the LKDTM modules as it is not needed there
 
 - Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS parts
 
 - Enhance error output of apply_returns() when it fails to patch a return thunk
 
 - A sparse fix to the sev-guest module
 
 - Protect EFI fw calls by issuing an IBPB on AMD
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLdCZIACgkQEsHwGGHe
 VUpZHw//frU5tCHiP+4UM/kGdrcDxVy+FutkoprrNzhYszkvVGWOU4GOyExjnbXj
 cS47AlIAv5o2T+UKN6pwtH1n6Jb+Qf2zsaMlyIcxI7t1PiyddkT9wvrgP52GPNyl
 vrxtDVoo1aYF8z7RxTEtjWHzUlfxQTSlo+olQVtW/JYWnqQOPwvPm+bOA+0MBzwA
 gbDV2nM1Z0D9ifo536J9P93/HtYwW/fsrutTj0hICxweRGnEN7Gw11Ci+6cT+Rgq
 TYO5feaFnnbv0hnfDbWQXAuDVdgi0aJ06CDuoC6U/WmKNpQo2rpjAASVXN/EEQyh
 wwy/bqauj8daOpSU/8/hNx1Rwx3gP6FWEzYpylsn3xP7p9y6dWQnBh4i2ZmMhxUt
 zbIlwNqSFWluFJaSpOnaPJJVrQbRj03Yb1ynTPCcuTl1K1ydGr03J351K8SfeXqP
 bhkecr+BiddFs2Y3udaq/NYJEFRHtbZi3CdDcUulbmEfKOZIQRN0JMSAvR7SH1sZ
 4lljsoSCfJgxKmgP9tVhRc19LD6knxTAzVbtjtyuEmsf1dN36rN5r5Y4lc1AN68s
 12ymbV0qc/Qh6Cn8x/Cgv3S6kXjYrKdbSQHVnJXsJE0kz/mqyIn3+LChU+PoIvmu
 7uGeEouIlDroBgT5WWFWnTgxujFPZD6O2rPsJwOXvTkp9uDof4Q=
 =aSDG
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "A couple more retbleed fallout fixes.

  It looks like their urgency is decreasing so it seems like we've
  managed to catch whatever snafus the limited -rc testing has exposed.
  Maybe we're getting ready... :)

   - Make retbleed mitigations 64-bit only (32-bit will need a bit more
     work if even needed, at all).

   - Prevent return thunks patching of the LKDTM modules as it is not
     needed there

   - Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS
     parts

   - Enhance error output of apply_returns() when it fails to patch a
     return thunk

   - A sparse fix to the sev-guest module

   - Protect EFI fw calls by issuing an IBPB on AMD"

* tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Make all RETbleed mitigations 64-bit only
  lkdtm: Disable return thunks in rodata.c
  x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
  x86/alternative: Report missing return thunk details
  virt: sev-guest: Pass the appropriate argument type to iounmap()
  x86/amd: Use IBPB for firmware calls
2022-07-24 09:40:17 -07:00
Linus Torvalds
515f71412b * Check for invalid flags to KVM_CAP_X86_USER_SPACE_MSR
* Fix use of sched_setaffinity in selftests
 
 * Sync kernel headers to tools
 
 * Fix KVM_STATS_UNIT_MAX
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLaTFwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO5Dwf/bRHhFs7XXdC5YU687bEFq/8/XCbY
 wczM6cEIsWk0chzx92xIXzjb6DKPhrUFjGNH2C55XhLwHhCCUI+Q0zCfZ89ghjdX
 Fe3fNcs6SAq6aLPjRBkk0+vt1jq233KzIV/GQJ5FivocPlWX562FXVEXoB/T26Ml
 ljTmtPBn4Hd+LIE+7+HED2qCNzvNYtx3KGGTsZR7hcjoQmfFjXg+OTN0Uqsa+enW
 lCEcN/gDMaTWFxY7lII63IJA4mE4WkdfYWjzuzvfUFsNU0IQZk+NrVZpiAP9zXeS
 20o9nzetS7h1enLWqdGvJ+m5ot19l24nJeWZ8QQsS3T4XF2h7vL0lY/WBg==
 =BDnJ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 - Check for invalid flags to KVM_CAP_X86_USER_SPACE_MSR

 - Fix use of sched_setaffinity in selftests

 - Sync kernel headers to tools

 - Fix KVM_STATS_UNIT_MAX

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Protect the unused bits in MSR exiting flags
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  KVM: selftests: Fix target thread to be migrated in rseq_test
  KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats
2022-07-23 10:22:26 -07:00
Ben Hutchings
b648ab487f x86/speculation: Make all RETbleed mitigations 64-bit only
The mitigations for RETBleed are currently ineffective on x86_32 since
entry_32.S does not use the required macros.  However, for an x86_32
target, the kconfig symbols for them are still enabled by default and
/sys/devices/system/cpu/vulnerabilities/retbleed will wrongly report
that mitigations are in place.

Make all of these symbols depend on X86_64, and only enable RETHUNK by
default on X86_64.

Fixes: f43b9876e8 ("x86/retbleed: Add fine grained Kconfig knobs")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YtwSR3NNsWp1ohfV@decadent.org.uk
2022-07-23 18:45:11 +02:00
Peter Zijlstra
1e9fdf21a4 mmu_gather: Remove per arch tlb_{start,end}_vma()
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.

 - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
   but does *NOT* want to call it for each VMA.

 - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
   invalidate across multiple VMAs if possible.

With these it is possible to capture the three forms:

  1) empty stubs;
     select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS

  2) start: flush_cache_range(), end: empty;
     select MMU_GATHER_MERGE_VMAS

  3) start: flush_cache_range(), end: flush_tlb_range();
     default

Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-21 10:50:13 -07:00
Kan Liang
b0380e1350 perf/x86/intel/lbr: Fix unchecked MSR access error on HSW
The fuzzer triggers the below trace.

[ 7763.384369] unchecked MSR access error: WRMSR to 0x689
(tried to write 0x1fffffff8101349e) at rIP: 0xffffffff810704a4
(native_write_msr+0x4/0x20)
[ 7763.397420] Call Trace:
[ 7763.399881]  <TASK>
[ 7763.401994]  intel_pmu_lbr_restore+0x9a/0x1f0
[ 7763.406363]  intel_pmu_lbr_sched_task+0x91/0x1c0
[ 7763.410992]  __perf_event_task_sched_in+0x1cd/0x240

On a machine with the LBR format LBR_FORMAT_EIP_FLAGS2, when the TSX is
disabled, a TSX quirk is required to access LBR from registers.
The lbr_from_signext_quirk_needed() is introduced to determine whether
the TSX quirk should be applied. However, the
lbr_from_signext_quirk_needed() is invoked before the
intel_pmu_lbr_init(), which parses the LBR format information. Without
the correct LBR format information, the TSX quirk never be applied.

Move the lbr_from_signext_quirk_needed() into the intel_pmu_lbr_init().
Checking x86_pmu.lbr_has_tsx in the lbr_from_signext_quirk_needed() is
not required anymore.

Both LBR_FORMAT_EIP_FLAGS2 and LBR_FORMAT_INFO have LBR_TSX flag, but
only the LBR_FORMAT_EIP_FLAGS2 requirs the quirk. Update the comments
accordingly.

Fixes: 1ac7fd8159 ("perf/x86/intel/lbr: Support LBR format V7")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220714182630.342107-1-kan.liang@linux.intel.com
2022-07-20 19:24:55 +02:00
Josh Poimboeuf
efc72a665a lkdtm: Disable return thunks in rodata.c
The following warning was seen:

  WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:557 apply_returns (arch/x86/kernel/alternative.c:557 (discriminator 1))
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc4-00008-gee88d363d156 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
  RIP: 0010:apply_returns (arch/x86/kernel/alternative.c:557 (discriminator 1))
  Code: ff ff 74 cb 48 83 c5 04 49 39 ee 0f 87 81 fe ff ff e9 22 ff ff ff 0f 0b 48 83 c5 04 49 39 ee 0f 87 6d fe ff ff e9 0e ff ff ff <0f> 0b 48 83 c5 04 49 39 ee 0f 87 59 fe ff ff e9 fa fe ff ff 48 89

The warning happened when apply_returns() failed to convert "JMP
__x86_return_thunk" to RET.  It was instead a JMP to nowhere, due to the
thunk relocation not getting resolved.

That rodata.o code is objcopy'd to .rodata, and later memcpy'd, so
relocations don't work (and are apparently silently ignored).

LKDTM is only used for testing, so the naked RET should be fine.  So
just disable return thunks for that file.

While at it, disable objtool and KCSAN for the file.

Fixes: 0b53c374b9 ("x86/retpoline: Use -mfunction-return")
Reported-by: kernel test robot <oliver.sang@intel.com>
Debugged-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Ys58BxHxoDZ7rfpr@xsang-OptiPlex-9020/
2022-07-20 19:24:53 +02:00
Pawan Gupta
eb23b5ef91 x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
IBRS mitigation for spectre_v2 forces write to MSR_IA32_SPEC_CTRL at
every kernel entry/exit. On Enhanced IBRS parts setting
MSR_IA32_SPEC_CTRL[IBRS] only once at boot is sufficient. MSR writes at
every kernel entry/exit incur unnecessary performance loss.

When Enhanced IBRS feature is present, print a warning about this
unnecessary performance loss.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/2a5eaf54583c2bfe0edc4fea64006656256cca17.1657814857.git.pawan.kumar.gupta@linux.intel.com
2022-07-20 19:24:53 +02:00
Kees Cook
65cdf0d623 x86/alternative: Report missing return thunk details
Debugging missing return thunks is easier if we can see where they're
happening.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Ys66hwtFcGbYmoiZ@hirez.programming.kicks-ass.net/
2022-07-20 19:24:53 +02:00
Aaron Lewis
cf5029d5dd KVM: x86: Protect the unused bits in MSR exiting flags
The flags for KVM_CAP_X86_USER_SPACE_MSR and KVM_X86_SET_MSR_FILTER
have no protection for their unused bits.  Without protection, future
development for these features will be difficult.  Add the protection
needed to make it possible to extend these features in the future.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20220714161314.1715227-1-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-19 14:04:18 -04:00
Peter Zijlstra
28a99e95f5 x86/amd: Use IBPB for firmware calls
On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.

And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.

  [ bp: Massage. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
2022-07-18 15:38:09 +02:00
Linus Torvalds
59c80f053d - Improve the check whether the kernel supports WP mappings so that it
can accomodate a XenPV guest due to how the latter is setting up the PAT
 machinery
 
 Now that the retbleed nightmare is public, here's the first round of
 fallout fixes:
 
 - Fix a build failure on 32-bit due to missing include
 
 - Remove an untraining point in espfix64 return path
 
 - other small cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLTudcACgkQEsHwGGHe
 VUrY4BAAtWm7wC6T8rzovbsyticj6kcehRMBEXxtlEP5LOeltR0dbNaIGskrS2Li
 Q9YxxtQhbZPXqzqB+xeHVhDPThzsd3+wRvvetmR4fW/c3XCYr+fLLFjHj0NEvX0P
 lQzuY8GKWGU/QTrjKSKclGvqyB692Fvdu4YImlnrGSbR6ywwVttditd3YNJR0w1q
 7S4zq90uwWuX6cTLqXIcbKbhssOjcR1Agj9+bE8i+rzyB2VtNoihJCJh0pTJAn3P
 RaXnxI/7J6Y+5imPf5/ywu8gxhvGBTy5MU/1v2pw939EurU9tmhVkNVWdO2g/qYY
 V+Y1nj9xV0ucL4hlUBqAFdM+5jFC89Ey1X2tSgUgSl+44L/d8IIjVCp6inVoyCzE
 Olbc6q7A/V8PNXfo4g6gDwVc3Ii53Fwgtu8xVHkwPGfjly6+yZ9O/RUBXcBAOnpU
 jfS9LSc/Ro7kxqFy32beUgB7wwhMpkYuHe6ECxrvXj1IK13y3OkdxFzm03ty7S/E
 BwjrkltDia9BQ6i4Ywy+qSBYkSH6+sxxt4pboB+ft6/p2JIw4YJIp9PRqzPAG0jx
 JTjcZ9YAr7zPNlWp5e30BjGawgbuKPq0wkF1r6QD+3VzNf9+SSDmtkYGWeAyTTP2
 SkzLy5QCNBTeeq0FZVYZFm/vcF7wccQhQdwNcezxygKAmihH0xw=
 =ND3p
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Improve the check whether the kernel supports WP mappings so that it
   can accomodate a XenPV guest due to how the latter is setting up the
   PAT machinery

  - Now that the retbleed nightmare is public, here's the first round of
    fallout fixes:

      * Fix a build failure on 32-bit due to missing include

      * Remove an untraining point in espfix64 return path

      * other small cleanups

* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Remove apostrophe typo
  um: Add missing apply_returns()
  x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
  x86/bugs: Mark retbleed_strings static
  x86/pat: Fix x86_has_pat_wp()
  x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
2022-07-17 08:27:30 -07:00
Linus Torvalds
16c957f089 ACPI fix for 5.19-rc7
Fix more fallout from recent changes of the ACPI CPPC handling on AMD
 platforms (Mario Limonciello).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLRsmYSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxw7AP/i0f3x8wTGby77H9aFHxqYD1az5/nUco
 Ymp8+l1Bm9TNbfBa1EqmwGwleho5X2GXUzFX2K3cKqNNp3fG/R0fsp7/+/h8YDmd
 DzwidRaIQPMsqdUzjIncB83UNIhFeh3/gix0n+iZM+cVw2dBdluXHiq88HtDjOFz
 ZkCF0ka5QcJyLJ4DgIbF5/CV/q3hd0ZT3xKLC0HfedZ7YwPRTL1yefUK3sosLZTW
 qDWYL/86+XwoFBntBQ6gfGYS/xQ7nQ50QKa+SD8cqLCSrQFYswLkRnQpDg1XJlMT
 XPc5WRrlbMC5VV+daY5/uMflRVdDLSuBKt5uTyFrvgguvw9S0Bgct8l/tABa8KRL
 AGDMoqW6V0Lc3z1Jyu4xjjTY6lrhyTB+any8K/roGrAxSYqZveMiAgyyMLbZfxNa
 dU1IAIw41g/ucBB/pE0T1jfrVrAIsLVoKXSl4ixpu50yC1DjJrv85zQfOyEVI9jM
 /okPONeRLTbXnB7+xRCA8AB5nXgPOqPqpLYPW7IDHghskacqZiAo6fm7tWEC5ijX
 epiI/JvwU7Pbt9UuS/7Kui9yC/33ErZEdYRC4QOhBQS4vLVpzQMRFRjdF8d50Jzk
 Qm73OzFKDZNm/8ladAWSdyvmvl/Ch8PjAqIBqmZNQYAoBYmWjN1LhuN+d17/ro2I
 jE05pwY8TXQL
 =Y0dz
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Fix more fallout from recent changes of the ACPI CPPC handling on AMD
  platforms (Mario Limonciello)"

* tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory
2022-07-16 10:52:41 -07:00
Thadeu Lima de Souza Cascardo
51a6fa0732 efi/x86: use naked RET on mixed mode call wrapper
When running with return thunks enabled under 32-bit EFI, the system
crashes with:

  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
  BUG: unable to handle page fault for address: 000000005bc02900
  #PF: supervisor instruction fetch in kernel mode
  #PF: error_code(0x0011) - permissions violation
  PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
  Oops: 0011 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:0x5bc02900
  Code: Unable to access opcode bytes at RIP 0x5bc028d6.
  RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
  RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
  RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
  R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
  R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 0000000080050033
  CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
  Call Trace:
   ? efi_set_virtual_address_map+0x9c/0x175
   efi_enter_virtual_mode+0x4a6/0x53e
   start_kernel+0x67c/0x71e
   x86_64_start_reservations+0x24/0x2a
   x86_64_start_kernel+0xe9/0xf4
   secondary_startup_64_no_verify+0xe5/0xeb

That's because it cannot jump to the return thunk from the 32-bit code.

Using a naked RET and marking it as safe allows the system to proceed
booting.

Fixes: aa3d480315 ("x86: Use return-thunk in asm code")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: <stable@vger.kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-16 09:51:24 -07:00
Kim Phillips
bcf163150c x86/bugs: Remove apostrophe typo
Remove a superfluous ' in the mitigation string.

Fixes: e8ec1b6e08 ("x86/bugs: Enable STIBP for JMP2RET")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-16 11:39:23 +02:00
Linus Torvalds
a8ebfcd33c RISC-V:
* Fix missing PAGE_PFN_MASK
 
 * Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
 
 x86:
 
 * Fix for nested virtualization when TSC scaling is active
 
 * Estimate the size of fastcc subroutines conservatively, avoiding disastrous
   underestimation when return thunks are enabled
 
 * Avoid possible use of uninitialized fields of 'struct kvm_lapic_irq'
 
 Generic:
 
 * Mark as such the boolean values available from the statistics file descriptors
 
 * Clarify statistics documentation
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLRVkcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMm4QgAgZHQTSyA4+/xOYs0cBX2Q6YYkDGG
 yUTjiiLmXjzKmjRkfhqKO75aqGhbv08U20hfHRdxxYV5b2Ful/xEnryj+mjyEBmv
 wFO1Q8Tlwi+6Wwen+VN0tjiQwdY/N6+dI39U2Nn4yCtYyLbCALTWSlq3qr6RjhaI
 P8XFXcPweyow3GsFrwgJVJ/vA/gaAhY17NOmdI5icFioTeJbrrAYw88Cbh9PzkGS
 IsgmHn8Yt9a3x/rzo2LhhMbzsXDR87l+OlJhmGCUB5L0kRt8rJz30ysCeKgTpkoz
 QOBZPdODeJ4Pdk4Z82A7NPUAFaaGGxUMkeIoAIXJ0F/VIpKb7+l3AETlZA==
 =x3x6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "RISC-V:
   - Fix missing PAGE_PFN_MASK

   - Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()

  x86:
   - Fix for nested virtualization when TSC scaling is active

   - Estimate the size of fastcc subroutines conservatively, avoiding
     disastrous underestimation when return thunks are enabled

   - Avoid possible use of uninitialized fields of 'struct
     kvm_lapic_irq'

  Generic:
   - Mark as such the boolean values available from the statistics file
     descriptors

   - Clarify statistics documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: emulate: do not adjust size of fastop and setcc subroutines
  KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
  Documentation: kvm: clarify histogram units
  kvm: stats: tell userspace which values are boolean
  x86/kvm: fix FASTOP_SIZE when return thunks are enabled
  KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
  RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
  riscv: Fix missing PAGE_PFN_MASK
2022-07-15 10:31:46 -07:00
Paolo Bonzini
7962918160 KVM: emulate: do not adjust size of fastop and setcc subroutines
Instead of doing complicated calculations to find the size of the subroutines
(which are even more complicated because they need to be stringified into
an asm statement), just hardcode to 16.

It is less dense for a few combinations of IBT/SLS/retbleed, but it has
the advantage of being really simple.

Cc: stable@vger.kernel.org # 5.15.x: 84e7051c0b: x86/kvm: fix FASTOP_SIZE when return thunks are enabled
Cc: stable@vger.kernel.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-15 07:49:40 -04:00
Nathan Chancellor
db88697968 x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Clang warns:

  arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection]
  DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
                      ^
  arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here
  extern u64 x86_spec_ctrl_current;
             ^
  1 error generated.

The declaration should be using DECLARE_PER_CPU instead so all
attributes stay in sync.

Cc: stable@vger.kernel.org
Fixes: fc02735b14 ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-14 14:52:43 -07:00
Vitaly Kuznetsov
8a414f943f KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.

Fixes: a183b638b6 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220708125147.593975-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 12:09:43 -04:00
Paolo Bonzini
cca3f3381b Merge commit 'kvm-vmx-nested-tsc-fix' into kvm-master
Merge bugfix needed in both 5.19 (because it's bad) and 5.20 (because
it is a prerequisite to test new features).
2022-07-14 10:04:44 -04:00
Paolo Bonzini
1b870fa557 kvm: stats: tell userspace which values are boolean
Some of the statistics values exported by KVM are always only 0 or 1.
It can be useful to export this fact to userspace so that it can track
them specially (for example by polling the value every now and then to
compute a % of time spent in a specific state).

Therefore, add "boolean value" as a new "unit".  While it is not exactly
a unit, it walks and quacks like one.  In particular, using the type
would be wrong because boolean values could be instantaneous or peak
values (e.g. "is the rmap allocated?") or even two-bucket histograms
(e.g. "number of posted vs. non-posted interrupt injections").

Suggested-by: Amneesh Singh <natto@weirdnatto.in>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 08:01:59 -04:00
Thadeu Lima de Souza Cascardo
84e7051c0b x86/kvm: fix FASTOP_SIZE when return thunks are enabled
The return thunk call makes the fastop functions larger, just like IBT
does. Consider a 16-byte FASTOP_SIZE when CONFIG_RETHUNK is enabled.

Otherwise, functions will be incorrectly aligned and when computing their
position for differently sized operators, they will executed in the middle
or end of a function, which may as well be an int3, leading to a crash
like:

[   36.091116] int3: 0000 [#1] SMP NOPTI
[   36.091119] CPU: 3 PID: 1371 Comm: qemu-system-x86 Not tainted 5.15.0-41-generic #44
[   36.091120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[   36.091121] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.091185] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.091186] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.091188] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.091188] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.091189] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.091190] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.091190] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.091191] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.091192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.091192] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.091195] PKRU: 55555554
[   36.091195] Call Trace:
[   36.091197]  <TASK>
[   36.091198]  ? fastop+0x5a/0xa0 [kvm]
[   36.091222]  x86_emulate_insn+0x7b8/0xe90 [kvm]
[   36.091244]  x86_emulate_instruction+0x2f4/0x630 [kvm]
[   36.091263]  ? kvm_arch_vcpu_load+0x7c/0x230 [kvm]
[   36.091283]  ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel]
[   36.091290]  complete_emulated_mmio+0x297/0x320 [kvm]
[   36.091310]  kvm_arch_vcpu_ioctl_run+0x32f/0x550 [kvm]
[   36.091330]  kvm_vcpu_ioctl+0x29e/0x6d0 [kvm]
[   36.091344]  ? kvm_vcpu_ioctl+0x120/0x6d0 [kvm]
[   36.091357]  ? __fget_files+0x86/0xc0
[   36.091362]  ? __fget_files+0x86/0xc0
[   36.091363]  __x64_sys_ioctl+0x92/0xd0
[   36.091366]  do_syscall_64+0x59/0xc0
[   36.091369]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091370]  ? do_syscall_64+0x69/0xc0
[   36.091371]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091372]  ? __x64_sys_writev+0x1c/0x30
[   36.091374]  ? do_syscall_64+0x69/0xc0
[   36.091374]  ? exit_to_user_mode_prepare+0x37/0xb0
[   36.091378]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091380]  ? do_syscall_64+0x69/0xc0
[   36.091381]  ? do_syscall_64+0x69/0xc0
[   36.091381]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   36.091384] RIP: 0033:0x7efdfe6d1aff
[   36.091390] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   36.091391] RSP: 002b:00007efdfce8c460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   36.091393] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007efdfe6d1aff
[   36.091393] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c
[   36.091394] RBP: 0000558f1609e220 R08: 0000558f13fb8190 R09: 00000000ffffffff
[   36.091394] R10: 0000558f16b5e950 R11: 0000000000000246 R12: 0000000000000000
[   36.091394] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[   36.091396]  </TASK>
[   36.091397] Modules linked in: isofs nls_iso8859_1 kvm_intel joydev kvm input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler drm msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net net_failover crypto_simd ahci xhci_pci cryptd psmouse virtio_blk libahci xhci_pci_renesas failover
[   36.123271] ---[ end trace db3c0ab5a48fabcc ]---
[   36.123272] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.123319] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.123320] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.123321] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.123321] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.123322] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.123322] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.123323] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.123323] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.123324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.123325] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.123327] PKRU: 55555554
[   36.123328] Kernel panic - not syncing: Fatal exception in interrupt
[   36.123410] Kernel Offset: 0x1400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   36.135305] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: aa3d480315 ("x86: Use return-thunk in asm code")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Message-Id: <20220713171241.184026-1-cascardo@canonical.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 07:44:38 -04:00
Vitaly Kuznetsov
9948272645 KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to
hang upon boot or shortly after when a non-default TSC frequency was
set for L1. The issue is observed on a host where TSC scaling is
supported. The problem appears to be that Windows doesn't use TSC
frequency for its guests even when the feature is advertised and KVM
filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from
L1's. This leads to L2 running with the default frequency (matching
host's) while L1 is running with an altered one.

Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when
it was set for L1. TSC_MULTIPLIER is already correctly computed and
written by prepare_vmcs02().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220712135009.952805-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 07:44:05 -04:00
Alexandre Chartre
d16e0b2667 x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
UNTRAIN_RET is not needed in native_irq_return_ldt because RET
untraining has already been done at this point.

In addition, when the RETBleed mitigation is IBPB, UNTRAIN_RET clobbers
several registers (AX, CX, DX) so here it trashes user values which are
in these registers.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/35b0d50f-12d1-10c3-f5e8-d6c140486d4a@oracle.com
2022-07-14 09:45:12 +02:00
Jiapeng Chong
33a8573bdf x86/bugs: Mark retbleed_strings static
This symbol is not used outside of bugs.c, so mark it static.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220714072939.71162-1-jiapeng.chong@linux.alibaba.com
2022-07-14 09:41:30 +02:00
Mario Limonciello
fbd74d1689 ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory
When commit 72f2ecb7ec ("ACPI: bus: Set CPPC _OSC bits for all
and when CPPC_LIB is supported") was introduced, we found collateral
damage that a number of AMD systems that supported CPPC but
didn't advertise support in _OSC stopped having a functional
amd-pstate driver. The _OSC was only enforced on Intel systems at that
time.

This was fixed for the MSR based designs by commit 8b356e536e
("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported")
but some shared memory based designs also support CPPC but haven't
advertised support in the _OSC.  Add support for those designs as well by
hardcoding the list of systems.

Fixes: 72f2ecb7ec ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported")
Fixes: 8b356e536e ("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported")
Link: https://lore.kernel.org/all/3559249.JlDtxWtqDm@natalenko.name/
Cc: 5.18+ <stable@vger.kernel.org> # 5.18+
Reported-and-tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-13 21:13:14 +02:00
Juergen Gross
230ec83d42 x86/pat: Fix x86_has_pat_wp()
x86_has_pat_wp() is using a wrong test, as it relies on the normal
PAT configuration used by the kernel. In case the PAT MSR has been
setup by another entity (e.g. Xen hypervisor) it might return false
even if the PAT configuration is allowing WP mappings. This due to the
fact that when running as Xen PV guest the PAT MSR is setup by the
hypervisor and cannot be changed by the guest. This results in the WP
related entry to be at a different position when running as Xen PV
guest compared to the bare metal or fully virtualized case.

The correct way to test for WP support is:

1. Get the PTE protection bits needed to select WP mode by reading
   __cachemode2pte_tbl[_PAGE_CACHE_MODE_WP] (depending on the PAT MSR
   setting this might return protection bits for a stronger mode, e.g.
   UC-)
2. Translate those bits back into the real cache mode selected by those
   PTE bits by reading __pte2cachemode_tbl[__pte2cm_idx(prot)]
3. Test for the cache mode to be _PAGE_CACHE_MODE_WP

Fixes: f88a68facd ("x86/mm: Extend early_memremap() support with additional attrs")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14
Link: https://lore.kernel.org/r/20220503132207.17234-1-jgross@suse.com
2022-07-13 12:44:04 +02:00
Jiri Slaby
3131ef39fb x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
The build on x86_32 currently fails after commit

  9bb2ec608a (objtool: Update Retpoline validation)

with:

  arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe'

ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is
missing this include. Fix this.

Fixes: 9bb2ec608a ("objtool: Update Retpoline validation")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/63e23f80-033f-f64e-7522-2816debbc367@kernel.org
2022-07-13 12:43:26 +02:00
Linus Torvalds
0d8ba24e72 Just when you thought that all the speculation bugs were addressed and
solved and the nightmare is complete, here's the next one: speculating
 after RET instructions and leaking privileged information using the now
 pretty much classical covert channels.
 
 It is called RETBleed and the mitigation effort and controlling
 functionality has been modelled similar to what already existing
 mitigations provide.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLNdDYACgkQEsHwGGHe
 VUrNAw/+OTFF7md0+17Ju6vvagc/nXfUxk/r0lWU9/KzbRXvPTZdPKTW4NN5c0IS
 VnogyUGFFpzU3dKU2os9ejTD4kHNx0oLuBfQt4w7t4qR+g3+nAH0ywNjH/N1VTJt
 iDpww7CxqloV+i9RCsWV+zQPMPfc2VMUhe6xqNB2CgEDrruzFrDASZR6zzarsKxY
 x4rwHn0ZkV7zNJfcNpV2323qktqHgBtAFf7GlZK8hBsgsiSk+xDk9CODkfxfWIV7
 o4BNvNmaUKDJL51hpuzvIzYwDSiRO5AXdjxHG/0CHc3r3dtA6Xt1elHbERAyUMuM
 P+6XievP5ZV/xXXjoZ5Vla67o3bbGKmTo2WluvVGeg8ahzQEwyPGqeXn77hk+of+
 BtasZyLgfdwSeWExxp0n5Nhh972TMpy5K4gqOFXcxvPSuTl6tTw77F1u0UQLaVVH
 QzHNu+RO/2iQ/P30cOM11IbZ9sfcBOj+5mjfoDoR4qCtoCQfyfHK+HlwXjZ+uk98
 xU/FnQbOKPRVxiyCVhrbKFxjW7iL7AIb0nRgxHzGGoIJ6A71Tbwa/5gGakE7WEBz
 e7ce8NW2JFucGBFYyiBab6I6fB7lbvmqbNPerYEVoU5YxZkMu+xxyToqBnsyPfHZ
 lxgEGREUaY8aZmGDfrD9EYyhhtQU/MwdpN+FY3xXQdUJkvkNaLg=
 =0Ca0
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull lockdep fix for x86 retbleed from Borislav Petkov:

 - Fix lockdep complaint for __static_call_fixup()

* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/static_call: Serialize __static_call_fixup() properly
2022-07-12 08:40:09 -07:00
Thomas Gleixner
c27c753ea6 x86/static_call: Serialize __static_call_fixup() properly
__static_call_fixup() invokes __static_call_transform() without holding
text_mutex, which causes lockdep to complain in text_poke_bp().

Adding the proper locking cures that, but as this is either used during
early boot or during module finalizing, it's not required to use
text_poke_bp(). Add an argument to __static_call_transform() which tells
it to use text_poke_early() for it.

Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-12 14:23:32 +02:00
Linus Torvalds
ce114c8668 Just when you thought that all the speculation bugs were addressed and
solved and the nightmare is complete, here's the next one: speculating
 after RET instructions and leaking privileged information using the now
 pretty much classical covert channels.
 
 It is called RETBleed and the mitigation effort and controlling
 functionality has been modelled similar to what already existing
 mitigations provide.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLKqAgACgkQEsHwGGHe
 VUoM5w/8CSvwPZ3otkhmu8MrJPtWc7eLDPjYN4qQP+19e+bt094MoozxeeWG2wmp
 hkDJAYHT2Oik/qDuEdhFgNYwS7XGgbV3Py3B8syO4//5SD5dkOSG+QqFXvXMdFri
 YsVqqNkjJOWk/YL9Ql5RS/xQewsrr0OqEyWWocuI6XAvfWV4kKvlRSd+6oPqtZEO
 qYlAHTXElyIrA/gjmxChk1HTt5HZtK3uJLf4twNlUfzw7LYFf3+sw3bdNuiXlyMr
 WcLXMwGpS0idURwP3mJa7JRuiVBzb4+kt8mWwWqA02FkKV45FRRRFhFUsy667r00
 cdZBaWdy+b7dvXeliO3FN/x1bZwIEUxmaNy1iAClph4Ifh0ySPUkxAr8EIER7YBy
 bstDJEaIqgYg8NIaD4oF1UrG0ZbL0ImuxVaFdhG1hopQsh4IwLSTLgmZYDhfn/0i
 oSqU0Le+A7QW9s2A2j6qi7BoAbRW+gmBuCgg8f8ECYRkFX1ZF6mkUtnQxYrU7RTq
 rJWGW9nhwM9nRxwgntZiTjUUJ2HtyXEgYyCNjLFCbEBfeG5QTg7XSGFhqDbgoymH
 85vsmSXYxgTgQ/kTW7Fs26tOqnP2h1OtLJZDL8rg49KijLAnISClEgohYW01CWQf
 ZKMHtz3DM0WBiLvSAmfGifScgSrLB5AjtvFHT0hF+5/okEkinVk=
 =09fW
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 retbleed fixes from Borislav Petkov:
 "Just when you thought that all the speculation bugs were addressed and
  solved and the nightmare is complete, here's the next one: speculating
  after RET instructions and leaking privileged information using the
  now pretty much classical covert channels.

  It is called RETBleed and the mitigation effort and controlling
  functionality has been modelled similar to what already existing
  mitigations provide"

* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  x86/speculation: Disable RRSBA behavior
  x86/kexec: Disable RET on kexec
  x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
  x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry
  x86/bugs: Add Cannon lake to RETBleed affected CPU list
  x86/retbleed: Add fine grained Kconfig knobs
  x86/cpu/amd: Enumerate BTC_NO
  x86/common: Stamp out the stepping madness
  KVM: VMX: Prevent RSB underflow before vmenter
  x86/speculation: Fill RSB on vmexit for IBRS
  KVM: VMX: Fix IBRS handling after vmexit
  KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
  KVM: VMX: Convert launched argument to flags
  KVM: VMX: Flatten __vmx_vcpu_run()
  objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
  x86/speculation: Remove x86_spec_ctrl_mask
  x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
  x86/speculation: Fix SPEC_CTRL write on SMT state change
  x86/speculation: Fix firmware entry SPEC_CTRL handling
  x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
  ...
2022-07-11 18:15:25 -07:00
Linus Torvalds
74a0032b85 - Prepare for and clear .brk early in order to address XenPV guests
failures where the hypervisor verifies page tables and uninitialized
 data in that range leads to bogus failures in those checks
 
 - Add any potential setup_data entries supplied at boot to the identity
 pagetable mappings to prevent kexec kernel boot failures. Usually, this
 is not a problem for the normal kernel as those mappings are part of
 the initially mapped 2M pages but if kexec gets to allocate the second
 kernel somewhere else, those setup_data entries need to be mapped there
 too.
 
 - Fix objtool not to discard text references from the __tracepoints
 section so that ENDBR validation still works
 
 - Correct the setup_data types limit as it is user-visible, before 5.19
 releases
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLKpf8ACgkQEsHwGGHe
 VUrc5w/8DIVLQ8w+Balf2TGfp5Sl3mPkg+eoARH29qtXhvVBs5KJB9sbT1IGnxao
 nE4yNeiIKhH5SEd17l11E7eWuUtNgZENLsUb3aiAdsItNS+MzOWQuEOPbnAwgJmk
 oKdxiI1SHiVoPy5KVXOcyAS90PSJIkhhxwgR5MInGdmpSUzEFsx5SY82ZfOjOkZU
 L7zCsJzeDfhJdWiR4N0MXWRaFbIvRxI1uXyqgv+Lo6JK5l8dyUUSEdWyLUqZ7E4M
 GFo6LwR3lskQM2bE9vBWS0h1X00d5oDMzfono8kZzRGA/11plZHRI007PCez8yZh
 4sUnnxsfCy2YF8/8hs4IhrHZdcWW9XoN4gTUsjD0wekGTHhOEqu5qpAnVSrXbvvM
 ZfPF8vM+DLPTWQqAT0a4aj1vd1RflDIQPSXKDzJDjeF49zouAj1ae/3KSOYJDzN9
 V6NGiKBnzj1rbtm0+8jOsTQusmh/oDage7uLlmel3hTfNOc2Ay0LXrJWcvqhj66V
 4CtCd12sLeavin+mGptni6lXbsue61EolRtH44RvZJsXLVY8iclM4onl728xOrxj
 CBtJo6bd3oQYy0SQsysXGDVR7BSXtwAYfArYR8BrMTtgHxuyULt/BDoew4r7XADB
 Xxz7ADJZ3DI3Gqza5H6r89Tj6Oi3yXiBWUVUNXFCMYc6ZrqvZc0=
 =tOvF
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Prepare for and clear .brk early in order to address XenPV guests
   failures where the hypervisor verifies page tables and uninitialized
   data in that range leads to bogus failures in those checks

 - Add any potential setup_data entries supplied at boot to the identity
   pagetable mappings to prevent kexec kernel boot failures. Usually,
   this is not a problem for the normal kernel as those mappings are
   part of the initially mapped 2M pages but if kexec gets to allocate
   the second kernel somewhere else, those setup_data entries need to be
   mapped there too.

 - Fix objtool not to discard text references from the __tracepoints
   section so that ENDBR validation still works

 - Correct the setup_data types limit as it is user-visible, before 5.19
   releases

* tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix the setup data types max limit
  x86/ibt, objtool: Don't discard text references from tracepoint section
  x86/compressed/64: Add identity mappings for setup_data entries
  x86: Fix .brk attribute in linker script
  x86: Clear .brk area at early boot
  x86/xen: Use clear_bss() for Xen PV guests
2022-07-10 08:43:52 -07:00
Borislav Petkov
cb8a4beac3 x86/boot: Fix the setup data types max limit
Commit in Fixes forgot to change the SETUP_TYPE_MAX definition which
contains the highest valid setup data type.

Correct that.

Fixes: 5ea98e01ab ("x86/boot: Add Confidential Computing type to setup_data")
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/ddba81dd-cc92-699c-5274-785396a17fb5@zytor.com
2022-07-10 11:17:40 +02:00
Pawan Gupta
4ad3278df6 x86/speculation: Disable RRSBA behavior
Some Intel processors may use alternate predictors for RETs on
RSB-underflow. This condition may be vulnerable to Branch History
Injection (BHI) and intramode-BTI.

Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines,
eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against
such attacks. However, on RSB-underflow, RET target prediction may
fallback to alternate predictors. As a result, RET's predicted target
may get influenced by branch history.

A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback
behavior when in kernel mode. When set, RETs will not take predictions
from alternate predictors, hence mitigating RETs as well. Support for
this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2).

For spectre v2 mitigation, when a user selects a mitigation that
protects indirect CALLs and JMPs against BHI and intramode-BTI, set
RRSBA_DIS_S also to protect RETs for RSB-underflow case.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09 13:12:45 +02:00
Konrad Rzeszutek Wilk
697977d841 x86/kexec: Disable RET on kexec
All the invocations unroll to __x86_return_thunk and this file
must be PIC independent.

This fixes kexec on 64-bit AMD boxes.

  [ bp: Fix 32-bit build. ]

Reported-by: Edward Tran <edward.tran@oracle.com>
Reported-by: Awais Tanveer <awais.tanveer@oracle.com>
Suggested-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09 13:12:32 +02:00
Thadeu Lima de Souza Cascardo
2259da159f x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
There are some VM configurations which have Skylake model but do not
support IBPB. In those cases, when using retbleed=ibpb, userspace is going
to be killed and kernel is going to panic.

If the CPU does not support IBPB, warn and proceed with the auto option. Also,
do not fallback to IBPB on AMD/Hygon systems if it is not supported.

Fixes: 3ebc170068 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-08 12:50:52 +02:00
Peter Zijlstra
2c08b9b38f x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry
Commit

  ee774dac0d ("x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry()")

moved PUSH_AND_CLEAR_REGS out of error_entry, into its own function, in
part to avoid calling error_entry() for XenPV.

However, commit

  7c81c0c921 ("x86/entry: Avoid very early RET")

had to change that because the 'ret' was too early and moved it into
idtentry, bloating the text size, since idtentry is expanded for every
exception vector.

However, with the advent of xen_error_entry() in commit

  d147553b64 ("x86/xen: Add UNTRAIN_RET")

it became possible to remove PUSH_AND_CLEAR_REGS from idtentry, back
into *error_entry().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-07 13:39:42 +02:00
Pawan Gupta
f54d45372c x86/bugs: Add Cannon lake to RETBleed affected CPU list
Cannon lake is also affected by RETBleed, add it to the list.

Fixes: 6ad0ad2bf8 ("x86/bugs: Report Intel retbleed vulnerability")
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-07 12:33:53 +02:00
Michael Roth
b57feed2cc x86/compressed/64: Add identity mappings for setup_data entries
The decompressed kernel initially relies on the identity map set up by
the boot/compressed kernel for accessing things like boot_params. With
the recent introduction of SEV-SNP support, the decompressed kernel
also needs to access the setup_data entries pointed to by
boot_params->hdr.setup_data.

This can lead to a crash in the kexec kernel during early boot due to
these entries not currently being included in the initial identity map,
see thread at Link below.

Include mappings for the setup_data entries in the initial identity map.

  [ bp: Massage commit message and use a helper var for better readability. ]

Fixes: b190a043c4 ("x86/sev: Add SEV-SNP feature detection/setup")
Reported-by: Jun'ichi Nomura <junichi.nomura@nec.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/TYCPR01MB694815CD815E98945F63C99183B49@TYCPR01MB6948.jpnprd01.prod.outlook.com
2022-07-06 11:23:39 +02:00
Mario Limonciello
8b356e536e ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported
commit 72f2ecb7ec ("ACPI: bus: Set CPPC _OSC bits for all and
when CPPC_LIB is supported") added support for claiming to
support CPPC in _OSC on non-Intel platforms.

This unfortunately caused a regression on a vartiety of AMD
platforms in the field because a number of AMD platforms don't set
the `_OSC` bit 5 or 6 to indicate CPPC or CPPC v2 support.

As these AMD platforms already claim CPPC support via a dedicated
MSR from `X86_FEATURE_CPPC`, use this enable this feature rather
than requiring the `_OSC` on platforms with a dedicated MSR.

If there is additional breakage on the shared memory designs also
missing this _OSC, additional follow up changes may be needed.

Fixes: 72f2ecb7ec ("Set CPPC _OSC bits for all and when CPPC_LIB is supported")
Reported-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-05 20:36:11 +02:00
Juergen Gross
7e09ac27f4 x86: Fix .brk attribute in linker script
Commit in Fixes added the "NOLOAD" attribute to the .brk section as a
"failsafe" measure.

Unfortunately, this leads to the linker no longer covering the .brk
section in a program header, resulting in the kernel loader not knowing
that the memory for the .brk section must be reserved.

This has led to crashes when loading the kernel as PV dom0 under Xen,
but other scenarios could be hit by the same problem (e.g. in case an
uncompressed kernel is used and the initrd is placed directly behind
it).

So drop the "NOLOAD" attribute. This has been verified to correctly
cover the .brk section by a program header of the resulting ELF file.

Fixes: e32683c6f7 ("x86/mm: Fix RESERVE_BRK() for older binutils")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20220630071441.28576-4-jgross@suse.com
2022-07-01 11:12:43 +02:00
Juergen Gross
38fa5479b4 x86: Clear .brk area at early boot
The .brk section has the same properties as .bss: it is an alloc-only
section and should be cleared before being used.

Not doing so is especially a problem for Xen PV guests, as the
hypervisor will validate page tables (check for writable page tables
and hypervisor private bits) before accepting them to be used.

Make sure .brk is initially zero by letting clear_bss() clear the brk
area, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220630071441.28576-3-jgross@suse.com
2022-07-01 11:11:34 +02:00
Juergen Gross
96e8fc5818 x86/xen: Use clear_bss() for Xen PV guests
Instead of clearing the bss area in assembly code, use the clear_bss()
function.

This requires to pass the start_info address as parameter to
xen_start_kernel() in order to avoid the xen_start_info being zeroed
again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
2022-07-01 10:57:52 +02:00
Peter Zijlstra
f43b9876e8 x86/retbleed: Add fine grained Kconfig knobs
Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-29 17:43:41 +02:00
Andrew Cooper
26aae8ccbc x86/cpu/amd: Enumerate BTC_NO
BTC_NO indicates that hardware is not susceptible to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

Hypervisors are expected to synthesise BTC_NO when it is appropriate
given the migration pool, to prevent kernels using heuristics.

  [ bp: Massage. ]

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:01 +02:00
Peter Zijlstra
7a05bc95ed x86/common: Stamp out the stepping madness
The whole MMIO/RETBLEED enumeration went overboard on steppings. Get
rid of all that and simply use ANY.

If a future stepping of these models would not be affected, it had
better set the relevant ARCH_CAP_$FOO_NO bit in
IA32_ARCH_CAPABILITIES.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:01 +02:00
Josh Poimboeuf
07853adc29 KVM: VMX: Prevent RSB underflow before vmenter
On VMX, there are some balanced returns between the time the guest's
SPEC_CTRL value is written, and the vmenter.

Balanced returns (matched by a preceding call) are usually ok, but it's
at least theoretically possible an NMI with a deep call stack could
empty the RSB before one of the returns.

For maximum paranoia, don't allow *any* returns (balanced or otherwise)
between the SPEC_CTRL write and the vmenter.

  [ bp: Fix 32-bit build. ]

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Josh Poimboeuf
9756bba284 x86/speculation: Fill RSB on vmexit for IBRS
Prevent RSB underflow/poisoning attacks with RSB.  While at it, add a
bunch of comments to attempt to document the current state of tribal
knowledge about RSB attacks and what exactly is being mitigated.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Josh Poimboeuf
bea7e31a5c KVM: VMX: Fix IBRS handling after vmexit
For legacy IBRS to work, the IBRS bit needs to be always re-written
after vmexit, even if it's already on.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00
Josh Poimboeuf
fc02735b14 KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
On eIBRS systems, the returns in the vmexit return path from
__vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks.

Fix that by moving the post-vmexit spec_ctrl handling to immediately
after the vmexit.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:34:00 +02:00