Commit Graph

1106901 Commits

Author SHA1 Message Date
Corey Minyard
d5d91586be ipmi: Add a sysfs count of total outstanding messages for an interface
Go through each user and add its message count to a total and print the
total.

It would be nice to have a per-user file, but there's no user sysfs
entity at this point to hang it off of.  Probably not worth the effort.

Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com>

Cc: Chen Guanqiao <chen.chenchacha@foxmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12 10:00:03 -05:00
Corey Minyard
f60231885f ipmi: Add a sysfs interface to view the number of users
A count of users is kept for each interface, allow it to be viewed.

Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com>

Cc: Chen Guanqiao <chen.chenchacha@foxmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12 10:00:03 -05:00
Corey Minyard
333730e456 ipmi: Limit the number of message a user may have outstanding
This way a rogue application can't use up a bunch of memory.

Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com>

Cc: Chen Guanqiao <chen.chenchacha@foxmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12 10:00:02 -05:00
Corey Minyard
8e76741c3d ipmi: Add a limit on the number of users that may use IPMI
Each user uses memory, we need limits to avoid a rogue program from
running the system out of memory.

Based on work by Chen Guanqiao <chen.chenchacha@foxmail.com>

Cc: Chen Guanqiao <chen.chenchacha@foxmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-05-12 10:00:02 -05:00
Mario Limonciello
d52848620d ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default
ASUS B1400CEAE fails to resume from suspend to idle by default.  This was
bisected back to commit df4f9bc4fb ("nvme-pci: add support for ACPI
StorageD3Enable property") but this is a red herring to the problem.

Before this commit the system wasn't getting into deepest sleep state.
Presumably this commit is allowing entry into deepest sleep state as
advertised by firmware, but there are some other problems related to
the wakeup.

As it is confirmed the system works properly with S3, set the default for
this system to S3.

Reported-by: Jian-Hong Pan <jhp@endlessos.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-12 16:56:58 +02:00
Ian Cowan
4c19851c70 ACPI: clean up white space in a few places for consistency
This cleans up a few line spaces so that it is consistent with the rest
of the file. There are a few places where a space was added before a
return and two spots where a double line space was made into one line
space.

Signed-off-by: Ian Cowan <ian@linux.cowan.aero>
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-12 16:54:49 +02:00
Nirmal Patel
c94f732e80 PCI: vmd: Revert 2565e5b69c ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
Revert 2565e5b69c ("PCI: vmd: Do not disable MSI-X remapping if
interrupt remapping is enabled by IOMMU.")

The commit 2565e5b69c was added as a workaround to keep MSI-X
remapping enabled if IOMMU enables interrupt remapping. VMD would keep
running in low performance mode. There is no dependency between MSI-X
remapping by VMD and interrupt remapping by IOMMU.

Link: https://lore.kernel.org/r/20220511095707.25403-3-nirmal.patel@linux.intel.com
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-05-12 15:54:14 +01:00
Nirmal Patel
886e67100b PCI: vmd: Assign VMD IRQ domain before enumeration
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.

Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.

As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:

  DMAR: DRHD: handling fault status reg 2
  DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request

To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.

Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.com
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2022-05-12 15:54:14 +01:00
Rafael J. Wysocki
b7fbf4cebd ACPI: glue: Rearrange find_child_checks()
Notice that it is not necessary to evaluate _STA in find_child_checks()
if the device is expected to have children, but there are none, so
move the children check to the front of the function.

Also notice that FIND_CHILD_MIN_SCORE can be returned right away if
_STA is missing, so make the function do so.

Finally, replace the ternary operator in the return statement argument
with an if () and a standalone return which is somewhat easier to
follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-12 16:52:32 +02:00
Yang Li
516edb456f nilfs2: Fix some kernel-doc comments
The description of @flags in nilfs_dirty_inode() kernel-doc comment is
missing, and some functions had kernel-doc that used a hash instead of a
colon to separate the parameter name from the one line description.

Fix them to remove some warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.

fs/nilfs2/inode.c:73: warning: Function parameter or member 'inode' not
described in 'nilfs_get_block'
fs/nilfs2/inode.c:73: warning: Function parameter or member 'blkoff' not
described in 'nilfs_get_block'
fs/nilfs2/inode.c:73: warning: Function parameter or member 'bh_result'
not described in 'nilfs_get_block'
fs/nilfs2/inode.c:73: warning: Function parameter or member 'create' not
described in 'nilfs_get_block'
fs/nilfs2/inode.c:145: warning: Function parameter or member 'file' not
described in 'nilfs_readpage'
fs/nilfs2/inode.c:145: warning: Function parameter or member 'page' not
described in 'nilfs_readpage'
fs/nilfs2/inode.c:968: warning: Function parameter or member 'flags' not
described in 'nilfs_dirty_inode'

Link: https://lkml.kernel.org/r/20220324024215.63479-1-yang.lee@linux.alibaba.com
Link: https://lkml.kernel.org/r/1652276316-7791-1-git-send-email-konishi.ryusuke@gmail.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-12 10:49:23 -04:00
Matthew Wilcox (Oracle)
08104fb0b1 Appoint myself page cache maintainer
This feels like a sufficiently distinct area of responsibility to be
worth separating out from both MM and VFS.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
2022-05-12 10:48:49 -04:00
Ming Lei
a327c341dc blk-mq: fix passthrough plugging
First we can't add request into plug list in blk_mq_request_bypass_insert
which may be called when flushing plug list, so nested plug is caused.

Second if polled passthrough request is inserted via blk_execute_rq(),
it can't be added to plug list too since io polling needs the request
to be issued to driver.

Fixes the two by moving plugging into blk_execute_rq_no_wait().

Cc: Christoph Hellwig <hch@lst.de>
Fixes: 1c2d2fff6d ("block: wire-up support for passthrough plugging")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220512140010.1458645-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-12 08:45:39 -06:00
Geert Uytterhoeven
66d7a40beb mtd: nand: MTD_NAND_ECC_MEDIATEK should depend on ARCH_MEDIATEK
The MediaTek Hardware ECC Engine is only present on MediaTek MT27xx and
MT76xx SoCs.  The driver for this engine is a dependency for the
MediaTek NAND controller (MTD_NAND_MTK) and the MediaTek SPI NAND Flash
Interface (SPI_MTK_SNFI) drivers, both of which already depend on
ARCH_MEDIATEK.

Hence add a dependency on ARCH_MEDIATEK to the Hardware ECC Engine
driver, too, to prevent asking the user about this driver when
configuring a kernel without MediaTek SoC support.

Fixes: 4fd62f15af ("mtd: nand: make mtk_ecc.c a separated module")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/bb9568e825d4bc7506870b03836baa91bcc4b725.1652104136.git.geert+renesas@glider.be
2022-05-12 16:43:04 +02:00
Siddh Raman Pant
75d6fe48a2 spi: Doc fix - Describe add_lock and dma_map_dev in spi_controller
This fixes the corresponding warnings during building the docs.

Signed-off-by: Siddh Raman Pant <siddhpant.gh@gmail.com>
Link: https://lore.kernel.org/r/4e6187a4-d0f8-4750-e407-e09cc1c91789@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-12 15:43:04 +01:00
Minghao Chi
c96f824af0 mtd: rawnand: cs553x: simplify the return expression of cs553x_write_ctrl_byte()
Simplify the return expression.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220505022354.61458-1-chi.minghao@zte.com.cn
2022-05-12 16:43:03 +02:00
Vaishnav Achath
606e5d4081 spi: cadence-quadspi: Handle spi_unregister_master() in remove()
Currently devres managed removal of the spi_controller happens after
removing the power domain of the host platform_device.While this
does not affect the clean removal of the controller, but affects
graceful removal of the child devices if the child  device removal
requires issuing commands over SPI.

Eg. flash device being soft reset to 1S-1S-1S mode before removal
so that on next probe operations in 1S-1S-1S mode is successful.

Failure is seen when `rmmod spi-cadence-quadspi` is performed:

root@j7-evm:~# rmmod spi_cadence_quadspi
[ 49.230996] cadence-qspi 47050000.spi: QSPI is still busy after 500ms timeout.
[ 49.238209] spi-nor spi1.0: operation failed with -110
[ 49.244457] spi-nor spi1.0: Software reset failed: -110

and on subsequent modprobe the OSPI flash probe fails as it
is in 8D-8D-8D mode since the previous soft reset did not happen.

root@j7-evm:~# modprobe spi_cadence_quadspi
[ 73.253536] spi-nor spi0.0: unrecognized JEDEC id bytes: ff ff ff ff ff ff
[ 73.260476] spi-nor: probe of spi0.0 failed with error -2

This commit adds necessary changes to perform spi_unregister_master()
in the host device remove() so that the child devices are gracefully
removed before the power domain is removed.

changes tested on J721E with mt35xu512aba flash.

Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com>
Link: https://lore.kernel.org/r/20220511115516.14894-1-vaishnav.a@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-05-12 15:43:03 +01:00
Rickard x Andersson
773898127e mtd: rawnand: kioxia: Add support for TH58NVG3S0HBAI4
Add timings for Kioxia/Toshiba TH58NVG3S0HBAI4. Timings
for this memory matches the timings selected for
TH58NVG2S3HBAI4.

This patch increases eraseblock write speed from 5248 KiB/s
to 6864 KiB/s and erase block read speed from 8542 KiB/s
to 18360 KiB/s

Tested on i.MX6SX.

Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220429083931.26795-1-rickaran@axis.com
2022-05-12 16:43:01 +02:00
Zucheng Zheng
bc469ddf67 perf/x86/amd: Remove unused variable 'hwc'
'hwc' is never used in amd_pmu_enable_all(), so remove it.

Signed-off-by: Zucheng Zheng <zhengzucheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421111031.174698-1-zhengzucheng@huawei.com
2022-05-12 16:28:46 +02:00
Tiezhu Yang
4bc7800588 objtool: Remove libsubcmd.a when make clean
The file libsubcmd.a still exists after make clean, remove it.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/1652258270-6278-3-git-send-email-yangtiezhu@loongson.cn
2022-05-12 07:28:35 -07:00
Tiezhu Yang
f193c32cad objtool: Remove inat-tables.c when make clean
When build objtool on x86, the generated file inat-tables.c is in
arch/x86/lib instead of arch/x86, use the correct dir to remove it
when make clean.

$ cd tools/objtool
$ make
[...]
  GEN     arch/x86/lib/inat-tables.c
[...]

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/1652258270-6278-2-git-send-email-yangtiezhu@loongson.cn
2022-05-12 07:28:05 -07:00
Dinh Nguyen
3a21c3ac93 dt-bindings: gpio: altera: correct interrupt-cells
update documentation to correctly state the interrupt-cells to be 2.

Cc: stable@vger.kernel.org
Fixes: 4fd9bbc6e0 ("drivers/gpio: Altera soft IP GPIO driver devicetree binding")
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2022-05-12 09:27:07 -05:00
Krzysztof Kozlowski
f9ccf752ed ARM: dts: socfpga: align SPI NOR node name with dtschema
The node names should be generic and SPI NOR dtschema expects "flash".

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2022-05-12 09:23:30 -05:00
Mark Brown
fd4b80044b ASoC: SOF: Add IPC4 FW loader support
Merge series from Ranjani Sridharan <ranjani.sridharan@linux.intel.com>:

The patches in this series add support for FW loading for IPC4 in the SOF
driver.
2022-05-12 15:18:23 +01:00
Sean Christopherson
b28cb0cd2c KVM: x86/mmu: Update number of zapped pages even if page list is stable
When zapping obsolete pages, update the running count of zapped pages
regardless of whether or not the list has become unstable due to zapping
a shadow page with its own child shadow pages.  If the VM is backed by
mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
the batch count and thus without yielding.  In the worst case scenario,
this can cause a soft lokcup.

 watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [dirty_log_perf_:13020]
   RIP: 0010:workingset_activation+0x19/0x130
   mark_page_accessed+0x266/0x2e0
   kvm_set_pfn_accessed+0x31/0x40
   mmu_spte_clear_track_bits+0x136/0x1c0
   drop_spte+0x1a/0xc0
   mmu_page_zap_pte+0xef/0x120
   __kvm_mmu_prepare_zap_page+0x205/0x5e0
   kvm_mmu_zap_all_fast+0xd7/0x190
   kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
   kvm_page_track_flush_slot+0x5c/0x80
   kvm_arch_flush_shadow_memslot+0xe/0x10
   kvm_set_memslot+0x1a8/0x5d0
   __kvm_set_memory_region+0x337/0x590
   kvm_vm_ioctl+0xb08/0x1040

Fixes: fbb158cb88 ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""")
Reported-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220511145122.3133334-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 10:09:51 -04:00
Vipin Sharma
6ba1e04fa6 KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
Avoid calling handlers on empty rmap entries and skip to the next non
empty rmap entry.

Empty rmap entries are noop in handlers.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220502220347.174664-1-vipinsh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:45 -04:00
Kai Huang
3c5c32457d KVM: VMX: Include MKTME KeyID bits in shadow_zero_check
Intel MKTME KeyID bits (including Intel TDX private KeyID bits) should
never be set to SPTE.  Set shadow_me_value to 0 and shadow_me_mask to
include all MKTME KeyID bits to include them to shadow_zero_check.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <27bc10e97a3c0b58a4105ff9107448c190328239.1650363789.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:45 -04:00
Kai Huang
e54f1ff244 KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask
Intel Multi-Key Total Memory Encryption (MKTME) repurposes couple of
high bits of physical address bits as 'KeyID' bits.  Intel Trust Domain
Extentions (TDX) further steals part of MKTME KeyID bits as TDX private
KeyID bits.  TDX private KeyID bits cannot be set in any mapping in the
host kernel since they can only be accessed by software running inside a
new CPU isolated mode.  And unlike to AMD's SME, host kernel doesn't set
any legacy MKTME KeyID bits to any mapping either.  Therefore, it's not
legitimate for KVM to set any KeyID bits in SPTE which maps guest
memory.

KVM maintains shadow_zero_check bits to represent which bits must be
zero for SPTE which maps guest memory.  MKTME KeyID bits should be set
to shadow_zero_check.  Currently, shadow_me_mask is used by AMD to set
the sme_me_mask to SPTE, and shadow_me_shadow is excluded from
shadow_zero_check.  So initializing shadow_me_mask to represent all
MKTME keyID bits doesn't work for VMX (as oppositely, they must be set
to shadow_zero_check).

Introduce a new 'shadow_me_value' to replace existing shadow_me_mask,
and repurpose shadow_me_mask as 'all possible memory encryption bits'.
The new schematic of them will be:

 - shadow_me_value: the memory encryption bit(s) that will be set to the
   SPTE (the original shadow_me_mask).
 - shadow_me_mask: all possible memory encryption bits (which is a super
   set of shadow_me_value).
 - For now, shadow_me_value is supposed to be set by SVM and VMX
   respectively, and it is a constant during KVM's life time.  This
   perhaps doesn't fit MKTME but for now host kernel doesn't support it
   (and perhaps will never do).
 - Bits in shadow_me_mask are set to shadow_zero_check, except the bits
   in shadow_me_value.

Introduce a new helper kvm_mmu_set_me_spte_mask() to initialize them.
Replace shadow_me_mask with shadow_me_value in almost all code paths,
except the one in PT64_PERM_MASK, which is used by need_remote_flush()
to determine whether remote TLB flush is needed.  This should still use
shadow_me_mask as any encryption bit change should need a TLB flush.
And for AMD, move initializing shadow_me_value/shadow_me_mask from
kvm_mmu_reset_all_pte_masks() to svm_hardware_setup().

Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <f90964b93a3398b1cf1c56f510f3281e0709e2ab.1650363789.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:44 -04:00
Kai Huang
c919e881ba KVM: x86/mmu: Rename reset_rsvds_bits_mask()
Rename reset_rsvds_bits_mask() to reset_guest_rsvds_bits_mask() to make
it clearer that it resets the reserved bits check for guest's page table
entries.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-Id: <efdc174b85d55598880064b8bf09245d3791031d.1650363789.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:44 -04:00
Paolo Bonzini
c9f3d9fbcd KVM: x86: a vCPU with a pending triple fault is runnable
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:44 -04:00
Sean Christopherson
1075d41efd KVM: x86/mmu: Expand and clean up page fault stats
Expand and clean up the page fault stats.  The current stats are at best
incomplete, and at worst misleading.  Differentiate between faults that
are actually fixed vs those that result in an MMIO SPTE being created,
track faults that are spurious, faults that trigger emulation, faults
that that are fixed in the fast path, and last but not least, track the
number of faults that are taken.

Note, the number of faults that require emulation for write-protected
shadow pages can roughly be calculated by subtracting the number of MMIO
SPTEs created from the overall number of faults that trigger emulation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:43 -04:00
Sean Christopherson
8d5265b101 KVM: x86/mmu: Use IS_ENABLED() to avoid RETPOLINE for TDP page faults
Use IS_ENABLED() instead of an #ifdef to activate the anti-RETPOLINE fast
path for TDP page faults.  The generated code is identical, and the #ifdef
makes it dangerously difficult to extend the logic (guess who forgot to
add an "else" inside the #ifdef and ran through the page fault handler
twice).

No functional or binary change intented.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:43 -04:00
Sean Christopherson
8a009d5bca KVM: x86/mmu: Make all page fault handlers internal to the MMU
Move kvm_arch_async_page_ready() to mmu.c where it belongs, and move all
of the page fault handling collateral that was in mmu.h purely for the
async #PF handler into mmu_internal.h, where it belongs.  This will allow
kvm_mmu_do_page_fault() to act on the RET_PF_* return without having to
expose those enums outside of the MMU.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:42 -04:00
Sean Christopherson
5276c616ab KVM: x86/mmu: Add RET_PF_CONTINUE to eliminate bool+int* "returns"
Add RET_PF_CONTINUE and use it in handle_abnormal_pfn() and
kvm_faultin_pfn() to signal that the page fault handler should continue
doing its thing.  Aside from being gross and inefficient, using a boolean
return to signal continue vs. stop makes it extremely difficult to add
more helpers and/or move existing code to a helper.

E.g. hypothetically, if nested MMUs were to gain a separate page fault
handler in the future, everything up to the "is self-modifying PTE" check
can be shared by all shadow MMUs, but communicating up the stack whether
to continue on or stop becomes a nightmare.

More concretely, proposed support for private guest memory ran into a
similar issue, where it'll be forced to forego a helper in order to yield
sane code: https://lore.kernel.org/all/YkJbxiL%2FAz7olWlq@google.com.

No functional change intended.

Cc: David Matlack <dmatlack@google.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:42 -04:00
Sean Christopherson
5c64aba517 KVM: x86/mmu: Drop exec/NX check from "page fault can be fast"
Tweak the "page fault can be fast" logic to explicitly check for !PRESENT
faults in the access tracking case, and drop the exec/NX check that
becomes redundant as a result.  No sane hardware will generate an access
that is both an instruct fetch and a write, i.e. it's a waste of cycles.
If hardware goes off the rails, or KVM runs under a misguided hypervisor,
spuriously running throught fast path is benign (KVM has been uknowingly
being doing exactly that for years).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:41 -04:00
Sean Christopherson
54275f74cf KVM: x86/mmu: Don't attempt fast page fault just because EPT is in use
Check for A/D bits being disabled instead of the access tracking mask
being non-zero when deciding whether or not to attempt to fix a page
fault vian the fast path.  Originally, the access tracking mask was
non-zero if and only if A/D bits were disabled by _KVM_ (including not
being supported by hardware), but that hasn't been true since nVMX was
fixed to honor EPTP12's A/D enabling, i.e. since KVM allowed L1 to cause
KVM to not use A/D bits while running L2 despite KVM using them while
running L1.

In other words, don't attempt the fast path just because EPT is enabled.

Note, attempting the fast path for all !PRESENT faults can "fix" a very,
_VERY_ tiny percentage of faults out of mmu_lock by detecting that the
fault is spurious, i.e. has been fixed by a different vCPU, but again the
odds of that happening are vanishingly small.  E.g. booting an 8-vCPU VM
gets less than 10 successes out of 30k+ faults, and that's likely one of
the more favorable scenarios.  Disabling dirty logging can likely lead to
a rash of collisions between vCPUs for some workloads that operate on a
common set of pages, but penalizing _all_ !PRESENT faults for that one
case is unlikely to be a net positive, not to mention that that problem
is best solved by not zapping in the first place.

The number of spurious faults does scale with the number of vCPUs, e.g. a
255-vCPU VM using TDP "jumps" to ~60 spurious faults detected in the fast
path (again out of 30k), but that's all of 0.2% of faults.  Using legacy
shadow paging does get more spurious faults, and a few more detected out
of mmu_lock, but the percentage goes _down_ to 0.08% (and that's ignoring
faults that are reflected into the guest), i.e. the extra detections are
purely due to the sheer number of faults observed.

On the other hand, getting a "negative" in the fast path takes in the
neighborhood of 150-250 cycles.  So while it is tempting to keep/extend
the current behavior, such a change needs to come with hard numbers
showing that it's actually a win in the grand scheme, or any scheme for
that matter.

Fixes: 995f00a619 ("x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:41 -04:00
Li RongQing
91ab933f75 KVM: VMX: clean up pi_wakeup_handler
Passing per_cpu() to list_for_each_entry() causes the macro to be
evaluated N+1 times for N sleeping vCPUs.  This is a very small
inefficiency, and the code is cleaner if the address of the per-CPU
variable is loaded earlier.  Do this for both the list and the spinlock.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-Id: <1649244302-6777-1-git-send-email-lirongqing@baidu.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:40 -04:00
Maxim Levitsky
33fbe6befa KVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness
This shows up as a TDP MMU leak when running nested.  Non-working cmpxchg on L0
relies makes L1 install two different shadow pages under same spte, and one of
them is leaked.

Fixes: 1c2361f667 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220512101420.306759-1-mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 09:51:26 -04:00
Minghao Chi
46ecf720f3 platform/x86: toshiba_acpi: use kobj_to_dev()
Use kobj_to_dev() instead of open-coding it.

Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220511021638.1488650-1-chi.minghao@zte.com.cn
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:37:53 +02:00
Minghao Chi
c8ad6a7680 platform/x86: samsung-laptop: use kobj_to_dev()
Use kobj_to_dev() instead of open-coding it.

Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220511021522.1488373-1-chi.minghao@zte.com.cn
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:37:53 +02:00
Frank Crawford
0ca48a2e73 platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
Tested on my systems with module force_load option.

Signed-off-by: Frank Crawford <frank@crawford.emu.id.au>
Link: https://lore.kernel.org/r/20220510120012.2167591-1-frank@crawford.emu.id.au
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:37:53 +02:00
Srinivas Pandruvada
9230a2ac2b tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
Initialize perf_cap struct to avoid warning:

  CC      hfi-events.o
In function ‘process_hfi_event’,
    inlined from ‘handle_event’ at hfi-events.c:220:5:
hfi-events.c:184:9: warning: ‘perf_cap.cpu’ may be used
uninitialized [-Wmaybe-uninitialized]
  184 |         process_level_change(perf_cap->cpu);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hfi-events.c: In function ‘handle_event’:
hfi-events.c:193:25: note: ‘perf_cap.cpu’ was declared here
  193 |         struct perf_cap perf_cap;
      |                         ^~~~~~~~

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20220511171208.211319-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:37:53 +02:00
Srinivas Pandruvada
2da6391dfc tools/power/x86/intel-speed-select: Display error on turbo mode disabled
For Intel SST turbo-freq feature to be enabled, the turbo mode on the
platform must be enabled also. If turbo mode is disabled, display error
while enabling turbo-freq feature.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20220510023421.3930540-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:30 +02:00
Tony Luck
34604d2891 Documentation: In-Field Scan
Add documentation for In-Field Scan (IFS). This documentation
describes the basics of IFS, the loading IFS image, chunk
authentication, running scan and how to check result via sysfs.

The CORE_CAPABILITIES MSR enumerates whether IFS is supported.

The full  github location for distributing the IFS images is
still being decided. So just a placeholder included for now
in the documentation.

Future CPUs will support more than one type of test. Plan for
that now by using a "_0" suffix on the ABI directory names.
Additional test types will use "_1", etc.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-13-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Jithu Joseph
55b52633e1 platform/x86/intel/ifs: add ABI documentation for IFS
Add the sysfs attributes in ABI/testing for In-Field Scan.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220506225410.1652287-12-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Tony Luck
51af802fc0 trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
Add tracing support which may be useful for debugging systems that fail to complete
In Field Scan tests.

Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220506225410.1652287-11-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Jithu Joseph
6f33a92b92 platform/x86/intel/ifs: Add IFS sysfs interface
Implement sysfs interface to trigger ifs test for a specific cpu.
Additional interfaces related to checking the status of the
scan test and seeing the version of the loaded IFS binary
are also added.

The basic usage is as below.
   - To start test, for example on cpu5:
       echo 5 > /sys/devices/platform/intel_ifs/run_test
   - To see the status of the last test
       cat /sys/devices/platform/intel_ifs/status
   - To see the version of the loaded scan binary
       cat /sys/devices/platform/intel_ifs/image_version

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-10-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Jithu Joseph
2b40e654b7 platform/x86/intel/ifs: Add scan test support
In a core, the scan engine is shared between sibling cpus.

When a Scan test (for a particular core) is triggered by the user,
the scan chunks are executed on all the threads on the core using
stop_core_cpuslocked.

Scan may be aborted by some reasons. Scan test will be aborted in certain
circumstances such as when interrupt occurred or cpu does not have enough
power budget for scan. In this case, the kernel restart scan from the chunk
where it stopped. Scan will also be aborted when the test is failed. In
this case, the test is immediately stopped without retry.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-9-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Jithu Joseph
684ec21570 platform/x86/intel/ifs: Authenticate and copy to secured memory
The IFS image contains hashes that will be used to authenticate the ifs
test chunks. First, use WRMSR to copy the hashes and enumerate the number
of test chunks, chunk size and the maximum number of cores that can run
scan test simultaneously.

Next, use WRMSR to authenticate each and every scan test chunk which is
stored in the IFS image. The CPU will check if the test chunks match
the hashes, otherwise failure is indicated to system software. If the test
chunk is authenticated, it is automatically copied to secured memory.

Use schedule_work_on() to perform the hash copy and authentication. Note
this needs only be done on the first logical cpu of each socket.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-8-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Jithu Joseph
846e751ff3 platform/x86/intel/ifs: Check IFS Image sanity
IFS image is designed specifically for a given family, model and
stepping of the processor. Like Intel microcode header, the IFS image
has the Processor Signature, Checksum and Processor Flags that must be
matched with the information returned by the CPUID.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-7-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Jithu Joseph
fb57fc785e platform/x86/intel/ifs: Read IFS firmware image
Driver probe routine allocates structure to communicate status
and parameters between functions in the driver. Also call
load_ifs_binary() to load the scan image file.

There is a separate scan image file for each processor family,
model, stepping combination. This is read from the static path:

  /lib/firmware/intel/ifs/{ff-mm-ss}.scan

Step 1 in loading is to generate the correct path and use
request_firmware_direct() to load into memory.

Subsequent patches will use the IFS MSR interfaces to copy
the image to BIOS reserved memory and validate the SHA256
checksums.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-6-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00