Commit Graph

900356 Commits

Author SHA1 Message Date
Daniel Borkmann
b9e4272488 Merge branch 'bpf-xsk-fixes'
Maciej Fijalkowski says:

====================
Cameron reported [0] that on fresh bpf-next he could not run multiple
xdpsock instances in Tx-only mode on single network interface with i40e
driver.

Turns out that Maxim's series [1] which was adding RCU protection around
ndo_xsk_wakeup added check against the __I40E_CONFIG_BUSY being set on
pf->state within i40e_xsk_wakeup() - if it's set, return -ENETDOWN.
Since this bit is set per PF when UMEM is being enabled/disabled, the
situation Cameron stumbled upon was that when he launched second xdpsock
instance, second UMEM was being registered, hence set __I40E_CONFIG_BUSY
which is now observed by first xdpsock and therefore xdpsock's kick_tx()
gets -ENETDOWN as errno.

-ENETDOWN currently is not allowed in kick_tx(), so we were exiting the
first application. Such exit means also XDP program being unloaded and
its dedicated resources, which caused an -ENXIO being return in the
second xdpsock instance.

Let's fix the issue from both sides - protect ourselves from future
xdpsock crashes by allowing for -ENETDOWN errno being set in kick_tx()
(patch 3) and from driver side, return -EAGAIN for the case where PF is
busy (patch 1).

Remove also doubled variable from xdpsock_user.c (patch 2).

Note that ixgbe seems not to be affected since UMEM registration sets
the busy/disable bit per ring, not per PF.

[0]: https://www.spinics.net/lists/xdp-newbies/msg01558.html
[1]: https://lore.kernel.org/netdev/20191217162023.16011-1-maximmi@mellanox.com/
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-02-05 22:06:12 +01:00
Maciej Fijalkowski
8ed47e1408 samples: bpf: Allow for -ENETDOWN in xdpsock
ndo_xsk_wakeup() can return -ENETDOWN and there's no particular reason
to bail the whole application out on that case. Let's check in kick_tx()
whether errno was set to mentioned value and basically allow application
to further process frames.

Fixes: 248c7f9c0e ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access")
Reported-by: Cameron Elliott <cameron@cameronelliott.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20200205045834.56795-4-maciej.fijalkowski@intel.com
2020-02-05 22:06:09 +01:00
Maciej Fijalkowski
32c92c15ad samples: bpf: Drop doubled variable declaration in xdpsock
Seems that by accident there is a doubled declaration of global variable
opt_xdp_bind_flags in xdpsock_user.c. The second one is uninitialized so
compiler was simply ignoring it.

To keep things clean, drop the doubled variable.

Fixes: c543f54698 ("samples/bpf: add unaligned chunks mode support to xdpsock")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20200205045834.56795-3-maciej.fijalkowski@intel.com
2020-02-05 22:06:09 +01:00
Maciej Fijalkowski
c77e9f0914 i40e: Relax i40e_xsk_wakeup's return value when PF is busy
Return -EAGAIN instead of -ENETDOWN to provide a slightly milder
information to user space so that an application will know to retry the
syscall when __I40E_CONFIG_BUSY bit is set on pf->state.

Fixes: b3873a5be7 ("net/i40e: Fix concurrency issues between config flow and XSK")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20200205045834.56795-2-maciej.fijalkowski@intel.com
2020-02-05 22:06:08 +01:00
Song Liu
fc9e34f8de tools/bpf/runqslower: Rebuild libbpf.a on libbpf source change
Add missing dependency of $(BPFOBJ) to $(LIBBPF_SRC), so that running make
in runqslower/ will rebuild libbpf.a when there is change in libbpf/.

Fixes: 9c01546d26 ("tools/bpf: Add runqslower tool to tools/bpf")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200204215037.2258698-1-songliubraving@fb.com
2020-02-05 22:05:28 +01:00
Linus Torvalds
4c7d00ccf4 pwm: Changes for v5.6-rc1
This set of changes are mostly cleanups and minor improvements with some
 new chip support for some drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEEiOrDCAFJzPfAjcif3SOs138+s6EFAl46zmoZHHRoaWVycnku
 cmVkaW5nQGdtYWlsLmNvbQAKCRDdI6zXfz6zodGZEACQyt5Es8RbSMws6qmsZXSU
 XgVi1vCq3Mz51t2DMXp3PCe4Kb7MMTIn8nsFhQ6Q7z888ZCYC6auRTZ5kLNUZvwe
 guBqUR2+y2RkG0d9fWvf8xan2FqKuh33axX4YljNYI7lj79tyhQvHIC/B4DFes9y
 FnpvtDsFdVIFJmnDjgynWO5FhGbf7DdO6Mhmj99lQN3yMiUQ6RGmoXXrKoadcbNt
 vWGr5UqFY72PdVvMQFIA7U3ZeM5n8ocVI6TmvqHhX1ozLyB113yWXZr1hLH4LXzz
 f14NHygpo/GhB8ch3elxBZcXWPmPyn9tHxd2tYnlQ8whEY5RSD3C/bmVFMruhOLC
 LkcAmJ4mbk4tOVEoHB23vTcV5oxcVobDqIOj0W9n1pKWsx6Y3J31rIIg2BCPwSB2
 o/xtFWafuvX9E90EVVM0H0mZSsxydZLXWOr17wzb1xlVulCpJCzvnhwJ8JXB6a0z
 9/az3nGpOa6QB17IkR+sRkjVAnUFJ0CLHURIv+O7O+DBHVuykojdjpACzZcyy7kl
 4ZrHhhCKwEbC4gjmtaJTgmQ0L/WGYfOBQKQmBKaQEJWXPUHsyJ4DFJjrIF4INqOe
 mS9BRvC0nRJ6g3ulOR95sVszMVrXKFUaMN6XqzyWzWPibdDk+1QR4EqU1QZQ/PgR
 46RWEV02tG6SK65UzLqZxA==
 =x0aH
 -----END PGP SIGNATURE-----

Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
 "Mostly cleanups and minor improvements with some new chip support for
  some drivers"

* tag 'pwm/for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (37 commits)
  pwm: Remove set but not set variable 'pwm'
  pwm: sun4i: Initialize variables before use
  pwm: stm32: Remove automatic output enable
  pwm: sun4i: Narrow scope of local variable
  pwm: bcm2835: Allow building for ARCH_BRCMSTB
  pwm: imx27: Eliminate error message for defer probe
  pwm: sun4i: Fix inconsistent IS_ERR and PTR_ERR
  pwm: sun4i: Move pwm_calculate() out of spin_lock()
  pwm: omap-dmtimer: Allow compiling with COMPILE_TEST
  pwm: omap-dmtimer: put_device() after of_find_device_by_node()
  pwm: omap-dmtimer: Simplify error handling
  pwm: omap-dmtimer: Remove PWM chip in .remove before making it unfunctional
  pwm: Implement tracing for .get_state() and .apply_state()
  pwm: rcar: Document inability to set duty_cycle = 0
  pwm: rcar: Drop useless call to pwm_get_state()
  pwm: Fix minor Kconfig whitespace issues
  pwm: atmel: Implement .get_state()
  pwm: atmel: Use register accessors for channels
  pwm: atmel: Document known weaknesses of both hardware and software
  pwm: atmel: Replace loop in prescale calculation by ad-hoc calculation
  ...
2020-02-05 18:11:51 +00:00
Linus Torvalds
18ea671ba4 dmaengine fixes for v5.6-rc1
Fixes for:
  - Documentation build error fix
  - Fix dma_request_chan() error return
  - Remove unneeded conversion in idxd driver
  - Fix pointer check for dma_async_device_channel_register()
  - Fix slave-channel symlink cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl463NAACgkQfBQHDyUj
 g0eUOA//SZBP/bMgKujbOb976fVbTQrmVs+WnX+h3swkReG6SIkXZ4SWLvDDBiZ3
 JVGHbibHdMDD6zYxhXwv/uFlU9hCdBlkinRBSvADQFi90cvN5GQNHwPTre7IxqWv
 eg1liuUmoVtR/F26BxUmvEoT5tIUwxPxAQOjnoLNlBBdfMjnJUcN+xI44DQvI29J
 xvKc3CiUiAebsOVcIb01I26z4CRYCTnTKnWT5Qh+3Dgb8r1oJUGb0i7fVb9+gFHM
 RpeWH73bVFRk8/Z89hIxLC87ZNX0HB6THLdf43ADSGVaEogN950mxV/FqDX/0K5D
 Kkao5kRyZrvdoHFF87kBIdv37nD5/jXqJ33uiloCZUFO+FiKvimvcw2rRVpGrMOl
 76+5rjQ7X713hpeifa4w+3gZFMayEHpi7lpsUtYWM0VKYEf3Z1o1DPVkW8h0jwqF
 qfk+sxap2GYbkNcIDxva86O1rYDvqmy9OKQN61xpVdaaIC8uXqR9HT5gVAyinqgA
 t010v7C4LxIouC3/UoGHG9Xl/dzcBQIFsHTfgPuZeSHbSceLGk6aUZ/X4nOiWpru
 EgyClJxtYpRZlLQJiH+z8DYxCfBFa29MCd0U9N4vYvtKtqIDVaoK0+ejBksF0yaO
 Z4HqPkwD5OYBJ/jftKsPuukxCnSS8uStyDHeWsQBtpziQBp6lro=
 =5tXW
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "Fixes for:

   - Documentation build error fix

   - Fix dma_request_chan() error return

   - Remove unneeded conversion in idxd driver

   - Fix pointer check for dma_async_device_channel_register()

   - Fix slave-channel symlink cleanup"

* tag 'dmaengine-fix-5.6-rc1' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: Cleanups for the slave <-> channel symlink support
  dmaengine: fix null ptr check for __dma_async_device_channel_register()
  dmaengine: idxd: fix boolconv.cocci warnings
  dmaengine: Fix return value for dma_request_chan() in case of failure
  dmaengine: doc: Properly indent metadata title
2020-02-05 18:07:39 +00:00
Kuppuswamy Sathyanarayanan
2e34673be0 PCI/ATS: Use PF PASID for VFs
Per PCIe r5.0, sec 9.3.7.14, if a PF implements the PASID Capability, the
PF PASID configuration is shared by its VFs, and VFs must not implement
their own PASID Capability.  But commit 751035b8dc ("PCI/ATS: Cache PASID
Capability offset") changed pci_max_pasids() and pci_pasid_features() to
use the PASID Capability of the VF device instead of the associated PF
device.  This leads to IOMMU bind failures when pci_max_pasids() and
pci_pasid_features() are called for VFs.

In pci_max_pasids() and pci_pasid_features(), always use the PF PASID
Capability.

Fixes: 751035b8dc ("PCI/ATS: Cache PASID Capability offset")
Link: https://lore.kernel.org/r/fe891f9755cb18349389609e7fed9940fc5b081a.1580325170.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v5.5+
2020-02-05 11:58:08 -06:00
Linus Torvalds
4fc2ea6a86 IOMMU Updates for Linux v5.6
Including:
 
 	- Allow to compile the ARM-SMMU drivers as modules.
 
 	- Fixes and cleanups for the ARM-SMMU drivers and io-pgtable code
 	  collected by Will Deacon. The merge-commit (6855d1ba75) has all the
 	  details.
 
 	- Cleanup of the iommu_put_resv_regions() call-backs in various drivers.
 
 	- AMD IOMMU driver cleanups.
 
 	- Update for the x2APIC support in the AMD IOMMU driver.
 
 	- Preparation patches for Intel VT-d nested mode support.
 
 	- RMRR and identity domain handling fixes for the Intel VT-d driver.
 
 	- More small fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl46yTsACgkQK/BELZcB
 GuPldg//bHDyntUWHta11oHFnUh759mveFFsfYto/2dV7hyHOM1lgv2+ZaXdOYxE
 03f9d9K9b4F3amF8wkluu9Z1Lve40JpZwD3WKTTg8sImh0z61nWoJ7+uE8ZjNzmA
 /6pNqIPJ4w8n5Wz2yycTR7GeUekM0X5nvF0IBlVcX3UYFG0DDlL/eldLVSk43Wmw
 P9C2dFwQLHBKhqhquqnXP1Z7YRfG9lX2ONAoMHSj3x2fm7UySSkrioBR0VrI6+sH
 8/LFL+NgqQEZBTK0I8tZRwNuDUoHQ396PAdYEiXFFzWBLrEaadAuoh4WeLnLzlRe
 G601on9ScUg1xy4lQHQiZDNKYqeB7gETMW/V2Q5nCaTrq2i1D5M6IwiNvOiV6crj
 ZtFBqejng1BXfqLOxutHYyjNQBB/KxmWR3zT2p/spcAgxhUAJxS5GkbGm5G3OucN
 2xWSnV+Dyu5jei8Ua5/onCdcuvCFi41xa9aCRrTSymIZ2W6aCNjm2mrctyRr1xmA
 AwHBZ0/dx/v3vZA8GnkBrrvAYVsTWiMbHD/2sChDjnwedO0o0pMooHEjywmDgMON
 /qXQjqR7pDpyxDA0US+30WfA3tkUcpLskza0ugWKtr/8RcHffrt5s9L0PpklgmtN
 ILC2/zNB43mIFw8uGJdDVFw4aYuCAqIFFHkqQ5+hOBGn/ePHawc=
 =HMIT
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Allow compiling the ARM-SMMU drivers as modules.

 - Fixes and cleanups for the ARM-SMMU drivers and io-pgtable code
   collected by Will Deacon. The merge-commit (6855d1ba75) has all the
   details.

 - Cleanup of the iommu_put_resv_regions() call-backs in various
   drivers.

 - AMD IOMMU driver cleanups.

 - Update for the x2APIC support in the AMD IOMMU driver.

 - Preparation patches for Intel VT-d nested mode support.

 - RMRR and identity domain handling fixes for the Intel VT-d driver.

 - More small fixes and cleanups.

* tag 'iommu-updates-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
  iommu/amd: Remove the unnecessary assignment
  iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
  iommu/vt-d: Unnecessary to handle default identity domain
  iommu/vt-d: Allow devices with RMRRs to use identity domain
  iommu/vt-d: Add RMRR base and end addresses sanity check
  iommu/vt-d: Mark firmware tainted if RMRR fails sanity check
  iommu/amd: Remove unused struct member
  iommu/amd: Replace two consecutive readl calls with one readq
  iommu/vt-d: Don't reject Host Bridge due to scope mismatch
  PCI/ATS: Add PASID stubs
  iommu/arm-smmu-v3: Return -EBUSY when trying to re-add a device
  iommu/arm-smmu-v3: Improve add_device() error handling
  iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Prepare for handling arm_smmu_write_ctx_desc() failure
  iommu/arm-smmu-v3: Propagate ssid_bits
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add context descriptor tables allocators
  iommu/arm-smmu-v3: Prepare arm_smmu_s1_cfg for SSID support
  ACPI/IORT: Parse SSID property of named component node
  ...
2020-02-05 17:49:54 +00:00
Linus Torvalds
d271ab2923 xen: branch for v5.6-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXjrKegAKCRCAXGG7T9hj
 vkAzAQDtV8yxItCMTC/0vxMZnBUk7t+KFuSg7UIoWkwHPvd2CQEAjlhWeX0u3z9D
 uxwmxdjri1nlrTJBulbvCkJuTfZDYwo=
 =8Q8T
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - fix a bug introduced in 5.5 in the Xen gntdev driver

 - fix the Xen balloon driver when running on ancient Xen versions

 - allow Xen stubdoms to control interrupt enable flags of
   passed-through PCI cards

 - release resources in Xen backends under memory pressure

* tag 'for-linus-5.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/blkback: Consistently insert one empty line between functions
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xenbus/backend: Protect xenbus callback with lock
  xenbus/backend: Add memory pressure handler callback
  xen/gntdev: Do not use mm notifiers with autotranslating guests
  xen/balloon: Support xend-based toolstack take two
  xen-pciback: optionally allow interrupt enable flag writes
2020-02-05 17:44:14 +00:00
Linus Torvalds
2634744bf3 Devicetree fixes for v5.6:
- Fix incorrect $id paths in schemas
 
 - 2 fixes for Intel LGM SoC binding schemas
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl46iQoQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw1WAD/9vM7kmCWN1T3czpkX2ZOILTY8maVxOR512
 5t4RiFqxbU3a7fvb7WZqzlH1KbunlYOvTsWhMOddIVT0YrujicjjRQ6v1OjL078m
 uqgJ8J6coC1o2iAGW+Jv3jzcQthdUWWURsj78UrdeftYemX/MMDHkKXiVjg+s9L0
 n/wHEaxXG3KT1MjA2meA0SSLUwBqEy8wukPON6t12LSsVW6GVJYlEYXBok9rD3pC
 Wl2zW7Iyza+223oeoU7aUh8ePNdCiHoy3RKg/phx6CsqngMzFug0+vxPVmTSR/as
 QVEDUBiyqRLtOzNyDhpiNUWMNwXaZ2X/romPg4rv2A0mqAdRe0H8IO08ObsnMro9
 hBOX07Yfr2h88QGavktOpGrFJmvWo1f+EvjyN9gm+r6pTT+xVLmjRFbnC8RXcJnq
 id8QzV7KTtH1r/W8iDpEMCCElAfoQ8poKnzfXSKEiWk8Bd8em62h/BRAF7362AEr
 SBwcqqqBEwF2gEZuiYBPTPRI2/Vc4+f7StK8qDnJlDxqg9DSExLg+bpyH4L1zoeA
 4DQlrXFiTyMawc6DC+Fscc/CtMivvX91DYox3biU4d2mYfVrWvn/1ZuDRXzmUNvn
 agh63kuXKbIXirTTP1vhW1q22m3ishlGtiffazVrGV9MtTSNG5DqCeQ0ukB35GAj
 ucFd7PHM5A==
 =ypMI
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Fix incorrect $id paths in schemas

 - Two fixes for Intel LGM SoC binding schemas

* tag 'devicetree-fixes-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: Fix paths in schema $id fields
  dt-bindings: PCI: intel: Fix dt_binding_check compilation failure
  dt-bindings: phy: Fix errors in intel,lgm-emmc-phy example
2020-02-05 17:37:25 +00:00
Stephen Kitt
d1c9038ab5 Allow git builds of Sphinx
When using a non-release version of Sphinx, from a local build (with
improvements for kernel doc handling, why not),

	sphinx-build --version

reports versions of the form

	sphinx-build 3.0.0+/4703d9119972

i.e. base version, a plus symbol, slash, and the start of the git hash
of whatever repository the command is run in (no, not the hash that
was used to build Sphinx!).

This patch fixes the installation check in sphinx-pre-install to
recognise such version output.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20200124183316.1719218-1-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-05 10:33:44 -07:00
Linus Torvalds
cfb4b571e8 s390 updates for the 5.6 merge window #2
- Add KPROBES_ON_FTRACE support.
 
 - Add EP11 AES secure keys support.
 
 - PAES rework and prerequisites for paes-s390 ciphers selftests.
 
 - Fix page table upgrade for hugetlbfs.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl465KkACgkQjYWKoQLX
 FBiR/wf/e+Fj/mDYHElcZ55MWaORBpp8NT94IYSt0RbII1PEh9cB8NciYLQdFFmc
 bUlNj7u3fHwk1D8S3pOSYKhIaHQQOWDqd/uNTzbCicbbVhuwmslLc+jffnORtlKe
 mCHeQsVAw3NwE8FIPhPMTAKBZV0pLkM4T9PA2xgeuB5cShoMgXgLgUoIwHJ4c2TP
 WwnolIJ/QR0nKpmPI5lp0+PjjSk/8nA/VvmpxgYbJCTQm8dhwhAfePh8Kf6pEp6K
 wETUaIyWkX1a+kI9h2qIBsR7KplqqrKABA5sxnPDQW/kut1Pc/2fWxMOBxux0f/V
 Kk+f6yoVbe7X6VYm+V4AyyAzQMRggQ==
 =9Eeg
 -----END PGP SIGNATURE-----

Merge tag 's390-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:
 "The second round of s390 fixes and features for 5.6:

   - Add KPROBES_ON_FTRACE support

   - Add EP11 AES secure keys support

   - PAES rework and prerequisites for paes-s390 ciphers selftests

   - Fix page table upgrade for hugetlbfs"

* tag 's390-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pkey/zcrypt: Support EP11 AES secure keys
  s390/zcrypt: extend EP11 card and queue sysfs attributes
  s390/zcrypt: add new low level ep11 functions support file
  s390/zcrypt: ep11 structs rework, export zcrypt_send_ep11_cprb
  s390/zcrypt: enable card/domain autoselect on ep11 cprbs
  s390/crypto: enable clear key values for paes ciphers
  s390/pkey: Add support for key blob with clear key value
  s390/crypto: Rework on paes implementation
  s390: support KPROBES_ON_FTRACE
  s390/mm: fix dynamic pagetable upgrade for hugetlbfs
2020-02-05 17:33:35 +00:00
Randy Dunlap
599e6f8d3d Documentation: changes.rst: update several outdated project URLs
Update projects URLs in the changes.rst file.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/a9c3c509-8f30-fcc4-d9e0-b53aeaa89e4f@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-05 10:32:57 -07:00
Sameer Rahmani
ff1e81a7e2 Documentation: build warnings related to missing blank lines after explicit markups has been fixed
Fix for several documentation build warnings related to missing blank lines
after explicit mark up.

Exact warning message:
 WARNING: Explicit markup ends without a blank line; unexpected unindent.

Signed-off-by: Sameer Rahmani <lxsameer@gnu.org>
Link: https://lore.kernel.org/r/20200203201543.24834-1-lxsameer@gnu.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-05 10:30:03 -07:00
Tiezhu Yang
36a375c6df mailmap: add entry for Tiezhu Yang
Add an entry to connect all my email addresses.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1580721045-4988-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-05 10:24:26 -07:00
SeongJae Park
95c472ffca Documentation/ko_KR/howto: Update a broken link
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Link: https://lore.kernel.org/r/20200131205237.29535-5-sj38.park@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-05 10:21:23 -07:00
SeongJae Park
5549c20232 Documentation/ko_KR/howto: Update broken web addresses
Commit 0ea6e61122 ("Documentation: update broken web addresses.")
removed a link to 'http://patchwork.ozlabs.org' in howto, but the change
has not applied to the Korean translation.  This commit simply applies
the change to the Korean translation.  The link is restored now, though.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
Link: https://lore.kernel.org/r/20200131205237.29535-4-sj38.park@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-05 10:21:19 -07:00
SeongJae Park
4bfdebd620 docs/locking: Fix outdated section names
Commit 2e4f5382d1 ("locking/doc: Rename LOCK/UNLOCK to
ACQUIRE/RELEASE") has not appied to 'spinlock.rst'.  This commit updates
the doc for the change.

Signed-off-by: SeongJae Park <sjpark@amazon.com>
Link: https://lore.kernel.org/r/20200131205237.29535-2-sj38.park@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-05 10:21:12 -07:00
Miaohe Lin
a8be1ad01b KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
The function vmx_decache_cr0_guest_bits() is only called below its
implementation. So this is meaningless and should be removed.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:44:06 +01:00
Sean Christopherson
d76c7fbc01 KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit
Re-add code to mark CR4.UMIP as reserved if UMIP is not supported by the
host.  The UMIP handling was unintentionally dropped during a recent
refactoring.

Not flagging CR4.UMIP allows the guest to set its CR4.UMIP regardless of
host support or userspace desires.  On CPUs with UMIP support, including
emulated UMIP, this allows the guest to enable UMIP against the wishes
of the userspace VMM.  On CPUs without any form of UMIP, this results in
a failed VM-Enter due to invalid guest state.

Fixes: 345599f9a2 ("KVM: x86: Add macro to ensure reserved cr4 bits checks stay in sync")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:30:19 +01:00
Paolo Bonzini
bcfcff640c x86: vmxfeatures: rename features for consistency with KVM and manual
Three of the feature bits in vmxfeatures.h have names that are different
from the Intel SDM.  The names have been adjusted recently in KVM but they
were using the old name in the tip tree's x86/cpu branch.  Adjust for
consistency.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:22:59 +01:00
Paolo Bonzini
ef09f4f463 KVM: s390: Fixes and cleanups for 5.6
- fix register corruption
 - ENOTSUPP/EOPNOTSUPP mixed
 - reset cleanups/fixes
 - selftests
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJeNDcAAAoJEBF7vIC1phx8NkcP/2JWMr/9v44LJJ8BfZVFqdP4
 i41pVFIgtI8Ieqjgp+Fuiu/8ELPxfohzBZ1Rm60TPcZlJ+uREmHklG1ZD2iXEJix
 0YqzICadQ4OvJxiFpi/s5+9bzczoxCIEx7CfJ4PTM2V3qtefauFgNtoSMevF9CtK
 6UuPNNjBi6cJuG3uAyqoOZ3vbMNeZ337ffEgBwukR01UxGImXwJ9odPFEwz31hji
 WKEEbnPaXFZUKy2vMSZVcndJKkhb043QFkZBY98D8m5VTSO5UFwpdYuht6QdMSKx
 IrxDN7788e/p4IPOGBWAXuhjYcmAYZh2Ayt7DM53b49XhWifsc6fw4khly2fjr3+
 Wg5Ol13ls2WaeDTGd5c4XQRWpQD27Wnum0yXLaVf2gaTRbTqrrsisWLHL6k/gqyb
 CXqJIr11/sb4zLwlwXPSrOrIz3CRz4DqawF/F0q47rHC7xyGsRzpGU4gP5Aqj8op
 qAMVORoQQjMtH4fVv6/NhIG6srVeonNA5GjI6hkYZ85mEJhy5Nl9lNuyEh4W094D
 fkNSnlWcCG8fyoLih1SHVa7cROVI8G0tfwhk4uSjRCXXtA5B5Rve2LQl3nCP9gUX
 m7Y6Qzm/yusVtaTu+YE8MyXVE2bpvGMR/xeztIR8eYw/LqbodOzxkRLdfeH2cfaD
 VCmFaVuUjTXx5q4xYmIl
 =ZgeW
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes and cleanups for 5.6
- fix register corruption
- ENOTSUPP/EOPNOTSUPP mixed
- reset cleanups/fixes
- selftests
2020-02-05 16:15:05 +01:00
Paolo Bonzini
df7e881892 KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
Userspace that does not know about the AMD_IBRS bit might still
allow the guest to protect itself with MSR_IA32_SPEC_CTRL using
the Intel SPEC_CTRL bit.  However, svm.c disallows this and will
cause a #GP in the guest when writing to the MSR.  Fix this by
loosening the test and allowing the Intel CPUID bit, and in fact
allow the AMD_STIBP bit as well since it allows writing to
MSR_IA32_SPEC_CTRL too.

Reported-by: Zhiyi Guo <zhguo@redhat.com>
Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:12:57 +01:00
Eric Hankland
4400cf546b KVM: x86: Fix perfctr WRMSR for running counters
Correct the logic in intel_pmu_set_msr() for fixed and general purpose
counters. This was recently changed to set pmc->counter without taking
in to account the value of pmc_read_counter() which will be incorrect if
the counter is currently running and non-zero; this changes back to the
old logic which accounted for the value of currently running counters.

Signed-off-by: Eric Hankland <ehankland@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:01:15 +01:00
Vitaly Kuznetsov
a83502314c x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
controls on for its guests and nested_vmx_check_controls() checks for
that. This is, however, not the case for the controls which are supported
on the host but are missing in enlightened VMCS and when eVMCS is in use.

It would certainly be possible to add these missing checks to
nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
non-eVMCS guests by doing less unrelated checks. Create a separate
nested_evmcs_check_controls() for this purpose.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:55:26 +01:00
Vitaly Kuznetsov
31de3d2500 x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails.

It would be possible to drop VMX feature filtering completely and make
this a guest's responsibility: if it decides to use eVMCS it should know
which fields are available and which are not. Hyper-V mostly complies to
this, however, there are some problematic controls:
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL

which Hyper-V enables. As there are no corresponding fields in eVMCS, we
can't handle this properly in KVM. This is a Hyper-V issue.

Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
and do the bare minimum (only clear controls which are known to cause issues).
This allows userspace to keep setting controls it wants and at the same
time hides them from the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:55:06 +01:00
Ben Gardon
8f79b06495 kvm: mmu: Separate generating and setting mmio ptes
Separate the functions for generating MMIO page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:54:07 +01:00
Ben Gardon
0a2b64c50d kvm: mmu: Replace unsigned with unsigned int for PTE access
There are several functions which pass an access permission mask for
SPTEs as an unsigned. This works, but checkpatch complains about it.
Switch the occurrences of unsigned to unsigned int to satisfy checkpatch.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:54:00 +01:00
Sean Christopherson
ea79a75092 KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
The blurb pertaining to the return value of nested_vmx_load_cr3() no
longer matches reality, remove it entirely as the behavior it is
attempting to document is quite obvious when reading the actual code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:31:25 +01:00
Sean Christopherson
879a37632b KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()
Fold kvm_mips_comparecount_func() into kvm_mips_comparecount_wakeup() to
eliminate the nondescript function name as well as its unnecessary cast
of a vcpu to "unsigned long" and back to a vcpu.  Presumably func() was
used as a callback at some point during pre-upstream development, as
wakeup() is the only user of func() and has been the only user since
both with introduced by commit 669e846e6c ("KVM/MIPS32: MIPS arch
specific APIs for KVM").

Cc: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:29:55 +01:00
Sean Christopherson
09df630712 KVM: MIPS: Fix a build error due to referencing not-yet-defined function
Hoist kvm_mips_comparecount_wakeup() above its only user,
kvm_arch_vcpu_create() to fix a compilation error due to referencing an
undefined function.

Fixes: d11dfed5d7 ("KVM: MIPS: Move all vcpu init code into kvm_arch_vcpu_create()")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:29:01 +01:00
Thadeu Lima de Souza Cascardo
64b38bd190 x86/kvm: do not setup pv tlb flush when not paravirtualized
kvm_setup_pv_tlb_flush will waste memory and print a misguiding message
when KVM paravirtualization is not available.

Intel SDM says that the when cpuid is used with EAX higher than the
maximum supported value for basic of extended function, the data for the
highest supported basic function will be returned.

So, in some systems, kvm_arch_para_features will return bogus data,
causing kvm_setup_pv_tlb_flush to detect support for pv tlb flush.

Testing for kvm_para_available will work as it checks for the hypervisor
signature.

Besides, when the "nopv" command line parameter is used, it should not
continue as well, as kvm_guest_init will no be called in that case.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:28:07 +01:00
Zhuang Yanying
7df003c852 KVM: fix overflow of zero page refcount with ksm running
We are testing Virtual Machine with KSM on v5.4-rc2 kernel,
and found the zero_page refcount overflow.
The cause of refcount overflow is increased in try_async_pf
(get_user_page) without being decreased in mmu_set_spte()
while handling ept violation.
In kvm_release_pfn_clean(), only unreserved page will call
put_page. However, zero page is reserved.
So, as well as creating and destroy vm, the refcount of
zero page will continue to increase until it overflows.

step1:
echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan
echo 1 > /sys/kernel/pages_to_scan/run
echo 1 > /sys/kernel/pages_to_scan/use_zero_pages

step2:
just create several normal qemu kvm vms.
And destroy it after 10s.
Repeat this action all the time.

After a long period of time, all domains hang because
of the refcount of zero page overflow.

Qemu print error log as follow:
 …
 error: kvm run failed Bad address
 EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd
 ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4
 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
 SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
 TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
 GDT=     000f7070 00000037
 IDT=     000f70ae 00000000
 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
 DR6=00000000ffff0ff0 DR7=0000000000000400
 EFER=0000000000000000
 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04
 …

Meanwhile, a kernel warning is departed.

 [40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30
 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G           OE     5.2.0-rc2 #5
 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30
 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0
 0 00 00 00 48 8b 47 08 a8
 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286
 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000
 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440
 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000
 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8
 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440
 [40914.836423] FS:  00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000
 [40914.836425] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0
 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [40914.836428] Call Trace:
 [40914.836433]  follow_page_pte+0x302/0x47b
 [40914.836437]  __get_user_pages+0xf1/0x7d0
 [40914.836441]  ? irq_work_queue+0x9/0x70
 [40914.836443]  get_user_pages_unlocked+0x13f/0x1e0
 [40914.836469]  __gfn_to_pfn_memslot+0x10e/0x400 [kvm]
 [40914.836486]  try_async_pf+0x87/0x240 [kvm]
 [40914.836503]  tdp_page_fault+0x139/0x270 [kvm]
 [40914.836523]  kvm_mmu_page_fault+0x76/0x5e0 [kvm]
 [40914.836588]  vcpu_enter_guest+0xb45/0x1570 [kvm]
 [40914.836632]  kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm]
 [40914.836645]  kvm_vcpu_ioctl+0x26e/0x5d0 [kvm]
 [40914.836650]  do_vfs_ioctl+0xa9/0x620
 [40914.836653]  ksys_ioctl+0x60/0x90
 [40914.836654]  __x64_sys_ioctl+0x16/0x20
 [40914.836658]  do_syscall_64+0x5b/0x180
 [40914.836664]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 [40914.836666] RIP: 0033:0x7fb61cb6bfc7

Signed-off-by: LinFeng <linfeng23@huawei.com>
Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:27:46 +01:00
Sudarsana Reddy Kalluru
0202d293c2 qed: Fix timestamping issue for L2 unicast ptp packets.
commit cedeac9df4 ("qed: Add support for Timestamping the unicast
PTP packets.") handles the timestamping of L4 ptp packets only.
This patch adds driver changes to detect/timestamp both L2/L4 unicast
PTP packets.

Fixes: cedeac9df4 ("qed: Add support for Timestamping the unicast PTP packets.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05 15:19:34 +01:00
Sean Christopherson
9b5e85320f KVM: x86: Take a u64 when checking for a valid dr7 value
Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build
warning on i386 due to right-shifting a 32-bit value by 32 when checking
for bits being set in dr7[63:32].

Alternatively, the warning could be resolved by rewriting the check to
use an i386-friendly method, but taking a u64 fixes another oddity on
32-bit KVM.  Beause KVM implements natural width VMCS fields as u64s to
avoid layout issues between 32-bit and 64-bit, a devious guest can stuff
vmcs12->guest_dr7 with a 64-bit value even when both the guest and host
are 32-bit kernels.  KVM eventually drops vmcs12->guest_dr7[63:32] when
propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely
on that behavior for correctness.

Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Paolo Bonzini
8171cd6880 KVM: x86: use raw clock values consistently
Commit 53fafdbb8b ("KVM: x86: switch KVMCLOCK base to monotonic raw
clock") changed kvmclock to use tkr_raw instead of tkr_mono.  However,
the default kvmclock_offset for the VM was still based on the monotonic
clock and, if the raw clock drifted enough from the monotonic clock,
this could cause a negative system_time to be written to the guest's
struct pvclock.  RHEL5 does not like it and (if it boots fast enough to
observe a negative time value) it hangs.

There is another thing to be careful about: getboottime64 returns the
host boot time with tkr_mono frequency, and subtracting the tkr_raw-based
kvmclock value will cause the wallclock to be off if tkr_raw drifts
from tkr_mono.  To avoid this, compute the wallclock delta from the
current time instead of being clever and using getboottime64.

Fixes: 53fafdbb8b ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Paolo Bonzini
917f9475c0 KVM: x86: reorganize pvclock_gtod_data members
We will need a copy of tk->offs_boot in the next patch.  Store it and
cleanup the struct: instead of storing tk->tkr_xxx.base with the tk->offs_boot
included, store the raw value in struct pvclock_clock and sum it in
do_monotonic_raw and do_realtime.   tk->tkr_xxx.xtime_nsec also moves
to struct pvclock_clock.

While at it, fix a (usually harmless) typo in do_monotonic_raw, which
was using gtod->clock.shift instead of gtod->raw_clock.shift.

Fixes: 53fafdbb8b ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Miaohe Lin
33aabd029f KVM: nVMX: delete meaningless nested_vmx_run() declaration
The function nested_vmx_run() declaration is below its implementation. So
this is meaningless and should be removed.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Paolo Bonzini
e8ef2a19a0 KVM: SVM: allow AVIC without split irqchip
SVM is now able to disable AVIC dynamically whenever the in-kernel PIT sets
up an ack notifier, so we can enable it even if in-kernel IOAPIC/PIC/PIT
are in use.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit
f458d039db kvm: ioapic: Lazy update IOAPIC EOI
In-kernel IOAPIC does not receive EOI with AMD SVM AVIC
since the processor accelerate write to APIC EOI register and
does not trap if the interrupt is edge-triggered.

Workaround this by lazy check for pending APIC EOI at the time when
setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit
1ec2405c7c kvm: ioapic: Refactor kvm_ioapic_update_eoi()
Refactor code for handling IOAPIC EOI for subsequent patch.
There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit
e2ed4078a6 kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.
AMD SVM AVIC accelerates EOI write and does not trap. This causes
in-kernel PIT re-injection mode to fail since it relies on irq-ack
notifier mechanism. So, APICv is activated only when in-kernel PIT
is in discard mode e.g. w/ qemu option:

  -global kvm-pit.lost_tick_policy=discard

Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this
reason.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit
f3515dc3be svm: Temporarily deactivate AVIC during ExtINT handling
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary
deactivated and fall back to using legacy interrupt injection via vINTR
and interrupt window.

Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit
9a0bf05430 svm: Deactivate AVIC when launching guest with nested SVM support
Since AVIC does not currently work w/ nested virtualization,
deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM]
(i.e. indicate support for SVM, which is needed for nested virtualization).
Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for
this reason.

Suggested-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit
f4fdc0a2ed kvm: x86: hyperv: Use APICv update request interface
Since disabling APICv has to be done for all vcpus on AMD-based
system, adopt the newly introduced kvm_request_apicv_update()
interface, and introduce a new APICV_INHIBIT_REASON_HYPERV.

Also, remove the kvm_vcpu_deactivate_apicv() since no longer used.

Cc: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit
6c3e4422dd svm: Add support for dynamic APICv
Add necessary logics to support (de)activate AVIC at runtime.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Suravee Suthikulpanit
2de9d0ccd0 kvm: x86: Introduce x86 ops hook for pre-update APICv
AMD SVM AVIC needs to update APIC backing page mapping before changing
APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl
function hook to be called prior KVM APICv update request to each vcpu.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Suravee Suthikulpanit
ef8efd7a15 kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasons
Inibit reason bits are used to determine if APICv deactivation is
applicable for a particular hardware virtualization architecture.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Suravee Suthikulpanit
dcbcfa287e KVM: svm: avic: Add support for dynamic setup/teardown of virtual APIC backing page
Re-factor avic_init_access_page() to avic_update_access_page() since
activate/deactivate AVIC requires setting/unsetting the memory region used
for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT).

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:41 +01:00