In the vmbus_on_event loop, 2 jiffies timer will not serve the purpose if
callback_fn takes longer. For effective use move this check inside of
callback functions where needed. Out of all the VMbus drivers using
vmbus_on_event, only storvsc has a high packet volume, thus add this limit
only in storvsc callback for now.
There is no apparent benefit of loop itself because this tasklet will be
scheduled anyway again if there are packets left in ring buffer. This
patch removes this unnecessary loop as well.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1658741848-4210-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Passed through PCI device sometimes misbehave on Gen1 VMs when Hyper-V
DRM driver is also loaded. Looking at IOMEM assignment, we can see e.g.
$ cat /proc/iomem
...
f8000000-fffbffff : PCI Bus 0000:00
f8000000-fbffffff : 0000:00:08.0
f8000000-f8001fff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
...
fe0000000-fffffffff : PCI Bus 0000:00
fe0000000-fe07fffff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
fe0000000-fe07fffff : 2ba2:00:02.0
fe0000000-fe07fffff : mlx4_core
the interesting part is the 'f8000000' region as it is actually the
VM's framebuffer:
$ lspci -v
...
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
...
hv_vmbus: registering driver hyperv_drm
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5
hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console
hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff]
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active?
Note: "Cannot request framebuffer" is not a fatal error in
hyperv_setup_gen1() as the code assumes there's some other framebuffer
device there but we actually have some other PCI device (mlx4 in this
case) config space there!
The problem appears to be that vmbus_allocate_mmio() can use dedicated
framebuffer region to serve any MMIO request from any device. The
semantics one might assume of a parameter named "fb_overlap_ok"
aren't implemented because !fb_overlap_ok essentially has no effect.
The existing semantics are really "prefer_fb_overlap". This patch
implements the expected and needed semantics, which is to not allocate
from the frame buffer space when !fb_overlap_ok.
Note, Gen2 VMs are usually unaffected by the issue because
framebuffer region is already taken by EFI fb (in case kernel supports
it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI
pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers
load after it. Devices can be brought up in any sequence so let's
resolve the issue by always ignoring 'fb_mmio' region for non-FB
requests, even if the region is unclaimed.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20220827130345.1320254-4-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
vmbus_reserve_fb() tries reserving framebuffer region iff
'screen_info.lfb_base' is set. Gen2 VMs seem to have it set by EFI
and/or by the kernel EFI FB driver (or, in some edge cases like kexec,
the address where the buffer was moved, see
https://lore.kernel.org/all/20201014092429.1415040-1-kasong@redhat.com/)
but on Gen1 VM it depends on bootloader behavior. With grub, it depends
on 'gfxpayload=' setting but in some cases it is observed to be zero.
That being said, relying on 'screen_info.lfb_base' to reserve
framebuffer region is risky. For Gen1 VMs, it should always be
possible to get the address from the dedicated PCI device instead.
Check for legacy PCI video device presence and reserve the whole
region for framebuffer on Gen1 VMs.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20220827130345.1320254-3-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Allow the guest to know how much it is ballooned by the host.
It is useful when debugging out of memory conditions.
When host gets back memory from the guest it is accounted
as used memory in the guest but the guest have no way to know
how much it is actually ballooned.
Expose current state, flags and max possible memory to the guest.
While at it - fix a 10+ years old typo.
Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220711181825.52318-1-alexander.atanasov@virtuozzo.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Add a flag to indicate that the vmbus is suspended so we should ignore
any offer message. Add a new work_queue for rescind msg, so we could drain
it along with other offer work_queues upon suspension.
It was observed that in some hibernation related scenario testing, after
vmbus_bus_suspend() we get rescind offer message for the vmbus. This would
lead to processing of a rescind message for a channel that has already been
suspended.
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220711041147.GA5569@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Wei Liu <wei.liu@kernel.org>
In case of invalid sub channel, release cpu lock before returning.
Fixes: a949e86c0d ("Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug")
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1654794996-13244-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
When initially assigning a VMbus channel interrupt to a CPU, don’t choose
a managed IRQ isolated CPU (as specified on the kernel boot line with
parameter 'isolcpus=managed_irq,<#cpu>'). Also, when using sysfs to change
the CPU that a VMbus channel will interrupt, don't allow changing to a
managed IRQ isolated CPU.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1653637439-23060-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Here is the set of driver core changes for 5.19-rc1.
Note, I'm not really happy with this pull request as-is, see below for
details, but overall this is all good for everything but a small set of
systems, which we have a fix for already.
Lots of tiny driver core changes and cleanups happened this cycle,
but the two major things were:
- firmware_loader reorganization and additions including the
ability to have XZ compressed firmware images and the ability
for userspace to initiate the firmware load when it needs to,
instead of being always initiated by the kernel. FPGA devices
specifically want this ability to have their firmware changed
over the lifetime of the system boot, and this allows them to
work without having to come up with yet-another-custom-uapi
interface for loading firmware for them.
- physical location support added to sysfs so that devices that
know this information, can tell userspace where they are
located in a common way. Some ACPI devices already support
this today, and more bus types should support this in the
future.
Smaller changes included:
- driver_override api cleanups and fixes
- error path cleanups and fixes
- get_abi script fixes
- deferred probe timeout changes.
It's that last change that I'm the most worried about. It has been
reported to cause boot problems for a number of systems, and I have a
tested patch series that resolves this issue. But I didn't get it
merged into my tree before 5.18-final came out, so it has not gotten any
linux-next testing.
I'll send the fixup patches (there are 2) as a follow-on series to this
pull request if you want to take them directly, _OR_ I can just revert
the probe timeout changes and they can wait for the next -rc1 merge
cycle. Given that the fixes are tested, and pretty simple, I'm leaning
toward that choice. Sorry this all came at the end of the merge window,
I should have resolved this all 2 weeks ago, that's my fault as it was
in the middle of some travel for me.
All have been tested in linux-next for weeks, with no reported issues
other than the above-mentioned boot time outs.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYpnv/A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk/fACgvmenbo5HipqyHnOmTQlT50xQ9EYAn2eTq6ai
GkjLXBGNWOPBa5cU52qf
=yEi/
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of driver core changes for 5.19-rc1.
Lots of tiny driver core changes and cleanups happened this cycle, but
the two major things are:
- firmware_loader reorganization and additions including the ability
to have XZ compressed firmware images and the ability for userspace
to initiate the firmware load when it needs to, instead of being
always initiated by the kernel. FPGA devices specifically want this
ability to have their firmware changed over the lifetime of the
system boot, and this allows them to work without having to come up
with yet-another-custom-uapi interface for loading firmware for
them.
- physical location support added to sysfs so that devices that know
this information, can tell userspace where they are located in a
common way. Some ACPI devices already support this today, and more
bus types should support this in the future.
Smaller changes include:
- driver_override api cleanups and fixes
- error path cleanups and fixes
- get_abi script fixes
- deferred probe timeout changes.
It's that last change that I'm the most worried about. It has been
reported to cause boot problems for a number of systems, and I have a
tested patch series that resolves this issue. But I didn't get it
merged into my tree before 5.18-final came out, so it has not gotten
any linux-next testing.
I'll send the fixup patches (there are 2) as a follow-on series to this
pull request.
All have been tested in linux-next for weeks, with no reported issues
other than the above-mentioned boot time-outs"
* tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
driver core: fix deadlock in __device_attach
kernfs: Separate kernfs_pr_cont_buf and rename_lock.
topology: Remove unused cpu_cluster_mask()
driver core: Extend deferred probe timeout on driver registration
MAINTAINERS: add Russ Weight as a firmware loader maintainer
driver: base: fix UAF when driver_attach failed
test_firmware: fix end of loop test in upload_read_show()
driver core: location: Add "back" as a possible output for panel
driver core: location: Free struct acpi_pld_info *pld
driver core: Add "*" wildcard support to driver_async_probe cmdline param
driver core: location: Check for allocations failure
arch_topology: Trace the update thermal pressure
kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file.
export: fix string handling of namespace in EXPORT_SYMBOL_NS
rpmsg: use local 'dev' variable
rpmsg: Fix calling device_lock() on non-initialized device
firmware_loader: describe 'module' parameter of firmware_upload_register()
firmware_loader: Move definitions from sysfs_upload.h to sysfs.h
firmware_loader: Fix configs for sysfs split
selftests: firmware: Add firmware upload selftests
...
Add missing cleanup in balloon_probe() if the call to
balloon_connect_vsp() fails. Also correctly handle cleanup in
balloon_remove() when dm_state is DM_INIT_ERROR because
balloon_resume() failed.
Signed-off-by: Shradha Gupta <shradhagupta@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220516045058.GA7933@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The VMbus driver has special case code for running on the first released
versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions are now
out of support (except for extended security updates) and lack the
performance features needed for effective production usage of Linux
guests.
Simplify the code by removing the negotiation of the VMbus protocol
versions required for these releases of Hyper-V, and by removing the
special case code for handling these VMbus protocol versions.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/1651509391-2058-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
With no users of hv_pkt_iter_next_raw() and no "external" users of
hv_pkt_iter_first_raw(), the iterator functions can be refactored
and simplified to remove some indirection/code.
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220428145107.7878-6-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
So that isolated guests can communicate with the host via hv_sock
channels.
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220428145107.7878-5-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
To abtract the lock and unlock operations on the requestor spin lock.
The helpers will come in handy in hv_pci.
No functional change.
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220419122325.10078-6-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The function can be used to retrieve and clear/remove a transation ID
from a channel requestor, provided the memory address corresponding to
the ID equals a specified address. The function, and its 'lockless'
variant __vmbus_request_addr_match(), will be used by hv_pci.
Refactor vmbus_request_addr() to reuse the 'newly' introduced code.
No functional change.
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220419122325.10078-5-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The function can be used to send a VMbus packet and retrieve the
corresponding transaction ID. It will be used by hv_pci.
No functional change.
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220419122325.10078-4-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
vmbus_request_addr() returns 0 (zero) if the transaction ID passed
to as argument is 0. This is unfortunate for two reasons: first,
netvsc_send_completion() does not check for a NULL cmd_rqst (before
dereferencing the corresponding NVSP message); second, 0 is a *valid*
value of cmd_rqst in netvsc_send_tx_complete(), cf. the call of
vmbus_sendpacket() in netvsc_send_pkt().
vmbus_request_addr() has included the code in question since its
introduction with commit e8b7db3844 ("Drivers: hv: vmbus: Add
vmbus_requestor data structure for VMBus hardening"); such code was
motivated by the early use of vmbus_requestor by hv_storvsc. Since
hv_storvsc moved to a tag-based mechanism to generate and retrieve
transaction IDs with commit bf5fd8cae3 ("scsi: storvsc: Use
blk_mq_unique_tag() to generate requestIDs"), vmbus_request_addr()
can be modified to return VMBUS_RQST_ERROR if the ID is 0. This
change solves the issues in hv_netvsc (and makes the handling of
messages with transaction ID of 0 consistent with the semantics
"the ID is not contained in the requestor/invalid ID").
vmbus_next_request_id(), vmbus_request_addr() should still reserve
the ID of 0 for Hyper-V, because Hyper-V will "ignore" (not respond
to) VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED packets/requests with
transaction ID of 0 from the guest.
Fixes: bf5fd8cae3 ("scsi: storvsc: Use blk_mq_unique_tag() to generate requestIDs")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220419122325.10078-2-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Hyper-V may offer an Initial Machine Configuration (IMC) synthetic
device to guest VMs. The device may be used by Windows guests to get
specialization information, such as the hostname. But the device
is not used in Linux and there is no Linux driver, so it is
unsupported.
Currently, the IMC device GUID is not recognized by the VMbus driver,
which results in an "Unknown GUID" error message during boot. Add
the GUID to the list of known but unsupported devices so that the
error message is not generated. Other than avoiding the error message,
there is no change in guest behavior.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1649818140-100953-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Use a helper to set driver_override to the reduce amount of duplicated
code. Make the driver_override field const char, because it is not
modified by the core and it matches other subsystems.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-5-krzysztof.kozlowski@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Following the recommendation in Documentation/memory-barriers.txt for
virtual machine guests.
Fixes: 8b6a877c06 ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Currently there are known potential issues for balloon and hot-add on
ARM64:
* Unballoon requests from Hyper-V should only unballoon ranges
that are guest page size aligned, otherwise guests cannot handle
because it's impossible to partially free a page. This is a
problem when guest page size > 4096 bytes.
* Memory hot-add requests from Hyper-V should provide the NUMA
node id of the added ranges or ARM64 should have a functional
memory_add_physaddr_to_nid(), otherwise the node id is missing
for add_memory().
These issues require discussions on design and implementation. In the
meanwhile, post_status() is working and essential to guest monitoring.
Therefore instead of disabling the entire hv_balloon driver, the
ballooning (when page size > 4096 bytes) and hot-add are disabled
accordingly for now. Once the issues are fixed, they can be re-enable in
these cases.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220325023212.1570049-3-boqun.feng@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
DM_STATUS_REPORT expects the numbers of pages in the unit of 4k pages
(HV_HYP_PAGE) instead of guest pages, so to make it work when guest page
sizes are larger than 4k, convert the numbers of guest pages into the
numbers of HV_HYP_PAGEs.
Note that the numbers of guest pages are still used for tracing because
tracing is internal to the guest kernel.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220325023212.1570049-2-boqun.feng@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
When reading a packet from a host-to-guest ring buffer, there is no
memory barrier between reading the write index (to see if there is
a packet to read) and reading the contents of the packet. The Hyper-V
host uses store-release when updating the write index to ensure that
writes of the packet data are completed first. On the guest side,
the processor can reorder and read the packet data before the write
index, and sometimes get stale packet data. Getting such stale packet
data has been observed in a reproducible case in a VM on ARM64.
Fix this by using virt_load_acquire() to read the write index,
ensuring that reads of the packet data cannot be reordered
before it. Preventing such reordering is logically correct, and
with this change, getting stale data can no longer be reproduced.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/1648394710-33480-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
VMbus synthetic devices are not represented in the ACPI DSDT -- only
the top level VMbus device is represented. As a result, on ARM64
coherence information in the _CCA method is not specified for
synthetic devices, so they default to not hardware coherent.
Drivers for some of these synthetic devices have been recently
updated to use the standard DMA APIs, and they are incurring extra
overhead of unneeded software coherence management.
Fix this by propagating coherence information from the VMbus node
in ACPI to the individual synthetic devices. There's no effect on
x86/x64 where devices are always hardware coherent.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1648138492-2191-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The vmbus driver relies on the panic notifier infrastructure to perform
some operations when a panic event is detected. Since vmbus can be built
as module, it is required that the driver handles both registering and
unregistering such panic notifier callback.
After commit 74347a99e7 ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
though, the panic notifier registration is done unconditionally in the module
initialization routine whereas the unregistering procedure is conditionally
guarded and executes only if HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE capability
is set.
This patch fixes that by unconditionally unregistering the panic notifier
in the module's exit routine as well.
Fixes: 74347a99e7 ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220315203535.682306-1-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
hv_panic_page might contain guest-sensitive information, do not dump it
over to Hyper-V by default in isolated guests.
While at it, update some comments in hyperv_{panic,die}_event().
Reported-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/20220301141135.2232-1-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmI6MhgTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXiBkB/9FEaebcytvBMNVnz/fprfQWfYxdrUB
wmDB9dnJVPX6UcWg6DuDv7KqatimDp7JrnKrNsrqlgF6Wafn/wI3Cdf8ZUi8CgtZ
Walq8m2QM9j8WCm4EtgmluzGkHjAn6llmjE7XoLmo/A8MYp0FZhJzghKfCEu54EB
9DhgqQX+zzy7n32CxoAWrGrHegnGeqMpj3rhQ4qVuNn+bOzZ/wTyJQPmXTu4hSW1
0PrXRnfzqnODkB6QDUantABZM5bh1VsqwIlENp+xAFQVsQpsc35lW7AUg7JXczVn
ES0+UHG7wVWPJCtIfD4FgIFT1xekPKxRYX9U66LMRbWGYhXfohQX+wFt
=u62V
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
"Minor patches from various people"
* tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Output host build info as normal Windows version number
hv_balloon: rate-limit "Unhandled message" warning
drivers: hv: log when enabling crash_kexec_post_notifiers
hv_utils: Add comment about max VMbus packet size in VSS driver
Drivers: hv: Compare cpumasks and not their weights in init_vp_index()
Drivers: hv: Rename 'alloced' to 'allocated'
Drivers: hv: vmbus: Use struct_size() helper in kmalloc()
For a couple of times I have encountered a situation where
hv_balloon: Unhandled message: type: 12447
is being flooded over 1 million times per second with various values,
filling the log and consuming cycles, making debugging difficult.
Add rate limiting to the message.
Most other Hyper-V drivers already have similar rate limiting in their
message callbacks.
The cause of the floods in my case was probably fixed by 96d9d1fa5c
("Drivers: hv: balloon: account for vmbus packet header in
max_pkt_size").
Fixes: 9aa8b50b2b ("Drivers: hv: Add Hyper-V balloon driver")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220222141400.98160-1-anssi.hannula@bitwise.fi
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Recently I went down a rabbit hole looking at a race condition in
panic() on a Hyper-V guest. I assumed, since it was missing from the
command line, that crash_kexec_post_notifiers was disabled. Only after
a rather long reproduction and analysis process did I learn that Hyper-V
actually enables this setting unconditionally.
Users and debuggers alike would like to know when these things happen. I
think it would be good to print a message to the kernel log when this
happens, so that a grep for "crash_kexec_post_notifiers" shows relevant
results.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220215013735.358327-1-stephen.s.brennan@oracle.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The VSS driver allocates a VMbus receive buffer significantly
larger than sizeof(hv_vss_msg), with no explanation. To help
prevent future mistakes, add a #define and comment about why
this is done.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1644423070-75125-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Using DMA_BIT_MASK(64) as an initializer for a global variable
causes problems with Clang 12.0.1. The compiler doesn't understand
that value 64 is excluded from the shift at compile time, resulting
in a build error.
While this is a compiler problem, avoid the issue by setting up
the dma_mask memory as part of struct hv_device, and initialize
it using dma_set_mask().
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Vitaly Chikunov <vt@altlinux.org>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 743b237c3a ("scsi: storvsc: Add Isolation VM support for storvsc driver")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/1644176216-12531-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
kobject_init_and_add() takes reference even when it fails.
According to the doc of kobject_init_and_add():
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.
Fix memory leak by calling kobject_put().
Fixes: c2e5df616e ("vmbus: add per-channel sysfs info")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Juan Vazquez <juvazq@linux.microsoft.com>
Link: https://lore.kernel.org/r/20220203173008.43480-1-linmq006@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The condition is supposed to check whether 'allocated_mask' got fully
exhausted, i.e. there's no free CPU on the NUMA node left so we have
to use one of the already used CPUs. As only bits which correspond
to CPUs from 'cpumask_of_node(numa_node)' get set in 'allocated_mask',
checking for the equal weights is technically correct but not obvious.
Let's compare cpumasks directly.
No functional change intended.
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220128103412.3033736-3-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
'Alloced' is not a real word and only saves us two letters, let's
use 'allocated' instead.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220128103412.3033736-2-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worst scenario, could lead to heap overflows.
Also, address the following sparse warnings:
drivers/hv/vmbus_drv.c:1132:31: warning: using sizeof on a flexible structure
Link: https://github.com/KSPP/linux/issues/174
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220125180131.GA67746@embeddedor
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Commit adae1e931a ("Drivers: hv: vmbus: Copy packets sent by Hyper-V
out of the ring buffer") introduced a notion of maximum packet size in
vmbus channel and used that size to initialize a buffer holding all
incoming packet along with their vmbus packet header. hv_balloon uses
the default maximum packet size VMBUS_DEFAULT_MAX_PKT_SIZE which matches
its maximum message size, however vmbus_open expects this size to also
include vmbus packet header. This leads to 4096 bytes
dm_unballoon_request messages being truncated to 4080 bytes. When the
driver tries to read next packet it starts from a wrong read_index,
receives garbage and prints a lot of "Unhandled message: type:
<garbage>" in dmesg.
Allocate the buffer with HV_HYP_PAGE_SIZE more bytes to make room for
the header.
Fixes: adae1e931a ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer")
Suggested-by: Michael Kelley (LINUX) <mikelley@microsoft.com>
Suggested-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Yanming Liu <yanminglr@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20220119202052.3006981-1-yanminglr@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmHhw7oTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXrjSB/979LV4Dn1PMcFYsSdlFEMeHcjzJdw/
kFnLPXMaPJyfg6QPuf83jxzw9uxw8fcePMdVq/FFBtmVV9fJMAv62B8jaGS1p58c
WnAg+7zsTN+xEoJn+tskSSon8BNMWVrl41zP3K4Ged+5j8UEBk62GB8Orz1qkpwL
fTh3/+xAvczJeD4zZb1dAm4WnmcQJ4vhg45p07jX6owvnwQAikMFl45aSW54I5o8
vAxGzFgdsZ2NtExnRNKh3b3DozA8JUE89KckBSZnDtq4rH8Fyy6Wij56Hc6v6Cml
SUohiNbHX7hsNwit/lxL8wuF97IiA0pQSABobEg3rxfTghTUep51LlaN
=/m4A
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- More patches for Hyper-V isolation VM support (Tianyu Lan)
- Bug fixes and clean-up patches from various people
* tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
scsi: storvsc: Fix storvsc_queuecommand() memory leak
x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
Drivers: hv: vmbus: Initialize request offers message for Isolation VM
scsi: storvsc: Fix unsigned comparison to zero
swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap()
x86/hyperv: Fix definition of hv_ghcb_pg variable
Drivers: hv: Fix definition of hypercall input & output arg variables
net: netvsc: Add Isolation VM support for netvsc driver
scsi: storvsc: Add Isolation VM support for storvsc driver
hyper-v: Enable swiotlb bounce buffer for Isolation VM
x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
swiotlb: Add swiotlb bounce buffer remap function for HV IVM
Since commit
ee3e00e9e7 ("random: use registers from interrupted code for CPU's w/o a cycle counter")
the irq_flags argument is no longer used.
Remove unused irq_flags.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Initialize memory of request offers message to be sent to the host so
padding or uninitialized fields do not leak guest memory contents.
Signed-off-by: Juan Vazquez <juvazq@linux.microsoft.com>
Link: https://lore.kernel.org/r/20220105192746.23046-1-juvazq@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The percpu variables hyperv_pcpu_input_arg and hyperv_pcpu_output_arg
have been incorrectly defined since their inception. The __percpu
qualifier should be associated with the void * (i.e., a pointer), not
with the target of the pointer. This distinction makes no difference
to gcc and the generated code, but sparse correctly complains. Fix
the definitions in the interest of general correctness in addition
to making sparse happy.
No functional change.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1640662315-22260-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.
rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211213071407.314309-6-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.
Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211213071407.314309-5-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The hyperv utilities use PTP clock interfaces and should depend a
a kconfig symbol such that they will be built as a loadable module or
builtin so that linker errors do not happen.
Prevents these build errors:
ld: drivers/hv/hv_util.o: in function `hv_timesync_deinit':
hv_util.c:(.text+0x37d): undefined reference to `ptp_clock_unregister'
ld: drivers/hv/hv_util.o: in function `hv_timesync_init':
hv_util.c:(.text+0x738): undefined reference to `ptp_clock_register'
Fixes: 3716a49a81 ("hv_utils: implement Hyper-V PTP source")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: linux-hyperv@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211126023316.25184-1-rdunlap@infradead.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Baihua reported an error when boot an ARM64 guest with PAGE_SIZE=64k and
BALLOON is enabled:
hv_vmbus: registering driver hv_balloon
hv_vmbus: probe failed for device 1eccfd72-4b41-45ef-b73a-4a6e44c12924 (-22)
The cause of this is that the ringbuffer size for hv_balloon is not
adjusted with VMBUS_RING_SIZE(), which makes the size not large enough
for ringbuffers on guest with PAGE_SIZE=64k. Therefore use
VMBUS_RING_SIZE() to calculate the ringbuffer size. Note that the old
size (20 * 1024) counts a 4k header in the total size, while
VMBUS_RING_SIZE() expects the parameter as the payload size, so use
16 * 1024.
Cc: <stable@vger.kernel.org> # 5.15.x
Reported-by: Baihua Lu <baihua.lu@microsoft.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20211101150026.736124-1-boqun.feng@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>