Commit 6e0a32d6f3 ("spi: dw: Fix default polarity of native
chipselect") attempted to fix the problem when GPIO active-high
chip-select is utilized to communicate with some SPI slave. It fixed
the problem, but broke the normal native CS support. At the same time
the reversion commit ada9e3fcc1 ("spi: dw: Correct handling of native
chipselect") didn't solve the problem either, since it just inverted
the set_cs() polarity perception without taking into account that
CS-high might be applicable. Here is what is done to finally fix the
problem.
DW SPI controller demands any native CS being set in order to proceed
with data transfer. So in order to activate the SPI communications we
must set any bit in the Slave Select DW SPI controller register no
matter whether the platform requests the GPIO- or native CS. Preferably
it should be the bit corresponding to the SPI slave CS number. But
currently the dw_spi_set_cs() method activates the chip-select
only if the second argument is false. Since the second argument of the
set_cs callback is expected to be a boolean with "is-high" semantics
(actual chip-select pin state value), the bit in the DW SPI Slave
Select register will be set only if SPI core requests the driver
to set the CS in the low state. So this will work for active-low
GPIO-based CS case, and won't work for active-high CS setting
the bit when SPI core actually needs to deactivate the CS.
This commit fixes the problem for all described cases. So no matter
whether an SPI slave needs GPIO- or native-based CS with active-high
or low signal the corresponding bit will be set in SER.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Fixes: ada9e3fcc1 ("spi: dw: Correct handling of native chipselect")
Fixes: 6e0a32d6f3 ("spi: dw: Fix default polarity of native chipselect")
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20200515104758.6934-5-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
Implement two basic tests to verify terse dump functionality of flower
classifier:
- Test that verifies that terse dump works.
- Test that verifies that terse dump doesn't print filter key.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only
dump handle, flags and action data in terse mode.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend tcf_action_dump() with boolean argument 'terse' that is used to
request terse-mode action dump. In terse mode only essential data needed to
identify particular action (action kind, cookie, etc.) and its stats is put
to resulting skb and everything else is omitted. Implement
tcf_exts_terse_dump() helper in cls API that is intended to be used to
request terse dump of all exts (actions) attached to the filter.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse
filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option
is intended to be used to improve performance of TC filter dump when
userland only needs to obtain stats and not the whole classifier/action
data. Extend struct tcf_proto_ops with new terse_dump() callback that must
be defined by supporting classifier implementations.
Support of the options in specific classifiers and actions is
implemented in following patches in the series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assumption that a device node is associated either with the
netdev's device, or the parent of that device, does not hold for all
drivers. E.g. Freescale's DPAA has two layers of platform devices
above the netdev. Instead, recursively walk up the tree from the
netdev, allowing any parent to match against the sought after node.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For USB sound devices using implicit feedback the endpoint used for
this feedback should be able to be opened twice, once for required
feedback and second time for audio data. This way these devices can be
put in duplex audio mode. Since this only works if the settings of the
endpoint don't change a check is included for this.
This fixes bug 207023 ("MOTU M2 regression on duplex audio") and
should also fix bug 103751 ("M-Audio Fast Track Ultra usb audio device
will not operate full-duplex")
Fixes: c249177944 ("ALSA: usb-audio: add implicit fb quirk for MOTU M Series")
Signed-off-by: Erwin Burema <e.burema@gmail.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207023
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=103751
Link: https://lore.kernel.org/r/2410739.SCZni40SNb@alpha-wolf
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull hwmon fixes from Guenter Roeck:
- Fix ADC access synchronization problem with da9052 driver
- Fix temperature limit and status reporting in nct7904 driver
- Fix drivetemp temperature reporting if SCT is supported but SCT data
tables are not.
* tag 'hwmon-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (da9052) Synchronize access with mfd
hwmon: (nct7904) Fix incorrect range of temperature limit registers
hwmon: (nct7904) Read all SMI status registers in probe function
hwmon: (drivetemp) Fix SCT support if SCT data tables are not supported
Apparently the 128x128 and 256x256 ARGB cursor modes were
only added on LPT/CST.
While the display section of bspec isn't super clear on the
subject, it does highlight these two modes in a different
color, has a few changlog entries indicating the 256x256 mode
was added for a LPT DCN, and that the 128x128 mode was also
added later (though no DCN/platform note there).
The "device dependencies" bspec section does list the 256x256x32
as a new feature for LPT/CST, and goes on to mention that current
hw only has the 64x64x32 mode (which reinforces the notion that
the 128x128 mode was also added at the same time).
Testing on actual hardware confirms all of this. CI shows all
the 128x128 and 256x256 tests failing on GDG, and my ALV
definitely doesn't like them.
So we shall limit GDG/ALV to 64x64 only. And while at it
let's adjust the mobile gen2 case to list the two platforms
explicitly so that the if-ladder looks reasonably uniform.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191028113036.27553-2-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Pull sound fixes from Takashi Iwai:
"Things look good and calming down; the only change to ALSA core is the
fix for racy rawmidi buffer accesses spotted by syzkaller, and the
rest are all small device-specific quirks for HD-audio and USB-audio
devices"
* tag 'sound-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Limit int mic boost for Thinkpad T530
ALSA: hda/realtek - Add COEF workaround for ASUS ZenBook UX431DA
ALSA: hda/realtek: Enable headset mic of ASUS UX581LV with ALC295
ALSA: hda/realtek - Enable headset mic of ASUS UX550GE with ALC295
ALSA: hda/realtek - Enable headset mic of ASUS GL503VM with ALC295
ALSA: hda/realtek: Add quirk for Samsung Notebook
ALSA: rawmidi: Fix racy buffer resize under concurrent accesses
ALSA: usb-audio: add mapping for ASRock TRX40 Creator
ALSA: hda/realtek - Fix S3 pop noise on Dell Wyse
Revert "ALSA: hda/realtek: Fix pop noise on ALC225"
ALSA: firewire-lib: fix 'function sizeof not defined' error of tracepoints format
ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset
Pull drm fixes from Dave Airlie:
"As mentioned last week an i915 PR came in late, but I left it, so the
i915 bits of this cover 2 weeks, which is why it's likely a bit larger
than usual.
Otherwise it's mostly amdgpu fixes, one tegra fix, one meson fix.
i915:
- Handle idling during i915_gem_evict_something busy loops (Chris)
- Mark current submissions with a weak-dependency (Chris)
- Propagate error from completed fences (Chris)
- Fixes on execlist to avoid GPU hang situation (Chris)
- Fixes couple deadlocks (Chris)
- Timeslice preemption fixes (Chris)
- Fix Display Port interrupt handling on Tiger Lake (Imre)
- Reduce debug noise around Frame Buffer Compression (Peter)
- Fix logic around IPC W/a for Coffee Lake and Kaby Lake (Sultan)
- Avoid dereferencing a dead context (Chris)
tegra:
- tegra120/4 smmu fixes
amdgpu:
- Clockgating fixes
- Fix fbdev with scatter/gather display
- S4 fix for navi
- Soft recovery for gfx10
- Freesync fixes
- Atomic check cursor fix
- Add a gfxoff quirk
- MST fix
amdkfd:
- Fix GEM reference counting
meson:
- error code propogation fix"
* tag 'drm-fixes-2020-05-15' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/i915: Handle idling during i915_gem_evict_something busy loops
drm/meson: pm resume add return errno branch
drm/amd/amdgpu: Update update_config() logic
drm/amd/amdgpu: add raven1 part to the gfxoff quirk list
drm/i915: Mark concurrent submissions with a weak-dependency
drm/i915: Propagate error from completed fences
drm/i915/gvt: Fix kernel oops for 3-level ppgtt guest
drm/i915/gvt: Init DPLL/DDI vreg for virtual display instead of inheritance.
drm/amd/display: add basic atomic check for cursor plane
drm/amd/display: Fix vblank and pageflip event handling for FreeSync
drm/amdgpu: implement soft_recovery for gfx10
drm/amdgpu: enable hibernate support on Navi1X
drm/amdgpu: Use GEM obj reference for KFD BOs
drm/amdgpu: force fbdev into vram
drm/amd/powerplay: perform PG ungate prior to CG ungate
drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate
drm/amdgpu: disable MGCG/MGLS also on gfx CG ungate
drm/i915/execlists: Track inflight CCID
drm/i915/execlists: Avoid reusing the same logical CCID
drm/i915/gem: Remove object_is_locked assertion from unpin_from_display_plane
...
Moving forward, platforms are going to need to execute specific "last-man"
operations before a domain idle state can be entered. In one way or the
other, these operations needs to be triggered while walking the
hierarchical topology via runtime PM and genpd, as it's at that point the
last-man becomes known.
Moreover, executing last-man operations needs to be done after the CPU PM
notifications are sent through cpu_pm_enter(), as otherwise it's likely
that some notifications would fail. Therefore, let's re-order the sequence
in psci_enter_domain_idle_state(), so cpu_pm_enter() gets called prior
pm_runtime_put_sync().
Fixes: ce85aef570 ("cpuidle: psci: Manage runtime PM in the idle path")
Reported-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Per the ACPI spec, interrupts in the range [0, 255] may be handled
in AML using individual methods whose naming is based on the format
_Exx or _Lxx, where xx is the hex representation of the interrupt
index.
Add support for this missing feature to our ACPI GED driver.
Cc: v4.9+ <stable@vger.kernel.org> # v4.9+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Two new stats for exposing halt-polling cpu usage:
halt_poll_success_ns
halt_poll_fail_ns
Thus sum of these 2 stats is the total cpu time spent polling. "success"
means the VCPU polled until a virtual interrupt was delivered. "fail"
means the VCPU had to schedule out (either because the maximum poll time
was reached or it needed to yield the CPU).
To avoid touching every arch's kvm_vcpu_stat struct, only update and
export halt-polling cpu usage stats if we're on x86.
Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
~500 years, which seems reasonably large.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jon Cargille <jcargill@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200508182240.68440-1-jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The hrtimer used to emulate the VMX-preemption timer must be pinned to
the same logical processor as the vCPU thread to be interrupted if we
want to have any hope of adhering to the architectural specification
of the VMX-preemption timer. Even with this change, the emulated
VMX-preemption timer VM-exit occasionally arrives too late.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20200508203643.85477-4-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The PINNED bit is ignored by hrtimer_init. It is only considered when
starting the timer.
When the hrtimer isn't pinned to the same logical processor as the
vCPU thread to be interrupted, the emulated VMX-preemption timer
often fails to adhere to the architectural specification.
Fixes: f15a75eedc ("KVM: nVMX: make emulated nested preemption timer pinned")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20200508203643.85477-2-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch implements a fastpath for the preemption timer vmexit. The vmexit
can be handled quickly so it can be performed with interrupts off and going
back directly to the guest.
Testing on SKX Server.
cyclictest in guest(w/o mwait exposed, adaptive advance lapic timer is default -1):
5540.5ns -> 4602ns 17%
kvm-unit-test/vmexit.flat:
w/o avanced timer:
tscdeadline_immed: 3028.5 -> 2494.75 17.6%
tscdeadline: 5765.7 -> 5285 8.3%
w/ adaptive advance timer default -1:
tscdeadline_immed: 3123.75 -> 2583 17.3%
tscdeadline: 4663.75 -> 4537 2.7%
Tested-by: Haiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-8-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace the ad hoc test in vmx_set_hv_timer with a test in the caller,
start_hv_timer. This test is not Intel-specific and would be duplicated
when introducing the fast path for the TSC deadline MSR.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While optimizing posted-interrupt delivery especially for the timer
fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt()
to introduce substantial latency because the processor has to perform
all vmentry tasks, ack the posted interrupt notification vector,
read the posted-interrupt descriptor etc.
This is not only slow, it is also unnecessary when delivering an
interrupt to the current CPU (as is the case for the LAPIC timer) because
PIR->IRR and IRR->RVI synchronization is already performed on vmentry
Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and
instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST
fastpath as well.
Tested-by: Haiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-6-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adds a fastpath_t typedef since enum lines are a bit long, and replace
EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values.
- EXIT_FASTPATH_EXIT_HANDLED kvm will still go through it's full run loop,
but it would skip invoking the exit handler.
- EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered
without invoking the exit handler or going
back to vcpu_run
Tested-by: Haiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use __print_flags() to display the names of VMX flags in VM-Exit traces
and strip the flags when printing the basic exit reason, e.g. so that a
failed VM-Entry due to invalid guest state gets recorded as
"INVALID_STATE FAILED_VMENTRY" instead of "0x80000021".
Opportunstically fix misaligned variables in the kvm_exit and
kvm_nested_vmexit_inject tracepoints.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200508235348.19427-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce generic fastpath handler to handle MSR fastpath, VMX-preemption
timer fastpath etc; move it after vmx_complete_interrupts() in order to
catch events delivered to the guest, and abort the fast path in later
patches. While at it, move the kvm_exit tracepoint so that it is printed
for fastpath vmexits as well.
There is no observed performance effect for the IPI fastpath after this patch.
Tested-by: Haiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <1588055009-12677-2-git-send-email-wanpengli@tencent.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly truncate the data written to vmcs.SYSENTER_EIP/ESP on WRMSR
if the virtual CPU doesn't support 64-bit mode. The SYSENTER address
fields in the VMCS are natural width, i.e. bits 63:32 are dropped if the
CPU doesn't support Intel 64 architectures. This behavior is visible to
the guest after a VM-Exit/VM-Exit roundtrip, e.g. if the guest sets bits
63:32 in the actual MSR.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428231025.12766-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Improve handle_external_interrupt_irqoff inline assembly in several ways:
- remove unneeded %c operand modifiers and "$" prefixes
- use %rsp instead of _ASM_SP, since we are in CONFIG_X86_64 part
- use $-16 immediate to align %rsp
- remove unneeded use of __ASM_SIZE macro
- define "ss" named operand only for X86_64
The patch introduces no functional changes.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Message-Id: <20200504155706.2516956-1-ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The index returned by kvm_async_pf_gfn_slot() will be removed when an
async pf gfn is going to be removed. However kvm_async_pf_gfn_slot()
is not reliable in that it can return the last key it loops over even
if the gfn is not found in the async gfn array. It should never
happen, but it's still better to sanity check against that to make
sure no unexpected gfn will be removed.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200416155910.267514-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
hva_to_pfn_remapped() calls fixup_user_fault(), which has already
handled the retry gracefully. Even if "unlocked" is set to true, it
means that we've got a VM_FAULT_RETRY inside fixup_user_fault(),
however the page fault has already retried and we should have the pfn
set correctly. No need to do that again.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200416155906.267462-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Forcing the ASYNC_PF_PER_VCPU to be power of two is much easier to be
used rather than calling roundup_pow_of_two() from time to time. Do
this by adding a BUILD_BUG_ON() inside the hash function.
Another point is that generally async pf does not allow concurrency
over ASYNC_PF_PER_VCPU after all (see kvm_setup_async_pf()), so it
does not make much sense either to have it not a power of two or some
of the entries will definitely be wasted.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200416155859.267366-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a helper, mmu_alloc_root(), to consolidate the allocation of a root
shadow page, which has the same basic mechanics for all flavors of TDP
and shadow paging.
Note, __pa(sp->spt) doesn't need to be protected by mmu_lock, sp->spt
points at a kernel page.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428023714.31923-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL
with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G. KVM's
enums are borderline impossible to remember and result in code that is
visually difficult to audit, e.g.
if (!enable_ept)
ept_lpage_level = 0;
else if (cpu_has_vmx_ept_1g_page())
ept_lpage_level = PT_PDPE_LEVEL;
else if (cpu_has_vmx_ept_2m_page())
ept_lpage_level = PT_DIRECTORY_LEVEL;
else
ept_lpage_level = PT_PAGE_TABLE_LEVEL;
versus
if (!enable_ept)
ept_lpage_level = 0;
else if (cpu_has_vmx_ept_1g_page())
ept_lpage_level = PG_LEVEL_1G;
else if (cpu_has_vmx_ept_2m_page())
ept_lpage_level = PG_LEVEL_2M;
else
ept_lpage_level = PG_LEVEL_4K;
No functional change intended.
Suggested-by: Barret Rhoden <brho@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change the PSE hugepage handling in walk_addr_generic() to fire on any
page level greater than PT_PAGE_TABLE_LEVEL, a.k.a. PG_LEVEL_4K. PSE
paging only has two levels, so "== 2" and "> 1" are functionally the
same, i.e. this is a nop.
A future patch will drop KVM's PT_*_LEVEL enums in favor of the kernel's
PG_LEVEL_* enums, at which point "walker->level == PG_LEVEL_2M" is
semantically incorrect (though still functionally ok).
No functional change intended.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use an enum for passing around the failure code for a failed VM-Enter
that results in VM-Exit to provide a level of indirection from the final
resting place of the failure code, vmcs.EXIT_QUALIFICATION. The exit
qualification field is an unsigned long, e.g. passing around
'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
bits when reporting errors to L1. This is a red herring because the
only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
close to overflowing a u32.
Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
by the MSR load list, which returns the (1-based) entry that failed, and
the number of MSRs to load is a 32-bit VMCS field. At first blush, it
would appear that overflowing a u32 is possible, but the number of MSRs
that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).
In other words, there are two completely disparate types of data that
eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
an 'unsigned long' in nature. This was presumably the reasoning for
switching to 'u32' when the related code was refactored in commit
ca0bde28f2 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").
Using an enum for the failure code addresses the technically-possible-
but-will-never-happen scenario where Intel defines a failure code that
doesn't fit in a 32-bit integer. The enum variables and values will
either be automatically sized (gcc 5.4 behavior) or be subjected to some
combination of truncation. The former case will simply work, while the
latter will trigger a compile-time warning unless the compiler is being
particularly unhelpful.
Separating the failure code from the failed MSR entry allows for
disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
conundrum where KVM has to choose between 'u32 exit_qual' and tracking
values as 'unsigned long' that have no business being tracked as such.
To cement the split, set vmcs12->exit_qualification directly from the
entry error code or failed MSR index instead of bouncing through a local
variable.
Opportunistically rename the variables in load_vmcs12_host_state() and
vmx_set_nested_state() to call out that they're ignored, set exit_reason
on demand on nested VM-Enter failure, and add a comment in
nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.
No functional change intended.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200511220529.11402-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>