Defer APICv updates that occur while L2 is active until nested VM-Exit,
i.e. until L1 regains control. vmx_refresh_apicv_exec_ctrl() assumes L1
is active and (a) stomps all over vmcs02 and (b) neglects to ever updated
vmcs01. E.g. if vmcs12 doesn't enable the TPR shadow for L2 (and thus no
APICv controls), L1 performs nested VM-Enter APICv inhibited, and APICv
becomes unhibited while L2 is active, KVM will set various APICv controls
in vmcs02 and trigger a failed VM-Entry. The kicker is that, unless
running with nested_early_check=1, KVM blames L1 and chaos ensues.
In all cases, ignoring vmcs02 and always deferring the inhibition change
to vmcs01 is correct (or at least acceptable). The ABSENT and DISABLE
inhibitions cannot truly change while L2 is active (see below).
IRQ_BLOCKING can change, but it is firmly a best effort debug feature.
Furthermore, only L2's APIC is accelerated/virtualized to the full extent
possible, e.g. even if L1 passes through its APIC to L2, normal MMIO/MSR
interception will apply to the virtual APIC managed by KVM.
The exception is the SELF_IPI register when x2APIC is enabled, but that's
an acceptable hole.
Lastly, Hyper-V's Auto EOI can technically be toggled if L1 exposes the
MSRs to L2, but for that to work in any sane capacity, L1 would need to
pass through IRQs to L2 as well, and IRQs must be intercepted to enable
virtual interrupt delivery. I.e. exposing Auto EOI to L2 and enabling
VID for L2 are, for all intents and purposes, mutually exclusive.
Lack of dynamic toggling is also why this scenario is all but impossible
to encounter in KVM's current form. But a future patch will pend an
APICv update request _during_ vCPU creation to plug a race where a vCPU
that's being created doesn't get included in the "all vCPUs request"
because it's not yet visible to other vCPUs. If userspaces restores L2
after VM creation (hello, KVM selftests), the first KVM_RUN will occur
while L2 is active and thus service the APICv update request made during
VM creation.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220420013732.3308816-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set the DISABLE inhibit, not the ABSENT inhibit, if APICv is disabled via
module param. A recent refactoring to add a wrapper for setting/clearing
inhibits unintentionally changed the flag, probably due to a copy+paste
goof.
Fixes: 4f4c4a3ee5 ("KVM: x86: Trace all APICv inhibit changes and capture overall status")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220420013732.3308816-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Initialize debugfs_entry to its semi-magical -ENOENT value when the VM
is created. KVM's teardown when VM creation fails is kludgy and calls
kvm_uevent_notify_change() and kvm_destroy_vm_debugfs() even if KVM never
attempted kvm_create_vm_debugfs(). Because debugfs_entry is zero
initialized, the IS_ERR() checks pass and KVM derefs a NULL pointer.
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1068b1067 P4D 1068b1067 PUD 1068b0067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 871 Comm: repro Not tainted 5.18.0-rc1+ #825
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:__dentry_path+0x7b/0x130
Call Trace:
<TASK>
dentry_path_raw+0x42/0x70
kvm_uevent_notify_change.part.0+0x10c/0x200 [kvm]
kvm_put_kvm+0x63/0x2b0 [kvm]
kvm_dev_ioctl+0x43a/0x920 [kvm]
__x64_sys_ioctl+0x83/0xb0
do_syscall_64+0x31/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
Modules linked in: kvm_intel kvm irqbypass
Fixes: a44a4cc1c9 ("KVM: Don't create VM debugfs files outside of the VM directory")
Cc: stable@vger.kernel.org
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@google.com>
Reported-by: syzbot+df6fbbd2ee39f21289ef@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20220415004622.2207751-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add wrappers to acquire/release KVM's SRCU lock when stashing the index
in vcpu->src_idx, along with rudimentary detection of illegal usage,
e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the
SRCU index is (currently) either 0 or 1, illegal nesting bugs can go
unnoticed for quite some time and only cause problems when the nested
lock happens to get a different index.
Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will
likely yell so loudly that it will bring the kernel to its knees.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220415004343.2203171-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the generic kvm_vcpu's srcu_idx instead of using an indentical field
in RISC-V's version of kvm_vcpu_arch. Generic KVM very intentionally
does not touch vcpu->srcu_idx, i.e. there's zero chance of running afoul
of common code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220415004343.2203171-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Don't re-acquire SRCU in complete_emulated_io() now that KVM acquires the
lock in kvm_arch_vcpu_ioctl_run(). More importantly, don't overwrite
vcpu->srcu_idx. If the index acquired by complete_emulated_io() differs
from the one acquired by kvm_arch_vcpu_ioctl_run(), KVM will effectively
leak a lock and hang if/when synchronize_srcu() is invoked for the
relevant grace period.
Fixes: 8d25b7beca ("KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220415004343.2203171-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Remove 's' & 'u' as valid ISA extension
- Restrict the extensions that can be disabled
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmJhdVEACgkQrUjsVaLH
LAdpZQ/+JKICZ9bimd25lNZSDecvVpMApg6FYmNyYFoMT917+VaNIQIQJt54NZk5
9yeP8eBdoTLEzj48Ib0EYHmBCXEve2Tzj8pQ9FiP34YWCHK+zxytagGoRcSaqsXH
VObLiezwzNknBg6on8gzTh7woxW6EE+KQ7N+gxarVizFBUG4dicSGfDL8fXr7W2g
/mJjs1zJOGD02rz9X/02PHMWXFTL1QEQH5HZHCi5NV2Xh8nFiJUcgEjx9eaweY0V
56cDRGEj7/P+QI3Ulr6f5Vy897aGNr1JR9XHEMCQ2qkgeOtinRYSp7w008x5FcCE
1PbfZAr51J4B4VbN+sbgQiX64hRyZJ46SPZIrvYp9mJWfY5FALweI51ikzVzkLbW
TWLNXTM8QofzUjtNmGr7by4Fr0/5XYwS2TTZ6ScQEREg/rMLg0blG3F5AnlVFVF9
TUFeNjaAnenjMRE+tfNSF2BF/K4u43gFkVx/FTL2cFPIXrUwuRvveZ1YCls9zVFt
4xM0EpVAsaYVsnzW9qxDyU7Fs0RInTvSmt+2lu0ZxhWdF4P/NzuRogIJ0M8j58d6
BDcHH77yb+tSDjMTo68112WvEWVLYK1upDYh8RBdRitccrcgflibQDjwgLjf97iP
gRXkt5klHSElAnFZ7yFNKKzXd1tF2TPU8QNvu+2z2T4OMefifY4=
=RKx8
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-fixes-5.18-2' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 5.18, take #2
- Remove 's' & 'u' as valid ISA extension
- Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
The "read_bhrb" global symbol is only called under CONFIG_PPC64 of
arch/powerpc/perf/core-book3s.c but it is compiled for both 32 and 64 bit
anyway (and LLVM fails to link this on 32bit).
This fixes it by moving bhrb.o to obj64 targets.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220421025756.571995-1-aik@ozlabs.ru
When scheduling a group of events, there are constraint checks done to
make sure all events can go in a group. Example, one of the criteria is
that events in a group cannot use the same PMC. But platform specific
PMU supports alternative event for some of the event codes. During
perf_event_open(), if any event group doesn't match constraint check
criteria, further lookup is done to find alternative event.
By current design, the array of alternatives events in PMU code is
expected to be sorted by column 0. This is because in
find_alternative() the return criteria is based on event code
comparison. ie. "event < ev_alt[i][0])". This optimisation is there
since find_alternative() can be called multiple times. In power10 PMU
code, the alternative event array is not sorted properly and hence there
is breakage in finding alternative event.
To work with existing logic, fix the alternative event array to be
sorted by column 0 for power10-pmu.c
Results:
In case where an alternative event is not chosen when we could, events
will be multiplexed. ie, time sliced where it could actually run
concurrently.
Example, in power10 PM_INST_CMPL_ALT(0x00002) has alternative event,
PM_INST_CMPL(0x500fa). Without the fix, if a group of events with PMC1
to PMC4 is used along with PM_INST_CMPL_ALT, it will be time sliced
since all programmable PMC's are consumed already. But with the fix,
when it picks alternative event on PMC5, all events will run
concurrently.
Before:
# perf stat -e r00002,r100fc,r200fa,r300fc,r400fc
Performance counter stats for 'system wide':
328668935 r00002 (79.94%)
56501024 r100fc (79.95%)
49564238 r200fa (79.95%)
376 r300fc (80.19%)
660 r400fc (79.97%)
4.039150522 seconds time elapsed
With the fix, since alternative event is chosen to run on PMC6, events
will be run concurrently.
After:
# perf stat -e r00002,r100fc,r200fa,r300fc,r400fc
Performance counter stats for 'system wide':
23596607 r00002
4907738 r100fc
2283608 r200fa
135 r300fc
248 r400fc
1.664671390 seconds time elapsed
Fixes: a64e697cef ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220419114828.89843-2-atrajeev@linux.vnet.ibm.com
When scheduling a group of events, there are constraint checks done to
make sure all events can go in a group. Example, one of the criteria is
that events in a group cannot use the same PMC. But platform specific
PMU supports alternative event for some of the event codes. During
perf_event_open(), if any event group doesn't match constraint check
criteria, further lookup is done to find alternative event.
By current design, the array of alternatives events in PMU code is
expected to be sorted by column 0. This is because in
find_alternative() the return criteria is based on event code
comparison. ie. "event < ev_alt[i][0])". This optimisation is there
since find_alternative() can be called multiple times. In power9 PMU
code, the alternative event array is not sorted properly and hence there
is breakage in finding alternative events.
To work with existing logic, fix the alternative event array to be
sorted by column 0 for power9-pmu.c
Results:
With alternative events, multiplexing can be avoided. That is, for
example, in power9 PM_LD_MISS_L1 (0x3e054) has alternative event,
PM_LD_MISS_L1_ALT (0x400f0). This is an identical event which can be
programmed in a different PMC.
Before:
# perf stat -e r3e054,r300fc
Performance counter stats for 'system wide':
1057860 r3e054 (50.21%)
379 r300fc (49.79%)
0.944329741 seconds time elapsed
Since both the events are using PMC3 in this case, they are
multiplexed here.
After:
# perf stat -e r3e054,r300fc
Performance counter stats for 'system wide':
1006948 r3e054
182 r300fc
Fixes: 91e0bd1e62 ("powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220419114828.89843-1-atrajeev@linux.vnet.ibm.com
This reverts commit 461fa7b0ac.
We are missing some inter dependencies here so re-introduce the lock
until we have figured out what's missing. Just drop/retake it while
adding dependencies.
v2: still drop the lock while adding dependencies
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> (v1)
Fixes: 461fa7b0ac ("drm/amdgpu: remove ctx->lock")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220419110633.166236-1-christian.koenig@amd.com
There is a deadlock in rr_close(), which is shown below:
(Thread 1) | (Thread 2)
| rr_open()
rr_close() | add_timer()
spin_lock_irqsave() //(1) | (wait a time)
... | rr_timer()
del_timer_sync() | spin_lock_irqsave() //(2)
(wait timer to stop) | ...
We hold rrpriv->lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need rrpriv->lock in position (2) of thread 2.
As a result, rr_close() will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220417125519.82618-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On HP EliteBook 845 G9 and EliteBook 865 G9, the audio LEDs can be enabled by
ALC285_FIXUP_HP_MUTE_LED. So use it accordingly.
Signed-off-by: Andy Chi <andy.chi@canonical.com>
Fixes: 07bcab9394 ("ALSA: hda/realtek: Add support for HP Laptops")
Link: https://lore.kernel.org/r/20220421063606.39772-1-andy.chi@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit '80253168dbfd ("drm: of: Lookup if child node has panel or
bridge")' attempted to simplify the case of expressing a simple panel
under a DSI controller, by assuming that the first non-graph child node
was a panel or bridge.
Unfortunately for non-trivial cases the first child node might not be a
panel or bridge. Examples of this can be a aux-bus in the case of
DisplayPort, or an opp-table represented before the panel node.
In these cases the reverted commit prevents the caller from ever finding
a reference to the panel.
This reverts commit '80253168dbfd ("drm: of: Lookup if child node has
panel or bridge")', in favor of using an explicit graph reference to the
panel in the trivial case as well.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220420231230.58499-2-bjorn.andersson@linaro.org
Commit '80253168dbfd ("drm: of: Lookup if child node has panel or
bridge")' introduced the ability to describe a panel under a display
controller without having to use a graph to connect the controller to
its single child panel (or bridge).
The implementation of this would find the first non-graph node and
attempt to acquire the related panel or bridge. This prevents cases
where any other child node, such as a aux bus for a DisplayPort
controller, or an opp-table to find the referenced panel.
Commit '67bae5f28c89 ("drm: of: Properly try all possible cases for
bridge/panel detection")' attempted to solve this problem by not
bypassing the graph reference lookup before attempting to find the panel
or bridge.
While this does solve the case where a proper graph reference is
present, it does not allow the caller to distinguish between a
yet-to-be-probed panel or bridge and the absence of a reference to a
panel.
One such case is a DisplayPort controller that on some boards have an
explicitly described reference to a panel, but on others have a
discoverable DisplayPort display attached (which doesn't need to be
expressed in DeviceTree).
This reverts commit '67bae5f28c89 ("drm: of: Properly try all possible
cases for bridge/panel detection")', as a step towards reverting commit
'80253168dbfd ("drm: of: Lookup if child node has panel or bridge")'.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220420231230.58499-1-bjorn.andersson@linaro.org
If the device is already in a runtime PM enabled state
pm_runtime_get_sync() will return 1.
Also, we need to call pm_runtime_put_noidle() when pm_runtime_get_sync()
fails, so use pm_runtime_resume_and_get() instead. this function
will handle this.
Fixes: 4078f57571 ("drm/vc4: Add DSI driver")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220420135008.2757-1-linmq006@gmail.com
The LoPAPR spec defines a guest visible IOMMU with a variable page size.
Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks
the biggest (16MB). In the case of a passed though PCI device, there is
a hardware IOMMU which does not support all pages sizes from the above -
P8 cannot do 2MB and P9 cannot do 16MB. So for each emulated
16M IOMMU page we may create several smaller mappings ("TCEs") in
the hardware IOMMU.
The code wrongly uses the emulated TCE index instead of hardware TCE
index in error handling. The problem is easier to see on POWER8 with
multi-level TCE tables (when only the first level is preallocated)
as hash mode uses real mode TCE hypercalls handlers.
The kernel starts using indirect tables when VMs get bigger than 128GB
(depends on the max page order).
The very first real mode hcall is going to fail with H_TOO_HARD as
in the real mode we cannot allocate memory for TCEs (we can in the virtual
mode) but on the way out the code attempts to clear hardware TCEs using
emulated TCE indexes which corrupts random kernel memory because
it_offset==1<<59 is subtracted from those indexes and the resulting index
is out of the TCE table bounds.
This fixes kvmppc_clear_tce() to use the correct TCE indexes.
While at it, this fixes TCE cache invalidation which uses emulated TCE
indexes instead of the hardware ones. This went unnoticed as 64bit DMA
is used these days and VMs map all RAM in one go and only then do DMA
and this is when the TCE cache gets populated.
Potentially this could slow down mapping, however normally 16MB
emulated pages are backed by 64K hardware pages so it is one write to
the "TCE Kill" per 256 updates which is not that bad considering the size
of the cache (1024 TCEs or so).
Fixes: ca1fc489cf ("KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220420050840.328223-1-aik@ozlabs.ru
This is a partial revert of commit 0faf20a1ad ("powerpc/64s/interrupt:
Don't enable MSR[EE] in irq handlers unless perf is in use").
Prior to that commit, we always set the decrementer in
timer_interrupt(), to clear the timer interrupt. Otherwise we could end
up continuously taking timer interrupts.
When high res timers are enabled there is no problem seen with leaving
the decrementer untouched in timer_interrupt(), because it will be
programmed via hrtimer_interrupt() -> tick_program_event() ->
clockevents_program_event() -> decrementer_set_next_event().
However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we
see a stall/lockup, because tick_nohz_handler() does not cause a
reprogram of the decrementer, leading to endless timer interrupts.
Example trace:
[ 1.898617][ T7] Freeing initrd memory: 2624K^M
[ 22.680919][ C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M
[ 22.682281][ C1] rcu: 0-....: (25 ticks this GP) idle=073/0/0x1 softirq=10/16 fqs=1050 ^M
[ 22.682851][ C1] (detected by 1, t=2102 jiffies, g=-1179, q=476)^M
[ 22.683649][ C1] Sending NMI from CPU 1 to CPUs 0:^M
[ 22.685252][ C0] NMI backtrace for cpu 0^M
[ 22.685649][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc2-00185-g0faf20a1ad16 #145^M
[ 22.686393][ C0] NIP: c000000000016d64 LR: c000000000f6cca4 CTR: c00000000019c6e0^M
[ 22.686774][ C0] REGS: c000000002833590 TRAP: 0500 Not tainted (5.16.0-rc2-00185-g0faf20a1ad16)^M
[ 22.687222][ C0] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24000222 XER: 00000000^M
[ 22.688297][ C0] CFAR: c00000000000c854 IRQMASK: 0 ^M
...
[ 22.692637][ C0] NIP [c000000000016d64] arch_local_irq_restore+0x174/0x250^M
[ 22.694443][ C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M
[ 22.695762][ C0] Call Trace:^M
[ 22.696050][ C0] [c000000002833830] [c000000000f6cc80] __do_softirq+0xc0/0x3dc (unreliable)^M
[ 22.697377][ C0] [c000000002833920] [c000000000151508] __irq_exit_rcu+0xd8/0x130^M
[ 22.698739][ C0] [c000000002833950] [c000000000151730] irq_exit+0x20/0x40^M
[ 22.699938][ C0] [c000000002833970] [c000000000027f40] timer_interrupt+0x270/0x460^M
[ 22.701119][ C0] [c0000000028339d0] [c0000000000099a8] decrementer_common_virt+0x208/0x210^M
Possibly this should be fixed in the lowres timing code, but that would
be a generic change and could take some time and may not backport
easily, so for now make the programming of the decrementer unconditional
again in timer_interrupt() to avoid the stall/lockup.
Fixes: 0faf20a1ad ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20220420141657.771442-1-mpe@ellerman.id.au
because the copychunk_write might cover a region of the file that has not yet
been sent to the server and thus fail.
A simple way to reproduce this is:
truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile
the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with
the 'fcollapse 16k 24k' due to write-back caching.
fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call
and it will fail serverside since the file is still 0b in size serverside
until the writes have been destaged.
To avoid this we must ensure that we destage any unwritten data to the
server before calling COPYCHUNK_WRITE.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
TCP_Server_Info::origin_fullpath and TCP_Server_Info::leaf_fullpath
are protected by refpath_lock mutex and not cifs_tcp_ses_lock
spinlock.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
v2: Add the last part of the ref count fix which was spotted by
Philipp Sieweck where the ref count of cpu writers is off due to
ERESTARTSYS or EBUSY during bo waits.
The initial GEM port broke refcounting on shareable (prime) surfaces and
memory evictions. The prime surfaces broke because the parent surfaces
weren't increasing the ref count on GEM surfaces, which meant that
the memory backing textures could have been deleted while the texture
was still accessible. The evictions broke due to a typo, the code was
supposed to exit if the passed buffers were not vmw_buffer_object
not if they were. They're tied because the evictions depend on having
memory to actually evict.
This fixes crashes with XA state tracker which is used for xrender
acceleration on xf86-video-vmware, apps/tests which use a lot of
memory (a good test being the piglit's streaming-texture-leak) and
desktops.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Fixes: 8afa13a058 ("drm/vmwgfx: Implement DRIVER_GEM")
Reported-by: Philipp Sieweck <psi@informatik.uni-kiel.de>
Cc: <stable@vger.kernel.org> # v5.17+
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220420040328.1007409-1-zack@kde.org
kmap() is being deprecated and these usages are all local to the thread
so there is no reason kmap_local_page() can't be used.
Replace kmap() calls with kmap_local_page().
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220419234328.10346-1-eng.alaamohamedsoliman.am@gmail.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
- fix patching CPU selection in patch_text
- fix potential deadlock in ISS platform serial driver
- fix potential register clobbering in coprocessor exception handler
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAmJbgwQTHGpjbXZia2Jj
QGdtYWlsLmNvbQAKCRBR+cyR+D+gRPYwD/9/aVrsMKPkJsxUQDI/A2fX4ZTcM5fj
cnBsKvKGc+CpKwtWDK5YGAlLLGskBicRSUXqqiNNYapUUx6R9nwyRzEz2srKUa+P
V/4gUnKCCVUyBiTMf1gYa9srO+KdrgY2460uDZWCvPsA1PTLotrQQQcZR4lkIKeT
RvHjflEAY5f1l0A96WoH4F3xSNqrDdDInB4QpqwmZ2oWP4qjO9AA1RY25HFsvWVz
xGErCrPKlJeHR6N5q1N0IzJYJdJeT2lKIMi5kwepLurNnn60+8KL9eQwFFd546fz
EHuzcThg8c1q/jOhaOabkp0HWtWxv5dZiWzKmYnGXnrtjZwDQFk4PFdrHqYyY6OU
LKsyV/bQDsDmzxT+oU1ylWC/buUsY3ulI30yfESWGXZzPw3iQDimO9O0jEJJ//W9
FNl6hU+KvVcr5bedYV6UIMrQeAXhGpUzpJL9qebuJmR/66sjiwZJGIYVB5RViF9o
eVutlfkDByQsKY4ATJZyaj+8/p6Px2KNaOdMI4rzphrzpsQGAgUg7bNEgDt+dWMr
vFPBcj+Ad8u1CQbBn6hmdovLutFdT1mmO/ePWepslcavhRQf4AnN7KMY7A8LIoiF
dgegaICNrNA3XRAHp1UkBUQoGgqn++xvVYRZ8biHgjnBF8sVfythooc3tjHT+C6V
RSyz+S+8Yiuh0w==
=zDe5
-----END PGP SIGNATURE-----
Merge tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:
- fix patching CPU selection in patch_text
- fix potential deadlock in ISS platform serial driver
- fix potential register clobbering in coprocessor exception handler
* tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix a7 clobbering in coprocessor context load/store
arch: xtensa: platforms: Fix deadlock in rs_close()
xtensa: patch_text: Fixup last cpu should be master
- Fix use-after-free of the on-stack z_erofs_decompressqueue;
- Fix sysfs documentation Sphinx warnings.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYl2A0REceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBEe7AQCh7aNhYEncBnoHvrB276HCP0xmMwmc0gPq
6UeSWgarpAEArgfgLyo8tFgS+gvXbhB8/P1FqEMwaRZVLWathXID3QU=
=GFAS
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"One patch to fix a use-after-free race related to the on-stack
z_erofs_decompressqueue, which happens very rarely but needs to be
fixed properly soon.
The other patch fixes some sysfs Sphinx warnings"
* tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors
erofs: fix use-after-free of on-stack io[]
This reverts commit 5a519c8fe4.
It turns out that making the pipe almost arbitrarily large has some
rather unexpected downsides. The kernel test robot reports a kernel
warning that is due to pipe->max_usage now growing to the point where
the iter_file_splice_write() buffer allocation can no longer be
satisfied as a slab allocation, and the
int nbufs = pipe->max_usage;
struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
GFP_KERNEL);
code sequence there will now always fail as a result.
That code could be modified to use kvcalloc() too, but I feel very
uncomfortable making those kinds of changes for a very niche use case
that really should have other options than make these kinds of
fundamental changes to pipe behavior.
Maybe the CRIU process dumping should be multi-threaded, and use
multiple pipes and multiple cores, rather than try to use one larger
pipe to minimize splice() calls.
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The first "if" condition in __memcpy_flushcache is supposed to align the
"dest" variable to 8 bytes and copy data up to this alignment. However,
this condition may misbehave if "size" is greater than 4GiB.
The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both
arguments to unsigned int and selects the smaller one. However, the
cast truncates high bits in "size" and it results in misbehavior.
For example:
suppose that size == 0x100000001, dest == 0x200000002
min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1;
...
dest += 0x1;
so we copy just one byte "and" dest remains unaligned.
This patch fixes the bug by replacing unsigned with size_t.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The test verifies that packets are correctly flooded by the bridge and
the VXLAN device by matching on the encapsulated packets at the other
end. However, if packets other than those generated by the test also
ingress the bridge (e.g., MLD packets), they will be flooded as well and
interfere with the expected count.
Make the test more robust by making sure that only the packets generated
by the test can ingress the bridge. Drop all the rest using tc filters
on the egress of 'br0' and 'h1'.
In the software data path, the problem can be solved by matching on the
inner destination MAC or dropping unwanted packets at the egress of the
VXLAN device, but this is not currently supported by mlxsw.
Fixes: d01724dd2a ("selftests: mlxsw: spectrum-2: Add a test for VxLAN flooding with IPv6")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test verifies that packets are correctly flooded by the bridge and
the VXLAN device by matching on the encapsulated packets at the other
end. However, if packets other than those generated by the test also
ingress the bridge (e.g., MLD packets), they will be flooded as well and
interfere with the expected count.
Make the test more robust by making sure that only the packets generated
by the test can ingress the bridge. Drop all the rest using tc filters
on the egress of 'br0' and 'h1'.
In the software data path, the problem can be solved by matching on the
inner destination MAC or dropping unwanted packets at the egress of the
VXLAN device, but this is not currently supported by mlxsw.
Fixes: 94d302deae ("selftests: mlxsw: Add a test for VxLAN flooding")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a rawmidi output stream is closed, it calls the drain at first,
then does trigger-off only when the drain returns -ERESTARTSYS as a
fallback. It implies that each driver should turn off the stream
properly after the drain. Meanwhile, USB-audio MIDI interface didn't
change the port->active flag after the drain. This may leave the
output work picking up the port that is closed right now, which
eventually leads to a use-after-free for the already released rawmidi
object.
This patch fixes the bug by properly clearing the port->active flag
after the output drain.
Reported-by: syzbot+70e777a39907d6d5fd0a@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/00000000000011555605dceaff03@google.com
Link: https://lore.kernel.org/r/20220420130247.22062-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add the minItems for interrupts property as well. In the absence of
this, we get warning if interrupts are less than 13
arch/arm64/boot/dts/qcom/qrb5165-rb5.dtb:
dma-controller@800000: interrupts: [[0, 588, 4], [0, 589, 4], [0, 590,
4], [0, 591, 4], [0, 592, 4], [0, 593, 4], [0, 594, 4], [0, 595, 4], [0,
596, 4], [0, 597, 4]] is too short
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220414064235.1182195-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add a Bug section, indicating preferred mailing method for bug reports,
to NFC Subsystem entry.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the device shows up as read-only configuration, skip the clearing of the
state as the context must be preserved for device re-enable after being
disabled.
Fixes: 0dcfe41e9a ("dmanegine: idxd: cleanup all device related bits after disabling device")
Reported-by: Tony Zhu <tony.zhu@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/164971479479.2200566.13980022473526292759.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Block wq_max_transfer_size_store() when the device is configured as
read-only and not configurable.
Fixes: d7aad5550e ("dmaengine: idxd: add support for configurable max wq xfer size")
Reported-by: Bernice Zhang <bernice.zhang@intel.com>
Tested-by: Bernice Zhang <bernice.zhang@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/164971488154.2200913.10706665404118545941.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Block wq_max_batch_size_store() when the device is configured as read-only
and not configurable.
Fixes: e7184b159d ("dmaengine: idxd: add support for configurable max wq batch size")
Reported-by: Bernice Zhang <bernice.zhang@intel.com>
Tested-by: Bernice Zhang <bernice.zhang@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/164971493551.2201159.1942042593642155209.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
When retries is compared to wq->enqcmds_retries each loop of idxd_enqcmds(),
wq->enqcmds_retries can potentially changed by user. Assign the value
of retries to wq->enqcmds_retries during initialization so it is the
original value set when entering the function.
Fixes: 7930d85535 ("dmaengine: idxd: add knob for enqcmds retries")
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/165031760154.3658664.1983547716619266558.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
wq->enqcmds_retries is defined as unsigned int. However, retries on the
stack is defined as int. Change retries to unsigned int to compare the same
type.
Fixes: 7930d85535 ("dmaengine: idxd: add knob for enqcmds retries")
Suggested-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/165031747059.3658198.6035308204505664375.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The init_systime() may be invoked in atomic state. We have observed the
following call trace when running "phc_ctl /dev/ptp0 set" on a Intel
Agilex board.
BUG: sleeping function called from invalid context at drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c:74
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 381, name: phc_ctl
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at:
[<ffff80000892ef78>] stmmac_set_time+0x34/0x8c
CPU: 2 PID: 381 Comm: phc_ctl Not tainted 5.18.0-rc2-next-20220414-yocto-standard+ #567
Hardware name: SoCFPGA Agilex SoCDK (DT)
Call trace:
dump_backtrace.part.0+0xc4/0xd0
show_stack+0x24/0x40
dump_stack_lvl+0x7c/0xa0
dump_stack+0x18/0x34
__might_resched+0x154/0x1c0
__might_sleep+0x58/0x90
init_systime+0x78/0x120
stmmac_set_time+0x64/0x8c
ptp_clock_settime+0x60/0x9c
pc_clock_settime+0x6c/0xc0
__arm64_sys_clock_settime+0x88/0xf0
invoke_syscall+0x5c/0x130
el0_svc_common.constprop.0+0x4c/0x100
do_el0_svc+0x7c/0xa0
el0_svc+0x58/0xcc
el0t_64_sync_handler+0xa4/0x130
el0t_64_sync+0x18c/0x190
So we should use readl_poll_timeout_atomic() here instead of
readl_poll_timeout().
Also adjust the delay time to 10us to fix a "__bad_udelay" build error
reported by "kernel test robot <lkp@intel.com>". I have tested this on
Intel Agilex and NXP S32G boards, there is no delay needed at all.
So the 10us delay should be long enough for most cases.
Fixes: ff8ed73786 ("net: stmmac: use readl_poll_timeout() function in init_systime()")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let's describe this sysctl.
Fixes: 5cbf777cfd ("route: add support for directed broadcast forwarding")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the config isa register allows us to disable all allowed
single letter ISA extensions. It shouldn't be the case as vmm shouldn't
be able to disable base extensions (imac).
These extensions should always be enabled as long as they are enabled
in the host ISA.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Fixes: 92ad82002c ("RISC-V: KVM: Implement
KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls")
If CONFIG_DRM_VC4=y, CONFIG_RASPBERRYPI_FIRMWARE=m, CONFIG_COMPILE_TEST=n,
bulding fails:
drivers/gpu/drm/vc4/vc4_drv.o: In function `vc4_drm_bind':
vc4_drv.c:(.text+0x320): undefined reference to `rpi_firmware_get'
vc4_drv.c:(.text+0x320): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `rpi_firmware_get'
vc4_drv.c:(.text+0x34c): undefined reference to `rpi_firmware_property'
vc4_drv.c:(.text+0x34c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `rpi_firmware_property'
vc4_drv.c:(.text+0x354): undefined reference to `rpi_firmware_put'
vc4_drv.c:(.text+0x354): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `rpi_firmware_put'
Make DRM_VC4 depends on RASPBERRYPI_FIRMWARE || (COMPILE_TEST && !RASPBERRYPI_FIRMWARE) to fix this.
Fixes: c406ad5e4a ("drm/vc4: Notify the firmware when DRM is in charge")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220411024325.3968413-1-zhengbin13@huawei.com
The panel has a prepare call which is before video starts, and an
enable call which is after.
The Toshiba bridge should be configured before video, so move
the relevant power and initialisation calls to prepare.
Fixes: 2f733d6194 ("drm/panel: Add support for the Raspberry Pi 7" Touchscreen.")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220415162513.42190-3-stefan.wahren@i2se.com
If a call to rpi_touchscreen_i2c_write from rpi_touchscreen_probe
fails before mipi_dsi_device_register_full is called, then
in trying to log the error message if uses ts->dsi->dev when
it is still NULL.
Use ts->i2c->dev instead, which is initialised earlier in probe.
Fixes: 2f733d6194 ("drm/panel: Add support for the Raspberry Pi 7" Touchscreen.")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220415162513.42190-2-stefan.wahren@i2se.com
There are no ISA extension defined as 's' & 'u' in RISC-V specifications.
The misa register defines 's' & 'u' bit as Supervisor/User privilege mode
enabled. But it should not appear in the ISA extension in the device tree.
Remove those from the allowed ISA extension for kvm.
Fixes: a33c72faf2 ("RISC-V: KVM: Implement VCPU create, init and
destroy functions")
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
If any of the PSR2 checks after intel_psr2_sel_fetch_config_valid()
fails, enable_psr2_sel_fetch will be kept enabled causing problems
in the functions that only checks for it and not for has_psr2.
So here moving the check that do not depend on enable_psr2_sel_fetch
and for the remaning ones jumping to a section that unset
enable_psr2_sel_fetch in case of failure to support PSR2.
Fixes: 6e43e276b8 ("drm/i915: Initial implementation of PSR2 selective fetch")
Cc: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220414151118.21980-1-jose.souza@intel.com
(cherry picked from commit 554ae8dce1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Add <asm/smp.h> for cpuid_to_hartid_map etc.
This is needed for both SMP and non-SMP builds, but not having it
causes a build error for non-SMP:
drivers/cpuidle/cpuidle-riscv-sbi.c: In function 'sbi_cpuidle_init_cpu':
drivers/cpuidle/cpuidle-riscv-sbi.c:350:26: error: implicit declaration of function 'cpuid_to_hartid_map' [-Werror=implicit-function-declaration]
Fixes: 6abf32f1d9 ("cpuidle: Add RISC-V SBI CPU idle driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
From the perspective of Intel cstate residency counters,
SAPPHIRERAPIDS_X is the same as ICELAKE_X.
Share the code with it. And update the comments for SAPPHIRERAPIDS_X.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/20220415104520.2737004-1-rui.zhang@intel.com
This problem can be reproduced with CONFIG_PERF_USE_VMALLOC enabled on
both x86_64 and aarch64 arch when using sysdig -B(using ebpf)[1].
sysdig -B works fine after rebuilding the kernel with
CONFIG_PERF_USE_VMALLOC disabled.
I tracked it down to the if condition event->rb->nr_pages != nr_pages
in perf_mmap is true when CONFIG_PERF_USE_VMALLOC is enabled where
event->rb->nr_pages = 1 and nr_pages = 2048 resulting perf_mmap to
return -EINVAL. This is because when CONFIG_PERF_USE_VMALLOC is
enabled, rb->nr_pages is always equal to 1.
Arch with CONFIG_PERF_USE_VMALLOC enabled by default:
arc/arm/csky/mips/sh/sparc/xtensa
Arch with CONFIG_PERF_USE_VMALLOC disabled by default:
x86_64/aarch64/...
Fix this problem by using data_page_nr()
[1] https://github.com/draios/sysdig
Fixes: 906010b213 ("perf_event: Provide vmalloc() based mmap() backing")
Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220209145417.6495-1-xiezhipeng1@huawei.com