io_link_timeout_fn() removes REQ_F_LINK_TIMEOUT from the link head's
flags, it's not atomic and may race with what the head is doing.
If io_link_timeout_fn() doesn't clear the flag, as forced by this patch,
then it may happen that for "req -> link_timeout1 -> link_timeout2",
__io_kill_linked_timeout() would find link_timeout2 and try to cancel
it, so miscounting references. Teach it to ignore such double timeouts
by marking the active one with a new flag in io_prep_linked_timeout().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move INIT_HLIST_NODE(&req->hash_node) into __io_arm_poll_handler(), so
that it doesn't duplicated and common poll code would be responsible for
it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_poll_task_handler() doesn't add clarity, inline it in its only user.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_poll_add_prep() doesn't need to verify ->file because it's already
done in io_init_req().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
ctx->cached_cq_overflow is changed only under completion_lock. Convert
it from atomic_t to just int, and mark all places when it's read without
lock with READ_ONCE, which guarantees atomicity (relaxed ordering).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Inline io_fail_links() and kill extra io_cqring_ev_posted().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't take an identity on personality/creds init only to drop it a few
lines after. Extract a function which prepares req->work but leaves it
without identity.
Note: it's safe to not check REQ_F_WORK_INITIALIZED there because it's
nobody had a chance to init it before io_init_req().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use IO_WQ_WORK_CREDS to figure out if req has creds to be used.
Since recently it should rely only on flags, but not value of
work.creds.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A break is not needed if it is preceded by a return.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201019173846.1021-1-trix@redhat.com
On Tigerlake, we are seeing a repeat of commit d8f5053117 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.
However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:
513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f5053117 ("drm/i915/icl: Forcibly evict stale csb entries")
References: HSDES#22011327657, HSDES#1508287568
Suggested-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
(cherry picked from commit 233c1ae3c8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
A CSB entry is 64b, and it is simpler for us to treat it as an array of
64b entries than as an array of pairs of 32b entries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
(cherry picked from commit f24a44e52f)
(cherry picked from commit 3d4dbe0e0f0d04ebcea917b7279586817da8cf46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Commits
ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
8570978ea0 ("x86/boot/compressed/64: Don't pre-map memory in KASLR code")
set up a new page table in the decompressor stub, but without explicit
mappings for boot_params and the kernel command line, relying on the #PF
handler instead.
This is fragile, as boot_params and the command line mappings are
required for the main kernel. If EARLY_PRINTK and RANDOMIZE_BASE are
disabled, a QEMU/OVMF boot never accesses the command line in the
decompressor stub, and so it never gets mapped. The main kernel accesses
it from the identity mapping if AMD_MEM_ENCRYPT is enabled, and will
crash.
Fix this by adding back the explicit mapping of boot_params and the
command line.
Note: the changes also removed the explicit mapping of the main kernel,
with the result that .bss and .brk may not be in the identity mapping,
but those don't get accessed by the main kernel before it switches to
its own page tables.
[ bp: Pass boot_params with a MOV %rsp... instead of PUSH/POP. Use
block formatting for the comment. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20201016200404.1615994-1-nivedita@alum.mit.edu
perf_event.h has macros that define the field offsets in the
data_src bitmask in perf records. The SNOOPX and REMOTE offsets
were both 37. These are distinct fields, and the bitfield layout
in perf_mem_data_src confirms that SNOOPX should be at offset 38.
Fixes: 52839e653b ("perf tools: Add support for printing new mem_info encodings")
Signed-off-by: Al Grant <al.grant@foss.arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/4ac9f5cc-4388-b34a-9999-418a4099415d@foss.arm.com
We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.
However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)
Fixes: 8ab3a3812a ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
(cherry picked from commit bb65548e3c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.
Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c8 ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
(cherry picked from commit 6ca7217dff)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld noted that on more recent systems (such as the parser for
gen9) we may have objects that are larger than expected by the GEM uAPI
(i.e. greater than u32). These objects would have incorrect implicit
batch lengths, causing the parser to reject them for being incomplete,
or worse.
Based on a patch by Matthew Auld.
Reported-by: Matthew Auld <matthew.auld@intel.com>
Fixes: 435e8fc059 ("drm/i915: Allow parsing of unsized batches")
Testcase: igt/gem_exec_params/larger-than-life-batch
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20201015115954.871-1-chris@chris-wilson.co.uk
(cherry picked from commit 57b2d834bf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Currently we leave the cache_level of the initial fb obj
set to NONE. This means on eLLC machines the first pin_to_display()
will try to switch it to WT which requires a vma unbind+bind.
If that happens during the fbdev initialization rcu does not
seem operational which causes the unbind to get stuck. To
most appearances this looks like a dead machine on boot.
Avoid the unbind by already marking the object cache_level
as WT when creating it. We still do an excplicit ggtt pin
which will rewrite the PTEs anyway, so they will match whatever
cache level we set.
Cc: <stable@vger.kernel.org> # v5.7+
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
(cherry picked from commit d46b60a2e8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In order to avoid functional breakage of mis-programmed applications that
have grown to depend on unused MOCS entries, we are programming
those entries to be equal to fully cached ("L3 + LLC") entry.
These reserved and unspecified entries should not be used as they may be
changed to less performant variants with better coherency in the future
if more entries are needed.
v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for
programming default value, setting I915_MOCS_PTE index of tgl_mocs_table
with desired value.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Mathew Alwin <alwin.mathew@intel.com>
Cc: Mcguire Russell W <russell.w.mcguire@intel.com>
Cc: Spruit Neil R <neil.r.spruit@intel.com>
Cc: Zhou Cheng <cheng.zhou@intel.com>
Cc: Benemelis Mike G <mike.g.benemelis@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com
Cc: stable@vger.kernel.org
(cherry picked from commit 4d8a5cfe3b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In commit 7994672309 ("drm/i915: Assume 100% brightness when not in
DPCD control mode"), we fixed the brightness level when DPCD control was
not active to max brightness. This is as good as we can guess since most
backlights go on full when uncontrolled.
However in doing so we changed the semantics of the initial
'backlight.enabled' value. At least on Pixelbooks, they were relying
on the brightness level in DP_EDP_BACKLIGHT_BRIGHTNESS_MSB to be 0 on
boot such that enabled would be false. This causes the device to be
enabled when the brightness is set. Without this, brightness control
doesn't work. So by changing brightness to max, we also flipped enabled
to be true on boot.
To fix this, make enabled a function of brightness and backlight control
mechanism.
Fixes: 7994672309 ("drm/i915: Assume 100% brightness when not in DPCD control mode")
Cc: Lyude Paul <lyude@redhat.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Kevin Chowski <chowski@chromium.org>>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200918002845.32766-1-sean@poorly.run
(cherry picked from commit 4ade8f31c2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Fix an inverted flag for intercepting x2APIC MSRs and intercept writes
by default, even when APICV is enabled.
Fixes: 3eb900173c ("KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied")
Co-developed-by: Peter Xu <peterx@redhat.com>
[sean: added changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201005195532.8674-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recently, in commit 6735b4632d ("Fonts: Support FONT_EXTRA_WORDS macros
for built-in fonts"), we wrapped each of our built-in data buffers in a
`font_data` structure, in order to use the following macros on them, see
include/linux/font.h:
#define REFCOUNT(fd) (((int *)(fd))[-1])
#define FNTSIZE(fd) (((int *)(fd))[-2])
#define FNTCHARCNT(fd) (((int *)(fd))[-3])
#define FNTSUM(fd) (((int *)(fd))[-4])
#define FONT_EXTRA_WORDS 4
Do the same thing to our new 6x8 font. For built-in fonts, currently we
only use FNTSIZE(). Since this is only a temporary solution for an
out-of-bounds issue in the framebuffer layer (see commit 5af0864079
("fbcon: Fix global-out-of-bounds read in fbcon_get_font()")), all the
three other fields are intentionally set to zero in order to discourage
using these negative-indexing macros.
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/926453876c92caac34cba8545716a491754d04d5.1603037079.git.yepeilin.cs@gmail.com
We have the raw cached freq to reduce the chance in calling cpufreq
driver where it could be costly in some arch/SoC.
Currently, the raw cached freq is reset in sugov_update_single() when
it avoids frequency reduction (which is not desirable sometimes), but
it is better to restore the previous value of it in that case,
because it may not change in the next cycle and it is not necessary
to change the CPU frequency then.
Adapted from https://android-review.googlesource.com/1352810/
Signed-off-by: Wei Wang <wvw@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject edit and changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mikhail reported a lockdep spat detailing how __zram_bvec_read() and
__zram_bvec_write() use zstrm->lock and zspage->lock in opposite order.
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
acpi-cpufreq has a old quirk that overrides the _PSD table supplied by
BIOS on AMD CPUs. However the _PSD table of new AMD CPUs (Family 19h+)
now accurately reports the P-state dependency of CPU cores. Hence this
quirk needs to be fixed in order to support new CPUs' frequency control.
Fixes: acd3162482 ("acpi-cpufreq: Add quirk to disable _PSD usage on all AMD CPUs")
Signed-off-by: Wei Huang <wei.huang2@amd.com>
[ rjw: Subject edit ]
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
commit 021a24460d ("block: add QUEUE_FLAG_NOWAIT") adds a new helper
function blk_queue_nowait() to check if the bdev supports handling of
REQ_NOWAIT or not. Since then bio-based dm device can also support
REQ_NOWAIT, and currently only dm-linear supports that since
commit 6abc49468e ("dm: add support for REQ_NOWAIT and enable it for
linear target").
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The eventfd context is used as our irqbypass token, therefore if an
eventfd is re-used, our token is the same. The irqbypass code will
return an -EBUSY in this case, but we'll still attempt to unregister
the producer, where if that duplicate token still exists, results in
removing the wrong object. Clear the token of failed producers so
that they harmlessly fall out when unregistered.
Fixes: 6d7425f109 ("vfio: Register/unregister irq_bypass_producer")
Reported-by: guomin chen <guomin_chen@sina.com>
Tested-by: guomin chen <guomin_chen@sina.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The vfio_fsl_mc_reflck_attach function may return, on success path,
an uninitialized variable. Fix the problem by initializing the return
variable to 0.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: f2ba7e8c94 ("vfio/fsl-mc: Added lock support in preparation for interrupt handling")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Type casts needed on 32-bit systems are missing in two places in the
GPE register access code, so add them.
Fixes: 7a8379eb41 ("ACPICA: Add support for using logical addresses of GPE blocks")
Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit c40aaaac10 ("iommu/vt-d: Gracefully handle DMAR units
with no supported address widths") dmar.c needs struct iommu_device to
be selected. We can drop this dependency by not dereferencing struct
iommu_device if IOMMU_API is not selected and by reusing the information
stored in iommu->drhd->ignored instead.
This fixes the following build error when IOMMU_API is not selected:
drivers/iommu/intel/dmar.c: In function ‘free_iommu’:
drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no member named ‘ops’
1139 | if (intel_iommu_enabled && iommu->iommu.ops) {
^
Fixes: c40aaaac10 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20201013073055.11262-1-brgl@bgdev.pl
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Even though we use self removing sysfs helper, we still need
to make sure we do the final kobject delete conditionally.
sysfs_remove_file_self() will handle parallel calls to remove
the sysfs attribute file and returns true only in the caller
that removed the attribute file. The other parallel callers
are returned false. Do the final kobject delete checking
the return value of sysfs_remove_file_self().
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201017164236.264713-1-hegdevasant@linux.vnet.ibm.com
Every dump reported by OPAL is exported to userspace through a sysfs
interface and notified using kobject_uevent(). The userspace daemon
(opal_errd) then reads the dump and acknowledges that the dump is
saved safely to disk. Once acknowledged the kernel removes the
respective sysfs file entry causing respective resources to be
released including kobject.
However it's possible the userspace daemon may already be scanning
dump entries when a new sysfs dump entry is created by the kernel.
User daemon may read this new entry and ack it even before kernel can
notify userspace about it through kobject_uevent() call. If that
happens then we have a potential race between
dump_ack_store->kobject_put() and kobject_uevent which can lead to
use-after-free of a kernfs object resulting in a kernel crash.
This patch fixes this race by protecting the sysfs file
creation/notification by holding a reference count on kobject until we
safely send kobject_uevent().
The function create_dump_obj() returns the dump object which if used
by caller function will end up in use-after-free problem again.
However, the return value of create_dump_obj() function isn't being
used today and there is no need as well. Hence change it to return
void to make this fix complete.
Fixes: c7e64b9ce0 ("powerpc/powernv Platform dump interface")
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201017164210.264619-1-hegdevasant@linux.vnet.ibm.com
On 64-bit, the startup_64_setup_env() function added in
866b556efa ("x86/head/64: Install startup GDT")
has stack protection enabled because of set_bringup_idt_handler().
This happens when CONFIG_STACKPROTECTOR_STRONG is enabled. It
also currently needs CONFIG_AMD_MEM_ENCRYPT enabled because then
set_bringup_idt_handler() is not an empty stub but that might change in
the future, when the other vendor adds their similar technology.
At this point, %gs is not yet initialized, and this doesn't cause a
crash only because the #PF handler from the decompressor stub is still
installed and handles the page fault.
Disable stack protection for the whole file, and do it on 32-bit as
well to avoid surprises.
[ bp: Extend commit message with the exact explanation how it happens. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20201008191623.2881677-6-nivedita@alum.mit.edu
Commit
ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
started using a new set of pagetables even without KASLR.
After that commit, initialize_identity_maps() is called before the
5-level paging variables are setup in choose_random_location(), which
will not work if 5-level paging is actually enabled.
Fix this by moving the initialization of __pgtable_l5_enabled,
pgdir_shift and ptrs_per_p4d into cleanup_trampoline(), which is called
immediately after the finalization of whether the kernel is executing
with 4- or 5-level paging. This will be earlier than anything that might
require those variables, and keeps the 4- vs 5-level paging code all in
one place.
Fixes: ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20201010191110.4060905-1-nivedita@alum.mit.edu
Qian Cai reported a regression where CPU Hotplug fails with the latest
powerpc/next
BUG: sleeping function called from invalid context at mm/slab.h:494
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88
no locks held by swapper/88/0.
irq event stamp: 18074448
hardirqs last enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110
hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0
do_idle at kernel/sched/idle.c:253 (discriminator 1)
softirqs last enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0
softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0
CPU: 88 PID: 0 Comm: swapper/88 Tainted: G W 5.9.0-rc8-next-20201007 #1
Call Trace:
[c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable)
[c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310
[c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190
[c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0
slab_alloc_node at mm/slub.c:2817
(inlined by) __kmalloc_node at mm/slub.c:4013
[c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80
kmalloc_node at include/linux/slab.h:577
(inlined by) alloc_cpumask_var_node at lib/cpumask.c:116
[c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800
update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267
(inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387
(inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420
[c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14
Allocating a temporary mask while performing a CPU Hotplug operation
with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable
function from a atomic context. Fix this by allocating the temporary
mask with GFP_ATOMIC flag. Also instead of having to allocate twice,
allocate the mask in the caller so that we only have to allocate once.
If the allocation fails, assume the mask to be same as sibling mask, which
will make the scheduler to drop this domain for this CPU.
Fixes: 70a94089d7 ("powerpc/smp: Optimize update_coregroup_mask")
Fixes: 3ab33d6dc3 ("powerpc/smp: Optimize update_mask_by_l2")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201019042716.106234-3-srikar@linux.vnet.ibm.com
Commit 3ab33d6dc3 ("powerpc/smp: Optimize update_mask_by_l2")
introduced submask_fn in update_mask_by_l2 to track the right submask.
However commit f6606cfdfb ("powerpc/smp: Dont assume l2-cache to be
superset of sibling") introduced sibling_mask in update_mask_by_l2 to
track the same submask. Remove sibling_mask in favour of submask_fn.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201019042716.106234-2-srikar@linux.vnet.ibm.com
This Kunit update for Linux 5.10-rc1 consists of:
- add Kunit to kernel_init() and remove KUnit from init calls entirely.
This addresses the concern Kunit would not work correctly during
late init phase.
- add a linker section where KUnit can put references to its test suites.
This patch is the first step in transitioning to dispatching all KUnit
tests from a centralized executor rather than having each as its own
separate late_initcall.
- add a centralized executor to dispatch tests rather than relying on
late_initcall to schedule each test suite separately. Centralized
execution is for built-in tests only; modules will execute tests when
loaded.
- convert bitfield test to use KUnit framework
- Documentation updates for naming guidelines and how kunit_test_suite()
works.
- add test plan to KUnit TAP format
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl+Mr68ACgkQCwJExA0N
Qxy7HxAAuToPP6uUHwTC3KzVVE4hjP9a3t4hiD7kP/gI0umN+2nrccm6Vx6E+r9t
Jkjiv9Yxj3riOkE5jJ8KriAx228mwz3N1yBEDfpp+8iCWOK3iOuFKKTTWOoZY4hf
Enlf7n4Yp2TOEmIH0xwh/H67zl0+3FwT3fGWC6DDPXHuw+X+mGphCl9XPB70rZcT
q/s0dwx1CmWBm30MgFXN+SZ7CgLP13lRAvkVO4t56/O1SkTbpCe7U1zqT2p5UoOY
x7qvzs3pdCaWbpCsAqFWr46iECDHuVQjIgLuddOF/OgWVcCZlv7T7ESd7IDPHUPx
DD3zYG0ODV0jKZHmpwkSojSbu3z6v5FnfhLpAcaHoEMBeRu5UIar7EjPHwqrqiU7
JqE7dBECmcD308sr9u0w44DK15nmsD3+njrBQ/AJmsWdg0wtnMvA01nAHKObbk0n
33aIu4Iny1dH35/rt9dV2DKT09f5r0ANCjoJMX8gu/li66FHGfULOaqr6KLLqi5X
VPgHCKzyT9nD+Bc2LYzRWmhhAj+5Iwyglgpe9ZiOlPQ5i+hLvfPPAZxVYSbVA1Sk
aVZi+ibKUqHSBfXcaLf/OKX7Csf4zni3F+WfFT5ZIC4Y6iEF+0tvS2HW2/pcUAN/
OSPYYmyqhwYIl8tvbQENgBsyU/K1rECxJpqWAznJLRCebkY5a/s=
=0Sco
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more Kunit updates from Shuah Khan:
- add Kunit to kernel_init() and remove KUnit from init calls entirely.
This addresses the concern that Kunit would not work correctly during
late init phase.
- add a linker section where KUnit can put references to its test
suites.
This is the first step in transitioning to dispatching all KUnit
tests from a centralized executor rather than having each as its own
separate late_initcall.
- add a centralized executor to dispatch tests rather than relying on
late_initcall to schedule each test suite separately. Centralized
execution is for built-in tests only; modules will execute tests when
loaded.
- convert bitfield test to use KUnit framework
- Documentation updates for naming guidelines and how
kunit_test_suite() works.
- add test plan to KUnit TAP format
* tag 'linux-kselftest-kunit-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
lib: kunit: Fix compilation test when using TEST_BIT_FIELD_COMPILE
lib: kunit: add bitfield test conversion to KUnit
Documentation: kunit: add a brief blurb about kunit_test_suite
kunit: test: add test plan to KUnit TAP format
init: main: add KUnit to kernel init
kunit: test: create a single centralized executor for all tests
vmlinux.lds.h: add linker section for KUnit test suites
Documentation: kunit: Add naming guidelines
Pull RCU changes from Ingo Molnar:
- Debugging for smp_call_function()
- RT raw/non-raw lock ordering fixes
- Strict grace periods for KASAN
- New smp_call_function() torture test
- Torture-test updates
- Documentation updates
- Miscellaneous fixes
[ This doesn't actually pull the tag - I've dropped the last merge from
the RCU branch due to questions about the series. - Linus ]
* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
smp: Make symbol 'csd_bug_count' static
kernel/smp: Provide CSD lock timeout diagnostics
smp: Add source and destination CPUs to __call_single_data
rcu: Shrink each possible cpu krcp
rcu/segcblist: Prevent useless GP start if no CBs to accelerate
torture: Add gdb support
rcutorture: Allow pointer leaks to test diagnostic code
rcutorture: Hoist OOM registry up one level
refperf: Avoid null pointer dereference when buf fails to allocate
rcutorture: Properly synchronize with OOM notifier
rcutorture: Properly set rcu_fwds for OOM handling
torture: Add kvm.sh --help and update help message
rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
torture: Update initrd documentation
rcutorture: Replace HTTP links with HTTPS ones
locktorture: Make function torture_percpu_rwsem_init() static
torture: document --allcpus argument added to the kvm.sh script
rcutorture: Output number of elapsed grace periods
rcutorture: Remove KCSAN stubs
rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
...
conversion of dt-bindings to json-schema
- mediatek: fix platform_get_irq error handling
- bcm: convert tasklets to use new tasklet_setup api
- core: fix race cause by hrtimer starting inappropriately
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6EwehDt/SOnwFyTyf9lkf8eYP5UFAl+KOnQACgkQf9lkf8eY
P5VgEQ/+I6YNUGGotegZsloTcHwfMwpuAuMMOfy+yZpe3iQPWcR4ZIIXqnieAKSG
Z+y8gWurIq5P46JKAnoCt+TyyllGZVMo+/ioaXCGutx085OEDKMPj6v5XKrPNRQ/
5CqOSfLiD8Zpr6BYjEQgMgxA/qr2shDRNsbiXtk3u3ObgCaCKnXSLbto6TOnoI2n
vB7H1HTjj7Yr24Tkxnebc1KdjhJX2VJYTLftCeRODT/3CeUk4uJDao20jf6TxAZw
ZR7j2kMcbEHQ3tqsSIy3xoukbGK2CktHeQJGqetGR1CK6TQsg7sNFT0WeF8ZwL1w
kX5rqDg8dneIpsN53K+NZCE+uUGYHDV8uKNQN80ZhzUkD1gnk680TPiSDFz+A0T3
4adwxEFJqRyn5ar3rGPFRT4V8wgsNVoGZNE0ExFB1C87HLOHedsln5AH7HXcq7yo
vICLuV9F6ETrHvb2mRcq/2RaG7c7njMhJ0bopoG8epzMKyosppwvaCuNJgC38T8D
nS0zpv0mGG9Ti5FODFyv4vHAKmBjGLBpVNGbaVGQO2RziOYsOHJ5xq+jHzRbdfkd
UGmozyHPYfVzC2wLCW/oLifdZ6+cisRT3jSMoYvxRx2UxGbKvVHOveVhspWXf66v
HTNOOFOASsvHODj8dt80LWLYSI1fTTkOZH+xkWipphISRKOtMkg=
=OcQc
-----END PGP SIGNATURE-----
Merge tag 'mailbox-v5.10' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
- arm: implementation of mhu as a doorbell driver and conversion of
dt-bindings to json-schema
- mediatek: fix platform_get_irq error handling
- bcm: convert tasklets to use new tasklet_setup api
- core: fix race cause by hrtimer starting inappropriately
* tag 'mailbox-v5.10' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: avoid timer start from callback
maiblox: mediatek: Fix handling of platform_get_irq() error
mailbox: arm_mhu: Add ARM MHU doorbell driver
mailbox: arm_mhu: Match only if compatible is "arm,mhu"
dt-bindings: mailbox: add doorbell support to ARM MHU
dt-bindings: mailbox : arm,mhu: Convert to Json-schema
mailbox: bcm: convert tasklets to use new tasklet_setup() API
Ian reports that after upgrade from v5.8.14 to v5.9 only one
of his 4 ixgbe netdevs appear in the system.
Quoting the comment on ixgbe_x550em_a_has_mii():
* Returns true if hw points to lowest numbered PCI B:D.F x550_em_a device in
* the SoC. There are up to 4 MACs sharing a single MDIO bus on the x550em_a,
* but we only want to register one MDIO bus.
This matches the symptoms, since the return value from
ixgbe_mii_bus_init() is no longer ignored we need to handle
the higher ports of x550em without an error.
Fixes: 09ef193fef ("net: ethernet: ixgbe: check the return value of ixgbe_mii_bus_init()")
Reported-by: Ian Kumlien <ian.kumlien@gmail.com>
Tested-by: Ian Kumlien <ian.kumlien@gmail.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20201016232006.3352947-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In set_ethernet_addr(), if get_registers() succeeds, the ethernet address
that was read must be copied over. Otherwise, a random ethernet address
must be assigned.
get_registers() returns 0 if successful, and negative error number
otherwise. However, in set_ethernet_addr(), this return value is
incorrectly checked.
Since this return value will never be equal to sizeof(node_id), a
random MAC address will always be generated and assigned to the
device; even in cases when get_registers() is successful.
Correctly modifying the condition that checks if get_registers() was
successful or not fixes this problem, and copies the ethernet address
appropriately.
Fixes: b2a0f274e3 ("net: rtl8150: Use the new usb control message API.")
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Link: https://lore.kernel.org/r/20201011173030.141582-1-anant.thazhemadam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When 'rp_filter' is configured in strict mode (1) the tests fail because
packets received from the macvlan netdevs would not be forwarded through
them on the reverse path.
Fix this by disabling the 'rp_filter', meaning no source validation is
performed.
Fixes: 1538812e08 ("selftests: forwarding: Add a test for VXLAN asymmetric routing")
Fixes: 438a4f5665 ("selftests: forwarding: Add a test for VXLAN symmetric routing")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20201015084525.135121-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>