The size of kasan_early_shadow_pte[] now is PTRS_PER_PTE which defined
to 512 for arm. This means that it only covers the prev Linux pte
entries, but not the HWTABLE pte entries for arm.
The reason it currently works is that the symbol kasan_early_shadow_page
immediately following kasan_early_shadow_pte in memory is page aligned,
which makes kasan_early_shadow_pte look like a 4KB size array. But we
can't ensure the order is always right with different compiler/linker,
or if more bss symbols are introduced.
We had a test with QEMU + vexpress:put a 512KB-size symbol with
attribute __section(".bss..page_aligned") after kasan_early_shadow_pte,
and poisoned it after kasan_early_init(). Then enabled CONFIG_KASAN, it
failed to boot up.
Link: https://lkml.kernel.org/r/20210109044622.8312-1-hailongliiu@yeah.net
Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: Ziliang Guo <guo.ziliang@zte.com.cn>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boot a CONFIG_MEMCG=y kernel with "cgroup_disabled=memory" and you are
met by a series of warnings from the VM_WARN_ON_ONCE_PAGE(!memcg, page)
recently added to the inline mem_cgroup_page_lruvec().
An earlier attempt to place that warning, in mem_cgroup_lruvec(), had
been careful to do so after weeding out the mem_cgroup_disabled() case;
but was itself invalid because of the mem_cgroup_lruvec(NULL, pgdat) in
clear_pgdat_congested() and age_active_anon().
Warning in mem_cgroup_page_lruvec() was once useful in detecting a KSM
charge bug, so may be worth keeping: but skip if mem_cgroup_disabled().
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101032056260.1093@eggly.anvils
Fixes: 9a1ac2288c ("mm/memcontrol:rewrite mem_cgroup_page_lruvec()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hui Su <sh_def@163.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
acquire_slab() fails if there is contention on the freelist of the page
(probably because some other CPU is concurrently freeing an object from
the page). In that case, it might make sense to look for a different page
(since there might be more remote frees to the page from other CPUs, and
we don't want contention on struct page).
However, the current code accidentally stops looking at the partial list
completely in that case. Especially on kernels without CONFIG_NUMA set,
this means that get_partial() fails and new_slab_objects() falls back to
new_slab(), allocating new pages. This could lead to an unnecessary
increase in memory fragmentation.
Link: https://lkml.kernel.org/r/20201228130853.1871516-1-jannh@google.com
Fixes: 7ced371971 ("slub: Acquire_slab() avoid loop")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 826f328e2b ("net: dcb: Validate netlink message in DCB
handler"), Linux started rejecting RTM_GETDCB netlink messages if they
contained a set-like DCB_CMD_ command.
The reason was that privileges were only verified for RTM_SETDCB messages,
but the value that determined the action to be taken is the command, not
the message type. And validation of message type against the DCB command
was the obvious missing piece.
Unfortunately it turns out that mlnx_qos, a somewhat widely deployed tool
for configuration of DCB, accesses the DCB set-like APIs through
RTM_GETDCB.
Therefore do not bounce the discrepancy between message type and command.
Instead, in addition to validating privileges based on the actual message
type, validate them also based on the expected message type. This closes
the loophole of allowing DCB configuration on non-admin accounts, while
maintaining backward compatibility.
Fixes: 2f90b8657e ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver")
Fixes: 826f328e2b ("net: dcb: Validate netlink message in DCB handler")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/a3edcfda0825f2aa2591801c5232f2bbf2d8a554.1610384801.git.me@pmachata.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Advance the maximum number of arguments from 9 to 15 to account for
all potential feature flags that may be supplied.
Linux 4.19 added "meta_device"
(356d9d52e1) and "recalculate"
(a3fcf72531) flags.
Commit 468dfca38b added
"sectors_per_bit" and "bitmap_flush_interval".
Commit 84597a44a9 added
"allow_discards".
And the commit d537858ac8 added
"fix_padding".
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull irqchip fixes from Marc Zyngier:
- Fix the MIPS CPU interrupt controller hierarchy
- Simplify the PRUSS Kconfig entry
- Eliminate trivial build warnings on the MIPS Loongson liointc
- Fix error path in devm_platform_get_irqs_affinity()
- Turn the BCM2836 IPI irq_eoi callback into irq_ack
- Fix initialisation of on-stack msi_alloc_info
- Cleanup spurious comma in irq-sl28cpld
Link: https://lore.kernel.org/r/20210110110001.2328708-1-maz@kernel.org
Due to an integer overflow, RTC synchronization now happens every 2s
instead of the intended 11 minutes. Fix this by forcing 64-bit
arithmetic for the sync period calculation.
Annotate the other place which multiplies seconds for consistency as well.
Fixes: c9e6189fb0 ("ntp: Make the RTC synchronization more reliable")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210111103956.290378-1-geert+renesas@glider.be
Empty BTFs do come up (e.g., simple kernel modules with no new types and
strings, compared to the vmlinux BTF) and there is nothing technically wrong
with them. So remove unnecessary check preventing loading empty BTFs.
Fixes: d812362450 ("libbpf: Fix BTF data layout checks and allow empty BTF")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-2-andrii@kernel.org
Some modules don't declare any new types and end up with an empty BTF,
containing only valid BTF header and no types or strings sections. This
currently causes BTF validation error. There is nothing wrong with such BTF,
so fix the issue by allowing module BTFs with no types or strings.
Fixes: 36e68442d1 ("bpf: Load and verify kernel module BTFs")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-1-andrii@kernel.org
pm_clk_suspend()/pm_clk_resume() are defined as NULL pointers rather than
empty inline stubs without CONFIG_PM:
drivers/clk/mmp/clk-audio.c:402:16: error: called object type 'void *' is not a function or function pointer
pm_clk_suspend(dev);
drivers/clk/mmp/clk-audio.c:411:15: error: called object type 'void *' is not a function or function pointer
pm_clk_resume(dev);
I tried redefining the helper functions, but that caused additional
problems. This is the simple solution of replacing the __maybe_unused
trick with an #ifdef.
Fixes: 725262d291 ("clk: mmp2: Add audio clock controller driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210103135503.3668784-1-arnd@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
optlen == 0 indicates that the kernel should ignore BPF buffer
and use the original one from the user. We, however, forget
to free the temporary buffer that we've allocated for BPF.
Fixes: d8fe449a9c ("bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE")
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210112162829.775079-1-sdf@google.com
A previous patch introduced a harmless randconfig warning:
WARNING: unmet direct dependencies detected for MXC_CLK_SCU
Depends on [n]: COMMON_CLK [=y] && ARCH_MXC [=n] && IMX_SCU [=y] && HAVE_ARM_SMCCC [=y]
Selected by [m]:
- CLK_IMX8QXP [=m] && COMMON_CLK [=y] && (ARCH_MXC [=n] && ARM64 [=y] || COMPILE_TEST [=y]) && IMX_SCU [=y] && HAVE_ARM_SMCCC [=y]
Since the symbol is now hidden and only selected by other symbols,
just remove the dependencies and require the other drivers to
get it right.
Fixes: 6247e31b75 ("clk: imx: scu: fix MXC_CLK_SCU module build break")
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201230155244.981757-1-arnd@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Sometimes, when dm-crypt executes decryption in a tasklet, we may get
"BUG: KASAN: use-after-free in tasklet_action_common.constprop..."
with a kasan-enabled kernel.
When the decryption fully completes in the tasklet, dm-crypt will call
bio_endio(), which in turn will call clone_endio() from dm.c core code. That
function frees the resources associated with the bio, including per bio private
structures. For dm-crypt it will free the current struct dm_crypt_io, which
contains our tasklet object, causing use-after-free, when the tasklet is being
dequeued by the kernel.
To avoid this, do not call bio_endio() from the current tasklet context, but
delay its execution to the dm-crypt IO workqueue.
Fixes: 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: <stable@vger.kernel.org> # v5.9+
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
With the introduction of a dynamic ZONE_DMA range based on DT or IORT
information, there's no need for CMA allocations from the wider
ZONE_DMA32 since on most platforms ZONE_DMA will cover the 32-bit
addressable range. Remove the arm64_dma32_phys_limit and set
arm64_dma_phys_limit to cover the smallest DMA range required on the
platform. CMA allocation and crashkernel reservation now go in the
dynamically sized ZONE_DMA, allowing correct functionality on RPi4.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> # On RPi4B
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
- Fix parsing of link-local IPv6 addresses
- Fix confusing logging of mount errors that was introduced by the
fsopen() patchset.
- Fix a tracing use after free in _nfs4_do_setlk()
- Layout return-on-close fixes when called from nfs4_evict_inode()
- Layout segments were being leaked in
pnfs_generic_clear_request_commit()
- Don't leak DS commits in pnfs_generic_retry_commit()
- Fix an Oopsable use-after-free when nfs_delegation_find_inode_server()
calls iput() on an inode after the super block has gone away"
* tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: nfs_igrab_and_active must first reference the superblock
NFS: nfs_delegation_find_inode_server must first reference the superblock
NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()
NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request
pNFS: Stricter ordering of layoutget and layoutreturn
pNFS: Clean up pnfs_layoutreturn_free_lsegs()
pNFS: We want return-on-close to complete when evicting the inode
pNFS: Mark layout for return if return-on-close was not sent
net: sunrpc: interpret the return value of kstrtou32 correctly
NFS: Adjust fs_context error logging
NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
Pull SCSI target fix from Martin Petersen:
"This addresses an issue in the SCSI target subsystem. A connected
initiator could specify IDs for any configured backing store device,
not just the ones explicitly made visible to the host.
The remedy is to honor the access control list when doing ID
descriptor lookups"
* tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: target: Fix XCOPY NAA identifier lookup
The system that use Synopsys USB host controllers goes to suspend
when using USB audio player. This causes the USB host controller
continuous send interrupt signal to system, When the number of
interrupts exceeds 100000, the system will forcibly close the
interrupts and output a calltrace error.
When the system goes to suspend, the last interrupt is reported to
the driver. At this time, the system has set the state to suspend.
This causes the last interrupt to not be processed by the system and
not clear the interrupt flag. This uncleared interrupt flag constantly
triggers new interrupt event. This causing the driver to receive more
than 100,000 interrupts, which causes the system to forcibly close the
interrupt report and report the calltrace error.
so, when the driver goes to sleep and changes the system state to
suspend, the interrupt flag needs to be cleared.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/1610416647-45774-1-git-send-email-liulongfang@huawei.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mika writes:
thunderbolt: Fix for v5.11-rc4
This includes a single format string fix for the firmware connection
manager USB4 NVM authentication proxy implementation introduced in this
merge window.
* tag 'thunderbolt-for-v5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Drop duplicated 0x prefix from format string
Tony Luck has maintained arch/ia64 for the past 16 years, but mentioned
that he no longer has working ia64 machines, nor time to look at patches,
so he is stepping down as the maintainer.
Fenghua Yu came in as a temporary co-maintainer when Tony was on
sabbatical in 2009, but has not worked on it after that either.
This leaves the architecture as Orphaned, meaning that patches
will have to get routed through other trees from now on.
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Link: https://lore.kernel.org/lkml/20210105153603.GA17644@agluck-desk2.amr.corp.intel.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The definition if xchg() causes a harmless warning in some files, like:
In file included from ../arch/ia64/include/uapi/asm/intrinsics.h:22,
from ../arch/ia64/include/asm/intrinsics.h:11,
from ../arch/ia64/include/asm/bitops.h:19,
from ../include/linux/bitops.h:32,
from ../include/linux/kernel.h:11,
from ../fs/nfs/read.c:12:
../fs/nfs/read.c: In function 'nfs_read_completion':
../arch/ia64/include/uapi/asm/cmpxchg.h:57:2: warning: value computed is not used [-Wunused-value]
57 | ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../fs/nfs/read.c:196:5: note: in expansion of macro 'xchg'
196 | xchg(&nfs_req_openctx(req)->error, error);
| ^~~~
Change it to a compound expression like the other architectures have
to get a clean defconfig build.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
A cleanup patch from my legacy timer series broke ia64 and led
to RCU stall errors and a fast system clock:
[ 909.360108] INFO: task systemd-sysv-ge:200 blocked for more than 127 seconds.
[ 909.360108] Not tainted 5.10.0+ #130
[ 909.360108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 909.360108] task:systemd-sysv-ge state:D stack: 0 pid: 200 ppid: 189 flags:0x00000000
[ 909.364108]
[ 909.364108] Call Trace:
[ 909.364423] [<a00000010109b210>] __schedule+0x890/0x21e0
[ 909.364423] sp=e0000100487d7b70 bsp=e0000100487d1748
[ 909.368423] [<a00000010109cc00>] schedule+0xa0/0x240
[ 909.368423] sp=e0000100487d7b90 bsp=e0000100487d16e0
[ 909.368558] [<a00000010109ce70>] io_schedule+0x70/0xa0
[ 909.368558] sp=e0000100487d7b90 bsp=e0000100487d16c0
[ 909.372290] [<a00000010109e1c0>] bit_wait_io+0x20/0xe0
[ 909.372290] sp=e0000100487d7b90 bsp=e0000100487d1698
[ 909.374168] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 909.376290] [<a00000010109d860>] __wait_on_bit+0xc0/0x1c0
[ 909.376290] sp=e0000100487d7b90 bsp=e0000100487d1648
[ 909.374168] rcu: 3-....: (2 ticks this GP) idle=19e/1/0x4000000000000002 softirq=1581/1581 fqs=2
[ 909.374168] (detected by 0, t=5661 jiffies, g=1089, q=3)
[ 909.376290] [<a00000010109da80>] out_of_line_wait_on_bit+0x120/0x140
[ 909.376290] sp=e0000100487d7b90 bsp=e0000100487d1610
[ 909.374168] Task dump for CPU 3:
[ 909.374168] task:khungtaskd state:R running task
Revert most of my patch to make this work again, including the extra
update_process_times()/profile_tick() and the local_irq_enable() in the
loop that I expected not to be needed here.
I have not found out exactly what goes wrong, and would suggest that
someone with hardware access tries to convert this code into a singleshot
clockevent driver, which should give better behavior in all cases.
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Fixes: 2b49ddcef2 ("ia64: convert to legacy_timer_tick")
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The VT-d hardware will ignore those Addr bits which have been masked by
the AM field in the PASID-based-IOTLB invalidation descriptor. As the
result, if the starting address in the descriptor is not aligned with
the address mask, some IOTLB caches might not invalidate. Hence people
will see below errors.
[ 1093.704661] dmar_fault: 29 callbacks suppressed
[ 1093.704664] DMAR: DRHD: handling fault status reg 3
[ 1093.712738] DMAR: [DMA Read] Request device [7a:02.0] PASID 2
fault addr 7f81c968d000 [fault reason 113]
SM: Present bit in first-level paging entry is clear
Fix this by using aligned address for PASID-based-IOTLB invalidation.
Fixes: 1c4f88b7f1 ("iommu/vt-d: Shared virtual address in scalable mode")
Reported-and-tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-2-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
It was found in [1] that bpf_inode_storage_get helper did not check
the nullness of the passed owner ptr which caused an oops when
dereferenced. This change incorporates the example suggested in [1] into
the local storage selftest.
The test is updated to create a temporary directory instead of just
using a tempfile. In order to replicate the issue this copied rm binary
is renamed tiggering the inode_rename with a null pointer for the
new_inode. The logic to verify the setting and deletion of the inode
local storage of the old inode is also moved to this LSM hook.
The change also removes the copy_rm function and simply shells out
to copy files and recursively delete directories and consolidates the
logic of setting the initial inode storage to the bprm_committed_creds
hook and removes the file_open hook.
[1]: https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com
Suggested-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-2-kpsingh@kernel.org
The verifier allows ARG_PTR_TO_BTF_ID helper arguments to be NULL, so
helper implementations need to check this before dereferencing them.
This was already fixed for the socket storage helpers but not for task
and inode.
The issue can be reproduced by attaching an LSM program to
inode_rename hook (called when moving files) which tries to get the
inode of the new file without checking for its nullness and then trying
to move an existing file to a new path:
mv existing_file new_file_does_not_exist
The report including the sample program and the steps for reproducing
the bug:
https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com
Fixes: 4cf1bc1f10 ("bpf: Implement task local storage")
Fixes: 8ea636848a ("bpf: Implement bpf_local_storage for inodes")
Reported-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-3-kpsingh@kernel.org