When kdump is not configured, a Hyper-V VM might still respond to
network traffic after a kernel panic when kernel parameter panic=0.
The panic CPU goes into an infinite loop with interrupts enabled,
and the VMbus driver interrupt handler still works because the
VMbus connection is unloaded only in the kdump path. The network
responses make the other end of the connection think the VM is
still functional even though it has panic'ed, which could affect any
failover actions that should be taken.
Fix this by unloading the VMbus connection during the panic process.
vmbus_initiate_unload() could then be called twice (e.g., by
hyperv_panic_event() and hv_crash_handler(), so reset the connection
state in vmbus_initiate_unload() to ensure the unload is done only
once.
Fixes: 81b18bce48 ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20200406155331.2105-2-Tianyu.Lan@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
After changing the timing between GTT updates and execution on the GPU,
we started seeing sporadic failures on Ironlake. These were narrowed
down to being an insufficiently strong enough barrier/delay after
updating the GTT and scheduling execution on the GPU. By forcing the
uncached read, and adding the missing barrier for the singular
insert_page (relocation paths), the sporadic failures go away.
Fixes: 983d308cb8 ("agp/intel: Serialise after GTT updates")
Fixes: 3497971a71 ("agp/intel: Flush chipset writes after updating a single PTE")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Cc: stable@vger.kernel.org # v4.0+
Link: https://patchwork.freedesktop.org/patch/msgid/20200410083535.25464-1-chris@chris-wilson.co.uk
We can't reliably wait in io_ring_ctx_wait_and_kill(), since the
task_works list isn't ordered (in fact it's LIFO ordered). We could
either fix this with a separate task_works list for io_uring work, or
just punt the wait-and-free to async context. This ensures that
task_work that comes in while we're shutting down is processed
correctly. If we don't go async, we could have work past the fput()
work for the ring that depends on work that won't be executed until
after we're done with the wait-and-free. But as this operation is
blocking, it'll never get a chance to run.
This was reproduced with hundreds of thousands of sockets running
memcached, haven't been able to reproduce this synthetically.
Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Daniel Borkmann says:
====================
pull-request: bpf 2020-04-10
The following pull-request contains BPF updates for your *net* tree.
We've added 13 non-merge commits during the last 7 day(s) which contain
a total of 13 files changed, 137 insertions(+), 43 deletions(-).
The main changes are:
1) JIT code emission fixes for riscv and arm32, from Luke Nelson and Xi Wang.
2) Disable vmlinux BTF info if GCC_PLUGIN_RANDSTRUCT is used, from Slava Bacherikov.
3) Fix oob write in AF_XDP when meta data is used, from Li RongQing.
4) Fix bpf_get_link_xdp_id() handling on single prog when flags are specified,
from Andrey Ignatov.
5) Fix sk_assign() BPF helper for request sockets that can have sk_reuseport
field uninitialized, from Joe Stringer.
6) Fix mprotect() test case for the BPF LSM, from KP Singh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This sequence is recommended by PMFW team for the baco reset
with PMFW reloaded. And it seems able to address the random
failure seen on Arcturus.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When "ubifs: introduce UBIFS_ATIME_SUPPORT to ubifs" introduced atime
support to ubifs, it also added lazytime support. As far as I can tell
the lazytime support is terminally broken, as it causes
mark_inode_dirty_sync to be called from __writeback_single_inode, which
will then trigger the locking assert in ubifs_dirty_inode. Just remove
the broken lazytime support for now, it can be added back later,
especially as some infrastructure changes should make that easier soon.
Fixes: 8c1c5f2638 ("ubifs: introduce UBIFS_ATIME_SUPPORT to ubifs")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Commit a0900d0195 ("mtd: spi-nor: Prepare core / manufacturer code
split") moved various files into a new directory, but did not add the new
directory to its parent directory Makefile. The moved files no longer
build, and affected flash chips no longer instantiate.
Adding the new directory to the parent directory Makefile fixes the
problem.
Fixes: a0900d0195 ("mtd: spi-nor: Prepare core / manufacturer code split")
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull module updates from Jessica Yu:
"Only a small cleanup this time around: a trivial conversion of
zero-length arrays to flexible arrays"
* tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
kernel: module: Replace zero-length array with flexible-array member
`removed_sized` isn't correctly initialized (as the doc comment
suggests) on memory allocation failures. Fix by moving initialization up
a bit.
Fixes: 0c47383ba3 ("kernfs: Add option to enable user xattrs")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Tejun Heo <tj@kernel.org>
This reverts commit 9a9e97b2f1 ("cgroup: Add memory barriers to plug
cgroup_rstat_updated() race window").
The commit was added in anticipation of memcg rstat conversion which needed
synchronous accounting for the event counters (e.g. oom kill count). However,
the conversion didn't get merged due to percpu memory overhead concern which
couldn't be addressed at the time.
Unfortunately, the patch's addition of smp_mb() to cgroup_rstat_updated()
meant that every scheduling event now had to go through an additional full
barrier and Mel Gorman noticed it as 1% regression in netperf UDP_STREAM test.
There's no need to have this barrier in tree now and even if we need
synchronous accounting in the future, the right thing to do is separating that
out to a separate function so that hot paths which don't care about
synchronous behavior don't have to pay the overhead of the full barrier. Let's
revert.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Link: http://lkml.kernel.org/r/20200409154413.GK3818@techsingularity.net
Cc: v4.18+
The noisy posix error message in readdir was supposed
to be an FYI (not enabled by default)
CIFS VFS: XXX dev 66306, reparse 0, mode 755
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Pull arm64 fixes from Catalin Marinas:
- Ensure that the compiler and linker versions are aligned so that ld
doesn't complain about not understanding a .note.gnu.property section
(emitted when pointer authentication is enabled).
- Force -mbranch-protection=none when the feature is not enabled, in
case a compiler may choose a different default value.
- Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and
rarely enabled.
- Fix checking 16-bit Thumb-2 instructions checking mask in the
emulation of the SETEND instruction (it could match the bottom half
of a 32-bit Thumb-2 instruction).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature
arm64: Always force a branch protection mode when the compiler has one
arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch
init/kconfig: Add LD_VERSION Kconfig
Pull more powerpc updates from Michael Ellerman:
"The bulk of this is the series to make CONFIG_COMPAT user-selectable,
it's been around for a long time but was blocked behind the
syscall-in-C series.
Plus there's also a few fixes and other minor things.
Summary:
- A fix for a crash in machine check handling on pseries (ie. guests)
- A small series to make it possible to disable CONFIG_COMPAT, and
turn it off by default for ppc64le where it's not used.
- A few other miscellaneous fixes and small improvements.
Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann,
Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven,
Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek,
Nicholas Piggin, Stephen Boyd, Wen Xiong"
* tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Always build the tm-poison test 64-bit
powerpc: Improve ppc_save_regs()
Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
powerpc/perf: split callchain.c by bitness
powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
powerpc/64: make buildable without CONFIG_COMPAT
powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
powerpc/perf: consolidate read_user_stack_32
powerpc: move common register copy functions from signal_32.c to signal.c
powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
powerpc/ps3: Remove an unneeded NULL check
powerpc/ps3: Remove duplicate error message
powerpc/powernv: Re-enable imc trace-mode in kernel
powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
powerpc/pseries: Fix MCE handling on pseries
selftests/eeh: Skip ahci adapters
powerpc/64s: Fix doorbell wakeup msgclr optimisation
Pull m68knommu update from Greg Ungerer:
"Only a single commit, to remove all use of the obsolete setup_irq()
calls within the m68knommu architecture code"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: Replace setup_irq() by request_irq()
The error could indicate a problem with the Bluetooth device. It
is easier to investigate if the packet's actual length gets logged,
not just the fact that a discrepancy occurred.
Signed-off-by: Daniels Umanovskis <du@axentia.se>
Reviewed-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Pull RISC-V updates from Palmer Dabbelt:
"This contains a handful of new features:
- Partial support for the Kendryte K210.
There are still a few outstanding issues that I have patches for,
but I don't actually have a board to test them so they're not
included yet.
- SBI v0.2 support.
- Fixes to support for building with LLVM-based toolchains. The
resulting images are known not to boot yet.
I don't anticipate a part two, but I'll probably have something early
in the RCs to finish up the K210 support"
* tag 'riscv-for-linus-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits)
riscv: create a loader.bin boot image for Kendryte SoC
riscv: Kendryte K210 default config
riscv: Add Kendryte K210 device tree
riscv: Select required drivers for Kendryte SOC
riscv: Add Kendryte K210 SoC support
riscv: Add SOC early init support
riscv: Unaligned load/store handling for M_MODE
RISC-V: Support cpu hotplug
RISC-V: Add supported for ordered booting method using HSM
RISC-V: Add SBI HSM extension definitions
RISC-V: Export SBI error to linux error mapping function
RISC-V: Add cpu_ops and modify default booting method
RISC-V: Move relocate and few other functions out of __init
RISC-V: Implement new SBI v0.2 extensions
RISC-V: Introduce a new config for SBI v0.1
RISC-V: Add SBI v0.2 extension definitions
RISC-V: Add basic support for SBI v0.2
RISC-V: Mark existing SBI as 0.1 SBI.
riscv: Use macro definition instead of magic number
riscv: Add support to dump the kernel page tables
...
When CONFIG_IP_MULTICAST is not set and multicast ip is added to the device
with autojoin flag or when multicast ip is deleted kernel will crash.
steps to reproduce:
ip addr add 224.0.0.0/32 dev eth0
ip addr del 224.0.0.0/32 dev eth0
or
ip addr add 224.0.0.0/32 dev eth0 autojoin
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
pc : _raw_write_lock_irqsave+0x1e0/0x2ac
lr : lock_sock_nested+0x1c/0x60
Call trace:
_raw_write_lock_irqsave+0x1e0/0x2ac
lock_sock_nested+0x1c/0x60
ip_mc_config.isra.28+0x50/0xe0
inet_rtm_deladdr+0x1a8/0x1f0
rtnetlink_rcv_msg+0x120/0x350
netlink_rcv_skb+0x58/0x120
rtnetlink_rcv+0x14/0x20
netlink_unicast+0x1b8/0x270
netlink_sendmsg+0x1a0/0x3b0
____sys_sendmsg+0x248/0x290
___sys_sendmsg+0x80/0xc0
__sys_sendmsg+0x68/0xc0
__arm64_sys_sendmsg+0x20/0x30
el0_svc_common.constprop.2+0x88/0x150
do_el0_svc+0x20/0x80
el0_sync_handler+0x118/0x190
el0_sync+0x140/0x180
Fixes: 93a714d6b5 ("multicast: Extend ip address command to enable multicast group join/leave on")
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rds_free_mr(), it calls rds_destroy_mr(mr) directly. But this
defeats the purpose of reference counting and makes MR free handling
impossible. It means that holding a reference does not guarantee that
it is safe to access some fields. For example, In
rds_cmsg_rdma_dest(), it increases the ref count, unlocks and then
calls mr->r_trans->sync_mr(). But if rds_free_mr() (and
rds_destroy_mr()) is called in between (there is no lock preventing
this to happen), r_trans_private is set to NULL, causing a panic.
Similar issue is in rds_rdma_unuse().
Reported-by: zerons <sironhide0null@gmail.com>
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
> (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&pid->wait_pidfd);
> local_irq_disable();
> lock(tasklist_lock);
> lock(&pid->wait_pidfd);
> <Interrupt>
> lock(tasklist_lock);
>
> *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:
The problem is that because wait_pidfd.lock is taken under the tasklist
lock. It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.
Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock. That should be safe and I think the code will eventually
do that. It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.
It is also true that my pre-merge window testing was insufficient. So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things. I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.
It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals. That will require taking pid->lock with irqs disabled.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes: 7bc3e6e55a ("proc: Use a list of inodes to flush from proc")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The variable ret is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the local node id(qrtr_local_nid) is not modified after its
initialization, it equals to the broadcast node id(QRTR_NODE_BCAST).
So the messages from local node should not be taken as broadcast
and keep the process going to send them out anyway.
The definitions are as follow:
static unsigned int qrtr_local_nid = NUMA_NO_NODE;
Fixes: fdf5fd3975 ("net: qrtr: Broadcast messages only from control port")
Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2020-04-08
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v5.3
('net/mlx5: Fix frequent ioread PCI access during recovery')
('net/mlx5e: Add missing release firmware call')
For -stable v5.4
('net/mlx5e: Fix nest_level for vlan pop action')
('net/mlx5e: Fix pfnum in devlink port attribute')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
At 'make htmldocs' the following warning is thrown:
Documentation/networking/devlink/devlink-trap.rst:302:
WARNING: undefined label: generic-packet-trap-groups
Fixes the warning by setting the label to the specified header,
within the same document.
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although it's not unexpected for drm_dp_check_act_status() to fail due
to DPCD read failures (as the hub may have just been unplugged
suddenly), timeouts are a bit more worrying as they either mean we need
a longer timeout value, or we aren't setting up payload allocations
properly. So, let's start printing errors on timeouts.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Sean Paul <sean@poorly.run>
Reviewed-by: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20200406221253.1307209-5-lyude@redhat.com
Currently we only poll for an ACT up to 30 times, with a busy-wait delay
of 100µs between each attempt - giving us a timeout of 2900µs. While
this might seem sensible, it would appear that in certain scenarios it
can take dramatically longer then that for us to receive an ACT. On one
of the EVGA MST hubs that I have available, I observed said hub
sometimes taking longer then a second before signalling the ACT. These
delays mostly seem to occur when previous sideband messages we've sent
are NAKd by the hub, however it wouldn't be particularly surprising if
it's possible to reproduce times like this simply by introducing branch
devices with large LCTs since payload allocations have to take effect on
every downstream device up to the payload's target.
So, instead of just retrying 30 times we poll for the ACT for up to 3ms,
and additionally use usleep_range() to avoid a very long and rude
busy-wait. Note that the previous retry count of 30 appears to have been
arbitrarily chosen, as I can't find any mention of a recommended timeout
or retry count for ACTs in the DisplayPort 2.0 specification. This also
goes for the range we were previously using for udelay(), although I
suspect that was just copied from the recommended delay for link
training on SST devices.
Changes since v1:
* Use readx_poll_timeout() instead of open-coding timeout loop - Sean
Paul
Changes since v2:
* Increase poll interval to 200us - Sean Paul
* Print status in hex when we timeout waiting for ACT - Sean Paul
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: ad7f8a1f9c ("drm/helper: add Displayport multi-stream helper (v0.6)")
Cc: Sean Paul <sean@poorly.run>
Cc: <stable@vger.kernel.org> # v3.17+
Reviewed-by: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20200406221253.1307209-4-lyude@redhat.com
If completion queue overflow occurs, __io_cqring_fill_event() will
update req->cflags, which is in a union with req->work and happens to
be aliased to req->work.fs. Following io_free_req() ->
io_req_work_drop_env() may get a bunch of different problems (miscount
fs->users, segfault, etc) on cleaning @fs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Intel Comet Lake should use the default LPM policy for mobile chipsets.
So, add the PCI ID to the driver list of supported devices.
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sdio bus bandwidth is low, sometimes for high performance TX test,
it will lack of ath10k_sdio_bus_request, it will print message:
ath10k_sdio mmc1:0001:1: unable to allocate bus request for async request
change the num from 64 to 1024 will not happen it.
Tested with QCA6174 SDIO with firmware
WLAN.RMH.4.4.1-00017-QCARMSWP-1.
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20200212080415.31265-3-wgong@codeaurora.org
For sdio chip, it is high latency bus, all the TX packet's content will
be tranferred from HOST memory to firmware memory via sdio bus, then it
need much more memory in firmware than low latency bus chip, for low
latency chip, such as PCI-E, it only need to transfer the TX descriptor
via PCI-E bus to firmware memory. For sdio chip, reduce the complexity of
TX logic will help TX efficiency since its memory is limited, and it will
reduce the TX circle's time of each packet and then firmware will have more
memory for TX since TX complete also need memeory.
This patch disable TX complete indication from firmware for htt data
packet, it will not have TX complete indication from firmware to ath10k.
It will cut the cost of bus bandwidth of TX complete and make the TX
logic of firmware simpler, it results in significant performance
improvement on TX path.
Udp TX throughout is 130Mbps without this patch, and it arrives
400Mbps with this patch.
The downside of this patch is the command "iw wlan0 station dump" will
show 0 for "tx retries" and "tx failed" since all tx packet's status
is success.
This patch only effect sdio chip, it will not effect PCI, SNOC etc.
Tested with QCA6174 SDIO with firmware
WLAN.RMH.4.4.1-00017-QCARMSWPZ-1
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20200212080415.31265-2-wgong@codeaurora.org