When filling in hardware intermediate representation tc_setup_flow_action()
directly obtains, checks and takes reference to dev used by mirred action,
instead of using act->ops->get_dev() API created specifically for this
purpose. In order to remove code duplication, refactor flow_action infra to
use action API when obtaining mirred action target dev. Extend get_dev()
with additional argument that is used to provide dev destructor to the
user.
Fixes: 5a6ff4b13d ("net: sched: take reference to action dev before calling offloads")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent patch set that removed rtnl lock dependency from cls hardware
offload API rtnl lock is only taken when reading action data and can be
released after action-specific data is parsed into intermediate
representation. However, sample action psample group is passed by pointer
without obtaining reference to it first, which makes it possible to
concurrently overwrite the action and deallocate object pointed by
psample_group pointer after rtnl lock is released but before driver
finished using the pointer.
To prevent such race condition, obtain reference to psample group while it
is used by flow_action infra. Extend psample API with function
psample_group_take() that increments psample group reference counter.
Extend struct tc_action_ops with new get_psample_group() API. Implement the
API for action sample using psample_group_take() and already existing
psample_group_put() as a destructor. Use it in tc_setup_flow_action() to
take reference to psample group pointed to by entry->sample.psample_group
and release it in tc_cleanup_flow_action().
Disable bh when taking psample_groups_lock. The lock is now taken while
holding action tcf_lock that is used by data path and requires bh to be
disabled, so doing the same for psample_groups_lock is necessary to
preserve SOFTIRQ-irq-safety.
Fixes: 918190f50e ("net: sched: flower: don't take rtnl lock for cls hw offloads API")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generalize flow_action_entry cleanup by extending the structure with
pointer to destructor function. Set the destructor in
tc_setup_flow_action(). Refactor tc_cleanup_flow_action() to call
entry->destructor() instead of using switch that dispatches by entry->id
and manually executes cleanup.
This refactoring is necessary for following patches in this series that
require destructor to use tc_action->ops callbacks that can't be easily
obtained in tc_cleanup_flow_action().
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many FORCEDETH NICs are used in our hosts. Several bugs are fixed and
some features are developed for FORCEDETH NICs. And I have been
reviewing patches for FORCEDETH NIC for several months. Mark me as the
FORCEDETH NIC maintainer. I will send out the patches and maintain
FORCEDETH NIC.
Signed-off-by: Rain River <rain.1986.08.12@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using static analysis, I discovered that the "dpriv->pci_priv->pdev"
pointer is always NULL. This pointer was supposed to be initialized
during probe and is essential for the driver to work. It would be easy
to add a "ppriv->pdev = pdev;" to dscc4_found1() but this driver has
been broken since before we started using git and no one has complained
so probably we should just remove it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My Citrix email address will expire shortly.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to hold rnl lock in suspend and resume callbacks because phylink
requires it. Otherwise we will get a WARN() in suspend and resume.
Also, move phylink start and stop callbacks to inside device's internal
lock so that we prevent concurrent HW accesses.
Fixes: 74371272f9 ("net: stmmac: Convert to phylink and remove phylib logic")
Reported-by: Christophe ROULLIER <christophe.roullier@st.com>
Tested-by: Christophe ROULLIER <christophe.roullier@st.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to
be freed on the tx_err path. Otherwise when deleting a netns, it would
cause dst/dev to leak, and dmesg shows:
unregister_netdevice: waiting for lo to become free. Usage count = 1
Fixes: ef7baf5e08 ("ip6_gre: add ip6 erspan collect_md mode")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a DP_VERBOSE debug message. Fix it.
(Using American English spelling as this is the most common way
to spell this in the kernel).
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for configuring the per-port egress flooding control for
both Unicast and Multicast traffic.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.
Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.
Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.
New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.
Fixes: e32ea7e747 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few small fixes and one feature that came in since I sent you the
earlier pull request.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl1+v7ETHGJyb29uaWVA
a2VybmVsLm9yZwAKCRAk1otyXVSH0N6DB/4kXPv5zew+Vm6uOz+4WguP+TqettIM
N47HXiWz66TKjM4m+7EWU+iZwOgtXGEisT0VvhVCNHvFiKRugViyURhgR2qNkBI8
KCion+EmehcRKLWesz3U06YXIaEctj54dUcdpi2wpymf1V1wye/wKJAWVDFpRd2z
g1/TcIp4KQaspEnfnTYEiN2jmeV4TGeJsWXMdeu2gbUf7zw0aEHMXt1QaiiNC0Vz
2T689MZBrBd/z0MT62EE0bkHoEqdgB+yxNSOWjoldjtx6yZ9wuz9f2Jc2WlPZT3i
97K2RJ3+TIwdeVtsN5ijZP61ALpz7Qif5q6UiSytyQcBdxYnwegUZlLT
=oi7v
-----END PGP SIGNATURE-----
Merge tag 'asoc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Final merge window fixes for v5.4
A few small fixes and one feature that came in since I sent you the
earlier pull request.
In __blk_mq_end_request() if block stats needs update, we should
ensure now is valid instead of 0 even when iostat is disabled.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently rq->data_len will be decreased by partial completion or
zeroed by completion, so when blk_stat_add() is invoked, data_len
will be zero and there will never be samples in poll_cb because
blk_mq_poll_stats_bkt() will return -1 if data_len is zero.
We could move blk_stat_add() back to __blk_mq_complete_request(),
but that would make the effort of trying to call ktime_get_ns()
once in vain. Instead we can reuse throtl_size field, and use
it for both block stats and block throttle, and adjust the
logic in blk_mq_poll_stats_bkt() accordingly.
Fixes: 4bc6339a58 ("block: move blk_stat_add() to __blk_mq_end_request()")
Tested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit b03755ad6f.
This is sad, and done for all the wrong reasons. Because that commit is
good, and does exactly what it says: avoids a lot of small disk requests
for the inode table read-ahead.
However, it turns out that it causes an entirely unrelated problem: the
getrandom() system call was introduced back in 2014 by commit
c6e9d6f388 ("random: introduce getrandom(2) system call"), and people
use it as a convenient source of good random numbers.
But part of the current semantics for getrandom() is that it waits for
the entropy pool to fill at least partially (unlike /dev/urandom). And
at least ArchLinux apparently has a systemd that uses getrandom() at
boot time, and the improvements in IO patterns means that existing
installations suddenly start hanging, waiting for entropy that will
never happen.
It seems to be an unlucky combination of not _quite_ enough entropy,
together with a particular systemd version and configuration. Lennart
says that the systemd-random-seed process (which is what does this early
access) is supposed to not block any other boot activity, but sadly that
doesn't actually seem to be the case (possibly due bogus dependencies on
cryptsetup for encrypted swapspace).
The correct fix is to fix getrandom() to not block when it's not
appropriate, but that fix is going to take a lot more discussion. Do we
just make it act like /dev/urandom by default, and add a new flag for
"wait for entropy"? Do we add a boot-time option? Or do we just limit
the amount of time it will wait for entropy?
So in the meantime, we do the revert to give us time to discuss the
eventual fix for the fundamental problem, at which point we can re-apply
the ext4 inode table access optimization.
Reported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All entries in 'rds_ib_stat_names' are stringified versions
of the corresponding "struct rds_ib_statistics" element
without the "s_"-prefix.
Fix entry 'ib_evt_handler_call' to do the same.
Fixes: f4f943c958 ("RDS: IB: ack more receive completions to improve performance")
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL
pointer which leads to a crash in sfb_destroy(). Similar for
sch_dsmark.
Instead of fixing each separately, Linus suggested to just accept
NULL pointer in qdisc_put(), which would make callers easier.
(For sch_dsmark, the bug probably exists long before commit
6529eaba33f0.)
Fixes: 6529eaba33 ("net: sched: introduce tcf block infractructure")
Reported-by: syzbot+d5870a903591faaca4ae@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA core, DSA taggers and DSA drivers all make use of
module_init(). Hence they get initialised at device_initcall() time.
The ordering is non-deterministic. It can be a DSA driver is bound to
a device before the needed tag driver has been initialised, resulting
in the message:
No tagger for this switch
Rather than have this be fatal, return -EPROBE_DEFER so that it is
tried again later once all the needed drivers have been loaded.
Fixes: d3b8c04988 ("dsa: Add boilerplate helper to register DSA tag driver modules")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test implemented by some_qdisc_is_busy() is somewhat loosy for
NOLOCK qdisc, as we may hit the following scenario:
CPU1 CPU2
// in net_tx_action()
clear_bit(__QDISC_STATE_SCHED...);
// in some_qdisc_is_busy()
val = (qdisc_is_running(q) ||
test_bit(__QDISC_STATE_SCHED,
&q->state));
// here val is 0 but...
qdisc_run(q)
// ... CPU1 is going to run the qdisc next
As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset()
in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as
both the above bit operations are under the root qdisc lock().
After commit 021a17ed79 ("pfifo_fast: drop unneeded additional lock on dequeue")
the race can cause use after free and/or null ptr dereference, but the root
cause is likely older.
This patch addresses the issue explicitly checking for deactivation under
the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical
scenario becomes a no-op.
Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(),
but that is safe due to the skb_array() locking, and we can't avoid that
for NOLOCK qdiscs.
Fixes: 021a17ed79 ("pfifo_fast: drop unneeded additional lock on dequeue")
Reported-by: Li Shuang <shuali@redhat.com>
Reported-and-tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the Distribution Kernels track at Linux Plumbers Conference there
was some discussion around the difficulty of making kernel builds
reproducible.
This is a solved problem, but the solutions don't appear to be
documented in one place. This document lists the issues I know about
and the settings needed to ensure reproducibility.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
code that was thought unnecessary; however, since then KVM has grown quite
a few cond_resched()s and for that reason the simplified code is prone to
livelocks---one CPUs tries to empty a list of guest page tables while the
others keep adding to them. This adds back the generation-based zapping of
guest page tables, which was not unnecessary after all.
On top of this, there is a fix for a kernel memory leak and a couple of
s390 fixlets as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdfJcxAAoJEL/70l94x66D8ocH/jhz+95+TBNF5j0xGmBWiDbH
zQlWmEMkAQ8o66J+503bc2ltQhM8078Ohumgtmq8HFaRgctDiRdjLBcce6aOr4tH
09gBdlWgkVWLhN8AhydSbHh+SLcCWdZQSAWQrvt1aM2BRdz5WECkUdauSL2oHsTP
P58318EL0JrqQaQZtK4qI4eNDiYWdKq2luMu9BTYm3f1Hnk28gorErgBehScYuxE
LlnM4RYedH54UR8opdUDmhHxO7bGTW4SVz4obq0sSOJs190DuVbCxbaJjhP+tSwk
7hPac3tHoYFUIhVtgGD3N/LrsFgvcShmjawLP0szCrR2sWrtTqmb0R63DLxpmyg=
=yuZ9
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main change here is a revert of reverts. We recently simplified
some code that was thought unnecessary; however, since then KVM has
grown quite a few cond_resched()s and for that reason the simplified
code is prone to livelocks---one CPUs tries to empty a list of guest
page tables while the others keep adding to them. This adds back the
generation-based zapping of guest page tables, which was not
unnecessary after all.
On top of this, there is a fix for a kernel memory leak and a couple
of s390 fixlets as well"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
KVM: x86: work around leak of uninitialized stack contents
KVM: nVMX: handle page fault in vmread
KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
Some workloads can require far more than 4K oustanding entries. For
example memcached can have ~300K sockets over ~40 cores. Bumping the max
to 32K seems to work pretty well.
Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
32 bit build got broken by the latest defence in depth patch.
Revert and we'll try again in the next cycle.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJdfT6AAAoJECgfDbjSjVRpOrUH/09N3JI/VSIvH5y75UYAU1pc
nDrekWI7TLYYIKJCUwfcxIzxskw1EOENxSFNRAtkKyRKZtq7HmhWOBec4NdnHkvy
ze1Cbt2aqQiMfbxJYStri/AYNSpC+aTttFSDAMm4TfE+QxfEjumO3J/HSRk0RYdt
leGziB4H2BjsZM/2JpMD9UFIq/D9SeGZTwd2sZTCTQ+RIvYEN2hGUXoG5hYl/xv6
+g6/Rkj/eAHqilpvyAt2PWFxslqLsQhWt/S/PHa61HpQLuPBCBYAu38O6X0vD93F
8vNQfcedz4qpyyHdfaGB+jquZ800BxqaUvdLdyAdQJXXGaytopAYzncEa1iPyyg=
=s4iF
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute revert
The 32-bit build got broken by the latest defence in depth patch.
Revert and we'll try again in the next cycle"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "vhost: block speculation of translated descriptors"
Last week, Palmer and I learned that there was an error in the RISC-V
kernel image header format that could make it less compatible with the
ARM64 kernel image header format. I had missed this error during my
original reviews of the patch.
The kernel image header format is an interface that impacts
bootloaders, QEMU, and other user tools. Those packages must be
updated to align with whatever is merged in the kernel. We would like
to avoid proliferating these image formats by keeping the RISC-V
header as close as possible to the existing ARM64 header. Since the
arch/riscv patch that adds support for the image header was merged
with our v5.3-rc1 pull request as commit 0f327f2aaa ("RISC-V: Add
an Image header that boot loader can parse."), we think it wise to try
to fix this error before v5.3 is released.
The fix itself should be backwards-compatible with any project that
has already merged support for premature versions of this interface.
It primarily involves ensuring that the RISC-V image header has
something useful in the same field as the ARM64 image header.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl1870sACgkQx4+xDQu9
Kkt33A//SG4fTIyz0rIGTZpJPKV3nXacBq6XvOFxFsHRHlEvD2f/JkSK1Ab+hV5R
vmTkVGCSCVz1C/OEA+KWsjuiJEglII6eOLIRqST1Wm6KumwAwLc78xdgEb1Sm/SC
E7OTYtSqbUjCqzzD1BFcXfbP4mGF/9IBjWI3OcCnb1UcLuL29Mt35gvxI9fF1FB6
+EU96MBQbk4gVUYjKXObTvaAZwWIYrMkOFQmdRgb4jqk42i0hLmKx//WkI1Ajlp8
FDjE2nIo2NAt0N7pImJ/QtqxkOsQjMtOyOscoTyhB4eGJW0+fTyVrt6FpUdYQDQq
vZI/WS2RFYUi2wfj+JNQ959MgsWZZ8z21KbFWwR0HC4k2xRZaxCO48g/VweJA/QW
3f6+CMxYgwF5KzToHvUjlo0wNMW2Xo/FX9bky3gb8rJPWnSx9uu9lfoh17FUD4Ty
cEknaLtmMALA8Lgr8hwTKbZLg7J1ih5r1SPj0UvjpjEmwDUl2doA0EONuuBroEHM
KDerGitg6D0g4B4VlGsHuLMd6Gj/5r2teno97tPoaf5J9mCZ1v2/Q5OL0QwBYd84
5cp+Ox1aQTY6SJq8gftBOD3MmW2lKCC5tT6H0bJvKBAE7tJaLPv5YIj6dp1jfXKB
klzJUdGRsL60EwlL/cbFOurDfhBeQlq8akdzG5Cg5e8q+mISSTE=
=Jt6U
-----END PGP SIGNATURE-----
Merge tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Paul Walmsley:
"Last week, Palmer and I learned that there was an error in the RISC-V
kernel image header format that could make it less compatible with the
ARM64 kernel image header format. I had missed this error during my
original reviews of the patch.
The kernel image header format is an interface that impacts
bootloaders, QEMU, and other user tools. Those packages must be
updated to align with whatever is merged in the kernel. We would like
to avoid proliferating these image formats by keeping the RISC-V
header as close as possible to the existing ARM64 header. Since the
arch/riscv patch that adds support for the image header was merged
with our v5.3-rc1 pull request as commit 0f327f2aaa ("RISC-V: Add
an Image header that boot loader can parse."), we think it wise to try
to fix this error before v5.3 is released.
The fix itself should be backwards-compatible with any project that
has already merged support for premature versions of this interface.
It primarily involves ensuring that the RISC-V image header has
something useful in the same field as the ARM64 image header"
* tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: modify the Image header to improve compatibility with the ARM64 header
This reverts commit a89db445fb.
I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull networking fixes from David Miller:
1) Don't corrupt xfrm_interface parms before validation, from Nicolas
Dichtel.
2) Revert use of usb-wakeup in btusb, from Mario Limonciello.
3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from
Leonardo Bras.
4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.
5) Missing ULP check in sock_map, from John Fastabend.
6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.
7) Fix length of SKB allocated in 6pack driver, from Christophe
JAILLET.
8) ip6_route_info_create() returns an error pointer, not NULL. From
Maciej Żenczykowski.
9) Only add RDS sock to the hashes after rs_transport is set, from
Ka-Cheong Poon.
10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.
11) Presence of transmit IPSEC offload in an SKB is not tested for
correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff
Kirsher.
12) Need rcu_barrier() when register_netdevice() takes one of the
notifier based failure paths, from Subash Abhinov Kasiviswanathan.
13) Fix leak in sctp_do_bind(), from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
cdc_ether: fix rndis support for Mediatek based smartphones
sctp: destroy bucket if failed to bind addr
sctp: remove redundant assignment when call sctp_get_port_local
sctp: change return type of sctp_get_port_local
ixgbevf: Fix secpath usage for IPsec Tx offload
sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
ixgbe: Fix secpath usage for IPsec TX offload.
net: qrtr: fix memort leak in qrtr_tun_write_iter
net: Fix null de-reference of device refcount
ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
tun: fix use-after-free when register netdev failed
tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
ixgbe: fix double clean of Tx descriptors with xdp
ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
mlx4: fix spelling mistake "veify" -> "verify"
net: hns3: fix spelling mistake "undeflow" -> "underflow"
net: lmc: fix spelling mistake "runnin" -> "running"
NFC: st95hf: fix spelling mistake "receieve" -> "receive"
net/rds: An rds_sock is added too early to the hash table
mac80211: Do not send Layer 2 Update frame before authorization
...
lima:
- fix gem_wait ioctl
core:
- constify modes list
i915:
- DP MST high color depth regression
- GPU hangs on vulkan compute workloads
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdemkOAAoJEAx081l5xIa+4ooP/RGV1E+pHsdC1SzsZYLZ+GDp
TLeg0KnNMijxfP8TCqjssPwJFbYVO3RxZHCDKbCQdESRG6TKS5wqJ9bgR2y3jdXZ
rjbhGY8YkxNkZbpAYPsYkzNcRqIrZFOYOnKDeafXP02KnHoiIckf8W3DKuCNr5XF
/htgh4rQnKZnkiFJm1OzGwxdTTI+yv4aqi50cLR5pGAnSUKpxeHm2M+S9P2BmMgL
BvsjDFFz+WGQllQ7nKUiMZ9oPbAGiQjWmICfucBniBUnhdleMk8qiNTuD8YQ1Cwh
m5Ix2wa8vPZBeg1Hb55iJGhC+l4ePGdLPtJiPUUUMM+d8A0QlcMLrFpXM8jvtgA5
h+cZFR9Po8QS1bb6Jic/i9OLzG+KhFAQLSREcBNqGcyqfCZHqKArPSlTqqjUkRRj
KW4OhdGxPrpmIzUHrHInSZKXzZtdWXKF35YFDWbjy0WNo91RW/X4I/qvCsML99uH
WDoPumEG4eolPLK5JCcTnMMhRYULRopulVc8Tw/yTIxlpUEVhWk7fGhc8XAK/F0D
S5bpxx8OQMfSYAaln5tzGhf4vUWVmdFNv0kyIdozZLRCQzYNuECg6bB16gZ9utvN
np8k9x6CKFWTlNerAhO2jpvmh/pVupQ4GylGY5uIc2Z4P9LPMzORltrNU8cgaZB/
8g9NDRvzCpfqW/UWXwFD
=wL1R
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"From the maintainer summit, just some last minute fixes for final:
lima:
- fix gem_wait ioctl
core:
- constify modes list
i915:
- DP MST high color depth regression
- GPU hangs on vulkan compute workloads"
* tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm:
drm/lima: fix lima_gem_wait() return value
drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+
drm/i915: Limit MST to <= 8bpc once again
drm/modes: Make the whitelist more const
Since commit 795fe54c2a ("bfq: Add per-device weight"), bfq uses
blkg_conf_prep() and blkg_conf_finish(), which are not exported. So, it
causes linkage error if bfq compiled as a module.
Fixes: 795fe54c2a ("bfq: Add per-device weight")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Last set of patches for 5.4. wil6210 and rtw88 being most active this
time, but ath9k also having a new module to load devices without
EEPROM.
Major changes:
wil6210
* add support for Enhanced Directional Multi-Gigabit (EDMG) channels 9-11
* add debugfs file to show PCM ring content
* report boottime_ns in scan results
ath9k
* add a separate loader for AR92XX (and older) pci(e) without eeprom
brcmfmac
* use the same wiphy after PCIe reset to not confuse the user space
rtw88
* enable interrupt migration
* enable AMSDU in AMPDU aggregation
* report RX power for each antenna
* enable to DPK and IQK calibration methods to improve performance
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJdfLwvAAoJEG4XJFUm622b8N8H/141YVGazqoNC/l0kaMbWZ//
xbgdS8ujGyEFg86dfXs1DjYPsPG1v+UJwc42GQALcjVrdUacuO7ZEdBKF6q9q0gG
7WsZrzmgMk2dJsZKSCYZHORbEJ/GE7yCgO1W1hS0iTRNXEV6/6u2s7bx4mPB3uhl
voYLTLaDvA2n8+pxuJ/Dl0ewWFuGnWqnhgU5CV3f9MGLkB4BXmvMdA5iqhiqq3Bj
RWfBZCslJj3snh4vsTzRNqnRRvtgTv3Nt/fVV8uH+K5wXBMutlc+MfVQw21kMb7c
RConz/N87usC+pgY+va2UpyNqLis2rUhb0UH4/Sqc7uBmeCBSyE8JulkRLnce+c=
=850h
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2019-09-14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 5.4
Last set of patches for 5.4. wil6210 and rtw88 being most active this
time, but ath9k also having a new module to load devices without
EEPROM.
Major changes:
wil6210
* add support for Enhanced Directional Multi-Gigabit (EDMG) channels 9-11
* add debugfs file to show PCM ring content
* report boottime_ns in scan results
ath9k
* add a separate loader for AR92XX (and older) pci(e) without eeprom
brcmfmac
* use the same wiphy after PCIe reset to not confuse the user space
rtw88
* enable interrupt migration
* enable AMSDU in AMPDU aggregation
* report RX power for each antenna
* enable to DPK and IQK calibration methods to improve performance
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Standard integer promotion is already done and %hx and %hhx is useless
so do not encourage the use of %hh[xudi] or %h[xudi].
As Linus said in:
https://lore.kernel.org/lkml/CAHk-=wgoxnmsj8GEVFJSvTwdnWm8wVJthefNk2n6+4TC=20e0Q@mail.gmail.com/
It's a pointless warning, making for more complex code, and
making people remember esoteric printf format details that have no
reason for existing.
The "h" and "hh" things should never be used. The only reason for them
being used if if you have an "int", but you want to print it out as a
"char" (and honestly, that is a really bad reason, you'd be better off
just using a proper cast to make the code more obvious).
So if what you have a "char" (or unsigned char) you should always just
print it out as an "int", knowing that the compiler already did the
proper type conversion.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Louis Taylor <louis@kragniz.eu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This argument is supported on RISC-V systems and widely used, but was
not documented here.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Remove the clever example about read-write lock because this type of
lock is not recommended anymore (according to the very same document).
So there is no reason to teach clever things that people should not do.
Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Describe how the comedi minor device numbers are split across comedi
devices and comedi subdevices.
Replace the current, long dead URL with an official URL for the Comedi
project.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
- prevent a user triggerable oops in the migration code
- do not leak kernel stack content
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJdejosAAoJEBF7vIC1phx8ZcYP/09WMmcbOexGvopqyMzIWgAv
xpSHAW0+mGriu9b41OwkxBsMG3MxUzk86b3zL0r5eaigWXSuE2NU0OhScqF9ehMX
pTtoeSzFJsPFwGQrOKIhpgcNzOJ+YfVqTDlf5dxq9uSNYF32suuz0Dw4P9PdFJOg
k8prJXiKu+bL21TcbhWsAAP7Gb5/DA26p4d5KM3wJe351Af9lrLrDF2z+pKe9fbY
v0vMcH3tJoBOOTYUSJeptEWU9OlYljMrJN7kkmXCEC8yklwoXPDNgAC8Yg2SfqYM
xNKVkX/rY97cn1Dq0LpAvEjMDYvu7KbOM1qQE9A67gRLIjuGJnDyEa+j/iB/tOrz
BMmTdut44XRaVZVdDL+d2pg3LKI+1+UV4XTwpD4g1tSpYLar3dJVb9mq00OzdCAg
TsK+pQYTSZig+H4ubtikgm9pFGKOB2Jsp2+FoC7jYxhYQWyj4syBkSoaaUdY0LvE
/Du3NY3RaG4yi2K2XV0yjBVAjpXxYMWqvzJYTC9XlrEQJ5nAmiefTgxZmcg4ZCMw
0YVRigG7vz8oKpVRl/6smGd/U+qTNZN4cXnFgUr71yONiIxsSndUZ/Yledtf+KQR
uzPfvIwYpRzwqVnXkkFb+PNxvJVftCbe2rRI4D549VsbmEJmSadjiB5aW1Rj3fMN
47ZjXZmmGETR8BtQEM37
=LxGy
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
KVM: s390: Fixes for 5.3
- prevent a user triggerable oops in the migration code
- do not leak kernel stack content
James Harvey reported a livelock that was introduced by commit
d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").
The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule. The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (4e103134b8, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.
There are three ways to fix the livelock:
- Reverting the revert (commit d012a06ab1) is not a viable option as
the revert is needed to fix a regression that occurs when the guest has
one or more assigned devices. It's unlikely we'll root cause the device
assignment regression soon enough to fix the regression timely.
- Remove the conditional reschedule from kvm_mmu_zap_all(). However, although
removing the reschedule would be a smaller code change, it's less safe
in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
the wild for flushing memslots since the fast invalidate mechanism was
introduced by commit 6ca18b6950 ("KVM: x86: use the fast way to
invalidate all pages"), back in 2013.
- Reintroduce the fast invalidate mechanism and use it when zapping shadow
pages in response to a memslot being deleted/moved, which is what this
patch does.
For all intents and purposes, this is a revert of commit ea145aacf4
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit 7390de1e99 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit 5304b8d37c ("KVM:
MMU: fast invalidate all pages") and commit 6ca18b6950 ("KVM: x86:
use the fast way to invalidate all pages") respectively.
Fixes: d012a06ab1 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Emulation of VMPTRST can incorrectly inject a page fault
when passed an operand that points to an MMIO address.
The page fault will use uninitialized kernel stack memory
as the CR2 and error code.
The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just ensure
that the error code and CR2 are zero.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Cc: stable@vger.kernel.org
[add comment]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The implementation of vmread to memory is still incomplete, as it
lacks the ability to do vmread to I/O memory just like vmptrst.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Part of the intention during the definition of the RISC-V kernel image
header was to lay the groundwork for a future merge with the ARM64
image header. One error during my original review was not noticing
that the RISC-V header's "magic" field was at a different size and
position than the ARM64's "magic" field. If the existing ARM64 Image
header parsing code were to attempt to parse an existing RISC-V kernel
image header format, it would see a magic number 0. This is
undesirable, since it's our intention to align as closely as possible
with the ARM64 header format. Another problem was that the original
"res3" field was not being initialized correctly to zero.
Address these issues by creating a 32-bit "magic2" field in the RISC-V
header which matches the ARM64 "magic" field. RISC-V binaries will
store "RSC\x05" in this field. The intention is that the use of the
existing 64-bit "magic" field in the RISC-V header will be deprecated
over time. Increment the minor version number of the file format to
indicate this change, and update the documentation accordingly. Fix
the assembler directives in head.S to ensure that reserved fields are
properly zero-initialized.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reported-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Karsten Merker <merker@debian.org>
Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
IPI shorthand is supported now by linux apic/x2apic driver, switch to
IPI shorthand for all excluding self and all including self destination
shorthand in kvm guest, to avoid splitting the target mask into several
PV IPI hypercalls. This patch removes the kvm_send_ipi_all() and
kvm_send_ipi_allbutself() since the callers in APIC codes have already
taken care of apic_use_ipi_shorthand and fallback to ->send_IPI_mask
and ->send_IPI_mask_allbutself if it is false.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull MD fixes from Song.
* 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
raid5: use bio_end_sector in r5_next_bio
raid5: remove STRIPE_OPS_REQ_PENDING
md: add feature flag MD_FEATURE_RAID0_LAYOUT
md/raid0: avoid RAID0 data corruption due to layout confusion.
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
raid5: don't increment read_errors on EILSEQ return
Actually, we calculate bio's end sector here, so use the common
way for the purpose.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>