Add missing of_node_put() call for device node returned by
of_parse_phandle().
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp GSO uses skb_gro_receive() to append the data into head
skb frag_list. However it actually only needs very few code from
skb_gro_receive(). Besides, NAPI_GRO_CB has to be set while most
of its members are not needed here.
This patch is to add sctp_packet_gso_append() to build GSO frames
instead of skb_gro_receive(), and it would avoid many unnecessary
checks and make the code clearer.
Note that sctp will use page frags instead of frag_list to build
GSO frames in another patch. But it may take time, as sctp's GSO
frames may have different size. skb_segment() can only split it
into the frags with the same size, which would break the border
of sctp chunks.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter patches for your net tree:
1) Fix NULL pointer dereference from nf_nat_decode_session() if NAT is
not loaded, from Prashant Bhole.
2) Fix socket extension module autoload.
3) Don't bogusly reject sets with the NFT_SET_EVAL flag set on from
the dynset extension.
4) Fix races with nf_tables module removal and netns exit path,
patches from Florian Westphal.
5) Don't hit BUG_ON if jumpstack goes too deep, instead hit
WARN_ON_ONCE, from Taehee Yoo.
6) Another NULL pointer dereference from ctnetlink, again if NAT is
not loaded, from Florian Westphal.
7) Fix x_tables match list corruption in xt_connmark module removal
path, also from Florian.
8) nf_conncount doesn't properly deal with conntrack zones, hence
garbage collector may get rid of entries in a different zone.
From Yi-Hung Wei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The max number of slots used in xennet_get_responses() is set to
MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD).
In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This
difference is resulting in frequent messages "too many slots" and a
reduced network throughput for some workloads (factor 10 below that of
a kernel-xen based guest).
Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of
the max number of slots to use solves that problem (tests showed no
more messages "too many slots" and throughput was as high as with the
kernel-xen based guest system).
Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in
netfront_tx_slot_available() for making it clearer what is really being
tested without actually modifying the tested value.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
smc->clcsock is an internal TCP socket, after TCP socket
converts to ->poll_mask, ->poll doesn't exist any more.
So just convert smc socket to ->poll_mask too.
Fixes: 2c7d3daceb ("net/tcp: convert to ->poll_mask")
Reported-by: syzbot+f5066e369b2d5fff630f@syzkaller.appspotmail.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If 'of_device_get_match_data()' fails, we need to release some resources as
done in the other error handling path of this function.
Fixes: efacb568c9 ("net: stmmac: dwmac-meson: extend phy mode setting")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix failures in the 'teardown' stage of test b7b8, probably a leftover of
commit 7c5995b33d ("tc-testing: fixed copy-pasting error in ife tests")
Fixes: a56e6bcd34 ("tc-testing: updated ife test cases")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each network interface linux network stack issue ndo_set_rx_mode call
in order to configure MAC address filters (e.g. for multicast filtering).
Currently ThunderX NICVF driver has only one ordered workqueue to process
such requests for all VFs.
And because of that it is possible that subsequent call to
ndo_set_rx_mode would corrupt data which is currently in use
by nicvf_set_rx_mode_task. Which in turn could cause following issue:
[...]
[ 48.978341] Unable to handle kernel paging request at virtual address 1fffff0000000000
[ 48.986275] Mem abort info:
[ 48.989058] Exception class = DABT (current EL), IL = 32 bits
[ 48.994965] SET = 0, FnV = 0
[ 48.998020] EA = 0, S1PTW = 0
[ 49.001152] Data abort info:
[ 49.004022] ISV = 0, ISS = 0x00000004
[ 49.007869] CM = 0, WnR = 0
[ 49.010826] [1fffff0000000000] address between user and kernel address ranges
[ 49.017963] Internal error: Oops: 96000004 [#1] SMP
[...]
[ 49.072138] task: ffff800fdd675400 task.stack: ffff000026440000
[ 49.078051] PC is at prefetch_freepointer.isra.37+0x28/0x3c
[ 49.083613] LR is at kmem_cache_alloc_trace+0xc8/0x1fc
[...]
[ 49.272684] [<ffff0000082738f0>] prefetch_freepointer.isra.37+0x28/0x3c
[ 49.279286] [<ffff000008276bc8>] kmem_cache_alloc_trace+0xc8/0x1fc
[ 49.285455] [<ffff0000082c0c0c>] alloc_fdtable+0x78/0x134
[ 49.290841] [<ffff0000082c15c0>] dup_fd+0x254/0x2f4
[ 49.295709] [<ffff0000080d1954>] copy_process.isra.38.part.39+0x64c/0x1168
[ 49.302572] [<ffff0000080d264c>] _do_fork+0xfc/0x3b0
[ 49.307524] [<ffff0000080d29e8>] SyS_clone+0x44/0x50
[...]
This patch is to prevent such concurrent data write with spinlock.
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GPIO MDIO driver now needs only <linux/gpio/consumer.h>
so cut the legacy <linux/gpio.h> and <linux/of_gpio.h>
includes that are no longer used.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
hv_netvsc: notification and namespace fixes
This set of patches addresses two set of fixes. First it backs out
the common callback model which was merged in net-next without
completing all the review feedback or getting maintainer approval.
Then it fixes the transparent VF management code to handle network
namespaces.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When VF is added, the paravirtual device is already present
and may have been moved to another network namespace. For example,
sometimes the management interface is put in another net namespace
in some environments.
The VF should get moved to where the netvsc device is when the
VF is discovered. The user can move it later (if desired).
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When finding the parent netvsc device, the search needs to be across
all netvsc device instances (independent of network namespace).
Find parent device of VF using upper_dev_get routine which
searches only adjacent list.
Fixes: e8ff40d4bf ("hv_netvsc: improve VF device matching")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
netns aware byref
Signed-off-by: David S. Miller <davem@davemloft.net>
The callback model of handling network failover is not suitable
in the current form.
1. It was merged without addressing all the review feedback.
2. It was merged without approval of any of the netvsc maintainers.
3. Design discussion on how to handle PV/VF fallback is still
not complete.
4. IMHO the code model using callbacks is trying to make
something common which isn't.
Revert the netvsc specific changes for now. Does not impact ongoing
development of failover model for virtio.
Revisit this after a simpler library based failover kernel
routines are extracted.
This reverts
commit 9c6ffbacdb ("hv_netvsc: fix error return code in netvsc_probe()")
and
commit 1ff78076d8 ("netvsc: refactor notifier/event handling code to use the failover framework")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: fix a warning, stats, naming and route leak
Various fixes for the NFP. Patch 1 fixes a harmless GCC 8 warning.
Patch 2 ensures statistics are correct after users decrease the number
of channels/rings. Patch 3 restores phy_port_name behaviour for flower,
ndo_get_phy_port_name used to return -EOPNOTSUPP on one of the netdevs,
and we need to keep it that way otherwise interface names may change.
Patch 4 fixes refcnt leak in flower tunnel offload code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to release the refcnt on dst_entry in the route table, otherwise
we will leak the route.
Fixes: 8e6a9046b6 ("nfp: flower vxlan neighbour offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.ndo_get_phys_port_name was recently extended to support multi-vNIC
FWs. These are firmwares which can have more than one vNIC per PF
without associated port (e.g. Adaptive Buffer Management FW), therefore
we need a way of distinguishing the vNICs. Unfortunately, it's too
late to make flower use the same naming. Flower users may depend on
.ndo_get_phys_port_name returning -EOPNOTSUPP, for example the name
udev gave the PF vNIC was just the bare PCI device-based name before
the change, and will have 'nn0' appended after.
To ensure flower's vNIC doesn't have phys_port_name attribute, add
a flag to vNIC struct and set it in flower code. New projects will
not set the flag adhere to the naming scheme from the start.
Fixes: 51c1df83e3 ("nfp: assign vNIC id as phys_port_name of vNICs which are not ports")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are gathering software statistics on per-ring basis.
.ndo_get_stats64 handler adds the rings up. Unfortunately
we are currently only adding up active rings, which means
that if user decreases the number of active rings the
statistics from deactivated rings will no longer be counted
and total interface statistics may go backwards.
Always sum all possible rings, the stats are allocated
statically for max number of rings, so we don't have to
worry about them being removed. We could add the stats
up when user changes the ring count, but it seems unnecessary..
Adding up inactive rings will be very quick since no datapath
will be touching them.
Fixes: 164d1e9e5d ("nfp: add support for ethtool .set_channels")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once upon a time nfp_cpp_resource_find() took a name parameter,
which could be any user-chosen string. Resources are identified
by a CRC32 hash of a 8 byte string, so we had to pad user input
with zeros to make sure CRC32 gave the correct result.
Since then nfp_cpp_resource_find() was made to operate on allocated
resources only (struct nfp_resource). We kzalloc those so there is
no need to pad the strings and use memcmp.
This avoids a GCC 8 stringop-truncation warning:
In function ‘nfp_cpp_resource_find’,
inlined from ‘nfp_resource_try_acquire’ at .../nfpcore/nfp_resource.c:153:8,
inlined from ‘nfp_resource_acquire’ at .../nfpcore/nfp_resource.c:206:9:
.../nfpcore/nfp_resource.c:108:2: warning: strncpy’ output may be truncated copying 8 bytes from a string of length 8 [-Wstringop-truncation]
strncpy(name_pad, res->name, sizeof(name_pad));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert the patch mentioned in the subject because it breaks at least
the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
daemon to fail to start:
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
Fixes: f396922d86 ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we use check_hlist() for garbage colleciton. However, we
use the ‘zone’ from the counted entry to query the existence of
existing entries in the hlist. This could be wrong when they are in
different zones, and this patch fixes this issue.
Fixes: e59ea3df3f ("netfilter: xt_connlimit: honor conntrack zone if available")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This needs to use xt_unregister_targets, else new revision is left
on the list which then causes list to point to a target struct that has been free'd.
Fixes: 472a73e007 ("netfilter: xt_conntrack: Support bit-shifting for CONNMARK & MARK targets.")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Dan Carpenter points out that deref occurs after NULL check, we should
re-fetch the pointer and check that instead.
Fixes: 2c205dd398 ("netfilter: add struct nf_nat_hook and use it")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain
crashes. But there is no need to crash hard here.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If net namespace is exiting while nf_tables module is being removed
we can oops:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: nf_tables_flowtable_event+0x43/0xf0 [nf_tables]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
Modules linked in: nf_tables(-) nfnetlink [..]
unregister_netdevice_notifier+0xdd/0x130
nf_tables_module_exit+0x24/0x3a [nf_tables]
SyS_delete_module+0x1c5/0x240
do_syscall_64+0x74/0x190
Avoid this by attempting to take reference on the net namespace from
the notifiers. If it fails the namespace is exiting already, and nft
core is taking care of cleanup work.
We also need to make sure the netdev hook type gets removed
before netns ops removal, else notifier might be invoked with device
event for a netns where net->nft was never initialised (because
pernet ops was removed beforehand).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We must first remove the nfnetlink protocol handler when nf_tables module
is unloaded -- we don't want userspace to submit new change requests once
we've started to tear down nft state.
Furthermore, nfnetlink must not call any subsystem function after
call_batch returned -EAGAIN.
EAGAIN means the subsys mutex was dropped, so its unlikely but possible that
nf_tables subsystem was removed due to 'rmmod nf_tables' on another cpu.
Therefore, we must abort batch completely and not move on to next part of
the batch.
Last, we can't invoke ->abort unless we've checked that the subsystem is
still registered.
Change netns exit path of nf_tables to make sure any incompleted
transaction gets removed on exit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
NFT_SET_EVAL is signalling the kernel that this sets can be updated from
the evaluation path, even if there are no expressions attached to the
element. Otherwise, set updates with no expressions fail. Update
description to describe the right semantics.
Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add alias definition for module autoload when adding socket rules.
Fixes: 554ced0a6e ("netfilter: nf_tables: add support for native socket matching")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is not necessary. skb_gro_receive() will never change what
'head' points to.
In it's original implementation (see commit 71d93b39e5 ("net: Add
skb_gro_receive")), it did:
====================
+ *head = nskb;
+ nskb->next = p->next;
+ p->next = NULL;
====================
This sequence was removed in commit 58025e46ea ("net: gro: remove
obsolete code from skb_gro_receive()")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Daniel Borkmann says:
====================
pull-request: bpf 2018-06-12
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Avoid an allocation warning in AF_XDP by adding __GFP_NOWARN for the
umem setup, from Björn.
2) Silence a warning in bpf fs when an application tries to open(2) a
pinned bpf obj due to missing fops. Add a dummy open fop that continues
to just bail out in such case, from Daniel.
3) Fix a BPF selftest urandom_read build issue where gcc complains that
it gets built twice, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-06-11
This series contains fixes to ixgbe IPsec and MACVLAN.
Alex provides the 5 fixes in this series, starting with fixing an issue
where num_rx_pools was not being populated until after the queues and
interrupts were reinitialized when enabling MACVLAN interfaces. Updated
to use CONFIG_XFRM_OFFLOAD instead of CONFIG_XFRM, since the code
requires CONFIG_XFRM_OFFLOAD to be enabled. Moved the IPsec
initialization function to be more consistent with the placement of
similar initialization functions and before the call to reset the
hardware, which will clean up any link issues that may have been
introduced. Fixed the boolean logic that was testing for transmit OR
receive ready bits, when it should have been testing for transmit AND
receive ready bits. Fixed the bit definitions for SECTXSTAT and SECRXSTAT
registers and ensure that if IPsec is disabled on the part, do not
enable it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While hacking on kTLS, I ran into the following panic from an
unprivileged netserver / netperf TCP session:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
Oops: 0010 [#1] SMP KASAN PTI
CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
RIP: 0010: (null)
Code: Bad RIP value.
RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
FS: 00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? tls_sw_poll+0xa4/0x160 [tls]
? sock_poll+0x20a/0x680
? do_select+0x77b/0x11a0
? poll_schedule_timeout.constprop.12+0x130/0x130
? pick_link+0xb00/0xb00
? read_word_at_a_time+0x13/0x20
? vfs_poll+0x270/0x270
? deref_stack_reg+0xad/0xe0
? __read_once_size_nocheck.constprop.6+0x10/0x10
[...]
Debugging further, it turns out that calling into ctx->sk_poll() is
invalid since sk_poll itself is NULL which was saved from the original
TCP socket in order for tls_sw_poll() to invoke it.
Looks like the recent conversion from poll to poll_mask callback started
in 1525242310 ("net: add support for ->poll_mask in proto_ops") missed
to eventually convert kTLS, too: TCP's ->poll was converted over to the
->poll_mask in commit 2c7d3daceb ("net/tcp: convert to ->poll_mask")
and therefore kTLS wrongly saved the ->poll old one which is now NULL.
Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
tcp_poll_mask() as well that is mangled here.
Fixes: 2c7d3daceb ("net/tcp: convert to ->poll_mask")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Watson <davejwatson@fb.com>
Tested-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzkaller reported a warning from xdp_umem_pin_pages():
WARNING: CPU: 1 PID: 4537 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996
...
__do_kmalloc mm/slab.c:3713 [inline]
__kmalloc+0x25/0x760 mm/slab.c:3727
kmalloc_array include/linux/slab.h:634 [inline]
kcalloc include/linux/slab.h:645 [inline]
xdp_umem_pin_pages net/xdp/xdp_umem.c:205 [inline]
xdp_umem_reg net/xdp/xdp_umem.c:318 [inline]
xdp_umem_create+0x5c9/0x10f0 net/xdp/xdp_umem.c:349
xsk_setsockopt+0x443/0x550 net/xdp/xsk.c:531
__sys_setsockopt+0x1bd/0x390 net/socket.c:1935
__do_sys_setsockopt net/socket.c:1946 [inline]
__se_sys_setsockopt net/socket.c:1943 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:1943
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
This is a warning about attempting to allocate more than
KMALLOC_MAX_SIZE memory. The request originates from userspace, and if
the request is too big, the kernel is free to deny its allocation. In
this patch, the failed allocation attempt is silenced with
__GFP_NOWARN.
Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Reported-by: syzbot+4abadc5d69117b346506@syzkaller.appspotmail.com
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) Reject non-null terminated helper names from xt_CT, from Gao Feng.
2) Fix KASAN splat due to out-of-bound access from commit phase, from
Alexey Kodanev.
3) Missing conntrack hook registration on IPVS FTP helper, from Julian
Anastasov.
4) Incorrect skbuff allocation size in bridge nft_reject, from Taehee Yoo.
5) Fix inverted check on packet xmit to non-local addresses, also from
Julian.
6) Fix ebtables alignment compat problems, from Alin Nastac.
7) Hook mask checks are not correct in xt_set, from Serhey Popovych.
8) Fix timeout listing of element in ipsets, from Jozsef.
9) Cap maximum timeout value in ipset, also from Jozsef.
10) Don't allow family option for hash:mac sets, from Florent Fourcot.
11) Restrict ebtables to work with NFPROTO_BRIDGE targets only, this
Florian.
12) Another bug reported by KASAN in the rbtree set backend, from
Taehee Yoo.
13) Missing __IPS_MAX_BIT update doesn't include IPS_OFFLOAD_BIT.
From Gao Feng.
14) Missing initialization of match/target in ebtables, from Florian
Westphal.
15) Remove useless nft_dup.h file in include path, from C. Labbe.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When pskb_trim_rcsum fails, the lack of error-handling code may
cause unexpected results.
This patch adds error-handling code after calling pskb_trim_rcsum.
Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPVS setups with local client and remote tunnel server need
to create exception for the local virtual IP. What we do is to
change PMTU from 64KB (on "lo") to 1460 in the common case.
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Fixes: 45e4fd2668 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Fixes: 7343ff31eb ("ipv6: Don't create clones of host routes.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses two issues. First it adds the correct bit definitions
for the SECTXSTAT and SECRXSTAT registers. Then it makes use of those
definitions to test for if IPsec has been disabled on the part and if so we
do not enable it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: Andre Tomt <andre@tomt.net>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes two issues. First we add an early test for the Tx and Rx
security block ready bits. By doing this we can avoid the need for waits or
loopback in the event that the security block is already flushed out.
Secondly we fix the boolean logic that was testing for the Tx OR Rx ready
bits being set and change it so that we only exit if the Tx AND Rx ready
bits are both set.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch moves the IPsec init function in ixgbe_sw_init. This way it is a
bit more consistent with the placement of similar initialization functions
and is placed before the reset_hw call which should allow us to clean up
any link issues that may be introduced by the fact that we force the link
up if somehow the device had IPsec still enabled before the driver was
loaded.
In addition to the function move it is necessary to change the assignment
of netdev->features. The easiest way to do this is to just test for the
existence of adapter->ipsec and if it is present we set the feature bits.
Fixes: 49a94d74d9 ("ixgbe: add ipsec engine start and stop routines")
Reported-by: Andre Tomt <andre@tomt.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There is no point in adding code if CONFIG_XFRM is defined that we won't
use unless CONFIG_XFRM_OFFLOAD is defined. So instead of leaving this code
floating around I am replacing the ifdef with what I believe is the correct
one so that we only include the code and variables if they will actually be
used.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we were enabling macvlan interfaces we weren't correctly configuring
things until ixgbe_setup_tc was called a second time either by tweaking the
number of queues or increasing the macvlan count past 15.
The issue came down to the fact that num_rx_pools is not populated until
after the queues and interrupts are reinitialized.
Instead of trying to set it sooner we can just move the call to setup at
least 1 traffic class to the SR-IOV/VMDq setup function so that we just set
it for this one case. We already had a spot that was configuring the queues
for TC 0 in the code here anyway so it makes sense to also set the number
of TCs here as well.
Fixes: 49cfbeb7a9 ("ixgbe: Fix handling of macvlan Tx offload")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
gcc complains that urandom_read gets built twice.
gcc -o tools/testing/selftests/bpf/urandom_read
-static urandom_read.c -Wl,--build-id
gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf
-I../../../../include/generated -I../../../include urandom_read.c
urandom_read -lcap -lelf -lrt -lpthread -o
tools/testing/selftests/bpf/urandom_read
gcc: fatal error: input file
‘tools/testing/selftests/bpf/urandom_read’ is the
same as output file
compilation terminated.
../lib.mk:110: recipe for target
'tools/testing/selftests/bpf/urandom_read' failed
To fix this issue remove the urandom_read target and so target
TEST_CUSTOM_PROGS gets used.
Fixes: 81f77fd0de ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull networking fixes from David Miller:
1) Fix several bpfilter/UMH bugs, in particular make the UMH build not
depend upon X86 specific Kconfig symbols. From Alexei Starovoitov.
2) Fix handling of modified context pointer in bpf verifier, from
Daniel Borkmann.
3) Kill regression in ifdown/ifup sequences for hv_netvsc driver, from
Dexuan Cui.
4) When the bonding primary member name changes, we have to re-evaluate
the bond->force_primary setting, from Xiangning Yu.
5) Eliminate possible padding beyone end of SKB in cdc_ncm driver, from
Bjørn Mork.
6) RX queue length reported for UDP sockets in procfs and socket diag
are inaccurate, from Paolo Abeni.
7) Fix br_fdb_find_port() locking, from Petr Machata.
8) Limit sk_rcvlowat values properly in TCP, from Soheil Hassas
Yeganeh.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
tcp: limit sk_rcvlowat by the maximum receive buffer
net: phy: dp83822: use BMCR_ANENABLE instead of BMSR_ANEGCAPABLE for DP83620
socket: close race condition between sock_close() and sockfs_setattr()
net: bridge: Fix locking in br_fdb_find_port()
udp: fix rx queue len reported by diag and proc interface
cdc_ncm: avoid padding beyond end of skb
net/sched: act_simple: fix parsing of TCA_DEF_DATA
net: fddi: fix a possible null-ptr-deref
net: aquantia: fix unsigned numvecs comparison with less than zero
net: stmmac: fix build failure due to missing COMMON_CLK dependency
bpfilter: fix race in pipe access
bpf, xdp: fix crash in xdp_umem_unaccount_pages
xsk: Fix umem fill/completion queue mmap on 32-bit
tools/bpf: fix selftest get_cgroup_id_user
bpfilter: fix OUTPUT_FORMAT
umh: fix race condition
net: mscc: ocelot: Fix uninitialized error in ocelot_netdevice_event()
bonding: re-evaluate force_primary when the primary slave name changes
ip_tunnel: Fix name string concatenate in __ip_tunnel_create()
hv_netvsc: Fix a network regression after ifdown/ifup
...
Subsystem:
- rework of the rtc-test driver which allows to test the core more thoroughly
- rtc_set_alarm() now fails early when alarms are not supported
Drivers:
- mktime is now replaced by mktime64
- RTC range added for 88pm80x, ab-b5ze-s3, at91rm9200, brcmstb-waketimer,
ds1685, ftrtc010, ls1x, mxc_v2, rx8581, sprd, st-lpc, tps6586x, tps65910 and
vr41xx
- Fixed a possible race condition in probe functions
- pxa: fix the probe function that is broken since v4.3
- stm32: now supports stm32mp1
-----BEGIN PGP SIGNATURE-----
iQIyBAABCgAdFiEEXx9Viay1+e7J/aM4AyWl4gNJNJIFAlsdkVAACgkQAyWl4gNJ
NJJMfA/3YzFFxsZZdcf84e3LWMwgA12c/YNM24nlQ3S+Fo23bAerGZyKEroBAaiq
HVL7j6OwYkVrGJHbqvq7J0UhI0J9Fjbtp8suj7Cj5wBKOG3wUeTkpzBiHZN42WBB
PpPC97z9HRTVjxAOmWC0wbbf622ZBOZyEti3kMVh5DwER+8iNoPJWUS6nmZdOVqR
PjT/c79WCT3q7n2j9t+ZjQfVOqPlqTTty3WuCpYDu3ce3W7uUO/cISc3M4HA4A5d
dw6gDcd9WQcf4qZESjlci84pn4Ktha317fX5QlkaKM2ul3x33652pbH8Yv6ynZsq
ZlmIyE9vSWThBWUj7R4lo9/y2IsVL5FtMRIN5bvG6ms/tPuZGeX/qAZEBgMggN+r
PgFY5U+k/1WkOeSMd4OpuE9g308wzR3xGIhtuiJOa006hvHNvyMunIeMURDWjceW
fh1uu1eUQqf4yKt8ceB9s38pYcPrvtEOh9006VcHMp/JJpoOjIn93jdsaxCmlUZc
poDAYgH+RudVaaMZ4VvZjzlrD/diSwjh51MpBf0ImdQut4ehfZdGna4WOzddenAT
1nsVKRp/qxR0b9kolQYCpSVsJKHME4pZPEKY0f5UyCZgEy/l3SkMPVOkjXlAzZAd
ZX0l857UGeVbWP5sRDTc9J1sw2QAVO2oBsSOEeK0z9kvbuz/uQ==
=I5F0
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Setting the supported range from drivers for RTCs failing soon has
started. A few fixes are developed along the way. Some drivers have
been switched to SPDX by their maintainers.
Subsystem:
- rework of the rtc-test driver which allows to test the core more
thoroughly
- rtc_set_alarm() now fails early when alarms are not supported
Drivers:
- mktime() is now replaced by mktime64()
- RTC range added for 88pm80x, ab-b5ze-s3, at91rm9200,
brcmstb-waketimer, ds1685, ftrtc010, ls1x, mxc_v2, rx8581, sprd,
st-lpc, tps6586x, tps65910 and vr41xx
- fixed a possible race condition in probe functions
- pxa: fix the probe function that is broken since v4.3
- stm32: now supports stm32mp1"
* tag 'rtc-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (78 commits)
rtc: pxa: fix probe function
rtc: cros-ec: Switch to SPDX identifier.
rtc: cros-ec: Make license text and module license match.
rtc: ensure rtc_set_alarm fails when alarms are not supported
rtc: test: remove alarm support from the first device
rtc: test: convert to devm_rtc_allocate_device
rtc: ftrtc010: let the core handle range
rtc: ftrtc010: handle dates after 2106
rtc: ftrtc010: switch to devm_rtc_allocate_device
rtc: mrst: switch to devm functions
rtc: sunxi: fix possible race condition
rtc: test: remove irq sysfs file
rtc: test: emulate alarms using timers
rtc: test: store time as an offset to system time
rtc: test: allow registering many devices
rtc: test: remove useless proc info
rtc: ds1685: Add range
rtc: ds1685: fix possible race condition
rtc: sprd: Add new RTC power down check method
rtc: sun6i: Fix bit_idx value for clk_register_gate
...
- The UBI on-disk format header file is now dual licensed
- New way to detect Fastmap problems during runtime
- Bugfix for Fastmap
- Minor updates for UBIFS (spelling, comments, vm_fault_t, ...)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlsdi6AWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wVimEADam9dNKs7hVAlSR5G8fgoFMiPI
PgP/cQ719PAYh8Zub0w8fIIvtSCaVZmPX/K40w4KAgP/jdQ/v4QD3ZYdHO3vn3/M
6WI/cEB/Cn5lJp78tt14cGlqG3+xnlwyc9EEeiMCPDWmoSf66J7fwuL3xRq4dDet
CBISYrF8ZMe6BJohT9kAYeqYoYLzALUmZoOx1DGhX7b2SEaLphCyW0A0rkxoHbYG
sxAfUiR4/bdaFfqLAmEMgoovIHGYmOdnMJ3Hk+uUVrPEfBzxDQuGfpevDJ1u99hS
D4qdvY3+VAcTpTK0XJUBgPaVpkYYMS5nfHtpObbyOT/kBmgCloCAmtvVcznxUwJ7
R+8UBZH/SrxUw+R9JCP1hFgb/PlEAE0+Vsc2jgvqWOQ8xnOqojG5FdqAueGanX+l
/on2/HNoZp5C8x15+cB9SiQwfdoZrH7vr9uzi1lqaWSV+ZMDpvG/QFMX1QX1gLl9
TFeQJBojE41y+hjTnnUQZdy1oPaisffU44ymKlf1FR2SdzW6d7wIRBd0jnB+7Pjz
KocSAA2kqDUcCjUtLFSSVfE7kUL9wwJQUfUxeQFViTGDvAnLcmy1a6mqQ1K5YWq9
e5mpMQRWzoq0XrPyaVI5EEhvaqWalJwIa6GA7ZksOvS4W1pqFLLyFOKkHejUPou8
lxcw0dDsyZdmnDRuMA==
=b0lt
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.18-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
- the UBI on-disk format header file is now dual licensed
- new way to detect Fastmap problems during runtime
- bugfix for Fastmap
- minor updates for UBIFS (spelling, comments, vm_fault_t, ...)
* tag 'upstream-4.18-rc1' of git://git.infradead.org/linux-ubifs:
mtd: ubi: Update ubi-media.h to dual license
ubi: fastmap: Detect EBA mismatches on-the-fly
ubi: fastmap: Check each mapping only once
ubi: fastmap: Correctly handle interrupted erasures in EBA
ubi: fastmap: Cancel work upon detach
ubifs: lpt: Fix wrong pnode number range in comment
ubifs: gc: Fix typo
ubifs: log: Some spelling fixes
ubifs: Spelling fix someting -> something
ubifs: journal: Remove wrong comment
ubifs: remove set but never used variable
ubifs, xattr: remove misguided quota flags
fs: ubifs: Adding new return type vm_fault_t
The user-provided value to setsockopt(SO_RCVLOWAT) can be
larger than the maximum possible receive buffer. Such values
mute POLLIN signals on the socket which can stall progress
on the socket.
Limit the user-provided value to half of the maximum receive
buffer, i.e., half of sk_rcvbuf when the receive buffer size
is set by the user, or otherwise half of sysctl_tcp_rmem[2].
Fixes: d1361840f8 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
xfcp, hisi_sas, cxlflash, qla2xxx. In the absence of Nic, we're also
taking target updates which are mostly minor except for the tcmu
refactor. The only real core change to worry about is the removal of
high page bouncing (in sas, storvsc and iscsi). This has been well
tested and no problems have shown up so far.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWx1pbCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishUucAP42pccS
ziKyiOizuxv9fZ4Q+nXd1A9zhI5tqqpkHjcQegEA40qiZSi3EKGKR8W0UpX7Ntmo
tqrZJGojx9lnrAM2RbQ=
=NMXg
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
xfcp, hisi_sas, cxlflash, qla2xxx.
In the absence of Nic, we're also taking target updates which are
mostly minor except for the tcmu refactor.
The only real core change to worry about is the removal of high page
bouncing (in sas, storvsc and iscsi). This has been well tested and no
problems have shown up so far"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits)
scsi: lpfc: update driver version to 12.0.0.4
scsi: lpfc: Fix port initialization failure.
scsi: lpfc: Fix 16gb hbas failing cq create.
scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
scsi: lpfc: correct oversubscription of nvme io requests for an adapter
scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx)
scsi: hisi_sas: Mark PHY as in reset for nexus reset
scsi: hisi_sas: Fix return value when get_free_slot() failed
scsi: hisi_sas: Terminate STP reject quickly for v2 hw
scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command
scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
scsi: hisi_sas: Try wait commands before before controller reset
scsi: hisi_sas: Init disks after controller reset
scsi: hisi_sas: Create a scsi_host_template per HW module
scsi: hisi_sas: Reset disks when discovered
scsi: hisi_sas: Add LED feature for v3 hw
scsi: hisi_sas: Change common allocation mode of device id
scsi: hisi_sas: change slot index allocation mode
scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
...
DP83620 register set is compatible with the DP83848, but it also supports
100base-FX. When the hardware is configured such as that fiber mode is
enabled, autonegotiation is not possible.
The chip, however, doesn't expose this information via BMSR_ANEGCAPABLE.
Instead, this bit is always set high, even if the particular hardware
configuration makes it so that auto negotiation is not possible [1]. Under
these circumstances, the phy subsystem keeps trying for autonegotiation to
happen, without success.
Hereby, we inspect BMCR_ANENABLE bit after genphy_config_init, which on
reset is set to 0 when auto negotiation is disabled, and so we use this
value instead of BMSR_ANEGCAPABLE.
[1] https://e2e.ti.com/support/interface/ethernet/f/903/p/697165/2571170
Signed-off-by: Alvaro Gamez Machado <alvaro.gamez@hazent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fchownat() doesn't even hold refcnt of fd until it figures out
fd is really needed (otherwise is ignored) and releases it after
it resolves the path. This means sock_close() could race with
sockfs_setattr(), which leads to a NULL pointer dereference
since typically we set sock->sk to NULL in ->release().
As pointed out by Al, this is unique to sockfs. So we can fix this
in socket layer by acquiring inode_lock in sock_close() and
checking against NULL in sockfs_setattr().
sock_release() is called in many places, only the sock_close()
path matters here. And fortunately, this should not affect normal
sock_close() as it is only called when the last fd refcnt is gone.
It only affects sock_close() with a parallel sockfs_setattr() in
progress, which is not common.
Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Reported-by: shankarapailoor <shankarapailoor@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>