There are follow comment errors:
1 The function name is wrong in p9_release_pages() comment.
2 The function name and variable name is wrong in p9_poll_workfn() comment.
3 There is no variable dm_mr and lkey in struct p9_trans_rdma.
4 The function name is wrong in rdma_create_trans() comment.
5 There is no variable initialized in struct virtio_chan.
6 The variable name is wrong in p9_virtio_zc_request() comment.
Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
i386 builds report:
net/xdp/xdp_umem.o: In function `xdp_umem_reg':
xdp_umem.c:(.text+0x47e): undefined reference to `__udivdi3'
This fix uses div_u64 instead of the GCC built-in.
Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It's fairly easy for offloaded XDP programs to select the RX queue
packets go to. We need a way of expressing this in the software.
Allow write to the rx_queue_index field of struct xdp_md for
device-bound programs.
Skip convert_ctx_access callback entirely for offloads.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
INFINIBAND_ADDR_TRANS depends on INFINIBAND. So there's no need for
options which depend INFINIBAND_ADDR_TRANS to also depend on INFINIBAND.
Remove the unnecessary INFINIBAND depends.
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When removing a rule that jumps to chain and such chain in the same
batch, this bogusly hits EBUSY. Add activate and deactivate operations
to expression that can be called from the preparation and the
commit/abort phases.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
currently matchinfo gets stored in the expression, but some xt matches
are very large.
To handle those we either need to switch nft core to kvmalloc and increase
size limit, or allocate the info blob of large matches separately.
This does the latter, this limits the scope of the changes to
nft_compat.
I picked a threshold of 192, this allows most matches to work as before and
handle only few ones via separate alloation (cgroup, u32, sctp, rt).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Next patch will make it possible for *info to be stored in
a separate allocation instead of the expr private area.
This removes the 'expr priv area is info blob' assumption
from the match init/destroy/eval functions.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch makes it so that if a destructor is not present we avoid trying
to update the skb socket or any reference counting that would be associated
with the NULL socket and/or descriptor. By doing this we can support
traffic coming from another namespace without any issues.
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for a software provided checksum and GSO_PARTIAL
segmentation support. With this we can offload UDP segmentation on devices
that only have partial support for tunnels.
Since we are no longer needing the hardware checksum we can drop the checks
in the segmentation code that were verifying if it was present.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows us to take care of unrolling the first segment and the
last segment of the loop for processing the segmented skb. Part of the
motivation for this is that it makes it easier to process the fact that the
first fame and all of the frames in between should be mostly identical
in terms of header data, and the last frame has differences in the length
and partial checksum.
In addition I am dropping the header length calculation since we don't
really need it for anything but the last frame and it can be easily
obtained by just pulling the data_len and offset of tail from the transport
header.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is meant to allow us to avoid having to recompute the checksum
from scratch and have it passed as a parameter.
Instead of taking that approach we can take advantage of the fact that the
length that was used to compute the existing checksum is included in the
UDP header.
Finally to avoid the need to invert the result we can just call csum16_add
and csum16_sub directly. By doing this we can avoid a number of
instructions in the loop that is handling segmentation.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in passing MSS as a parameter for for the GSO
segmentation call as it is already available via the shared info for the
skb itself.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to record the number of segments that will be generated when this
frame is segmented. The expectation is that if gso_size is set then
gso_segs is set as well. Without this some drivers such as ixgbe get
confused if they attempt to offload this as they record 0 segments for the
entire packet instead of the correct value.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
pull-request: ieee802154 2018-05-08
An update from ieee802154 for your *net* tree.
Two fixes for the mcr20a driver, which was being added in the 4.17 merge window,
by Gustavo and myself.
The atusb driver got a change to GFP_KERNEL where no GFP_ATOMIC is needed by
Jia-Ju.
The last and most important fix is from Alex to get IPv6 reassembly working
again for the ieee802154 6lowpan adaptation. This got broken in 4.16 so please
queue this one also up for the 4.16 stable tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The multiplication of 100000 * cfg80211_calculate_bitrate() is a 32 bit
operation and can overflow if cfg80211_calculate_bitrate is greater
than 42949. Although I don't believe this is occurring at present, it
would be safer to avoid the potential overflow by making the constant
100000 an ULL to ensure a 64 multiplication occurs.
Detected by CoverityScan, CID#1468643 ("Unintentional integer overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nft_chain_stats_replace() and all other spots assume ->stats can be
NULL, but nft_update_chain_stats does not. It must do this check,
just because the jump label is set doesn't mean all basechains have stats
assigned.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The icmp matches are implemented in ip_tables and ip6_tables,
respectively, so for normal iptables they are always available:
those modules are loaded once iptables calls getsockopt() to fetch
available module revisions.
In iptables-over-nftables case probing occurs via nfnetlink, so
these modules might not be loaded. Add aliases so modprobe can load
these when icmp/icmp6 is requested.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
fixes these warnings:
'nfnl_cthelper_create' at net/netfilter/nfnetlink_cthelper.c:237:2,
'nfnl_cthelper_new' at net/netfilter/nfnetlink_cthelper.c:450:9:
./include/linux/string.h:246:9: warning: '__builtin_strncpy' specified bound 16 equals destination size [-Wstringop-truncation]
return __builtin_strncpy(p, q, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Moreover, strncpy assumes null-terminated source buffers, but thats
not the case here.
Unlike strlcpy, nla_strlcpy *does* pad the destination buffer
while also considering nla attribute size.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Local clients are not properly synchronized on 32-bit CPUs when
updating stats (3.10+). Now it is possible estimation_timer (timer),
a stats reader, to interrupt the local client in the middle of
write_seqcount_{begin,end} sequence leading to loop (DEADLOCK).
The same interrupt can happen from received packet (SoftIRQ)
which updates the same per-CPU stats.
Fix it by disabling BH while updating stats.
Found with debug:
WARNING: inconsistent lock state
4.17.0-rc2-00105-g35cb6d7-dirty #2 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-R} -> {SOFTIRQ-ON-W} usage.
ftp/2545 [HC0[0]:SC0[0]:HE1:SE1] takes:
86845479 (&syncp->seq#6){+.+-}, at: ip_vs_schedule+0x1c5/0x59e [ip_vs]
{IN-SOFTIRQ-R} state was registered at:
lock_acquire+0x44/0x5b
estimation_timer+0x1b3/0x341 [ip_vs]
call_timer_fn+0x54/0xcd
run_timer_softirq+0x10c/0x12b
__do_softirq+0xc1/0x1a9
do_softirq_own_stack+0x1d/0x23
irq_exit+0x4a/0x64
smp_apic_timer_interrupt+0x63/0x71
apic_timer_interrupt+0x3a/0x40
default_idle+0xa/0xc
arch_cpu_idle+0x9/0xb
default_idle_call+0x21/0x23
do_idle+0xa0/0x167
cpu_startup_entry+0x19/0x1b
start_secondary+0x133/0x182
startup_32_smp+0x164/0x168
irq event stamp: 42213
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&syncp->seq#6);
<Interrupt>
lock(&syncp->seq#6);
*** DEADLOCK ***
Fixes: ac69269a45 ("ipvs: do not disable bh for long time")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Taehee Yoo reported following bug:
iptables-compat -I OUTPUT -m cpu --cpu 0
iptables-compat -F
lsmod |grep xt_cpu
xt_cpu 16384 1
Quote:
"When above command is given, a netlink message has two expressions that
are the cpu compat and the nft_counter.
The nft_expr_type_get() in the nf_tables_expr_parse() successes
first expression then, calls select_ops callback.
(allocates memory and holds module)
But, second nft_expr_type_get() in the nf_tables_expr_parse()
returns -EAGAIN because of request_module().
In that point, by the 'goto err1',
the 'module_put(info[i].ops->type->owner)' is called.
There is no release routine."
The core problem is that unlike all other expression,
nft_compat select_ops has side effects.
1. it allocates dynamic memory which holds an nft ops struct.
In all other expressions, ops has static storage duration.
2. It grabs references to the xt module that it is supposed to
invoke.
Depending on where things go wrong, error unwinding doesn't
always do the right thing.
In the above scenario, a new nft_compat_expr is created and
xt_cpu module gets loaded with a refcount of 1.
Due to to -EAGAIN, the netlink messages get re-parsed.
When that happens, nft_compat finds that xt_cpu is already present
and increments module refcount again.
This fixes the problem by making select_ops to have no visible
side effects and removes all extra module_get/put.
When select_ops creates a new nft_compat expression, the new
expression has a refcount of 0, and the xt module gets its refcount
incremented.
When error happens, the next call finds existing entry, but will no
longer increase the reference count -- the presence of existing
nft_xt means we already hold a module reference.
Because nft_xt_put is only called from nft_compat destroy hook,
it will never see the initial zero reference count.
->destroy can only be called after ->init(), and that will increase the
refcount.
Lastly, we now free nft_xt struct with kfree_rcu.
Else, we get use-after free in nf_tables_rule_destroy:
while (expr != nft_expr_last(rule) && expr->ops) {
nf_tables_expr_destroy(ctx, expr);
expr = nft_expr_next(expr); // here
nft_expr_next() dereferences expr->ops. This is safe
for all users, as ops have static storage duration.
In nft_compat case however, its ->destroy callback can
free the memory that hold the ops structure.
Tested-by: Taehee Yoo <ap420073@gmail.com>
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This adds support to mac80211 to export TXQ stats via the newly added
cfg80211 API.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds support for exporting the mac80211 TXQ stats via nl80211 by
way of a nested TXQ stats attribute, as well as for configuring the
quantum and limits that were previously only changeable through debugfs.
This commit adds just the nl80211 API, a subsequent commit adds support to
mac80211 itself.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch removes "experimental" from the help text where depends on
CONFIG_EXPERIMENTAL was already removed.
Signed-off-by: Georg Hofmann <georg@hofmannsweb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change fixes a couple of type mismatch reported by the sparse
tool, explicitly using the requested type for the offending arguments.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the core networking needs to detect the transport offset in a given
packet and parse it explicitly, a full-blown flow_keys struct is used for
storage.
This patch introduces a smaller keys store, rework the basic flow dissect
helper to use it, and apply this new helper where possible - namely in
skb_probe_transport_header(). The used flow dissector data structures
are renamed to match more closely the new role.
The above gives ~50% performance improvement in micro benchmarking around
skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
to the smaller memset. Small, but measurable improvement is measured also
in macro benchmarking.
v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
as per DaveM suggestion
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2018-05-07
1) Always verify length of provided sadb_key to fix a
slab-out-of-bounds read in pfkey_add. From Kevin Easton.
2) Make sure that all states are really deleted
before we check that the state lists are empty.
Otherwise we trigger a warning.
3) Fix MTU handling of the VTI6 interfaces on
interfamily tunnels. From Stefano Brivio.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add GRO capability for IPv6 GRE tunnel and ip6erspan tap, via gro_cells
infrastructure.
Performance testing: 55% higher badwidth.
Measuring bandwidth of 1 thread IPv4 TCP traffic over IPv6 GRE tunnel
while GRO on the physical interface is disabled.
CPU: Intel Xeon E312xx (Sandy Bridge)
NIC: Mellanox Technologies MT27700 Family [ConnectX-4]
Before (GRO not working in tunnel) : 2.47 Gbits/sec
After (GRO working in tunnel) : 3.85 Gbits/sec
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 'an' into 'and', and use a comma instead of a period.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case of writing a partial tls record we forgot to clear the
ctx->in_tcp_sendpages flag, causing some connections to stall.
Fixes: c212d2c7fc ("net/tls: Don't recursively call push_record during tls_write_space callbacks")
Signed-off-by: Andre Tomt <andre@tomt.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp only delays the authentication for the normal cookie-echo
chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
authentication first based on the old asoc, which will definitely
fail due to the different auth info in the old asoc.
The duplicated cookie-echo chunk will create a new asoc with the
auth info from this chunk, and the authentication should also be
done with the new asoc's auth info for all of the collision 'A',
'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
will never pass the authentication and create the new connection.
This issue exists since very beginning, and this fix is to make
sctp_assoc_bh_rcv() follow the way sctp_endpoint_bh_rcv() does
for the normal cookie-echo chunk to delay the authentication.
While at it, remove the unused params from sctp_sf_authenticate()
and define sctp_auth_chunk_verify() used for all the places that
do the delayed authentication.
v1->v2:
fix the typo in changelog as Marcelo noticed.
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The v9fs_get_trans_by_name(char *s) variable name is not "name" but "s".
Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vlan_flags enum is defined in include/uapi/linux/if_vlan.h file.
not in include/linux/if_vlan.h file.
Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor conflict, a CHECK was placed into an if() statement
in net-next, whilst a newline was added to that CHECK
call in 'net'. Thanks to Daniel for the merge resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver will process the RSSI if available and send it to mac80211.
mac80211 will compute the weighted average of ack RSSI for stations.
Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Average ack rssi will be given to userspace via NL80211 interface
if firmware is capable. Userspace tool ‘iw’ can process this
information and give the output as one of the fields in
‘iw dev wlanX station dump’.
Example output :
localhost ~ #iw dev wlan-5000mhz station dump Station
34:f3:9a:aa:3b:29 (on wlan-5000mhz)
inactive time: 5370 ms
rx bytes: 85321
rx packets: 576
tx bytes: 14225
tx packets: 71
tx retries: 0
tx failed: 2
beacon loss: 0
rx drop misc: 0
signal: -54 dBm
signal avg: -53 dBm
tx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
rx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
avg ack signal: -56 dBm
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 203 seconds
Main use case is to measure the signal strength of a connected station
to AP. Data packet transmit rates and bandwidth used by station can vary
a lot even if the station is at fixed location, especially if the rates
used are multi stream(2stream, 3stream) rates with different bandwidth(20/40/80 Mhz).
These multi stream rates are sensitive and station can use different transmit power
for each of the rate and bandwidth combinations. RSSI measured from these RX packets
on AP will be not stable and can vary a lot with in a short time.
Whereas 802.11 ack frames from station are sent relatively at a constant
rate (6/12/24 Mbps) with constant bandwidth(20 Mhz).
So average rssi of the ack packets is good and more accurate.
Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently the regulatory core does not call the regulatory callback
reg_notifier for self managed wiphys, but regulatory_hint_user() call is
independent of wiphy and is meant for all wiphys in the system. Even a
self managed wiphy may be interested in regulatory_hint_user() to know
the country code from a trusted regulatory domain change like a cellular
base station. Therefore, for the regulatory source
NL80211_REGDOM_SET_BY_USER and the user hint type
NL80211_USER_REG_HINT_CELL_BASE, call the regulatory notifier.
No current wlan driver uses the REGULATORY_WIPHY_SELF_MANAGED flag while
also registering the reg_notifier regulatory callback, therefore there
will be no impact on existing drivers without them being explicitly
modified to take advantage of this new possibility.
Signed-off-by: Amar Singhal <asinghal@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will serve userspace entity to maintain its regulatory limitation.
More specifcally APs can use this data to calculate the WMM IE when
building: beacons, probe responses, assoc responses etc...
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The data structure is initialized to all zeroes, and
we already rely on that in other places, so remove the
pointless assignment to 0.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Rather than just setting the valid flags to 0 set the
whole struct to 0 since other places might rely on it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no need to do the same thing three times in
the different switch cases, pull that out to a single
place.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since all the HE data won't fit into struct ieee80211_rx_status,
we'll (have to) move that into the SKB proper. This means we'll
need to skip over more things in the future, so rename this to
remove the vendor-only notion from it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016 spec, section 10.24.2 specifies that the block ack
timeout in the ADD BA request is advisory.
That means we should check the value in the response and
act upon it (same as buffer size).
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>