Commit Graph

57946 Commits

Author SHA1 Message Date
Dominique Martinet
91a76be37f 9p: add a per-client fcall kmem_cache
Having a specific cache for the fcall allocations helps speed up
end-to-end latency.

The caches will automatically be merged if there are multiple caches
of items with the same size so we do not need to try to share a cache
between different clients of the same size.

Since the msize is negotiated with the server, only allocate the cache
after that negotiation has happened - previous allocations or
allocations of different sizes (e.g. zero-copy fcall) are made with
kmalloc directly.

Some figures on two beefy VMs with Connect-IB (sriov) / trans=rdma,
with ior running 32 processes in parallel doing small 32 bytes IOs:
 - no alloc (4.18-rc7 request cache): 65.4k req/s
 - non-power of two alloc, no patch: 61.6k req/s
 - power of two alloc, no patch: 62.2k req/s
 - non-power of two alloc, with patch: 64.7k req/s
 - power of two alloc, with patch: 65.1k req/s

Link: http://lkml.kernel.org/r/1532943263-24378-2-git-send-email-asmadeus@codewreck.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Greg Kurz <groug@kaod.org>
2018-09-08 01:39:47 +09:00
Dominique Martinet
523adb6cc1 9p: embed fcall in req to round down buffer allocs
'msize' is often a power of two, or at least page-aligned, so avoiding
an overhead of two dozen bytes for each allocation will help the
allocator do its work and reduce memory fragmentation.

Link: http://lkml.kernel.org/r/1533825236-22896-1-git-send-email-asmadeus@codewreck.org
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
2018-09-08 01:39:45 +09:00
Marek Lindner
4c4af69008 batman-adv: fix hardif_neigh refcount on queue_work() failure
The hardif_neigh refcounter is to be decreased by the queued work and
currently is never decreased if the queue_work() call fails.
Fix by checking the queue_work() return value and decrease refcount
if necessary.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-07 08:14:26 +02:00
Marek Lindner
5af96b9c59 batman-adv: fix backbone_gw refcount on queue_work() failure
The backbone_gw refcounter is to be decreased by the queued work and
currently is never decreased if the queue_work() call fails.
Fix by checking the queue_work() return value and decrease refcount
if necessary.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-07 08:14:26 +02:00
Jesper Dangaard Brouer
47b123ed9e xdp: split code for map vs non-map redirect
The compiler does an efficient job of inlining static C functions.
Perf top clearly shows that almost everything gets inlined into the
function call xdp_do_redirect.

The function xdp_do_redirect end-up containing and interleaving the
map and non-map redirect code.  This is sub-optimal, as it would be
strange for an XDP program to use both types of redirect in the same
program. The two use-cases are separate, and interleaving the code
just cause more instruction-cache pressure.

I would like to stress (again) that the non-map variant bpf_redirect
is very slow compared to the bpf_redirect_map variant, approx half the
speed.  Measured with driver i40e the difference is:

- map     redirect: 13,250,350 pps
- non-map redirect:  7,491,425 pps

For this reason, the function name of the non-map variant of redirect
have been called xdp_do_redirect_slow.  This hopefully gives a hint
when using perf, that this is not the optimal XDP redirect operating mode.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-06 22:34:08 -07:00
Jesper Dangaard Brouer
2a68d85fe1 xdp: explicit inline __xdp_map_lookup_elem
The compiler chooses to not-inline the function __xdp_map_lookup_elem,
because it can see that it is used by both Generic-XDP and native-XDP
do redirect calls (xdp_do_generic_redirect_map and xdp_do_redirect_map).

The compiler cannot know that this is a bad choice, as it cannot know
that a net device cannot run both XDP modes (Generic or Native) at the
same time.  Thus, mark this function inline, even-though we normally
leave this up-to the compiler.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-06 22:34:08 -07:00
Jesper Dangaard Brouer
e1302542e3 xdp: unlikely instrumentation for xdp map redirect
Notice the compiler generated ASM code layout was suboptimal.  It
assumed map enqueue errors as the likely case, which is shouldn't.
It assumed that xdp_do_flush_map() was a likely case, due to maps
changing between packets, which should be very unlikely.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-06 22:34:08 -07:00
Cong Wang
8f5c5fcf35 tipc: call start and done ops directly in __tipc_nl_compat_dumpit()
__tipc_nl_compat_dumpit() uses a netlink_callback on stack,
so the only way to align it with other ->dumpit() call path
is calling tipc_dump_start() and tipc_dump_done() directly
inside it. Otherwise ->dumpit() would always get NULL from
cb->args[].

But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve
net pointer, the cb->skb here doesn't set skb->sk, the net pointer
is saved in msg->net instead, so introduce a helper function
__tipc_dump_start() to pass in msg->net.

Ying pointed out cb->args[0...3] are already used by other
callbacks on this call path, so we can't use cb->args[0] any
more, use cb->args[4] instead.

Fixes: 9a07efa9ae ("tipc: switch to rhashtable iterator")
Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 21:49:18 -07:00
Yi-Hung Wei
fa642f0883 openvswitch: Derive IP protocol number for IPv6 later frags
Currently, OVS only parses the IP protocol number for the first
IPv6 fragment, but sets the IP protocol number for the later fragments
to be NEXTHDF_FRAGMENT.  This patch tries to derive the IP protocol
number for the IPV6 later frags so that we can match that.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 21:47:49 -07:00
Sven Eckelmann
ae3cdc97dc batman-adv: Prevent duplicated tvlv handler
The function batadv_tvlv_handler_register is responsible for adding new
tvlv_handler to the handler_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: ef26157747 ("batman-adv: tvlv - basic infrastructure")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 14:37:13 +02:00
Sven Eckelmann
e7136e48ff batman-adv: Prevent duplicated global TT entry
The function batadv_tt_global_orig_entry_add is responsible for adding new
tt_orig_list_entry to the orig_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: d657e621a0 ("batman-adv: add reference counting for type batadv_tt_orig_list_entry")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 14:37:12 +02:00
Sven Eckelmann
94cb82f594 batman-adv: Prevent duplicated softif_vlan entry
The function batadv_softif_vlan_get is responsible for adding new
softif_vlan to the softif_vlan_list. It first checks whether the entry
already is in the list or not. If it is, then the creation of a new entry
is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: 5d2c05b213 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 14:37:05 +02:00
Sven Eckelmann
fa122fec86 batman-adv: Prevent duplicated nc_node entry
The function batadv_nc_get_nc_node is responsible for adding new nc_nodes
to the in_coding_list and out_coding_list. It first checks whether the
entry already is in the list or not. If it is, then the creation of a new
entry is aborted.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: d56b1705e2 ("batman-adv: network coding - detect coding nodes and remove these after timeout")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:55:58 +02:00
Sven Eckelmann
dff9bc42ab batman-adv: Prevent duplicated gateway_node entry
The function batadv_gw_node_add is responsible for adding new gw_node to
the gateway_list. It is expecting that the caller already checked that
there is not already an entry with the same key or not.

But the lock for the list is only held when the list is really modified.
This could lead to duplicated entries because another context could create
an entry with the same key between the check and the list manipulation.

The check and the manipulation of the list must therefore be in the same
locked code section.

Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:55:20 +02:00
Sven Eckelmann
a25bab9d72 batman-adv: Fix segfault when writing to sysfs elp_interval
The per hardif sysfs file "batman_adv/elp_interval" is using the generic
functions to store/show uint values. The helper __batadv_store_uint_attr
requires the softif net_device as parameter to print the resulting change
as info text when the users writes to this file. It uses the helper
function batadv_info to add it at the same time to the kernel ring buffer
and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG is enabled).

The function batadv_info requires as first parameter the batman-adv softif
net_device. This parameter is then used to find the private buffer which
contains the debug log for this batman-adv interface. But
batadv_store_throughput_override used as first argument the slave
net_device. This slave device doesn't have the batadv_priv private data
which is access by batadv_info.

Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead
to a segfault or to memory corruption.

Fixes: 0744ff8fa8 ("batman-adv: Add hard_iface specific sysfs wrapper macros for UINT")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:54:48 +02:00
Sven Eckelmann
b9fd14c208 batman-adv: Fix segfault when writing to throughput_override
The per hardif sysfs file "batman_adv/throughput_override" prints the
resulting change as info text when the users writes to this file. It uses
the helper function batadv_info to add it at the same time to the kernel
ring buffer and to the batman-adv debug log (when CONFIG_BATMAN_ADV_DEBUG
is enabled).

The function batadv_info requires as first parameter the batman-adv softif
net_device. This parameter is then used to find the private buffer which
contains the debug log for this batman-adv interface. But
batadv_store_throughput_override used as first argument the slave
net_device. This slave device doesn't have the batadv_priv private data
which is access by batadv_info.

Writing to this file with CONFIG_BATMAN_ADV_DEBUG enabled can either lead
to a segfault or to memory corruption.

Fixes: 0b5ecc6811 ("batman-adv: add throughput override attribute to hard_ifaces")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:54:48 +02:00
Sven Eckelmann
88d0895d0e batman-adv: Avoid probe ELP information leak
The probe ELPs for WiFi interfaces are expanded to contain at least
BATADV_ELP_MIN_PROBE_SIZE bytes. This is usually a lot more than the
number of bytes which the template ELP packet requires.

These extra padding bytes were not initialized and thus could contain data
which were previously stored at the same location. It is therefore required
to set it to some predefined or random values to avoid leaking private
information from the system transmitting these kind of packets.

Fixes: e4623c913508 ("batman-adv: Avoid probe ELP information leak")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-06 13:54:47 +02:00
Julian Wiedmann
b7f4156554 net/iucv: declare iucv_path_table_empty() as static
Fixes a compile warning.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:22 -07:00
Julian Wiedmann
b2f543949a net/af_iucv: fix skb handling on HiperTransport xmit error
When sending an skb, afiucv_hs_send() bails out on various error
conditions. But currently the caller has no way of telling whether the
skb was freed or not - resulting in potentially either
a) leaked skbs from iucv_send_ctrl(), or
b) double-free's from iucv_sock_sendmsg().

As dev_queue_xmit() will always consume the skb (even on error), be
consistent and also free the skb from all other error paths. This way
callers no longer need to care about managing the skb.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:22 -07:00
Julian Wiedmann
222440996d net/af_iucv: drop inbound packets with invalid flags
Inbound packets may have any combination of flag bits set in their iucv
header. If we don't know how to handle a specific combination, drop the
skb instead of leaking it.

To clarify what error is returned in this case, replace the hard-coded
0 with the corresponding macro.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:32:21 -07:00
Christian Brauner
203651b665 ipv6: add inet6_fill_args
inet6_fill_if{addr,mcaddr, acaddr}() already took 6 arguments which
meant the 7th argument would need to be pushed onto the stack on x86.
Add a new struct inet6_fill_args which holds common information passed
to inet6_fill_if{addr,mcaddr, acaddr}() and shortens the functions to
three pointer arguments.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
978a46fa6c ipv4: add inet_fill_args
inet_fill_ifaddr() already took 6 arguments which meant the 7th argument
would need to be pushed onto the stack on x86.
Add a new struct inet_fill_args which holds common information passed
to inet_fill_ifaddr() and shortens the function to three pointer arguments.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
7e4a8d5a93 rtnetlink: s/IFLA_IF_NETNSID/IFLA_TARGET_NETNSID/g
IFLA_TARGET_NETNSID is the new alias for IFLA_IF_NETNSID. This commit
replaces all occurrences of IFLA_IF_NETNSID with the new alias to
indicate that this identifier is the preferred one.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
87ccbb1f94 rtnetlink: move type calculation out of loop
I don't see how the type - which is one of
RTM_{GETADDR,GETROUTE,GETNETCONF} - can change. So do the message type
calculation once before entering the for loop.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
6ecf4c37eb ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR
- Backwards Compatibility:
  If userspace wants to determine whether ipv6 RTM_GETADDR requests
  support the new IFA_TARGET_NETNSID property it should verify that the
  reply includes the IFA_TARGET_NETNSID property. If it does not
  userspace should assume that IFA_TARGET_NETNSID is not supported for
  ipv6 RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
  RTM_GETADDR requests some of them pass down struct ifinfomsg when they
  should actually pass down struct ifaddrmsg. To not break existing
  tools that pass down the wrong struct we will do the same as for
  RTM_GETLINK | NLM_F_DUMP requests and not error out when the
  nlmsg_parse() fails.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
d38071455f ipv4: enable IFA_TARGET_NETNSID for RTM_GETADDR
- Backwards Compatibility:
  If userspace wants to determine whether ipv4 RTM_GETADDR requests
  support the new IFA_TARGET_NETNSID property it should verify that the
  reply includes the IFA_TARGET_NETNSID property. If it does not
  userspace should assume that IFA_TARGET_NETNSID is not supported for
  ipv4 RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
  RTM_GETADDR requests some of them pass down struct ifinfomsg when they
  should actually pass down struct ifaddrmsg. To not break existing
  tools that pass down the wrong struct we will do the same as for
  RTM_GETLINK | NLM_F_DUMP requests and not error out when the
  nlmsg_parse() fails.

- Security:
  Callers must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Christian Brauner
c383edc424 rtnetlink: add rtnl_get_net_ns_capable()
get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to
retrieve network namespaces based on network namespace identifiers. So
remove the static declaration and export in the rtnetlink header. Also,
rename it to rtnl_get_net_ns_capable() to make it obvious what this
function is doing.
Export rtnl_get_net_ns_capable() so it can be used when ipv6 is built as
a module.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:27:11 -07:00
Davide Caratti
ee28bb56ac net/sched: fix memory leak in act_tunnel_key_init()
If users try to install act_tunnel_key 'set' rules with duplicate values
of 'index', the tunnel metadata are allocated, but never released. Then,
kmemleak complains as follows:

 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 # echo clear > /sys/kernel/debug/kmemleak
 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 Error: TC IDR already exists.
 We have an error talking to the kernel
 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
 unreferenced object 0xffff8800574e6c80 (size 256):
   comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff  ................
     81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00  .$..............
   backtrace:
     [<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
     [<000000007d98fccd>] tcf_action_init_1+0x698/0xac0
     [<0000000099b8f7cc>] tcf_action_init+0x15c/0x590
     [<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2
     [<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
     [<000000000bfe7575>] netlink_rcv_skb+0x124/0x350
     [<00000000edab656f>] netlink_unicast+0x40f/0x5d0
     [<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0
     [<0000000063d9d490>] sock_sendmsg+0xb3/0xf0
     [<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960
     [<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170
     [<00000000ce72e4b0>] do_syscall_64+0xa5/0x470
     [<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
     [<00000000fac1b476>] 0xffffffffffffffff

This problem theoretically happens also in case users attempt to setup a
geneve rule having wrong configuration data, or when the kernel fails to
allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
metadata also in the above conditions.

Addresses-Coverity-ID: 1373974 ("Resource leak")
Fixes: d0f6dd8a91 ("net/sched: Introduce act_tunnel_key")
Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:18:54 -07:00
Cong Wang
0a3b8b2b21 tipc: orphan sock in tipc_release()
Before we unlock the sock in tipc_release(), we have to
detach sk->sk_socket from sk, otherwise a parallel
tipc_sk_fill_sock_diag() could stil read it after we
free this socket.

Fixes: c30b70deb5 ("tipc: implement socket diagnostics for AF_TIPC")
Reported-and-tested-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:14:00 -07:00
Dmitry Safonov
428f944bd5 netlink: Make groups check less stupid in netlink_bind()
As Linus noted, the test for 0 is needless, groups type can follow the
usual kernel style and 8*sizeof(unsigned long) is BITS_PER_LONG:

> The code [..] isn't technically incorrect...
> But it is stupid.
> Why stupid? Because the test for 0 is pointless.
>
> Just doing
>        if (nlk->ngroups < 8*sizeof(groups))
>                groups &= (1UL << nlk->ngroups) - 1;
>
> would have been fine and more understandable, since the "mask by shift
> count" already does the right thing for a ngroups value of 0. Now that
> test for zero makes me go "what's special about zero?". It turns out
> that the answer to that is "nothing".
[..]
> The type of "groups" is kind of silly too.
>
> Yeah, "long unsigned int" isn't _technically_ wrong. But we normally
> call that type "unsigned long".

Cleanup my piece of pointlessness.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Fairly-blamed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:11:33 -07:00
Vincent Whitchurch
fa788d986a packet: add sockopt to ignore outgoing packets
Currently, the only way to ignore outgoing packets on a packet socket is
via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
if the filter run from packet_rcv() would reject them.  So the presence
of a packet socket on the interface takes away the benefits of
MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
cloned, but the cost for that is much lower.)

Add a socket option to allow AF_PACKET sockets to ignore outgoing
packets to solve this.  Note that the *BSDs already have something
similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.

The first intended user is lldpd.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:09:37 -07:00
Bob Copeland
6eae4a6c2b mac80211: fix pending queue hang due to TX_DROP
In our environment running lots of mesh nodes, we are seeing the
pending queue hang periodically, with the debugfs queues file showing
lines such as:

    00: 0x00000000/348

i.e. there are a large number of frames but no stop reason set.

One way this could happen is if queue processing from the pending
tasklet exited early without processing all frames, and without having
some future event (incoming frame, stop reason flag, ...) to reschedule
it.

Exactly this can occur today if ieee80211_tx() returns false due to
packet drops or power-save buffering in the tx handlers.  In the
past, this function would return true in such cases, and the change
to false doesn't seem to be intentional.  Fix this case by reverting
to the previous behavior.

Fixes: bb42f2d13f ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue")
Signed-off-by: Bob Copeland <bobcopeland@fb.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 12:54:27 +02:00
Stanislaw Gruszka
014f5a250f cfg80211: validate wmm rule when setting
Add validation check for wmm rule when copy rules from fwdb and print
error when rule is invalid.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:16:59 +02:00
Gustavo A. R. Silva
40b5a0f8c6 mac80211: remove unnecessary NULL check
Both old and new cannot be NULL at the same time, hence checking
new when old is not NULL is unnecessary.

Also, notice that new is being dereferenced before it is checked:

	idx = new->conf.keyidx;

The above triggers a static code analysis warning.

Address this by removing the NULL check on new and adding a code
comment based on the following piece of code:

387        /* caller must provide at least one old/new */
388        if (WARN_ON(!new && !old))
389                return 0;

Addresses-Coverity-ID: 1473176 ("Dereference before null check")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:15:48 +02:00
Sara Sharon
9739fe29a2 mac80211: add an option for drivers to check if packets can be aggregated
Some hardwares have limitations on the packets' type in AMSDU.
Add an optional driver callback to determine if two skbs can
be used in the same AMSDU or not.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:11:50 +02:00
Sara Sharon
edba6bdad6 mac80211: allow AMSDU size limitation per-TID
Some drivers may have AMSDU size limitation per TID, due to
HW constrains. Add an option to set this limit.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:10:26 +02:00
Sara Sharon
0eeb2b674f mac80211: add an option for station management TXQ
We have a TXQ abstraction for non-data packets that need
powersave buffering. Since the AP cannot sleep, in case
of station we can use this TXQ for all management frames,
regardless if they are bufferable. Add HW flag to allow
that.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:10:11 +02:00
Shaul Triebitz
add7453ad6 wireless: align to draft 11ax D3.0
Align to new 11ax draft D3.0.  Change/add new MAC and PHY capabilities
and update drivers' 11ax capabilities and mac80211's debugfs
accordingly.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:09:50 +02:00
Naftali Goldstein
77cbbc35a4 mac80211: fix saving a few HE values
After masking the he_oper_params, to get the requested values as
integers one must rshift and not lshift.  Fix that by using the
le32_get_bits() macro.

Fixes: 41cbb0f5a2 ("mac80211: add support for HE")
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
[converted to use le32_get_bits()]
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:08:42 +02:00
Shaul Triebitz
c3d1f87528 mac80211: support reporting 0-length PSDU in radiotap
For certain sounding frames, it may be useful to report them
to userspace even though they don't have a PSDU in order to
determine the PHY parameters (e.g. VHT rate/stream config.)
Add support for this to mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:08:25 +02:00
Andrei Otcheretianski
24f33e64fc cfg80211: reg: Init wiphy_idx in regulatory_hint_core()
Core regulatory hints didn't set wiphy_idx to WIPHY_IDX_INVALID. Since
the regulatory request is zeroed, wiphy_idx was always implicitly set to
0. This resulted in updating only phy #0.
Fix that.

Fixes: 806a9e3967 ("cfg80211: make regulatory_request use wiphy_idx instead of wiphy")
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[add fixes tag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:07:53 +02:00
Andrei Otcheretianski
94a5b3acd0 mac80211: Don't wake up from PS for offchannel TX
Otherwise the offchannel frame might be queued due to
IEEE80211_QUEUE_STOP_REASON_PS and later dropped (in
ieee80211_tx_frags()).  Anyway, it doesn't make much sense to wake up
the device during ROC.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:05:02 +02:00
Andrei Otcheretianski
8682250b3c mac80211: Always report TX status
If a frame is dropped for any reason, mac80211 wouldn't report the TX
status back to user space.

As the user space may rely on the TX_STATUS to kick its state
machines, resends etc, it's better to just report this frame as not
acked instead.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:04:51 +02:00
Alexander Wetzel
62872a9b9a mac80211: Fix PTK rekey freezes and clear text leak
Rekeying PTK keys without "Extended Key ID for Individually Addressed
Frames" did use a procedure not suitable to replace in-use keys and
could caused the following issues:

 1) Freeze caused by incoming frames:
    If the local STA installed the key prior to the remote STA we still
    had the old key active in the hardware when mac80211 switched over
    to the new key.
    Therefore there was a window where the card could hand over frames
    decoded with the old key to mac80211 and bump the new PN (IV) value
    to an incorrect high number. When it happened the local replay
    detection silently started to drop all frames sent with the new key.

 2) Freeze caused by outgoing frames:
    If mac80211 was providing the PN (IV) and handed over a clear text
    frame for encryption to the hardware prior to a key change the
    driver/card could have processed the queued frame after switching
    to the new key. This bumped the PN value on the remote STA to an
    incorrect high number, tricking the remote STA to discard all frames
    we sent later.

 3) Freeze caused by RX aggregation reorder buffer:
    An aggregation session started with the old key and ending after the
    switch to the new key also bumped the PN to an incorrect high number,
    freezing the connection quite similar to 1).

 4) Freeze caused by repeating lost frames in an aggregation session:
    A driver could repeat a lost frame and encrypt it with the new key
    while in a TX aggregation session without updating the PN for the
    new key. This also could freeze connections similar to 2).

 5) Clear text leak:
    Removing encryption offload from the card cleared the encryption
    offload flag only after the card had deleted the key and we did not
    stop TX during the rekey. The driver/card could therefore get
    unencrypted frames from mac80211 while no longer be instructed to
    encrypt them.

To prevent those issues the key install logic has been changed:
 - Mac80211 divers known to be able to rekey PTK0 keys have to set
   @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0,
 - mac80211 stops queuing frames depending on the key during the replace
 - the key is first replaced in the hardware and after that in mac80211
 - and mac80211 stops/blocks new aggregation sessions during the rekey.

For drivers not setting
@NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 the user space must avoid PTK
rekeys if "Extended Key ID for Individually Addressed Frames" is not
being used. Rekeys for mac80211 drivers without this flag will generate a
warning and use an extra call to ieee80211_flush_queues() to both
highlight and try to prevent the issues with not updated drivers.

The core of the fix changes the key install procedure from:
 - atomic switch over to the new key in mac80211
 - remove the old key in the hardware (stops encryption offloading, fall
   back to software encryption with a potential clear text packet leak
   in between)
 - delete the inactive old key in mac80211
 - enable hardware encryption offloading for the new key
to:
 - if it's a PTK mark the old key as tainted to drop TX frames with the
   outgoing key
 - replace the key in hardware with the new one
 - atomic switch over to the new (not marked as tainted) key in
   mac80211 (which also resumes TX)
 - delete the inactive old key in mac80211

With the new sequence the hardware will be unable to decrypt frames
encrypted with the old key prior to switching to the new key in mac80211
and thus prevent PNs from packets decrypted with the old key to be
accounted against the new key.

For that to work the drivers have to provide a clear boundary.
Mac80211 drivers setting @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 confirm
to provide it and mac80211 will then be able to correctly rekey in-use
PTK keys with those drivers.

The mac80211 requirements for drivers to set the flag have been added to
the "Hardware crypto acceleration" documentation section. It drills down
to:
The drivers must not hand over frames decrypted with the old key to
mac80211 once the call to set_key() with %DISABLE_KEY has been
completed. It's allowed to either drop or continue to use the old key
for any outgoing frames which are already in the queues, but it must not
send out any of them unencrypted or encrypted with the new key.

Even with the new boundary in place aggregation sessions with the
reorder buffer are problematic:
RX aggregation session started prior and completed after the rekey could
still dump frames received with the old key at mac80211 after it
switched over to the new key. This is side stepped by stopping all (RX
and TX) aggregation sessions when replacing a PTK key and hardware key
offloading.
Stopping TX aggregation sessions avoids the need to get
the PNs (IVs) updated in frames prepared for the old key and
(re)transmitted after the switch to the new key. As a bonus it improves
the compatibility when the remote STA is not handling rekeys as it
should.

When using software crypto aggregation sessions are not stopped.
Mac80211 won't be able to decode the dangerous frames and discard them
without special handling.

Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
[trim overly long rekey warning]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:17 +02:00
Shaul Triebitz
d1332e7be2 mac80211: support radiotap L-SIG data
As before with HE, the data needs to be provided by the
driver in the skb head, since there's not enough space
in the skb CB.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:15 +02:00
Wen Gong
70e53669c4 mac80211: Store sk_pacing_shift in ieee80211_hw
Make it possibly for drivers to adjust the default skb_pacing_shift
by storing it in the hardware struct.

Signed-off-by: Wen Gong <wgong@codeaurora.org>
[adjust commit log, move & adjust comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:15 +02:00
Johannes Berg
e80d642552 mac80211: copy VHT EXT NSS BW Support/Capable data to station
When taking VHT capabilities for a station, copy the new
fields if we support them as a transmitter. Also adjust
the maximum bandwidth the station supports appropriately.

Also, since it was missing, copy tx_highest and rx_highest.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Johannes Berg
7eb26df297 mac80211: add ability to parse CCFS2
With newer VHT implementations, it's necessary to look at the
HT operation's CCFS2 field to identify the actual bandwidth
used.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Johannes Berg
09b4a4faf9 mac80211: introduce capability flags for VHT EXT NSS support
Depending on whether or not rate control supports selecting
rates depending on the bandwidth, we can use VHT extended
NSS support. In essence, this is dot11VHTExtendedNSSBWCapable
from the spec, since depending on that we'll need to parse
the bandwidth.

If needed, also set/clear the VHT Capability Element bit for
this capability so that we don't advertise it erroneously or
don't advertise it when we actually use it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Johannes Berg
b0aa75f0b1 ieee80211: add new VHT capability fields/parsing
IEEE 802.11-2016 extended the VHT capability fields to allow
indicating the number of spatial streams depending on the
actually used bandwidth, add support for decoding this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00