Commit Graph

44103 Commits

Author SHA1 Message Date
Jouni Malinen
6ec63612c3 mac80211: Allow AUTH_DATA to be used for FILS
The special SAE case should be limited only for SAE since the more
generic AUTH_DATA can now be used with other authentication algorithms
as well.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:21 +02:00
Jouni Malinen
11b6b5a4ce cfg80211: Rename SAE_DATA to more generic AUTH_DATA
This adds defines and nl80211 extensions to allow FILS Authentication to
be implemented similarly to SAE. FILS does not need the special rules
for the Authentication transaction number and Status code fields, but it
does need to add non-IE fields. The previously used
NL80211_ATTR_SAE_DATA can be reused for this to avoid having to
duplicate that implementation. Rename that attribute to more generic
NL80211_ATTR_AUTH_DATA (with backwards compatibility define for
NL80211_SAE_DATA).

Also document the special rules related to the Authentication
transaction number and Status code fiels.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:20 +02:00
Johannes Berg
bfe2c7b1cc nl80211: use nla_parse_nested() instead of nla_parse()
It's just an inline doing the same thing, but the code
is nicer with it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:20 +02:00
Johannes Berg
1794899e8b nl80211: move unsplit command advertising to a separate function
When we split the wiphy dump because it got too large, I added a
comment and asked that all new command advertising be done only
for userspace clients capable of receiving split data, in order
to not break older ones (which can't use the new commands anyway)

This mostly worked, and we haven't added many new commands, but
I occasionally get patches that modify the wrong place.

Make this easier to detect and understand by splitting out the
old commands to a separate function that makes it more clear it
should never be modified again.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:19 +02:00
Johannes Berg
ac668afe41 mac80211: validate new interface's beacon intervals
As part of interface combination checking, verify any new
interface's beacon intervals. In fact, just always add the
beacon interval since that's harmless.

With this patch, mac80211 is prepared for drivers that set
the min_beacon_int_gcd parameter in interface combinations.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:18:07 +02:00
Johannes Berg
4c8dea638c cfg80211: validate beacon int as part of iface combinations
Remove the pointless checking against interface combinations in
the initial basic beacon interval validation, that currently isn't
taking into account radar detection or channels properly. Instead,
just validate the basic range there, and then delay real checking
to the interface combination validation that drivers must do.

This means that drivers wanting to use the beacon_int_min_gcd will
now have to pass the new_beacon_int when validating the AP/mesh
start.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:18:07 +02:00
Johannes Berg
56271da29c cfg80211: disallow beacon_int_min_gcd with IBSS
This can't really be supported right now, because the IBSS
interface may change its beacon interval at any time due to
joining another network; thus, there's already "support"
for different beacon intervals here, implicitly.

Until we figure out how we should handle this case (continue
to allow it to arbitrarily join? Join only if compatible?)
disallow advertising that different beacon intervals are
supported if IBSS is allowed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:08:45 +02:00
Johannes Berg
275fcf62c2 cfg80211: mesh: track (and thus validate) beacon interval
This is needed for beacon interval validation; if we don't
store it, then new interfaces added won't validate that the
beacon interval is the same as existing ones. Fix this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:08:45 +02:00
Johannes Berg
0507a3ac6e cfg80211: fix beacon interval in interface combination iteration
We shouldn't abort the iteration with an error when one of the
potential combinations can't accomodate the beacon interval
request, we should just skip that particular combination. Fix
the code to do so.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:08:44 +02:00
Arend Van Spriel
73c7da3dae cfg80211: add generic helper to check interface is running
Add a helper using wdev to check if interface is running. This
deals with both non-netdev and netdev interfaces. In struct
wireless_dev replace 'p2p_started' and 'nan_started' by
'is_running' as those are mutually exclusive anyway, and unify
all the code to use wdev_running().

Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:08:44 +02:00
Johannes Berg
8f20542386 wireless: deprecate WDS and disable by default
The old WDS 4-addr frame support is very limited, e.g.
 * no encryption is possible on such links
 * it cannot support rate/HT/VHT negotiation
 * management APIs are very restricted

These make the WDS legacy mode useless in practice.

All of these are resolved by the 4-addr AP/client support,
so there's also no reason to improve WDS in the future.

Therefore, add a Kconfig option to disable legacy WDS.
This gives people an "emergency valve" while they migrate
to the better-supported 4-addr AP/client option; we plan
to remove it (and the associated cfg80211/mac80211 code,
which is the ultimate goal) in the future.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:08:43 +02:00
Eric Dumazet
10df8e6152 udp: fix IP_CHECKSUM handling
First bug was added in commit ad6f939ab1 ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));

Then commit e6afc8ace6 ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.

Fixes: ad6f939ab1 ("ip: Add offset parameter to ip_cmsg_recv")
Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sam Kumar <samanthakumar@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:33:22 -04:00
Xin Long
ecc515d723 sctp: fix the panic caused by route update
Commit 7303a14750 ("sctp: identify chunks that need to be fragmented
at IP level") made the chunk be fragmented at IP level in the next round
if it's size exceed PMTU.

But there still is another case, PMTU can be updated if transport's dst
expires and transport's pmtu_pending is set in sctp_packet_transmit. If
the new PMTU is less than the chunk, the same issue with that commit can
be triggered.

So we should drop this packet and let it retransmit in another round
where it would be fragmented at IP level.

This patch is to fix it by checking the chunk size after PMTU may be
updated and dropping this packet if it's size exceed PMTU.

Fixes: 90017accff ("sctp: Add GSO support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:32:19 -04:00
Elad Raz
6edf10173a devlink: Prevent port_type_set() callback when it's not needed
When a port_type_set() is been called and the new port type set is the same
as the old one, just return success.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:30:32 -04:00
Sven Eckelmann
701470baba batman-adv: Revert "use core MTU range checking in misc drivers"
The maximum MTU is defined via the slave devices of an batman-adv
interface. Thus it is not possible to calculate the max_mtu during the
creation of the batman-adv device when no slave devices are attached. Doing
so would for example break non-fragmentation setups which then
(incorrectly) allow an MTU of 1500 even when underlying device cannot
transport 1500 bytes + batman-adv headers.

Checking the dynamically calculated max_mtu via the minimum of the slave
devices MTU during .ndo_change_mtu is also used by the bridge interface.

Cc: Jarod Wilson <jarod@redhat.com>
Fixes: b3e3893e12 ("net: use core MTU range checking in misc drivers")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:19:07 -04:00
Eric Dumazet
73e42ff72a sch_htb: do not report fake rate estimators
When I prepared commit d250a5f90e ("pkt_sched: gen_estimator: Dont
report fake rate estimators"), htb still had an implicit rate estimator
for all its classes.

Then later, I made this rate estimator optional in commit 64153ce0a7
("net_sched: htb: do not setup default rate estimators"), but I forgot
to update htb use of gnet_stats_copy_rate_est()

After this patch, "tc -s qdisc ..." no longer report fake rate
estimators for HTB classes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:13:49 -04:00
Pablo Neira Ayuso
254432613c netfilter: nft_ct: add notrack support
This patch adds notrack support.

I decided to add a new expression, given that this doesn't fit into the
existing set operation. Notrack doesn't need a source register, and an
hypothetical NFT_CT_NOTRACK key makes no sense since matching the
untracked state is done through NFT_CT_STATE.

I'm placing this new notrack expression into nft_ct.c, I think a single
module is too much.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26 16:35:16 +02:00
Liping Zhang
96d9f2a72c netfilter: nft_meta: permit pkttype mangling in ip/ip6 prerouting
After supporting this, we can combine it with hash expression to emulate
the 'cluster match'.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26 16:35:16 +02:00
Liping Zhang
0ecba4d9d1 netfilter: nft_numgen: start round robin from zero
Currently we start round robin from 1, but it's better to start round
robin from 0. This is to keep consistent with xt_statistic in iptables.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26 16:35:16 +02:00
Florian Westphal
5efa0fc6d7 netfilter: nf_tables: allow expressions to return STOLEN
Currently not supported, we'd oops as skb was (or is) free'd elsewhere.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26 16:35:15 +02:00
Calvin Owens
0813fbc913 netfilter: nfnetlink_log: Use GFP_NOWARN for skb allocation
Since the code explicilty falls back to a smaller allocation when the
large one fails, we shouldn't complain when that happens.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26 16:35:15 +02:00
Gao Feng
dd2602d00f netfilter: xt_multiport: Use switch case instead of multiple condition checks
There are multiple equality condition checks in the original codes, so it
is better to use switch case instead of them.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26 16:35:15 +02:00
Johannes Berg
e1957dba5b cfg80211: process events caused by suspend before suspending
When suspending without WoWLAN, cfg80211 will ask drivers to
disconnect. Even when the driver does this synchronously, and
immediately returns with a notification, cfg80211 schedules
the handling thereof to a workqueue, and may then call back
into the driver when the driver was already suspended/ing.

Fix this by processing all events caused by cfg80211_leave_all()
directly after that function returns. The driver still needs to
do the right thing here and wait for the firmware response, but
that is - at least - true for mwifiex where this occurred.

Reported-by: Amitkumar Karwar <akarwar@marvell.com>
Tested-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-26 07:59:52 +02:00
Cyrill Gorcunov
432490f9d4 net: ip, diag -- Add diag interface for raw sockets
In criu we are actively using diag interface to collect sockets
present in the system when dumping applications. And while for
unix, tcp, udp[lite], packet, netlink it works as expected,
the raw sockets do not have. Thus add it.

v2:
 - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
 - implement @destroy for diag requests (by dsa@)

v3:
 - add export of raw_abort for IPv6 (by dsa@)
 - pass net-admin flag into inet_sk_diag_fill due to
   changes in net-next branch (by dsa@)

v4:
 - use @pad in struct inet_diag_req_v2 for raw socket
   protocol specification: raw module carries sockets
   which may have custom protocol passed from socket()
   syscall and sole @sdiag_protocol is not enough to
   match underlied ones
 - start reporting protocol specifed in socket() call
   when sockets are raw ones for the same reason: user
   space tools like ss may parse this attribute and use
   it for socket matching

v5 (by eric.dumazet@):
 - use sock_hold in raw_sock_get instead of atomic_inc,
   we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
   when looking up so counter won't be zero here.

v6:
 - use sdiag_raw_protocol() helper which will access @pad
   structure used for raw sockets protocol specification:
   we can't simply rename this member without breaking uapi

v7:
 - sine sdiag_raw_protocol() helper is not suitable for
   uapi lets rather make an alias structure with proper
   names. __check_inet_diag_req_raw helper will catch
   if any of structure unintentionally changed.

CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrey Vagin <avagin@openvz.org>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 19:35:24 -04:00
David S. Miller
44060abe1d Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2016-10-21

Here are some more Bluetooth fixes for the 4.9 kernel:

 - Fix to btwilink driver probe function return value
 - Power management fix to hci_bcm
 - Fix to encoding name in scan response data

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 17:51:38 -04:00
Thomas Graf
f76a9db351 lwt: Remove unused len field
The field is initialized by ILA and MPLS but never used. Remove it.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 17:45:01 -04:00
Jiri Slaby
a4b8e71b05 net: sctp, forbid negative length
Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
        if (len < sizeof(int))
                return -EINVAL;

On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.

Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.

Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.

If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
 [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
 [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
 [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
 [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 17:43:15 -04:00
Jason A. Donenfeld
b678aa578c ipv6: do not increment mac header when it's unset
Otherwise we'll overflow the integer. This occurs when layer 3 tunneled
packets are handed off to the IPv6 layer.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 17:38:58 -04:00
Andrey Vagin
7281a66590 net: allow to kill a task which waits net_mutex in copy_new_ns
net_mutex can be locked for a long time. It may be because many
namespaces are being destroyed or many processes decide to create
a network namespace.

Both these operations are heavy, so it is better to have an ability to
kill a process which is waiting net_mutex.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 17:33:39 -04:00
Shmulik Ladkani
d65f2fa680 net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames
META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
is zero, then no vlan accel tag is present.

This is incorrect for zero VID vlan accel packets, making the following
match fail:
  tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...

Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
introduced in 05423b2 "vlan: allow null VLAN ID to be used"
(and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
 adapted).

Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
value.

Fixes: 05423b2413 ("vlan: allow null VLAN ID to be used")
Fixes: 1a31f2042e ("netsched: Allow meta match on vlan tag on receive")

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 17:31:25 -04:00
Daniel Borkmann
2d0e30c30f bpf: add helper for retrieving current numa node id
Use case is mainly for soreuseport to select sockets for the local
numa node, but since generic, lets also add this for other networking
and tracing program types.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:52 -04:00
Paolo Abeni
850cbaddb5 udp: use it's own memory accounting schema
Completely avoid default sock memory accounting and replace it
with udp-specific accounting.

Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.

Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr readers      Kpps (vanilla)  Kpps (patched)
1               170             440
3               1250            2150
6               3000            3650
9               4200            4450
12              5700            6250

v4 -> v5:
  - avoid unneeded test in first_packet_length

v3 -> v4:
  - remove useless sk_rcvqueues_full() call

v2 -> v3:
  - do not set the now unsed backlog_rcv callback

v1 -> v2:
  - add memory pressure support
  - fixed dropwatch accounting for ipv6

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:05 -04:00
Paolo Abeni
f970bd9e3a udp: implement memory accounting helpers
Avoid using the generic helpers.
Use the receive queue spin lock to protect the memory
accounting operation, both on enqueue and on dequeue.

On dequeue perform partial memory reclaiming, trying to
leave a quantum of forward allocated memory.

On enqueue use a custom helper, to allow some optimizations:
- use a plain spin_lock() variant instead of the slightly
  costly spin_lock_irqsave(),
- avoid dst_force check, since the calling code has already
  dropped the skb dst
- avoid orphaning the skb, since skb_steal_sock() already did
  the work for us

The above needs custom memory reclaiming on shutdown, provided
by the udp_destruct_sock().

v5 -> v6:
  - don't orphan the skb on enqueue

v4 -> v5:
  - replace the mem_lock with the receive queue spin lock
  - ensure that the bh is always allowed to enqueue at least
    a skb, even if sk_rcvbuf is exceeded

v3 -> v4:
  - reworked memory accunting, simplifying the schema
  - provide an helper for both memory scheduling and enqueuing

v1 -> v2:
  - use a udp specific destrctor to perform memory reclaiming
  - remove a couple of helpers, unneeded after the above cleanup
  - do not reclaim memory on dequeue if not under memory
    pressure
  - reworked the fwd accounting schema to avoid potential
    integer overflow

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:05 -04:00
Paolo Abeni
f8c3bf00d4 net/socket: factor out helpers for memory and queue manipulation
Basic sock operations that udp code can use with its own
memory accounting schema. No functional change is introduced
in the existing APIs.

v4 -> v5:
  - avoid whitespace changes

v2 -> v4:
  - avoid exporting __sock_enqueue_skb

v1 -> v2:
  - avoid export sock_rmem_free

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:05 -04:00
WANG Cong
396a30cce1 ipv4: use the right lock for ping_group_range
This reverts commit a681574c99
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).

Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.

We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.

Fixes: a681574c99 ("ipv4: disable BH in set_ping_group_range()")
Fixes: ba6b918ab2 ("ping: move ping_group_range out of CONFIG_SYSCTL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 16:23:12 -04:00
Paul Moore
2a73306b60 netns: revert "netns: avoid disabling irq for netns id"
This reverts commit bc51dddf98 ("netns: avoid disabling irq for
netns id") as it was found to cause problems with systems running
SELinux/audit, see the mailing list thread below:

 * http://marc.info/?t=147694653900002&r=1&w=2

Eventually we should be able to reintroduce this code once we have
rewritten the audit multicast code to queue messages much the same
way we do for unicast messages.  A tracking issue for this can be
found below:

 * https://github.com/linux-audit/audit-kernel/issues/23

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Elad Raz <e@eladraz.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 16:16:29 -04:00
Jarod Wilson
8b1efc0f83 net: remove MTU limits on a few ether_setup callers
These few drivers call ether_setup(), but have no ndo_change_mtu, and thus
were overlooked for changes to MTU range checking behavior. They
previously had no range checks, so for feature-parity, set their min_mtu
to 0 and max_mtu to ETH_MAX_MTU (65535), instead of the 68 and 1500
inherited from the ether_setup() changes. Fine-tuning can come after we get
back to full feature-parity here.

CC: netdev@vger.kernel.org
Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
CC: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
CC: R Parameswaran <parameswaran.r7@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21 13:57:50 -04:00
WANG Cong
8651be8f14 ipv6: fix a potential deadlock in do_ipv6_setsockopt()
Baozeng reported this deadlock case:

       CPU0                    CPU1
       ----                    ----
  lock([  165.136033] sk_lock-AF_INET6);
                               lock([  165.136033] rtnl_mutex);
                               lock([  165.136033] sk_lock-AF_INET6);
  lock([  165.136033] rtnl_mutex);

Similar to commit 87e9f03159
("ipv4: fix a potential deadlock in mcast getsockopt() path")
this is due to we still have a case, ipv6_sock_mc_close(),
where we acquire sk_lock before rtnl_lock. Close this deadlock
with the similar solution, that is always acquire rtnl lock first.

Fixes: baf606d9c9 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21 11:29:02 -04:00
David S. Miller
8dbad1a811 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from
   Geert Uytterhoeven.

2) Fix wrong timeout in set elements added from packet path via
   nft_dynset, from Anders K. Pedersen.

3) Remove obsolete nf_conntrack_events_retry_timeout sysctl
   documentation, from Nicolas Dichtel.

4) Ensure proper initialization of log flags via xt_LOG, from
   Liping Zhang.

5) Missing alias to autoload ipcomp, also from Liping Zhang.

6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping.

7) Wrong integer type in the new nft_parse_u32_check() function,
   from Dan Carpenter.

8) Another wrong integer type declaration in nft_exthdr_init, also
   from Dan Carpenter.

9) Fix insufficient mode validation in nft_range.

10) Fix compilation warning in nft_range due to possible uninitialized
    value, from Arnd Bergmann.

11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to
    calm down kmemcheck, from Florian Westphal.

12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached,
    from Nicolas Dichtel.

13) Fix nf_queue() after conversion to single-linked hook list, related
    to incorrect bypass flag handling and incorrect hook point of
    reinjection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21 10:25:22 -04:00
Linus Lüssing
9799c50372 batman-adv: fix splat on disabling an interface
As long as there is still a reference for a hard interface held, there might
still be a forwarding packet relying on its attributes.

Therefore avoid setting hard_iface->soft_iface to NULL when disabling a hard
interface.

This fixes the following, potential splat:

    batman_adv: bat0: Interface deactivated: eth1
    batman_adv: bat0: Removing interface: eth1
    cgroup: new mount options do not match the existing superblock, will be ignored
    batman_adv: bat0: Interface deactivated: eth3
    batman_adv: bat0: Removing interface: eth3
    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 1986 at ./net/batman-adv/bat_iv_ogm.c:549 batadv_iv_send_outstanding_bat_ogm_packet+0x145/0x643 [batman_adv]
    Modules linked in: batman_adv(O-) <...>
    CPU: 3 PID: 1986 Comm: kworker/u8:2 Tainted: G        W  O    4.6.0-rc6+ #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet [batman_adv]
     0000000000000000 ffff88001d93bca0 ffffffff8126c26b 0000000000000000
     0000000000000000 ffff88001d93bcf0 ffffffff81051615 ffff88001f19f818
     000002251d93bd68 0000000000000046 ffff88001dc04a00 ffff88001becbe48
    Call Trace:
     [<ffffffff8126c26b>] dump_stack+0x67/0x90
     [<ffffffff81051615>] __warn+0xc7/0xe5
     [<ffffffff8105164b>] warn_slowpath_null+0x18/0x1a
     [<ffffffffa0356f24>] batadv_iv_send_outstanding_bat_ogm_packet+0x145/0x643 [batman_adv]
     [<ffffffff8108b01f>] ? __lock_is_held+0x32/0x54
     [<ffffffff810689a2>] process_one_work+0x2a8/0x4f5
     [<ffffffff81068856>] ? process_one_work+0x15c/0x4f5
     [<ffffffff81068df2>] worker_thread+0x1d5/0x2c0
     [<ffffffff81068c1d>] ? process_scheduled_works+0x2e/0x2e
     [<ffffffff81068c1d>] ? process_scheduled_works+0x2e/0x2e
     [<ffffffff8106dd90>] kthread+0xc0/0xc8
     [<ffffffff8144de82>] ret_from_fork+0x22/0x40
     [<ffffffff8106dcd0>] ? __init_kthread_worker+0x55/0x55
    ---[ end trace 647f9f325123dc05 ]---

What happened here is, that there was still a forw_packet (here: a BATMAN IV
OGM) in the queue of eth3 with the forw_packet->if_incoming set to eth1 and the
forw_packet->if_outgoing set to eth3.

When eth3 is to be deactivated and removed, then this thread waits for the
forw_packet queued on eth3 to finish. Because eth1 was deactivated and removed
earlier and by that had forw_packet->if_incoming->soft_iface, set to NULL, the
splat when trying to send/flush the OGM on eth3 occures.

Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
[sven@narfation.org: Reduced size of Oops message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-21 14:47:02 +02:00
Jarod Wilson
b96f9afee4 ipv4/6: use core net MTU range checking
ipv4/ip_tunnel:
- min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen
- preserve all ndo_change_mtu checks for now to prevent regressions

ipv6/ip6_tunnel:
- min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len
- preserve all ndo_change_mtu checks for now to prevent regressions

ipv6/ip6_vti:
- min_mtu = 1280, max_mtu = 65535
- remove redundant vti6_change_mtu

ipv6/sit:
- min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen
- remove redundant ipip6_tunnel_change_mtu

CC: netdev@vger.kernel.org
CC: "David S. Miller" <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:10 -04:00
Jarod Wilson
b3e3893e12 net: use core MTU range checking in misc drivers
firewire-net:
- set min/max_mtu
- remove fwnet_change_mtu

nes:
- set max_mtu
- clean up nes_netdev_change_mtu

xpnet:
- set min/max_mtu
- remove xpnet_dev_change_mtu

hippi:
- set min/max_mtu
- remove hippi_change_mtu

batman-adv:
- set max_mtu
- remove batadv_interface_change_mtu
- initialization is a little async, not 100% certain that max_mtu is set
  in the optimal place, don't have hardware to test with

rionet:
- set min/max_mtu
- remove rionet_change_mtu

slip:
- set min/max_mtu
- streamline sl_change_mtu

um/net_kern:
- remove pointless ndo_change_mtu

hsi/clients/ssi_protocol:
- use core MTU range checking
- remove now redundant ssip_pn_set_mtu

ipoib:
- set a default max MTU value
- Note: ipoib's actual max MTU can vary, depending on if the device is in
  connected mode or not, so we'll just set the max_mtu value to the max
  possible, and let the ndo_change_mtu function continue to validate any new
  MTU change requests with checks for CM or not. Note that ipoib has no
  min_mtu set, and thus, the network core's mtu > 0 check is the only lower
  bounds here.

mptlan:
- use net core MTU range checking
- remove now redundant mpt_lan_change_mtu

fddi:
- min_mtu = 21, max_mtu = 4470
- remove now redundant fddi_change_mtu (including export)

fjes:
- min_mtu = 8192, max_mtu = 65536
- The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
  get past the core net MTU range checks so fjes_change_mtu can validate a
  new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)

hsr:
- min_mtu = 0 (calls ether_setup, max_mtu is 1500)

f_phonet:
- min_mtu = 6, max_mtu = 65541

u_ether:
- min_mtu = 14, max_mtu = 15412

phonet/pep-gprs:
- min_mtu = 576, max_mtu = 65530
- remove redundant gprs_set_mtu

CC: netdev@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: Stefan Richter <stefanr@s5r6.in-berlin.de>
CC: Faisal Latif <faisal.latif@intel.com>
CC: linux-rdma@vger.kernel.org
CC: Cliff Whickman <cpw@sgi.com>
CC: Robin Holt <robinmholt@gmail.com>
CC: Jes Sorensen <jes@trained-monkey.org>
CC: Marek Lindner <mareklindner@neomailbox.ch>
CC: Simon Wunderlich <sw@simonwunderlich.de>
CC: Antonio Quartulli <a@unstable.cc>
CC: Sathya Prakash <sathya.prakash@broadcom.com>
CC: Chaitra P B <chaitra.basappa@broadcom.com>
CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
CC: MPT-FusionLinux.pdl@broadcom.com
CC: Sebastian Reichel <sre@kernel.org>
CC: Felipe Balbi <balbi@kernel.org>
CC: Arvid Brodin <arvid.brodin@alten.se>
CC: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:10 -04:00
Jarod Wilson
91572088e3 net: use core MTU range checking in core net infra
geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
  closer inspection and testing

macvlan:
- set min/max_mtu

tun:
- set min/max_mtu, remove tun_net_change_mtu

vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function
- This one is also not as straight-forward and could use closer inspection
  and testing from vxlan folks

bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function

openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
  is the largest possible size supported

sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

macsec:
- min_mtu = 0, max_mtu = 65535

macvlan:
- min_mtu = 0, max_mtu = 65535

ntb_netdev:
- min_mtu = 0, max_mtu = 65535

veth:
- min_mtu = 68, max_mtu = 65535

8021q:
- min_mtu = 0, max_mtu = 65535

CC: netdev@vger.kernel.org
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Tom Herbert <tom@herbertland.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Jiri Benc <jbenc@redhat.com>
CC: WANG Cong <xiyou.wangcong@gmail.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Pravin B Shelar <pshelar@ovn.org>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Pravin Shelar <pshelar@nicira.com>
CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:09 -04:00
Jarod Wilson
8b6b4135e4 net: use core MTU range checking in WAN drivers
- set min/max_mtu in all hdlc drivers, remove hdlc_change_mtu
- sent max_mtu in lec driver, remove lec_change_mtu
- set min/max_mtu in x25_asy driver

CC: netdev@vger.kernel.org
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: Krzysztof Halasa <khalasa@piap.pl>
CC: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
CC: Francois Romieu <romieu@fr.zoreil.com>
CC: Kevin Curtis <kevin.curtis@farsite.co.uk>
CC: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:09 -04:00
Jarod Wilson
9c22b4a34e net: use core MTU range checking in wireless drivers
- set max_mtu in wil6210 driver
- set max_mtu in atmel driver
- set min/max_mtu in cisco airo driver, remove airo_change_mtu
- set min/max_mtu in ipw2100/ipw2200 drivers, remove libipw_change_mtu
- set min/max_mtu in p80211netdev, remove wlan_change_mtu
- set min/max_mtu in net/mac80211/iface.c and remove ieee80211_change_mtu
- set min/max_mtu in wimax/i2400m and remove i2400m_change_mtu
- set min/max_mtu in intersil/hostap and remove prism2_change_mtu
- set min/max_mtu in intersil/orinoco
- set min/max_mtu in tty/n_gsm and remove gsm_change_mtu

CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
CC: Maya Erez <qca_merez@qca.qualcomm.com>
CC: Simon Kelley <simon@thekelleys.org.uk>
CC: Stanislav Yakovlev <stas.yakovlev@gmail.com>
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:08 -04:00
Eric Dumazet
a681574c99 ipv4: disable BH in set_ping_group_range()
In commit 4ee3bd4a8c ("ipv4: disable BH when changing ip local port
range") Cong added BH protection in set_local_port_range() but missed
that same fix was needed in set_ping_group_range()

Fixes: b8f1a55639 ("udp: Add function to make source port for UDP tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:49:32 -04:00
Eric Dumazet
286c72deab udp: must lock the socket in udp_disconnect()
Baozeng Ding reported KASAN traces showing uses after free in
udp_lib_get_port() and other related UDP functions.

A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.

I could write a reproducer with two threads doing :

static int sock_fd;
static void *thr1(void *arg)
{
	for (;;) {
		connect(sock_fd, (const struct sockaddr *)arg,
			sizeof(struct sockaddr_in));
	}
}

static void *thr2(void *arg)
{
	struct sockaddr_in unspec;

	for (;;) {
		memset(&unspec, 0, sizeof(unspec));
	        connect(sock_fd, (const struct sockaddr *)&unspec,
			sizeof(unspec));
        }
}

Problem is that udp_disconnect() could run without holding socket lock,
and this was causing list corruptions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:45:52 -04:00
Sabrina Dubroca
fcd91dd449 net: add recursion limit to GRO
Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c2 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:32:22 -04:00
Jiri Bohac
7aa8e63f0d ipv6: properly prevent temp_prefered_lft sysctl race
The check for an underflow of tmp_prefered_lft is always false
because tmp_prefered_lft is unsigned. The intention of the check
was to guard against racing with an update of the
temp_prefered_lft sysctl, potentially resulting in an underflow.

As suggested by David Miller, the best way to prevent the race is
by reading the sysctl variable using READ_ONCE.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Fixes: 76506a986d ("IPv6: fix DESYNC_FACTOR")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:29:11 -04:00
Pablo Neira Ayuso
7034b566a4 netfilter: fix nf_queue handling
nf_queue handling is broken since e3b37f11e6 ("netfilter: replace
list_head with single linked list") for two reasons:

1) If the bypass flag is set on, there are no userspace listeners and
   we still have more hook entries to iterate over, then jump to the
   next hook. Otherwise accept the packet. On nf_reinject() path, the
   okfn() needs to be invoked.

2) We should not re-enter the same hook on packet reinjection. If the
   packet is accepted, we have to skip the current hook from where the
   packet was enqueued, otherwise the packets gets enqueued over and
   over again.

This restores the previous list_for_each_entry_continue() behaviour
happening from nf_iterate() that was dealing with these two cases.
This patch introduces a new nf_queue() wrapper function so this fix
becomes simpler.

Fixes: e3b37f11e6 ("netfilter: replace list_head with single linked list")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20 19:59:59 +02:00
Nicolas Dichtel
7bb6615d39 netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reached
When the maximum evictions number is reached, do not wait 5 seconds before
the next run.

CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20 19:59:53 +02:00
Eric Dumazet
9652dc2eb9 tcp: relax listening_hash operations
softirq handlers use RCU protection to lookup listeners,
and write operations all happen from process context.
We do not need to block BH for dump operations.

Also SYN_RECV since request sockets are stored in the ehash table :

 1) inet_diag_dump_icsk() no longer need to clear
    cb->args[3] and cb->args[4] that were used as cursors while
    iterating the old per listener hash table.

 2) Also factorize a test : No need to scan listening_hash[]
    if r->id.idiag_dport is not zero.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 11:24:32 -04:00
Gavin Shan
22d8aa93d7 net/ncsi: Improve HNCDSC AEN handler
This improves AEN handler for Host Network Controller Driver Status
Change (HNCDSC):

   * The channel's lock should be hold when accessing its state.
   * Do failover when host driver isn't ready.
   * Configure channel when host driver becomes ready.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 11:23:08 -04:00
Gavin Shan
bbc7c01e95 net/ncsi: Choose hot channel as active one if necessary
The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. C0 is in link-up state while C1 is in link-down
state. C0 is chosen as active channel until unplugging and plugging
C0's cable:  On unplugging C0's cable, LSC (Link State Change) AEN
packet received on C0 to report link-down event. After that, C1 is
chosen as active channel. LSC AEN for link-up event is lost on C0
when plugging C0's cable back. We lose the network even C0 is usable.

This resolves the issue by recording the (hot) channel that was ever
chosen as active one. The hot channel is chosen to be active one
if none of available channels in link-up state. With this, C0 is still
the active one after unplugging C0's cable. LSC AEN packet received
on C0 when plugging its cable back.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 11:23:07 -04:00
Gavin Shan
008a424a24 net/ncsi: Fix stale link state of inactive channels on failover
The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. Both of them are connected to different LANs,
means they are in link-up state and C0 is chosen as the active one
until resetting BCM5718 happens as below.

Resetting BCM5718 results in LSC (Link State Change) AEN packet
received on C0, meaning LSC AEN is missed on C1. When LSC AEN packet
received on C0 to report link-down, it fails over to C1 because C1
is in link-up state as software can see. However, C1 is in link-down
state in hardware. It means the link state is out of synchronization
between hardware and software, resulting in inappropriate channel (C1)
selected as active one.

This resolves the issue by sending separate GLS (Get Link Status)
commands to all channels in the package before trying to do failover.
The last link states of all channels in the package are retrieved.
With it, C0 (not C1) is selected as active one as expected.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 11:23:07 -04:00
Gavin Shan
7ba5c003db net/ncsi: Avoid if statements in ncsi_suspend_channel()
There are several if/else statements in the state machine implemented
by switch/case in ncsi_suspend_channel() to avoid duplicated code. It
makes the code a bit hard to be understood.

This drops if/else statements in ncsi_suspend_channel() to improve the
code readability as Joel Stanley suggested. Also, it becomes easy to
add more states in the state machine without affecting current code.
No logical changes introduced by this.

Suggested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 11:23:07 -04:00
Thomas Graf
c5098ebbd6 ila: Fix tailroom allocation of lwtstate
Tailroom is supposed to be of length sizeof(struct ila_lwt) but
sizeof(struct ila_params) is currently allocated.

This leads to the dst_cache and connected member of ila_lwt being
referenced out of bounds.

struct ila_lwt {
	struct ila_params p;
	struct dst_cache dst_cache;
	u32 connected : 1;
};

Fixes: 65d7ab8de5 ("net: Identifier Locator Addressing module")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 11:17:55 -04:00
Paul Blakey
5712bf9c5c net/sched: act_mirred: Use passed lastuse argument
stats_update callback is called by NIC drivers doing hardware
offloading of the mirred action. Lastuse is passed as argument
to specify when the stats was actually last updated and is not
always the current time.

Fixes: 9798e6fe4f ('net: act_mirred: allow statistic updates from offloaded actions')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 11:14:24 -04:00
Jiri Benc
76e4cc7731 openvswitch: remove unnecessary EXPORT_SYMBOLs
Some symbols exported to other modules are really used only by
openvswitch.ko. Remove the exports.

Tested by loading all 4 openvswitch modules, nothing breaks.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 15:11:55 -04:00
Jiri Benc
f33eb0cf99 openvswitch: remove unused functions
ovs_vport_deferred_free is not used anywhere. It's the only caller of
free_vport_rcu thus this one can be removed, too.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 15:11:55 -04:00
Michał Narajowski
f61851f64b Bluetooth: Fix append max 11 bytes of name to scan rsp data
Append maximum of 10 + 1 bytes of name to scan response data.
Complete name is appended only if exists and is <= 10 characters.
Else append short name if exists or shorten complete name if not.
This makes sure name is consistent across multiple advertising
instances.

Signed-off-by: Michał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-10-19 18:42:37 +02:00
Florian Westphal
1ecc281ec2 netfilter: x_tables: suppress kmemcheck warning
Markus Trippelsdorf reports:

WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480)
4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
 ^
|RIP: 0010:[<ffffffff8166e561>]  [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
[..]
 [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
 [<ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0
 [<ffffffff816d6715>] ipt_register_table+0xe5/0x110
[..]

This warning is harmless; we copy 'uninitialized' data from the hook ops
but it will not be used.
Long term the structures keeping run-time data should be disentangled
from those only containing config-time data (such as where in the list
to insert a hook), but thats -next material.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-19 18:32:24 +02:00
Linus Torvalds
63ae602cea Merge branch 'gup_flag-cleanups'
Merge the gup_flags cleanups from Lorenzo Stoakes:
 "This patch series adjusts functions in the get_user_pages* family such
  that desired FOLL_* flags are passed as an argument rather than
  implied by flags.

  The purpose of this change is to make the use of FOLL_FORCE explicit
  so it is easier to grep for and clearer to callers that this flag is
  being used.  The use of FOLL_FORCE is an issue as it overrides missing
  VM_READ/VM_WRITE flags for the VMA whose pages we are reading
  from/writing to, which can result in surprising behaviour.

  The patch series came out of the discussion around commit 38e0885465
  ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
  which addressed a BUG_ON() being triggered when a page was faulted in
  with PROT_NONE set but having been overridden by FOLL_FORCE.
  do_numa_page() was run on the assumption the page _must_ be one marked
  for NUMA node migration as an actual PROT_NONE page would have been
  dealt with prior to this code path, however FOLL_FORCE introduced a
  situation where this assumption did not hold.

  See

      https://marc.info/?l=linux-mm&m=147585445805166

  for the patch proposal"

Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.

[ This branch was rebased recently to add a few more acked-by's and
  reviewed-by's ]

* gup_flag-cleanups:
  mm: replace access_process_vm() write parameter with gup_flags
  mm: replace access_remote_vm() write parameter with gup_flags
  mm: replace __access_remote_vm() write parameter with gup_flags
  mm: replace get_user_pages_remote() write/force parameters with gup_flags
  mm: replace get_user_pages() write/force parameters with gup_flags
  mm: replace get_vaddr_frames() write/force parameters with gup_flags
  mm: replace get_user_pages_locked() write/force parameters with gup_flags
  mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
  mm: remove write/force parameters from __get_user_pages_unlocked()
  mm: remove write/force parameters from __get_user_pages_locked()
  mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
2016-10-19 08:39:47 -07:00
Eric Dumazet
82454581d7 tcp: do not export sysctl_tcp_low_latency
Since commit b2fb4f54ec ("tcp: uninline tcp_prequeue()") we no longer
access sysctl_tcp_low_latency from a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 11:12:41 -04:00
Ido Schimmel
97c242902c switchdev: Execute bridge ndos only for bridge ports
We recently got the following warning after setting up a vlan device on
top of an offloaded bridge and executing 'bridge link':

WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
[...]
 CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1
 Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
  0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3
  0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b
  0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990
 Call Trace:
  [<ffffffff8135eaa3>] dump_stack+0x63/0x90
  [<ffffffff8108c43b>] __warn+0xcb/0xf0
  [<ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20
  [<ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
  [<ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum]
  [<ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140
  [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
  [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
  [<ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0
  [<ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90
  [<ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190
  [<ffffffff8161a1b2>] netlink_dump+0x122/0x290
  [<ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190
  [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
  [<ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220
  [<ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0
  [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
  [<ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890
  [<ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0
  [<ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30
  [<ffffffff8161c92c>] netlink_unicast+0x18c/0x240
  [<ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0
  [<ffffffff815c5a48>] sock_sendmsg+0x38/0x50
  [<ffffffff815c6031>] SYSC_sendto+0x101/0x190
  [<ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90
  [<ffffffff815c6b6e>] SyS_sendto+0xe/0x10
  [<ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4

The problem is that the 8021q module propagates the call to
ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
recognize the netdev, as it's not offloaded.

While we can ignore calls being made to non-bridge ports inside the
driver, a better fix would be to push this check up to the switchdev
layer.

Note that these ndos can be called for non-bridged netdev, but this only
happens in certain PF drivers which don't call the corresponding
switchdev functions anyway.

Fixes: 99f44bb352 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 10:58:04 -04:00
Ido Schimmel
e4961b0768 net: core: Correctly iterate over lower adjacency list
Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:

 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
 Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
 task: ffff88017edfea40 task.stack: ffff88017ee10000
 RIP: 0010:[<ffffffff815dcc73>]  [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
 Call Trace:
  <IRQ>
  [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
  [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
  [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
  [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
  [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
  [<ffffffff815f2eb6>] neigh_update+0x306/0x740
  [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
  [<ffffffff8165ea3f>] arp_process+0x66f/0x700
  [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
  [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0
  [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
  [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
  [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
  [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
  [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
  [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
  [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
  [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
  [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
  [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
  [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
  [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
  [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
  [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
  [<ffffffff81374815>] ? find_next_bit+0x15/0x20
  [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
  [<ffffffff810c9968>] ? load_balance+0x178/0x9b0
  [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
  [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
  [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
  [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
  [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
  [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
  [<ffffffff81091986>] tasklet_action+0xf6/0x110
  [<ffffffff81704556>] __do_softirq+0xf6/0x280
  [<ffffffff8109213f>] irq_exit+0xdf/0xf0
  [<ffffffff817042b4>] do_IRQ+0x54/0xd0
  [<ffffffff8170214c>] common_interrupt+0x8c/0x8c

The problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.

Fix this by advancing the iterator and avoid the infinite loop.

Fixes: 7ce856aaaf ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 10:38:08 -04:00
Eric Garver
3805a938a6 flow_dissector: Check skb for VLAN only if skb specified.
Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.

Fixes: d5709f7ab7 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
Signed-off-by: Eric Garver <e@erig.me>
Reviewed-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 10:35:46 -04:00
Andrei Otcheretianski
0ea2a2ee8d cfg80211: allow vendor commands to be sent to nan interface
Allow vendor commands that require WIPHY_VENDOR_CMD_NEED_RUNNING flag
to be sent to NAN interface.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:16:02 +02:00
Ilan Peer
0711d63878 cfg80211: allow aborting in-progress connection atttempts
On a disconnect request from userspace, cfg80211 currently calls
called rdev_disconnect() only in case that 'current_bss' was set,
i.e. connection had been established.

Change this to allow the userspace call to succeed and call the
driver's disconnect() method also while the connection attempt is
in progress, to be able to abort attempts.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[change commit subject/message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:15:38 +02:00
Emmanuel Grumbach
f438ceb81d mac80211: uapsd_queues is in QoS IE order
The uapsd_queue field is in QoS IE order and not in
IEEE80211_AC_*'s order.
This means that mac80211 would get confused between
BK and BE which is certainly not such a big deal but
needs to be fixed.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:13:54 +02:00
Sara Sharon
f3fe4e93dd mac80211: add a HW flag for supporting HW TX fragmentation
Currently mac80211 determines whether HW does fragmentation
by checking whether the set_frag_threshold callback is set
or not.
However, some drivers may want to set the HW fragmentation
capability depending on HW generation.
Allow this by checking a HW flag instead of checking the
callback.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
[added the flag to ath10k and wlcore]
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:12:44 +02:00
Emmanuel Grumbach
0aa419ec6e mac80211: allow the driver not to pass the tid to ieee80211_sta_uapsd_trigger
iwlwifi will check internally that the tid maps to an AC
that is trigger enabled, but can't know what tid exactly.
Allow the driver to pass a generic tid and make mac80211
assume that a trigger frame was received.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:12:19 +02:00
Johannes Berg
b473b8f12a mac80211: improve RX aggregation data in debugfs
When the driver sets the SUPPORTS_REORDERING_BUFFER hardware flag,
the debugfs data for RX aggregation sessions won't even indicate
that a session is open. Since the previous fix to store the dialog
token separately, we can indicate that it's open and add the token
so that there's at least some data (ssn is not available.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:12:11 +02:00
Johannes Berg
1c3d185a9a mac80211: fix tid_agg_rx NULL dereference
On drivers setting the SUPPORTS_REORDERING_BUFFER hardware flag,
we crash when the peer sends an AddBA request while we already
have a session open on the seame TID; this is because on those
drivers, the tid_agg_rx is left NULL even though the session is
valid, and the agg_session_valid bit is set.

To fix this, store the dialog tokens outside the tid_agg_rx to
be able to compare them to the received AddBA request.

Fixes: f89e07d4cf ("mac80211: agg-rx: refuse ADDBA Request with timeout update")
Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:11:49 +02:00
Sven Eckelmann
4c7da0f6db batman-adv: Avoid precedence issues in macros
It must be avoided that arguments to a macro are evaluated ungrouped (which
enforces normal operator precendence). Otherwise the result of the macro
is not well defined.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-19 08:37:54 +02:00
Sven Eckelmann
507b37cf71 batman-adv: Use octal permissions instead of macros
Linus prefers to have octal permission numbers instead of combinations of
macro names ("random line noise"). Also old existing "bad symbolic
permission bit macro use" should be converted to octal numbers.
(http://lkml.kernel.org/r/CA+55aFw5v23T-zvDZp-MmD_EYxF8WbafwwB59934FV7g21uMGQ@mail.gmail.com)

Also remove the S_IFREG bit from the octal representation because it is
filtered out by debugfs_create.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-19 08:37:53 +02:00
Sven Eckelmann
70ea5cee95 batman-adv: Use proper name for gateway list head
The batman-adv codebase is using "list" for the list node (prev/next) and
<list content descriptor>+"_list" for the head of a list. Not using this
naming scheme can up in confusions when reading the code.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-19 08:37:53 +02:00
Sven Eckelmann
176e5b772b batman-adv: Use proper name for fragments list head
The batman-adv codebase is using "list" for the list node (prev/next) and
<list content descriptor>+"_list" for the head of a list. Not using this
naming scheme can up in confusions because list_head is used for both the
head of the list and the list node (prev/next) in each item of the list.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-19 08:37:52 +02:00
Sven Eckelmann
422d2f7780 batman-adv: Remove needless init of variables on stack
Some variables are overwritten immediatelly in a functions. These don't
have to be initialized to a specific value on the stack because the value
will be overwritten before they will be used anywhere.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-19 08:37:52 +02:00
Lorenzo Stoakes
c164154f66 mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18 14:13:37 -07:00
Gao Feng
9c403b6bee net: vlan: Use sizeof instead of literal number
Use sizeof variable instead of literal number to enhance the readability.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 14:23:23 -04:00
Eric Dumazet
41ee9c557e soreuseport: do not export reuseport_add_sock()
reuseport_add_sock() is not used from a module,
no need to export it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 14:18:23 -04:00
Nikolay Aleksandrov
7cb3f9214d bridge: multicast: restore perm router ports on multicast enable
Satish reported a problem with the perm multicast router ports not getting
reenabled after some series of events, in particular if it happens that the
multicast snooping has been disabled and the port goes to disabled state
then it will be deleted from the router port list, but if it moves into
non-disabled state it will not be re-added because the mcast snooping is
still disabled, and enabling snooping later does nothing.

Here are the steps to reproduce, setup br0 with snooping enabled and eth1
added as a perm router (multicast_router = 2):
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
<empty>

At this point we have mcast enabled and eth1 as a perm router (value = 2)
but it is not in the router list which is incorrect.

After this change:
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
router ports on br0: eth1

Note: we can directly do br_multicast_enable_port for all because the
querier timer already has checks for the port state and will simply
expire if it's in blocking/disabled. See the comment added by
commit 9aa6638216 ("bridge: multicast: add a comment to
br_port_state_selection about blocking state")

Fixes: 561f1103a2 ("bridge: Add multicast_snooping sysfs toggle")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 13:52:13 -04:00
David Ahern
67b62f98a1 net: dev: Improve debug statements for adjacency tracking
Adjacency code only has debugs for the insert case. Add debugs for
the remove path and make both consistently worded to make it easier
to follow the insert and removal with reference counts.

In addition, change the BUG to a WARN_ON. A missing adjacency at
removal time is not cause for a panic.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:45:00 -04:00
David Ahern
0f524a80ff net: Add warning if any lower device is still in adjacency list
Lower list should be empty just like upper.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:45:00 -04:00
David Ahern
f1170fd462 net: Remove all_adj_list and its references
Only direct adjacencies are maintained. All upper or lower devices can
be learned via the new walk API which recursively walks the adj_list for
upper devices or lower devices.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:45:00 -04:00
David Ahern
1a3f060c1a net: Introduce new api for walking upper and lower devices
This patch introduces netdev_walk_all_upper_dev_rcu,
netdev_walk_all_lower_dev and netdev_walk_all_lower_dev_rcu. These
functions recursively walk the adj_list of devices to determine all upper
and lower devices.

The functions take a callback function that is invoked for each device
in the list. If the callback returns non-0, the walk is terminated and
the functions return that code back to callers.

v3
- simplified netdev_has_upper_dev_all_rcu and __netdev_has_upper_dev and
  removed typecast as suggested by Stephen

v2
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
  version.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:44:58 -04:00
David Ahern
790510d99f net: Remove refnr arg when inserting link adjacencies
Commit 93409033ae ("net: Add netdev all_adj_list refcnt propagation to
fix panic") propagated the refnr to insert and remove functions tracking
the netdev adjacency graph. However, for the insert path the refnr can
only be 1. Accordingly, remove the refnr argument to make that clear.
ie., the refnr arg in 93409033ae was only needed for the remove path.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 11:44:58 -04:00
Arnd Bergmann
d2e4d59351 netfilter: nf_tables: avoid uninitialized variable warning
The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:

net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.

Fixes: 0f3cd9b369 ("netfilter: nf_tables: add range expression")
Link: http://patchwork.ozlabs.org/patch/677114/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-18 17:35:10 +02:00
Tobias Klauser
7ab488951a tcp: Remove unused but set variable
Remove the unused but set variable icsk in listening_get_next to fix the
following GCC warning when building with 'W=1':

  net/ipv4/tcp_ipv4.c: In function ‘listening_get_next’:
  net/ipv4/tcp_ipv4.c:1890:31: warning: variable ‘icsk’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 10:34:27 -04:00
Tobias Klauser
7e1670c15c ipv4: Remove unused but set variable
Remove the unused but set variable dev in ip_do_fragment to fix the
following GCC warning when building with 'W=1':

  net/ipv4/ip_output.c: In function ‘ip_do_fragment’:
  net/ipv4/ip_output.c:541:21: warning: variable ‘dev’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 10:30:28 -04:00
Tobias Klauser
8d324fd9e1 net/hsr: Remove unused but set variable
Remove the unused but set variable master_dev in check_local_dest to fix
the following GCC warning when building with 'W=1':

  net/hsr/hsr_forward.c: In function ‘check_local_dest’:
  net/hsr/hsr_forward.c:303:21: warning: variable ‘master_dev’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 10:28:18 -04:00
David S. Miller
5cbee55736 This is relatively small, mostly to get the SG/crypto
from stack removal fix that crashes things when VMAP
 stack is used in conjunction with software crypto.
 
 Aside from that, we have:
  * a fix for AP_VLAN usage with the nl80211 frame command
  * two fixes (and two preparation patches) for A-MSDU, one
    to discard group-addressed (multicast) and unexpected
    4-address A-MSDUs, the other to validate A-MSDU inner
    MAC addresses properly to prevent controlled port bypass
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYBcgKAAoJEGt7eEactAAdhUkP/jMVQbLMZ1Jcc9+lsPVGUIga
 I9GeQ4lcnD+4ASeJUhTtemC1IMNL4zMVqaIxbznDXKP7rZRrODVvCPk2TYIw9c5S
 rzF/TRierMFttLu3xY757nAsYg6T7F03JdOQ3SKIb3xOD8pXCWQoVRN14ldroRno
 4stOAtDrpD5wvK2JhlWv1EYlxGVLqLcakZt/BwgDX/cJGkAx49Q/s29FUnesB9Ep
 sCH5chffeQskOL9CrSwboNmucgt4HGQORc4UL/KtPOEBtyfu/LCXEKSqAKVyQZtZ
 OerouOHWqQE5lT2K6qD/KKFW4lV2t1h+xzqsvZk4ZR5o3s+PAGai6D/wf+JgY9Hk
 uor9ju/e0htcI9m0aFdHDnltV0OOwIhR2bxWTuBBUkyFVtdQQY+1MRTTtuunWIB4
 SDYv6LrNL/0HAIuTlPQH99rnsFNnRZCtTpdbT7GRckAMeWMvy19bF2ZB1FXuSn+h
 5dxIo0qkw8nv4Y9wQ6QmgOcSzYyidUrCgLTO516qXVAKY0kl/u4q/zPr0Fmx/qfY
 oxspelDv0qd2NMQwJ/AmwjAjkQBulv5DVLu+cDXdOMkc/EbhzWyvetcHiNukxjHn
 mukCBxTlLoDLug2LFkAPIddEutj+VUEefkf/pD/js8uYuyd9ZnPjiIh6fG25il9a
 cHbMYtANt2EnZjwI9Z74
 =T6t1
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
This is relatively small, mostly to get the SG/crypto
from stack removal fix that crashes things when VMAP
stack is used in conjunction with software crypto.

Aside from that, we have:
 * a fix for AP_VLAN usage with the nl80211 frame command
 * two fixes (and two preparation patches) for A-MSDU, one
   to discard group-addressed (multicast) and unexpected
   4-address A-MSDUs, the other to validate A-MSDU inner
   MAC addresses properly to prevent controlled port bypass
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 10:26:15 -04:00
Tobias Klauser
403f0727cb vlan: Remove unnecessary comparison of unsigned against 0
args.u.name_type is of type unsigned int and is always >= 0.

This fixes the following GCC warning:

  net/8021q/vlan.c: In function ‘vlan_ioctl_handler’:
  net/8021q/vlan.c:574:14: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18 10:25:34 -04:00
Markus Elfring
393b299d2c batman-adv: Less function calls in batadv_is_ap_isolated() after error detection
The variables "tt_local_entry" and "tt_global_entry" were eventually
checked again despite of a corresponding null pointer test before.

* Avoid this double check by reordering a function call sequence
  and the better selection of jump targets.

* Omit the initialisation for these variables at the beginning then.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-18 15:02:43 +02:00
Wei Yongjun
320c975f18 cfg80211: fix possible memory leak in cfg80211_iter_combinations()
'limits' is malloced in cfg80211_iter_combinations() and should be freed
before leaving from the error handling cases, otherwise it will cause
memory leak.

Fixes: 0c317a02ca ("cfg80211: support virtual interfaces with different beacon intervals")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-18 08:52:00 +02:00
Jakub Kicinski
a0e65de715 net: report right mtu value in error message
Check is for max_mtu but message reports min_mtu.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17 13:13:23 -04:00
Pablo Neira Ayuso
ccca6607c5 netfilter: nft_range: validate operation netlink attribute
Use nft_parse_u32_check() to make sure we don't get a value over the
unsigned 8-bit integer. Moreover, make sure this value doesn't go over
the two supported range comparison modes.

Fixes: 9286c2eb1fda ("netfilter: nft_range: validate operation netlink attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 18:57:02 +02:00
Dan Carpenter
21a9e0f156 netfilter: nft_exthdr: fix error handling in nft_exthdr_init()
"err" needs to be signed for the error handling to work.

Fixes: 36b701fae1 ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:43:54 +02:00
Dan Carpenter
09525a09ad netfilter: nf_tables: underflow in nft_parse_u32_check()
We don't want to allow negatives here.

Fixes: 36b701fae1 ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:43:53 +02:00
Liping Zhang
5751e175c6 netfilter: nft_hash: add missing NFTA_HASH_OFFSET's nla_policy
Missing the nla_policy description will also miss the validation check
in kernel.

Fixes: 70ca767ea1 ("netfilter: nft_hash: Add hash offset value")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:43:53 +02:00
Liping Zhang
f434ed0a00 netfilter: xt_ipcomp: add "ip[6]t_ipcomp" module alias name
Otherwise, user cannot add related rules if xt_ipcomp.ko is not loaded:
  # iptables -A OUTPUT -p 108 -m ipcomp --ipcompspi 1
  iptables: No chain/target/match by that name.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:38:19 +02:00
Liping Zhang
6d19375b58 netfilter: xt_NFLOG: fix unexpected truncated packet
Justin and Chris spotted that iptables NFLOG target was broken when they
upgraded the kernel to 4.8: "ulogd-2.0.5- IPs are no longer logged" or
"results in segfaults in ulogd-2.0.5".

Because "struct nf_loginfo li;" is a local variable, and flags will be
filled with garbage value, not inited to zero. So if it contains 0x1,
packets will not be logged to the userspace anymore.

Fixes: 7643507fe8 ("netfilter: xt_NFLOG: nflog-range does not truncate packets")
Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Reported-by: Chris Caputo <ccaputo@alt.net>
Tested-by: Chris Caputo <ccaputo@alt.net>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:38:19 +02:00
Anders K. Pedersen
a8b1e36d0d netfilter: nft_dynset: fix element timeout for HZ != 1000
With HZ=100 element timeout in dynamic sets (i.e. flow tables) is 10 times
higher than configured.

Add proper conversion to/from jiffies, when interacting with userspace.

I tested this on Linux 4.8.1, and it applies cleanly to current nf and
nf-next trees.

Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:29:39 +02:00
Geert Uytterhoeven
1b203c138c netfilter: xt_hashlimit: Add missing ULL suffixes for 64-bit constants
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):

    net/netfilter/xt_hashlimit.c: In function ‘user2credits’:
    net/netfilter/xt_hashlimit.c:476: warning: integer constant is too large for ‘long’ type
    ...
    net/netfilter/xt_hashlimit.c:478: warning: integer constant is too large for ‘long’ type
    ...
    net/netfilter/xt_hashlimit.c:480: warning: integer constant is too large for ‘long’ type
    ...

    net/netfilter/xt_hashlimit.c: In function ‘rateinfo_recalc’:
    net/netfilter/xt_hashlimit.c:513: warning: integer constant is too large for ‘long’ type

Fixes: 11d5f15723 ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:25:23 +02:00
Joe Perches
9c7cbcf5a8 rds: Remove duplicate prefix from rds_conn_path_error use
rds_conn_path_error already prefixes "RDS:" to the output.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17 11:07:22 -04:00
Joe Perches
e81c7b69c8 rds: Remove unused rds_conn_error
This macro's last use was removed in commit d769ef81d5
("RDS: Update rds_conn_shutdown to work with rds_conn_path")
so make the macro and the __rds_conn_error function definition
and declaration disappear.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17 11:07:22 -04:00
Eric Dumazet
9a0b1e8ba4 net: pktgen: remove rcu locking in pktgen_change_name()
After Jesper commit back in linux-3.18, we trigger a lockdep
splat in proc_create_data() while allocating memory from
pktgen_change_name().

This patch converts t->if_lock to a mutex, since it is now only
used from control path, and adds proper locking to pktgen_change_name()

1) pktgen_thread_lock to protect the outer loop (iterating threads)
2) t->if_lock to protect the inner loop (iterating devices)

Note that before Jesper patch, pktgen_change_name() was lacking proper
protection, but lockdep was not able to detect the problem.

Fixes: 8788370a1d ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17 10:52:59 -04:00
Tom Herbert
ab3a70bef8 ila: Don't use dest cache when gateway is set
If the gateway is set on an ILA route we don't need to bother with using
the destination cache in the ILA route. Translation does not change the
routing in this case so we can stick with orig_output in the lwstate
output function.

Tested: Ran netperf with and without gateway for LWT route.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17 10:30:18 -04:00
Linus Lüssing
0566df307b batman-adv: Allow selecting BATMAN V if CFG80211 is not built
With the new stub for cfg80211_get_station(), we can now build the
BATMAN V protocol even with a kernel that was built without any
wireless support.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:48 +02:00
Antonio Quartulli
843314c977 batman-adv: remove unsed argument from batadv_dbg_arp() function
The argument "type" passed to the batadv_dbg_arp() function is
never used. Remove it.

Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:47 +02:00
Linus Lüssing
6020a86a4a batman-adv: fix batadv_forw_packet kerneldoc for list attribute
The forw_packet list node is wrongly attributed to the icmp socket code.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:47 +02:00
Sven Eckelmann
bf093191db batman-adv: Remove unused batadv_icmp_user_cmd_type
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:46 +02:00
Sven Eckelmann
c408c1b9d4 batman-adv: Move batadv_sum_counter to soft-interface.c
The function batadv_sum_counter is only used in soft-interface.c and has no
special relevance for main.h.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:46 +02:00
Sven Eckelmann
739ae86cd2 batman-adv: Remove unused function batadv_hash_delete
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:45 +02:00
David Ahern
a04a480d43 net: Require exact match for TCP socket lookups if dif is l3mdev
Currently, socket lookups for l3mdev (vrf) use cases can match a socket
that is bound to a port but not a device (ie., a global socket). If the
sysctl tcp_l3mdev_accept is not set this leads to ack packets going out
based on the main table even though the packet came in from an L3 domain.
The end result is that the connection does not establish creating
confusion for users since the service is running and a socket shows in
ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the
skb came through an interface enslaved to an l3mdev device and the
tcp_l3mdev_accept is not set.

skb's through an l3mdev interface are marked by setting a flag in
inet{6}_skb_parm. The IPv6 variant is already set; this patch adds the
flag for IPv4. Using an skb flag avoids a device lookup on the dif. The
flag is set in the VRF driver using the IP{6}CB macros. For IPv4, the
inet_skb_parm struct is moved in the cb per commit 971f10eca1, so the
match function in the TCP stack needs to use TCP_SKB_CB. For IPv6, the
move is done after the socket lookup, so IP6CB is used.

The flags field in inet_skb_parm struct needs to be increased to add
another flag. There is currently a 1-byte hole following the flags,
so it can be expanded to u16 without increasing the size of the struct.

Fixes: 193125dbd8 ("net: Introduce VRF device driver")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17 10:17:05 -04:00
Ard Biesheuvel
f4a067f9ff mac80211: move struct aead_req off the stack
Some crypto implementations (such as the generic CCM wrapper in crypto/)
use scatterlists to map fields of private data in their struct aead_req.
This means these data structures cannot live in the vmalloc area, which
means that they cannot live on the stack (with CONFIG_VMAP_STACK.)

This currently occurs only with the generic software implementation, but
the private data and usage is implementation specific, so move the whole
data structures off the stack into heap by allocating every time we need
to use them.

In addition, take care not to put any of our own stack allocations into
scatterlists. This involves reserving some extra room when allocating the
aead_request structures, and referring to those allocations in the scatter-
lists (while copying the data from the stack before the crypto operation)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-17 16:14:04 +02:00
Sven Eckelmann
611b975b5c batman-adv: Add BATADV_DBG_TP_METER to BATADV_DBG_ALL
The BATADV_DBG_ALL has to contain the bit of BATADV_DBG_TP_METER to really
support all available debug messages.

Fixes: 33a3bb4a33 ("batman-adv: throughput meter implementation")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:11:56 +02:00
Sven Eckelmann
9ca488dd53 batman-adv: Modify neigh_list only with rcu-list functions
The batadv_hard_iface::neigh_list is accessed via rcu based primitives.
Thus all operations done on it have to fulfill the requirements by RCU. So
using non-RCU mechanisms like hlist_add_head is not allowed because it
misses the barriers required to protect concurrent readers when accessing
the data behind the pointer.

Fixes: cef63419f7 ("batman-adv: add list of unique single hop neighbors per hard-interface")
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:10:53 +02:00
Simon Wunderlich
4a2d09e31d batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 14:10:28 +02:00
Michael Braun
a3e2f4b6ed mac80211: fix A-MSDU outer SA/DA
According to IEEE 802.11-2012 section 8.3.2 table 8-19, the outer SA/DA
of A-MSDU frames need to be changed depending on FromDS/ToDS values.

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
[use ether_addr_copy and add alignment annotations]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-17 11:43:33 +02:00
Michael Braun
06f2bb1e01 mac80211: avoid extra memcpy in A-MSDU head creation
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-17 11:41:23 +02:00
Johannes Berg
f83ace3b1e nl80211: ifdef WoWLAN related policies
To avoid unused variable warnings when CONFIG_PM isn't set,
add the appropriate ifdef to the policies that are only used
for WoWLAN, which can only be invoked when CONFIG_PM is set.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-17 08:04:07 +02:00
Johannes Berg
1609d18de6 nl80211: correctly use nl80211_nan_srf_policy
This was clearly intended to be used in the attribute parsing,
so do that instead of leaving the attribute policy unused.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-17 08:02:48 +02:00
Tom Herbert
79ff2fc31e ila: Cache a route to translated address
Add a dst_cache to ila_lwt structure. This holds a cached route for the
translated address. In ila_output we now perform a route lookup after
translation and if possible (destination in original route is full 128
bits) we set the dst_cache. Subsequent calls to ila_output can then use
the cache to avoid the route lookup.

This eliminates the need to set the gateway on ILA routes as previously
was being done. Now we can do something like:

./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
    csum-mode neutral-map dev eth0  ## No via needed!

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-15 17:33:41 -04:00
Tom Herbert
1104d9ba44 lwtunnel: Add destroy state operation
Users of lwt tunnels may set up some secondary state in build_state
function. Add a corresponding destroy_state function to allow users to
clean up state. This destroy state function is called from lwstate_free.
Also, we now free lwstate using kfree_rcu so user can assume structure
is not freed before rcu.

Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-15 17:33:41 -04:00
Linus Torvalds
9ffc66941d This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot time as
 possible, hoping to capitalize on any possible variation in CPU operation
 (due to runtime data differences, hardware differences, SMP ordering,
 thermal timing variation, cache behavior, etc).
 
 At the very least, this plugin is a much more comprehensive example for
 how to manipulate kernel code using the gcc plugin internals.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJX/BAFAAoJEIly9N/cbcAmzW8QALFbCs7EFFkML+M/M/9d8zEk
 1QbUs/z8covJTTT1PjSdw7JUrAMulI3S00owpcQVd/PcWjRPU80QwfsXBgIB0tvC
 Kub2qxn6Oaf+kTB646zwjFgjdCecw/USJP+90nfcu2+LCnE8ReclKd1aUee+Bnhm
 iDEUyH2ONIoWq6ta2Z9sA7+E4y2ZgOlmW0iga3Mnf+OcPtLE70fWPoe5E4g9DpYk
 B+kiPDrD9ql5zsHaEnKG1ldjiAZ1L6Grk8rGgLEXmbOWtTOFmnUhR+raK5NA/RCw
 MXNuyPay5aYPpqDHFm+OuaWQAiPWfPNWM3Ett4k0d9ZWLixTcD1z68AciExwk7aW
 SEA8b1Jwbg05ZNYM7NJB6t6suKC4dGPxWzKFOhmBicsh2Ni5f+Az0BQL6q8/V8/4
 8UEqDLuFlPJBB50A3z5ngCVeYJKZe8Bg/Swb4zXl6mIzZ9darLzXDEV6ystfPXxJ
 e1AdBb41WC+O2SAI4l64yyeswkGo3Iw2oMbXG5jmFl6wY/xGp7dWxw7gfnhC6oOh
 afOT54p2OUDfSAbJaO0IHliWoIdmE5ZYdVYVU9Ek+uWyaIwcXhNmqRg+Uqmo32jf
 cP5J9x2kF3RdOcbSHXmFp++fU+wkhBtEcjkNpvkjpi4xyA47IWS7lrVBBebrCq9R
 pa/A7CNQwibIV6YD8+/p
 =1dUK
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc plugins update from Kees Cook:
 "This adds a new gcc plugin named "latent_entropy". It is designed to
  extract as much possible uncertainty from a running system at boot
  time as possible, hoping to capitalize on any possible variation in
  CPU operation (due to runtime data differences, hardware differences,
  SMP ordering, thermal timing variation, cache behavior, etc).

  At the very least, this plugin is a much more comprehensive example
  for how to manipulate kernel code using the gcc plugin internals"

* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  latent_entropy: Mark functions with __latent_entropy
  gcc-plugins: Add latent_entropy plugin
2016-10-15 10:03:15 -07:00
Jiri Bohac
76506a986d IPv6: fix DESYNC_FACTOR
The IPv6 temporary address generation uses a variable called DESYNC_FACTOR
to prevent hosts updating the addresses at the same time. Quoting RFC 4941:

   ... The value DESYNC_FACTOR is a random value (different for each
   client) that ensures that clients don't synchronize with each other and
   generate new addresses at exactly the same time ...

DESYNC_FACTOR is defined as:

   DESYNC_FACTOR -- A random value within the range 0 - MAX_DESYNC_FACTOR.
   It is computed once at system start (rather than each time it is used)
   and must never be greater than (TEMP_VALID_LIFETIME - REGEN_ADVANCE).

First, I believe the RFC has a typo in it and meant to say: "and must
never be greater than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE)"

The reason is that at various places in the RFC, DESYNC_FACTOR is used in
a calculation like (TEMP_PREFERRED_LIFETIME - DESYNC_FACTOR) or
(TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR). It needs to be
smaller than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE) for the result of
these calculations to be larger than zero. It's never used in a
calculation together with TEMP_VALID_LIFETIME.

I already submitted an errata to the rfc-editor:
https://www.rfc-editor.org/errata_search.php?rfc=4941

The Linux implementation of DESYNC_FACTOR is very wrong:
max_desync_factor is used in places DESYNC_FACTOR should be used.
max_desync_factor is initialized to the RFC-recommended value for
MAX_DESYNC_FACTOR (600) but the whole point is to get a _random_ value.

And nothing ensures that the value used is not greater than
(TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE), which leads to underflows.  The
effect can easily be observed when setting the temp_prefered_lft sysctl
e.g. to 60. The preferred lifetime of the temporary addresses will be
bogus.

TEMP_PREFERRED_LIFETIME and REGEN_ADVANCE are not constants and can be
influenced by these three sysctls: regen_max_retry, dad_transmits and
temp_prefered_lft. Thus, the upper bound for desync_factor needs to be
re-calculated each time a new address is generated and if desync_factor is
larger than the new upper bound, a new random value needs to be
re-generated.

And since we already have max_desync_factor configurable per interface, we
also need to calculate and store desync_factor per interface.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:59:15 -04:00
Jiri Bohac
9d6280da39 IPv6: Drop the temporary address regen_timer
The randomized interface identifier (rndid) was periodically updated from
the regen_timer timer. Simplify the code by updating the rndid only when
needed by ipv6_try_regen_rndid().

This makes the follow-up DESYNC_FACTOR fix much simpler.  Also it fixes a
reference counting error in this error path, where an in6_dev_put was
missing:
		err = addrconf_sysctl_register(ndev);
		if (err) {
			ipv6_mc_destroy_dev(ndev);
	-               del_timer(&ndev->regen_timer);
			snmp6_unregister_dev(ndev);
			goto err_release;

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:59:15 -04:00
David S. Miller
f1f081cef0 RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAV/+wwfSw1s6N8H32AQLpVA/+NByreKyI8cHL1zgz816iTrzYbEP5Gbtw
 RI9TI5iweUa9ySe4PFQUw+VC0yaAP9brY8tTtss8KHk808Wu4xhlg8fAClOaZXwy
 WmHASdwnRaDWguEpPHyHRST+s9ZO/VD5vwGhREB/hojzdzd135bq1d6GKaHoLFx2
 XDwDeyZc1z+aSzdMCoQuKJlqw9mfujsIOK5xZJ/h/JquYJ3iER55vdofettNCT+S
 hjueVBgWV988oORBtduPrfUYBbQI83QyiWl0xdo6QXWAoN784NfpVti8YAA9B7To
 qfup5aE6ky3LhuRD8GS00yWb96b43FGuPqt27LTH7SGnALX7KbETbBcgasyWqFeV
 UvPbVlk5R+0OXLxqOHvn20gRFS2c6HIdVAW6h7QHB0qwnLS1JJJToAczhK4QTsHQ
 eOXSGK4Gj0CiTd23bL7egaULKnD7eiZbagoty1UL05k8TAgPRnXXaMCRc0cupvKo
 5Amk7xT7ZN1Iyh9TQSt2MRIBIG6AOJogsmjSqgKJZY4YdCD5rYAWyAzc7K2l/L0e
 oUwDaPFWwNAfVYxJIXSd223squyxaXen0B+NlY501tqw4Ce4NnuEssqsSWk6OTHZ
 R9HCT200HvvZthPOStP7rdFiQ6VRDH3aSdl3zU4ila7cxr4wfdVslShuj91OiLqI
 /7zodCmls/8=
 =7krl
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20161013' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fixes

This set of patches contains a bunch of fixes:

 (1) Fix use of kunmap() after change from kunmap_atomic() within AFS.

 (2) Don't use of ERR_PTR() with an always zero value.

 (3) Check the right error when using ip6_route_output().

 (4) Be consistent about whether call->operation_ID is BE or CPU-E within
     AFS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:44:45 -04:00
Shmulik Ladkani
53592b3640 net/sched: act_mirred: Implement ingress actions
Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).

This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:23:07 -04:00
Shmulik Ladkani
dcf800344a net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header
Move detection logic that tests whether device expects skb data to point
at mac_header upon xmit into a function.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:23:06 -04:00
Shmulik Ladkani
165779231f net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit and make it a bool
'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
egress to the target device (if action is performed at ingress).

Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
the target device (and use a bool instead of int).

This allows to decouple the attribute from the action to be taken.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:23:06 -04:00
Nicolas Dichtel
a220445f9f ipv6: correctly add local routes when lo goes up
The goal of the patch is to fix this scenario:
 ip link add dummy1 type dummy
 ip link set dummy1 up
 ip link set lo down ; ip link set lo up

After that sequence, the local route to the link layer address of dummy1 is
not there anymore.

When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.

In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.

Fixes: 25fb6ca4ed ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
Fixes: a881ae1f62 ("ipv6: don't call addrconf_dst_alloc again when enable lo")
Fixes: 33d99113b1 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
Reported-by: Francesco Santoro <francesco.santoro@6wind.com>
Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com>
CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Weilong Chen <chenweilong@huawei.com>
CC: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:05:14 -04:00
stephen hemminger
cf53b1da73 Revert "net: Add driver helper functions to determine checksum offloadability"
This reverts commit 6ae23ad362.

The code has been in kernel since 4.4 but there are no in tree
code that uses. Unused code is broken code, remove it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:02:56 -04:00
Vadim Fedorenko
68d00f332e ip6_tunnel: fix ip6_tnl_lookup
The commit ea3dc9601b ("ip6_tunnel: Add support for wildcard tunnel
endpoints.") introduces support for wildcards in tunnels endpoints,
but in some rare circumstances ip6_tnl_lookup selects wrong tunnel
interface relying only on source or destination address of the packet
and not checking presence of wildcard in tunnels endpoints. Later in
ip6_tnl_rcv this packets can be dicarded because of difference in
ipproto even if fallback device have proper ipproto configuration.

This patch adds checks of wildcard endpoint in tunnel avoiding such
behavior

Fixes: ea3dc9601b ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Signed-off-by: Vadim Fedorenko <junk@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:01:26 -04:00
David S. Miller
8eed1cd4cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-14 10:00:27 -04:00
Linus Torvalds
29fbff8698 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix various build warnings in tlan/qed/xen-netback drivers, from
    Arnd Bergmann.

 2) Propagate proper error code in strparser's strp_recv(), from Geert
    Uytterhoeven.

 3) Fix accidental broadcast of RTM_GETTFILTER responses, from Eric
    Dumazret.

 4) Need to use list_for_each_entry_safe() in qed driver, from Wei
    Yongjun.

 5) Openvswitch 802.1AD bug fixes from Jiri Benc.

 6) Cure BUILD_BUG_ON() in mlx5 driver, from Tom Herbert.

 7) Fix UDP ipv6 checksumming in netvsc driver, from Stephen Hemminger.

 8) stmmac driver fixes from Giuseppe CAVALLARO.

 9) Fix access to mangled IP6CB in tcp, from Eric Dumazet.

10) Fix info leaks in tipc and rtnetlink, from Dan Carpenter.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  net: bridge: add the multicast_flood flag attribute to brport_attrs
  net: axienet: Remove unused parameter from __axienet_device_reset
  liquidio: CN23XX: fix a loop timeout
  net: rtnl: info leak in rtnl_fill_vfinfo()
  tipc: info leak in __tipc_nl_add_udp_addr()
  net: ipv4: Do not drop to make_route if oif is l3mdev
  net: phy: Trigger state machine on state change and not polling.
  ipv6: tcp: restore IP6CB for pktoptions skbs
  netvsc: Remove mistaken udp.h inclusion.
  xen-netback: fix type mismatch warning
  stmmac: fix error check when init ptp
  stmmac: fix ptp init for gmac4
  qed: fix old-style function definition
  netvsc: fix checksum on UDP IPV6
  net_sched: reorder pernet ops and act ops registrations
  xen-netback: fix guest Rx stall detection (after guest Rx refactor)
  drivers/ptp: Fix kernel memory disclosure
  net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
  qmi_wwan: add support for Quectel EC21 and EC25
  openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
  ...
2016-10-13 21:40:23 -07:00
Linus Torvalds
c4a86165d1 NFS client updates for Linux 4.9
Highlights include:
 
 Stable bugfixes:
 - sunrpc: fix writ espace race causing stalls
 - NFS: Fix inode corruption in nfs_prime_dcache()
 - NFSv4: Don't report revoked delegations as valid in
   nfs_have_delegation()
 - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is
   invalid
 - NFSv4: Open state recovery must account for file permission changes
 - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
 
 Features:
 - Add support for tracking multiple layout types with an ordered list
 - Add support for using multiple backchannel threads on the client
 - Add support for pNFS file layout session trunking
 - Delay xprtrdma use of DMA API (for device driver removal)
 - Add support for xprtrdma remote invalidation
 - Add support for larger xprtrdma inline thresholds
 - Use a scatter/gather list for sending xprtrdma RPC calls
 - Add support for the CB_NOTIFY_LOCK callback
 - Improve hashing sunrpc auth_creds by using both uid and gid
 
 Bugfixes:
 - Fix xprtrdma use of DMA API
 - Validate filenames before adding to the dcache
 - Fix corruption of xdr->nwords in xdr_copy_to_scratch
 - Fix setting buffer length in xdr_set_next_buffer()
 - Don't deadlock the state manager on the SEQUENCE status flags
 - Various delegation and stateid related fixes
 - Retry operations if an interrupted slot receives EREMOTEIO
 - Make nfs boot time y2038 safe
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX/+ZfAAoJENfLVL+wpUDr5MUP/16s2Kp9ZZZZ7ICi3yrHOzb0
 9WpCOmbKUIELXl8YgkxlvPUYMzTQTIc32TwbVgdFV0g41my/0+O3z3+IiTrUGxH5
 8LgouMWBZ9KKmyUB//+KQAXr3j/bvDdF6Li6wJfz8a2o+9xT4oTkK1+Js8p0kn6e
 HNKfRknfCKwvE+j4tPCLfs2RX5qDyBFILXwWhj1fAbmT3rbnp+QqkXD4mWUrXb9z
 DBgxciXRhOkOQQAD2KQBFd2kUqWDZ5ED23b+aYsu9D3VCW45zitBqQFAxkQWL0hp
 x8Mp+MDCxlgdEaGQPUmUiDtPkG1X9ZxUJCAwaJWWsZaItwR2Il+en2sETctnTZ1X
 0IAxZVFdolzSeLzIfNx3OG32JdWJdaNjUzkIZam8gO6i1f6PAmK4alR0J3CT31nJ
 /OEN76o1E7acGWRMmj+MAZ2U5gPfR7EitOzyE8ZUPcHgyeGMiynjwi56WIpeSvT2
 F/Sp5kRe5+D5gtnYuppGp7Srp5vYdtFaz1zgPDUKpDLcxfDweO8AHGjJf3Zmrunx
 X24yia4A14CnfcUy4vKpISXRykmkG/3Z0tpWwV53uXZm4nlQfRc7gPibiW7Ay521
 af8sDoItW98K3DK5NQU7IUn83ua1TStzpoqlAEafRw//g9zPMTbhHvNvOyrRfrcX
 kjWn6hNblMu9M34JOjtu
 =XOrF
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - sunrpc: fix writ espace race causing stalls
   - NFS: Fix inode corruption in nfs_prime_dcache()
   - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
   - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
   - NFSv4: Open state recovery must account for file permission changes
   - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

  Features:
   - Add support for tracking multiple layout types with an ordered list
   - Add support for using multiple backchannel threads on the client
   - Add support for pNFS file layout session trunking
   - Delay xprtrdma use of DMA API (for device driver removal)
   - Add support for xprtrdma remote invalidation
   - Add support for larger xprtrdma inline thresholds
   - Use a scatter/gather list for sending xprtrdma RPC calls
   - Add support for the CB_NOTIFY_LOCK callback
   - Improve hashing sunrpc auth_creds by using both uid and gid

  Bugfixes:
   - Fix xprtrdma use of DMA API
   - Validate filenames before adding to the dcache
   - Fix corruption of xdr->nwords in xdr_copy_to_scratch
   - Fix setting buffer length in xdr_set_next_buffer()
   - Don't deadlock the state manager on the SEQUENCE status flags
   - Various delegation and stateid related fixes
   - Retry operations if an interrupted slot receives EREMOTEIO
   - Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
  NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
  fs: nfs: Make nfs boot time y2038 safe
  sunrpc: replace generic auth_cred hash with auth-specific function
  sunrpc: add RPCSEC_GSS hash_cred() function
  sunrpc: add auth_unix hash_cred() function
  sunrpc: add generic_auth hash_cred() function
  sunrpc: add hash_cred() function to rpc_authops struct
  Retry operation on EREMOTEIO on an interrupted slot
  pNFS: Fix atime updates on pNFS clients
  sunrpc: queue work on system_power_efficient_wq
  NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
  NFSv4: If recovery failed for a specific open stateid, then don't retry
  NFSv4: Fix retry issues with nfs41_test/free_stateid
  NFSv4: Open state recovery must account for file permission changes
  NFSv4: Mark the lock and open stateids as invalid after freeing them
  NFSv4: Don't test open_stateid unless it is set
  NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
  NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
  NFSv4: Fix a race when updating an open_stateid
  NFSv4: Fix a race in nfs_inode_reclaim_delegation()
  ...
2016-10-13 21:28:20 -07:00
Linus Torvalds
2778556474 Some RDMA work and some good bugfixes, and two new features that could
benefit from user testing:
 
 Anna Schumacker contributed a simple NFSv4.2 COPY implementation.  COPY
 is already supported on the client side, so a call to copy_file_range()
 on a recent client should now result in a server-side copy that doesn't
 require all the data to make a round trip to the client and back.
 
 Jeff Layton implemented callbacks to notify clients when contended locks
 become available, which should reduce latency on workloads with
 contended locks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJX/mcsAAoJECebzXlCjuG+MU0P/3SzTLGYXU5yOTAorx255/uf
 fUVKQQhTzzaA2xj3gWWWztYx3y0ZJUVgwU56a+Ap5Z8/goqDQ78H+ePEc+MG7BT/
 /UXS/bITvt0MP/dvPrDzhSltvqx/wpelLPBo29hGLlAQ2dsnD4Y75IbOOQccWqcC
 iD2v6x7lnpWZ7j9Zhwzg/JNQHwISIb7tiLoYBjfcdNDEMU76KIyhxD0Cx9MSeBzH
 9Rq/oEdwGDFS5WqVfNe2jxbngoauq1IupziQ2eQGv2D/POyXCx8fphoYjDz1XaW8
 PxaJtJtM2owPGG+z2CxklJqNaS1Z4F+oppjg+nf4i/ibxmIBaTy8NluASX3vMh69
 CDO1+ly+TiF0l1VqMOQJWRnqn1qGk6fLpF6P1Ac62B0oWpeLGU7nmik7XN1ORgsi
 8ksxRKNAWeprZo3wl5xNrADu/wlZ7XCJTc4QoHEgYT04aHF+j8EMCHv+mtZ8+Bwn
 WWiA8iItZOgXV4vitCRJlvsixjYvmF3djPIoI2Lt5KDWIg+eL89sKwzTALSfeC4m
 Vjb0svzPX1MmZCNP1rCStFbl3gZYXZyqPk+uA6M7H8mjAjVeKxRPowWpMBgvYZHr
 FjCPb878bAuqCeBVbIyOLLcKWBLTw8PsUWZAor3gNg454JGkMjLUyJ/S22Cz5Nbo
 HdjoiTJtbPrHnCwTMXwa
 =nozl
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Some RDMA work and some good bugfixes, and two new features that could
  benefit from user testing:

   - Anna Schumacker contributed a simple NFSv4.2 COPY implementation.
     COPY is already supported on the client side, so a call to
     copy_file_range() on a recent client should now result in a
     server-side copy that doesn't require all the data to make a round
     trip to the client and back.

   - Jeff Layton implemented callbacks to notify clients when contended
     locks become available, which should reduce latency on workloads
     with contended locks"

* tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux:
  NFSD: Implement the COPY call
  nfsd: handle EUCLEAN
  nfsd: only WARN once on unmapped errors
  exportfs: be careful to only return expected errors.
  nfsd4: setclientid_confirm with unmatched verifier should fail
  nfsd: randomize SETCLIENTID reply to help distinguish servers
  nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies
  nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant
  nfsd: add a LRU list for blocked locks
  nfsd: have nfsd4_lock use blocking locks for v4.1+ locks
  nfsd: plumb in a CB_NOTIFY_LOCK operation
  NFSD: fix corruption in notifier registration
  svcrdma: support Remote Invalidation
  svcrdma: Server-side support for rpcrdma_connect_private
  rpcrdma: RDMA/CM private message data structure
  svcrdma: Skip put_page() when send_reply() fails
  svcrdma: Tail iovec leaves an orphaned DMA mapping
  nfsd: fix dprintk in nfsd4_encode_getdeviceinfo
  nfsd: eliminate cb_minorversion field
  nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
2016-10-13 21:04:42 -07:00
Nikolay Aleksandrov
4eb6753c33 net: bridge: add the multicast_flood flag attribute to brport_attrs
When I added the multicast flood control flag, I also added an attribute
for it for sysfs similar to other flags, but I forgot to add it to
brport_attrs.

Fixes: b6cb5ac833 ("net: bridge: add per-port multicast flood flag")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 12:16:36 -04:00
Dan Carpenter
775f4f0550 net: rtnl: info leak in rtnl_fill_vfinfo()
The "vf_vlan_info" struct ends with a 2 byte struct hole so we have to
memset it to ensure that no stack information is revealed to user space.

Fixes: 79aab093a0 ('net: Update API for VF vlan protocol 802.1ad support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 12:12:04 -04:00
Dan Carpenter
7307616245 tipc: info leak in __tipc_nl_add_udp_addr()
We should clear out the padding and unused struct members so that we
don't expose stack information to userspace.

Fixes: fdb3accc2c ('tipc: add the ability to get UDP options via netlink')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 12:10:01 -04:00
David Ahern
6104e112f4 net: ipv4: Do not drop to make_route if oif is l3mdev
Commit e0d56fdd73 was a bit aggressive removing l3mdev calls in
the IPv4 stack. If the fib_lookup fails we do not want to drop to
make_route if the oif is an l3mdev device.

Also reverts 19664c6a00 ("net: l3mdev: Remove netif_index_is_l3_master")
which removed netif_index_is_l3_master.

Fixes: e0d56fdd73 ("net: l3mdev: remove redundant calls")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 12:05:26 -04:00
Eric Dumazet
8ce48623f0 ipv6: tcp: restore IP6CB for pktoptions skbs
Baozeng Ding reported following KASAN splat :

BUG: KASAN: use-after-free in ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 at addr ffff880029c84ec8
Read of size 1 by task poc/25548
Call Trace:
 [<ffffffff82cf43c9>] dump_stack+0x12e/0x185 /lib/dump_stack.c:15
 [<     inline     >] print_address_description /mm/kasan/report.c:204
 [<ffffffff817ced3b>] kasan_report_error+0x48b/0x4b0 /mm/kasan/report.c:283
 [<     inline     >] kasan_report /mm/kasan/report.c:303
 [<ffffffff817ced9e>] __asan_report_load1_noabort+0x3e/0x40 /mm/kasan/report.c:321
 [<ffffffff85c71da1>] ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 /net/ipv6/datagram.c:687
 [<ffffffff85c734c3>] ip6_datagram_recv_ctl+0x33/0x40
 [<ffffffff85c0b07c>] do_ipv6_getsockopt.isra.4+0xaec/0x2150
 [<ffffffff85c0c7f6>] ipv6_getsockopt+0x116/0x230
 [<ffffffff859b5a12>] tcp_getsockopt+0x82/0xd0 /net/ipv4/tcp.c:3035
 [<ffffffff855fb385>] sock_common_getsockopt+0x95/0xd0 /net/core/sock.c:2647
 [<     inline     >] SYSC_getsockopt /net/socket.c:1776
 [<ffffffff855f8ba2>] SyS_getsockopt+0x142/0x230 /net/socket.c:1758
 [<ffffffff8685cdc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
Memory state around the buggy address:
 ffff880029c84d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff880029c84e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff880029c84e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                              ^
 ffff880029c84f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff880029c84f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

He also provided a syzkaller reproducer.

Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB
data that was moved at a different place in tcp_v6_rcv()

This patch moves tcp_v6_restore_cb() up and calls it from
tcp_v6_do_rcv() when np->pktoptions is set.

Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 11:07:34 -04:00
Roopa Prabhu
de1dfeefef bridge: add address and vlan to fdb warning messages
This patch adds vlan and address to warning messages printed
in the bridge fdb code for debuggability.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 10:49:54 -04:00
WANG Cong
ab102b80ce net_sched: reorder pernet ops and act ops registrations
Krister reported a kernel NULL pointer dereference after
tcf_action_init_1() invokes a_o->init(), it is a race condition
where one thread calling tcf_register_action() to initialize
the netns data after putting act ops in the global list and
the other thread searching the list and then calling
a_o->init(net, ...).

Fix this by moving the pernet ops registration before making
the action ops visible. This is fine because: a) we don't
rely on act_base in pernet ops->init(), b) in the worst case we
have a fully initialized netns but ops is still not ready so
new actions still can't be created.

Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 10:26:43 -04:00
Jiri Benc
3145c037e7 openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev
The internal device does support 802.1AD offloading since 018c1dda5f
("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink
attributes").

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 10:03:23 -04:00
Jiri Benc
72ec108d70 openvswitch: fix vlan subtraction from packet length
When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN
header is not counted in skb->len. It doesn't make sense to subtract it.

Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 10:03:23 -04:00
Jiri Benc
20ecf1e4e3 openvswitch: vlan: remove wrong likely statement
This code is called whenever flow key is being extracted from the packet.
The packet may be as likely vlan tagged as not.

Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13 10:03:23 -04:00