Commit Graph

55319 Commits

Author SHA1 Message Date
wenxu
6e6b904ad4 ip_tunnel: Fix route fl4 init in ip_md_tunnel_xmit
Init the gre_key from tuninfo->key.tun_id and init the mark
from the skb->mark, set the oif to zero in the collect metadata
mode.

Fixes: cfc7381b30 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-26 09:43:03 -08:00
wenxu
c8b34e680a ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit
Add tnl_update_pmtu in ip_md_tunnel_xmit to dynamic modify
the pmtu which packet send through collect_metadata mode
ip tunnel

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-26 09:43:03 -08:00
wenxu
f46fe4f8d7 ip_tunnel: Add ip tunnel dst_cache in ip_md_tunnel_xmit
Add ip tunnel dst cache in ip_md_tunnel_xmit to make more
efficient for the route lookup.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-26 09:43:03 -08:00
Willem de Bruijn
f859a44847 tcp: allow zerocopy with fastopen
Accept MSG_ZEROCOPY in all the TCP states that allow sendmsg. Remove
the explicit check for ESTABLISHED and CLOSE_WAIT states.

This requires correctly handling zerocopy state (uarg, sk_zckey) in
all paths reachable from other TCP states. Such as the EPIPE case
in sk_stream_wait_connect, which a sendmsg() in incorrect state will
now hit. Most paths are already safe.

Only extension needed is for TCP Fastopen active open. This can build
an skb with data in tcp_send_syn_data. Pass the uarg along with other
fastopen state, so that this skb also generates a zerocopy
notification on release.

Tested with active and passive tcp fastopen packetdrill scripts at
1747eef03d

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 22:41:08 -08:00
Peter Oskolkov
997dd96471 net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
Currently, IPv6 defragmentation code drops non-last fragments that
are smaller than 1280 bytes: see
commit 0ed4229b08 ("ipv6: defrag: drop non-last frags smaller than min mtu")

This behavior is not specified in IPv6 RFCs and appears to break
compatibility with some IPv6 implemenations, as reported here:
https://www.spinics.net/lists/netdev/msg543846.html

This patch re-uses common IP defragmentation queueing and reassembly
code in IP6 defragmentation in nf_conntrack, removing the 1280 byte
restriction.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reported-by: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 21:37:11 -08:00
Peter Oskolkov
d4289fcc9b net: IP6 defrag: use rbtrees for IPv6 defrag
Currently, IPv6 defragmentation code drops non-last fragments that
are smaller than 1280 bytes: see
commit 0ed4229b08 ("ipv6: defrag: drop non-last frags smaller than min mtu")

This behavior is not specified in IPv6 RFCs and appears to break
compatibility with some IPv6 implemenations, as reported here:
https://www.spinics.net/lists/netdev/msg543846.html

This patch re-uses common IP defragmentation queueing and reassembly
code in IPv6, removing the 1280 byte restriction.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reported-by: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 21:37:11 -08:00
Peter Oskolkov
c23f35d19d net: IP defrag: encapsulate rbtree defrag code into callable functions
This is a refactoring patch: without changing runtime behavior,
it moves rbtree-related code from IPv4-specific files/functions
into .h/.c defrag files shared with IPv6 defragmentation code.

Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 21:37:11 -08:00
Liangwei Dong
6c900360e7 nl80211: Allow set/del pmksa operations for AP
Host drivers may offload authentication to the user space
through the commit ("cfg80211: Authentication offload to
user space in AP mode").

This interface can be used to implement SAE by having the
userspace do authentication/PMKID key derivation and driver
handle the association.

A step ahead, this interface can get further optimized if the
PMKID is passed to the host driver and also have it respond to
the association request by the STA on a valid PMKID.

This commit enables the userspace to pass the PMKID to the host
drivers through the set/del pmksa operations in AP mode.

Set/Del pmksa is now restricted to STA/P2P client mode only and
thus the drivers might not expect them in any other(AP) mode.

This commit also introduces a feature flag
NL80211_EXT_FEATURE_AP_PMKSA_CACHING (johannes: renamed) to
maintain the backward compatibility of such an expectation by
the host drivers. These operations are allowed in AP mode only
when the drivers advertize the capability through this flag.

Signed-off-by: Liangwei Dong <liangwei@codeaurora.org>
Signed-off-by: Srinivas Dasari <dasaris@codeaurora.org>
[rename flag to NL80211_EXT_FEATURE_AP_PMKSA_CACHING]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 21:09:21 +01:00
Srinivas Dasari
fe4943702c cfg80211: Authentication offload to user space in AP mode
commit 40cbfa9021 ("cfg80211/nl80211: Optional authentication
offload to userspace")' introduced authentication offload to user
space by the host drivers in station mode. This commit extends
the same for the AP mode too.

Extend NL80211_ATTR_EXTERNAL_AUTH_SUPPORT to also claim the
support of external authentication from the user space in AP mode.
A new flag parameter is introduced in cfg80211_ap_settings to
intend the same while "start ap".

Host driver to use NL80211_CMD_FRAME interface to transmit and
receive the authentication frames to / from the user space.

Host driver to indicate the flag NL80211_RXMGMT_FLAG_EXTERNAL_AUTH
while sending the authentication frame to the user space. This
intends to the user space that the driver wishes it to process
the authentication frame for certain protocols, though it had
initially advertised the support for SME functionality.

User space shall accordingly do the authentication and indicate
its final status through the command NL80211_CMD_EXTERNAL_AUTH.
Allow the command even if userspace doesn't include the attribute
NL80211_ATTR_SSID for AP interface.

Host driver shall continue with the association sequence and
indicate the STA connection status through cfg80211_new_sta.

To facilitate the host drivers in AP mode for matching the pmkid
by the stations during the association, NL80211_CMD_EXTERNAL_AUTH
is also enhanced to include the pmkid to drivers after
the authentication.
This pmkid can also be used in the STA mode to include in the
association request.

Also modify nl80211_external_auth to not mandate SSID in AP mode.

Signed-off-by: Srinivas Dasari <dasaris@codeaurora.org>
[remove useless nla_get_flag() usage]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 21:08:05 +01:00
David S. Miller
517952756e Merge tag 'mac80211-for-davem-2019-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
Just a few small fixes:
 * avoid trying to operate TDLS when not connection,
   this is not valid and led to issues
 * count TTL-dropped frames in mesh better
 * deal with new WiGig channels in regulatory code
 * remove a WARN_ON() that can trigger due to benign
   races during device/driver registration
 * fix nested netlink policy maxattrs (syzkaller)
 * fix hwsim n_limits (syzkaller)
 * propagate __aligned(2) to a surrounding struct
 * return proper error in virt_wifi error path
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 10:59:36 -08:00
David S. Miller
30e5c2c6bf net: Revert devlink health changes.
This reverts the devlink health changes from 9/17/2019,
Jiri wants things to be designed differently and it was
agreed that the easiest way to do this is start from the
beginning again.

Commits reverted:

cb5ccfbe73
880ee82f03
c7af343b4e
ff253fedab
6f9d56132e
fcd852c69d
8a66704a13
12bd0dcefe
aba25279c1
ce019faa70
b8c45a033a

And the follow-on build fix:

o33a0efa4baecd689da9474ce0e8b673eb6931c60

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 10:53:23 -08:00
Veerendranath Jakkam
ab4dfa2053 cfg80211: Allow drivers to advertise supported AKM suites
There was no such capability advertisement from the driver and thus the
current user space has to assume the driver to support all the AKMs. While
that may be the case with some drivers (e.g., mac80211-based ones), there
are cfg80211-based drivers that implement SME and have constraints on
which AKMs can be supported (e.g., such drivers may need an update to
support SAE AKM using NL80211_CMD_EXTERNAL_AUTH). Allow such drivers to
advertise the exact set of supported AKMs so that user space tools can
determine what network profile options should be allowed to be configured.

Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
[pmsr data might be big, start a new netlink message section]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 14:05:31 +01:00
Sriram R
c82c06ce43 cfg80211: Notify all User Hints To self managed wiphys
Currently Self Managed WIPHY's are not notified on any
hints other than user cell base station hints.
Self Managed wiphy's basically rely on hints from firmware
and its local regdb for regulatory management, so hints from wireless
core can be ignored. But all user hints needs to be notified
to them to provide flexibility to these drivers to honour or
ignore these user hints.

Currently none of the drivers supporting self managed wiphy
register a notifier with cfg80211. Hence this change does not affect
any other driver behavior.

Signed-off-by: Sriram R <srirrama@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 14:05:31 +01:00
Gustavo A. R. Silva
4af217500e cfg80211: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

net/wireless/wext-compat.c:1327:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/wireless/wext-compat.c:1341:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 14:05:31 +01:00
Toke Høiland-Jørgensen
390298e86f mac80211: Expose ieee80211_schedule_txq() function
Since we reworked ieee80211_return_txq() so it assumes that the caller
takes care of logging, we need another function that can be called without
holding any locks. Introduce ieee80211_schedule_txq() which serves this
purpose.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 14:05:27 +01:00
Chaitanya Tata
93183bdbe7 cfg80211: extend range deviation for DMG
Recently, DMG frequency bands have been extended till 71GHz, so extend
the range check till 20GHz (45-71GHZ), else some channels will be marked
as disabled.

Signed-off-by: Chaitanya Tata <Chaitanya.Tata@bluwireless.co.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 10:18:51 +01:00
Chaitanya Tata
faae54ad41 cfg80211: reg: remove warn_on for a normal case
If there are simulatenous queries of regdb, then there might be a case
where multiple queries can trigger request_firmware_no_wait and can have
parallel callbacks being executed asynchronously. In this scenario we
might hit the WARN_ON.

So remove the warn_on, as the code already handles multiple callbacks
gracefully.

Signed-off-by: Chaitanya Tata <chaitanya.tata@bluwireless.co.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 10:18:02 +01:00
Mathieu Malaterre
7c53eb5d87 mac80211: Add attribute aligned(2) to struct 'action'
During refactor in commit 9e478066ea ("mac80211: fix MU-MIMO
follow-MAC mode") a new struct 'action' was declared with packed
attribute as:

  struct {
          struct ieee80211_hdr_3addr hdr;
          u8 category;
          u8 action_code;
  } __packed action;

But since struct 'ieee80211_hdr_3addr' is declared with an aligned
keyword as:

  struct ieee80211_hdr {
  	__le16 frame_control;
  	__le16 duration_id;
  	u8 addr1[ETH_ALEN];
  	u8 addr2[ETH_ALEN];
  	u8 addr3[ETH_ALEN];
  	__le16 seq_ctrl;
  	u8 addr4[ETH_ALEN];
  } __packed __aligned(2);

Solve the ambiguity of placing aligned structure in a packed one by
adding the aligned(2) attribute to struct 'action'.

This removes the following warning (W=1):

  net/mac80211/rx.c:234:2: warning: alignment 1 of 'struct <anonymous>' is less than 2 [-Wpacked-not-aligned]

Cc: Johannes Berg <johannes.berg@intel.com>
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 10:17:25 +01:00
Balaji Pothunoori
7ed5285396 mac80211: don't initiate TDLS connection if station is not associated to AP
Following call trace is observed while adding TDLS peer entry in driver
during TDLS setup.

Call Trace:
[<c1301476>] dump_stack+0x47/0x61
[<c10537d2>] __warn+0xe2/0x100
[<fa22415f>] ? sta_apply_parameters+0x49f/0x550 [mac80211]
[<c1053895>] warn_slowpath_null+0x25/0x30
[<fa22415f>] sta_apply_parameters+0x49f/0x550 [mac80211]
[<fa20ad42>] ? sta_info_alloc+0x1c2/0x450 [mac80211]
[<fa224623>] ieee80211_add_station+0xe3/0x160 [mac80211]
[<c1876fe3>] nl80211_new_station+0x273/0x420
[<c170f6d9>] genl_rcv_msg+0x219/0x3c0
[<c170f4c0>] ? genl_rcv+0x30/0x30
[<c170ee7e>] netlink_rcv_skb+0x8e/0xb0
[<c170f4ac>] genl_rcv+0x1c/0x30
[<c170e8aa>] netlink_unicast+0x13a/0x1d0
[<c170ec18>] netlink_sendmsg+0x2d8/0x390
[<c16c5acd>] sock_sendmsg+0x2d/0x40
[<c16c6369>] ___sys_sendmsg+0x1d9/0x1e0

Fixing this by allowing TDLS setup request only when we have completed
association.

Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 10:13:22 +01:00
Johannes Berg
a8b5c6d692 nl80211: fix NLA_POLICY_NESTED() arguments
syzbot reported an out-of-bounds read when passing certain
malformed messages into nl80211. The specific place where
this happened isn't interesting, the problem is that nested
policy parsing was referring to the wrong maximum attribute
and thus the policy wasn't long enough.

Fix this by referring to the correct attribute. Since this
is really not necessary, I'll come up with a separate patch
to just pass the policy instead of both, in the common case
we can infer the maxattr from the size of the policy array.

Reported-by: syzbot+4157b036c5f4713b1f2f@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-01-25 09:26:32 +01:00
Felix Fietkau
7d652669b6 batman-adv: release station info tidstats
With the addition of TXQ stats in the per-tid statistics the struct
station_info grew significantly. This resulted in stack size warnings
due to the structure itself being above the limit for the warnings.

To work around this, the TID array was allocated dynamically. Also a
function to free this content was introduced with commit 7ea3e110f2
("cfg80211: release station info tidstats where needed") but the necessary
changes were not provided for batman-adv's B.A.T.M.A.N. V implementation.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Fixes: 8689c051a2 ("cfg80211: dynamically allocate per-tid stats for station info")
[sven@narfation.org: add commit message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-01-25 09:04:41 +01:00
Colin Ian King
1e4b6e91b4 Bluetooth: make hw_err static, reduces object code size
Don't populate the const array hw_err on the stack but instead make
it static. Makes the object code smaller by 45 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
 100880	  21090	   1088	 123058	  1e0b2	linux/net/bluetooth/hci_core.o

After:
   text	   data	    bss	    dec	    hex	filename
 100739	  21186	   1088	 123013	  1e085	linux/net/bluetooth/hci_core.o

(gcc version 8.2.0 x86_64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-01-25 08:53:58 +01:00
Rajat Jain
e2bef3847e Bluetooth: Allow driver specific cmd timeout handling
Add a hook to allow the BT driver to do device or command specific
handling in case of timeouts. This is to be used by Intel driver to
reset the device after certain number of timeouts.

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-01-25 08:46:32 +01:00
YueHaibing
0ba9480cff bridge: remove duplicated include from br_multicast.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:49:57 -08:00
Zhaolong Zhang
8eab6dac8d tipc: remove dead code in struct tipc_topsrv
max_rcvbuf_size is no longer used since commit "414574a0af36".

Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:31:23 -08:00
Priyaranjan Jha
78dc70ebaa tcp_bbr: adapt cwnd based on ack aggregation estimation
Aggregation effects are extremely common with wifi, cellular, and cable
modem link technologies, ACK decimation in middleboxes, and LRO and GRO
in receiving hosts. The aggregation can happen in either direction,
data or ACKs, but in either case the aggregation effect is visible
to the sender in the ACK stream.

Previously BBR's sending was often limited by cwnd under severe ACK
aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets
were acked in bursts after long delays (e.g. one ACK acking 5*BDP after
5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving
the bottleneck idle for potentially long periods. Note that loss-based
congestion control does not have this issue because when facing
aggregation it continues increasing cwnd after bursts of ACKs, growing
cwnd until the buffer is full.

To achieve good throughput in the presence of aggregation effects, this
algorithm allows the BBR sender to put extra data in flight to keep the
bottleneck utilized during silences in the ACK stream that it has evidence
to suggest were caused by aggregation.

A summary of the algorithm: when a burst of packets are acked by a
stretched ACK or a burst of ACKs or both, BBR first estimates the expected
amount of data that should have been acked, based on its estimated
bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max
filter to estimate the recent level of observed ACK aggregation. Then cwnd
is increased by the ACK aggregation estimate. The larger cwnd avoids BBR
being cwnd-limited in the face of ACK silences that recent history suggests
were caused by aggregation. As a sanity check, the ACK aggregation degree
is upper-bounded by the cwnd (at the time of measurement) and a global max
of BW * 100ms. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00

In our internal testing, we observed a significant increase in BBR
throughput (measured using netperf), in a basic wifi setup.
- Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi)
- 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps
- 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps

Also, this code is running globally on YouTube TCP connections and produced
significant bandwidth increases for YouTube traffic.

This is based on Ian Swett's max_ack_height_ algorithm from the
QUIC BBR implementation.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:27:27 -08:00
Priyaranjan Jha
232aa8ec3e tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning
Because bbr_target_cwnd() is really a general-purpose BBR helper for
computing some volume of inflight data as a function of the estimated
BDP, refactor it into following helper functions:
- bbr_bdp()
- bbr_quantization_budget()
- bbr_inflight()

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 22:27:26 -08:00
David S. Miller
9620d6f683 Merge tag 'linux-can-fixes-for-5.0-20190122' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:

====================
pull-request: can 2019-01-22

this is a pull request of 4 patches for net/master.

The first patch by is by Manfred Schlaegl and reverts a patch that caused wrong
warning messages in certain use cases. The next patch is by Oliver Hartkopp for
the bcm that adds sanity checks for the timer value before using it to detect
potential interger overflows. The last two patches are for the flexcan driver,
YueHaibing's patch fixes the the return value in the error path of the
flexcan_setup_stop_mode() function. The second patch is by Uwe Kleine-König and
fixes a NULL pointer deref on older flexcan cores in flexcan_chip_start().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 21:52:37 -08:00
Xin Long
ecf938fe7d sctp: set flow sport from saddr only when it's 0
Now sctp_transport_pmtu() passes transport->saddr into .get_dst() to set
flow sport from 'saddr'. However, transport->saddr is set only when
transport->dst exists in sctp_transport_route().

If sctp_transport_pmtu() is called without transport->saddr set, like
when transport->dst doesn't exists, the flow sport will be set to 0
from transport->saddr, which will cause a wrong route to be got.

Commit 6e91b578bf ("sctp: re-use sctp_transport_pmtu in
sctp_transport_route") made the issue be triggered more easily
since sctp_transport_pmtu() would be called in sctp_transport_route()
after that.

In gerneral, fl4->fl4_sport should always be set to
htons(asoc->base.bind_addr.port), unless transport->asoc doesn't exist
in sctp_v4/6_get_dst(), which is the case:

  sctp_ootb_pkt_new() ->
    sctp_transport_route()

For that, we can simply handle it by setting flow sport from saddr only
when it's 0 in sctp_v4/6_get_dst().

Fixes: 6e91b578bf ("sctp: re-use sctp_transport_pmtu in sctp_transport_route")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
Xin Long
4ff40b8626 sctp: set chunk transport correctly when it's a new asoc
In the paths:

  sctp_sf_do_unexpected_init() ->
    sctp_make_init_ack()
  sctp_sf_do_dupcook_a/b()() ->
    sctp_sf_do_5_1D_ce()

The new chunk 'retval' transport is set from the incoming chunk 'chunk'
transport. However, 'retval' transport belong to the new asoc, which
is a different one from 'chunk' transport's asoc.

It will cause that the 'retval' chunk gets set with a wrong transport.
Later when sending it and because of Commit b9fd683982 ("sctp: add
sctp_packet_singleton"), sctp_packet_singleton() will set some fields,
like vtag to 'retval' chunk from that wrong transport's asoc.

This patch is to fix it by setting 'retval' transport correctly which
belongs to the right asoc in sctp_make_init_ack() and
sctp_sf_do_5_1D_ce().

Fixes: b9fd683982 ("sctp: add sctp_packet_singleton")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
Xin Long
8220c870cb sctp: improve the events for sctp stream adding
This patch is to improve sctp stream adding events in 2 places:

  1. In sctp_process_strreset_addstrm_out(), move up SCTP_MAX_STREAM
     and in stream allocation failure checks, as the adding has to
     succeed after reconf_timer stops for the in stream adding
     request retransmission.

  3. In sctp_process_strreset_addstrm_in(), no event should be sent,
     as no in or out stream is added here.

Fixes: 50a41591f1 ("sctp: implement receiver-side procedures for the Add Outgoing Streams Request Parameter")
Fixes: c5c4ebb3ab ("sctp: implement receiver-side procedures for the Add Incoming Streams Request Parameter")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
Xin Long
2e6dc4d951 sctp: improve the events for sctp stream reset
This patch is to improve sctp stream reset events in 4 places:

  1. In sctp_process_strreset_outreq(), the flag should always be set with
     SCTP_STREAM_RESET_INCOMING_SSN instead of OUTGOING, as receiver's in
     stream is reset here.
  2. In sctp_process_strreset_outreq(), move up SCTP_STRRESET_ERR_WRONG_SSN
     check, as the reset has to succeed after reconf_timer stops for the
     in stream reset request retransmission.
  3. In sctp_process_strreset_inreq(), no event should be sent, as no in
     or out stream is reset here.
  4. In sctp_process_strreset_resp(), SCTP_STREAM_RESET_INCOMING_SSN or
     OUTGOING event should always be sent for stream reset requests, no
     matter it fails or succeeds to process the request.

Fixes: 8105447645 ("sctp: implement receiver-side procedures for the Outgoing SSN Reset Request Parameter")
Fixes: 16e1a91965 ("sctp: implement receiver-side procedures for the Incoming SSN Reset Request Parameter")
Fixes: 11ae76e67a ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 18:13:57 -08:00
wenxu
d71b57532d ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
ip l add dev tun type gretap key 1000
ip a a dev tun 10.0.0.1/24

Packets with tun-id 1000 can be recived by tun dev. But packet can't
be sent through dev tun for non-tunnel-dst

With this patch: tunnel-dst can be get through lwtunnel like beflow:
ip r a 10.0.0.7 encap ip dst 172.168.0.11 dev tun

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 17:54:12 -08:00
Björn Töpel
a36b38aa2a xsk: add sock_diag interface for AF_XDP
This patch adds the sock_diag interface for querying sockets from user
space. Tools like iproute2 ss(8) can use this interface to list open
AF_XDP sockets.

The user-space ABI is defined in linux/xdp_diag.h and includes netlink
request and response structs. The request can query sockets and the
response contains socket information about the rings, umems, inode and
more.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-25 01:50:03 +01:00
Björn Töpel
50e74c0131 xsk: add id to umem
This commit adds an id to the umem structure. The id uniquely
identifies a umem instance, and will be exposed to user-space via the
socket monitoring interface.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-25 01:50:03 +01:00
Björn Töpel
1d0dc06930 net: xsk: track AF_XDP sockets on a per-netns list
Track each AF_XDP socket in a per-netns list. This will be used later
by the sock_diag interface for querying sockets from userspace.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-25 01:50:03 +01:00
ZhangXiaoxu
53ab60baa1 ipvs: Fix signed integer overflow when setsockopt timeout
There is a UBSAN bug report as below:
UBSAN: Undefined behaviour in net/netfilter/ipvs/ip_vs_ctl.c:2227:21
signed integer overflow:
-2147483647 * 1000 cannot be represented in type 'int'

Reproduce program:
	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/socket.h>

	#define IPPROTO_IP 0
	#define IPPROTO_RAW 255

	#define IP_VS_BASE_CTL		(64+1024+64)
	#define IP_VS_SO_SET_TIMEOUT	(IP_VS_BASE_CTL+10)

	/* The argument to IP_VS_SO_GET_TIMEOUT */
	struct ipvs_timeout_t {
		int tcp_timeout;
		int tcp_fin_timeout;
		int udp_timeout;
	};

	int main() {
		int ret = -1;
		int sockfd = -1;
		struct ipvs_timeout_t to;

		sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
		if (sockfd == -1) {
			printf("socket init error\n");
			return -1;
		}

		to.tcp_timeout = -2147483647;
		to.tcp_fin_timeout = -2147483647;
		to.udp_timeout = -2147483647;

		ret = setsockopt(sockfd,
				 IPPROTO_IP,
				 IP_VS_SO_SET_TIMEOUT,
				 (char *)(&to),
				 sizeof(to));

		printf("setsockopt return %d\n", ret);
		return ret;
	}

Return -EINVAL if the timeout value is negative or max than 'INT_MAX / HZ'.

Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-24 13:38:54 +01:00
Eric Dumazet
d9ff286a0f bpf: allow BPF programs access skb_shared_info->gso_segs field
This adds the ability to read gso_segs from a BPF program.

v3: Use BPF_REG_AX instead of BPF_REG_TMP for the temporary register,
    as suggested by Martin.

v2: refined Eddie Hao patch to address Alexei feedback.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eddie Hao <eddieh@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-24 10:49:05 +01:00
Eric Dumazet
63530aba78 ax25: fix possible use-after-free
syzbot found that ax25 routes where not properly protected
against concurrent use [1].

In this particular report the bug happened while
copying ax25->digipeat.

Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.

The current two ax25_get_route() callers do not sleep,
so this change should be fine.

Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.

[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531

ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x24/0x50 mm/kasan/common.c:130
 memcpy include/linux/string.h:352 [inline]
 kmemdup+0x42/0x60 mm/util.c:113
 kmemdup include/linux/string.h:425 [inline]
 ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
 __sys_connect+0x357/0x490 net/socket.c:1664
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect net/socket.c:1672 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1672
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff

Allocated by task 526:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
 kmalloc include/linux/slab.h:545 [inline]
 ax25_rt_add net/ax25/ax25_route.c:95 [inline]
 ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kfree+0xcf/0x230 mm/slab.c:3806
 ax25_rt_add net/ax25/ax25_route.c:92 [inline]
 ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888066641a80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
 96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                   ^
 ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23 11:18:00 -08:00
Gustavo A. R. Silva
6317950c1b Bluetooth: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

net/bluetooth/rfcomm/core.c:479:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/bluetooth/l2cap_core.c:4223:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-01-23 18:27:16 +01:00
Gustavo A. R. Silva
f79e3365bc tipc: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

net/tipc/link.c:1125:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/tipc/socket.c:736:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
net/tipc/socket.c:2418:7: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-23 09:06:35 -08:00
Marcel Holtmann
7c9cbd0b5e Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer
The function l2cap_get_conf_opt will return L2CAP_CONF_OPT_SIZE + opt->len
as length value. The opt->len however is in control over the remote user
and can be used by an attacker to gain access beyond the bounds of the
actual packet.

To prevent any potential leak of heap memory, it is enough to check that
the resulting len calculation after calling l2cap_get_conf_opt is not
below zero. A well formed packet will always return >= 0 here and will
end with the length value being zero after the last option has been
parsed. In case of malformed packets messing with the opt->len field the
length value will become negative. If that is the case, then just abort
and ignore the option.

In case an attacker uses a too short opt->len value, then garbage will
be parsed, but that is protected by the unknown option handling and also
the option parameter size checks.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-01-23 13:35:07 +02:00
Yafang Shao
c9e4576743 bpf: sock recvbuff must be limited by rmem_max in bpf_setsockopt()
When sock recvbuff is set by bpf_setsockopt(), the value must by
limited by rmem_max. It is the same with sendbuff.

Fixes: 8c4b4c7e9f ("bpf: Add setsockopt helper function to bpf")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-23 12:22:22 +01:00
Marcel Holtmann
af3d5d1c87 Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt
When doing option parsing for standard type values of 1, 2 or 4 octets,
the value is converted directly into a variable instead of a pointer. To
avoid being tricked into being a pointer, check that for these option
types that sizes actually match. In L2CAP every option is fixed size and
thus it is prudent anyway to ensure that the remote side sends us the
right option size along with option paramters.

If the option size is not matching the option type, then that option is
silently ignored. It is a protocol violation and instead of trying to
give the remote attacker any further hints just pretend that option is
not present and proceed with the default values. Implementation
following the specification and its qualification procedures will always
use the correct size and thus not being impacted here.

To keep the code readable and consistent accross all options, a few
cosmetic changes were also required.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-01-23 12:53:20 +02:00
Gustavo A. R. Silva
3bbe8b1a4a 9p: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

net/9p/trans_xen.c:514:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Link: http://lkml.kernel.org/r/20190123071632.GA8039@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
2019-01-23 16:29:54 +09:00
Nathan Chancellor
33a0efa4ba devlink: Use DIV_ROUND_UP_ULL in DEVLINK_HEALTH_SIZE_TO_BUFFERS
When building this code on a 32-bit platform such as ARM, there is a
link time error (lld error shown, happpens with ld.bfd too):

ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by devlink.c
>>>               net/core/devlink.o:(devlink_health_buffers_create) in archive built-in.a

This happens when using a regular division symbol with a u64 dividend.
Use DIV_ROUND_UP_ULL, which wraps do_div, to avoid this situation.

Fixes: cb5ccfbe73 ("devlink: Add health buffer support")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 21:00:37 -08:00
Lubomir Rintel
7c62b8dd5c net/ipv6: lower the level of "link is not ready" messages
This message gets logged far too often for how interesting is it.

Most distributions nowadays configure NetworkManager to use randomly
generated MAC addresses for Wi-Fi network scans. The interfaces end up
being periodically brought down for the address change. When they're
subsequently brought back up, the message is logged, eventually flooding
the log.

Perhaps the message is not all that helpful: it seems to be more
interesting to hear when the addrconf actually start, not when it does
not. Let's lower its level.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-By: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 20:42:39 -08:00
YueHaibing
ed175d9c6f devlink: Add missing check of nlmsg_put
nlmsg_put may fail, this fix add a check of its return value.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 20:08:23 -08:00
Jakub Kicinski
1518039f6b net/ipv6: don't return positive numbers when nothing was dumped
in6_dump_addrs() returns a positive 1 if there was nothing to dump.
This return value can not be passed as return from inet6_dump_addr()
as is, because it will confuse rtnetlink, resulting in NLMSG_DONE
never getting set:

$ ip addr list dev lo
EOF on netlink
Dump terminated

v2: flip condition to avoid a new goto (DaveA)

Fixes: 7c1e8a3817 ("netlink: fixup regression in RTM_GETADDR")
Reported-by: Brendan Galloway <brendan.galloway@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:24:18 -08:00
Linus Lüssing
4b3087c7e3 bridge: Snoop Multicast Router Advertisements
When multiple multicast routers are present in a broadcast domain then
only one of them will be detectable via IGMP/MLD query snooping. The
multicast router with the lowest IP address will become the selected and
active querier while all other multicast routers will then refrain from
sending queries.

To detect such rather silent multicast routers, too, RFC4286
("Multicast Router Discovery") provides a standardized protocol to
detect multicast routers for multicast snooping switches.

This patch implements the necessary MRD Advertisement message parsing
and after successful processing adds such routers to the internal
multicast router list.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:09 -08:00