Commit Graph

49907 Commits

Author SHA1 Message Date
Xin Long
de60fe9105 sctp: implement handle_ftsn for sctp_stream_interleave
handle_ftsn is added as a member of sctp_stream_interleave, used to skip
ssn for data or mid for idata, called for SCTP_CMD_PROCESS_FWDTSN cmd.

sctp_handle_iftsn works for ifwdtsn, and sctp_handle_fwdtsn works for
fwdtsn. Note that different from sctp_handle_fwdtsn, sctp_handle_iftsn
could do stream abort pd.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
Xin Long
47b20a8856 sctp: implement report_ftsn for sctp_stream_interleave
report_ftsn is added as a member of sctp_stream_interleave, used to
skip tsn from tsnmap, remove old events from reasm or lobby queue,
and abort pd for data or idata, called for SCTP_CMD_REPORT_FWDTSN
cmd and asoc reset.

sctp_report_iftsn works for ifwdtsn, and sctp_report_fwdtsn works
for fwdtsn. Note that sctp_report_iftsn doesn't do asoc abort_pd,
as stream abort_pd will be done when handling ifwdtsn. But when
ftsn is equal with ftsn, which means asoc reset, asoc abort_pd has
to be done.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
Xin Long
0fc2ea922c sctp: implement validate_ftsn for sctp_stream_interleave
validate_ftsn is added as a member of sctp_stream_interleave, used to
validate ssn/chunk type for fwdtsn or mid (message id)/chunk type for
ifwdtsn, called in sctp_sf_eat_fwd_tsn, just as validate_data.

If this check fails, an abort packet will be sent, as said in section
2.3.1 of RFC8260.

As ifwdtsn and fwdtsn chunks have different length, it also defines
ftsn_chunk_len for sctp_stream_interleave to describe the chunk size.
Then it replaces all sizeof(struct sctp_fwdtsn_chunk) with
sctp_ftsnchk_len.

It also adds the process for ifwdtsn in rx path. As Marcelo pointed
out, there's no need to add event table for ifwdtsn, but just share
prsctp_chunk_event_table with fwdtsn's. It would drop fwdtsn chunk
for ifwdtsn and drop ifwdtsn chunk for fwdtsn by calling validate_ftsn
in sctp_sf_eat_fwd_tsn.

After this patch, the ifwdtsn can be accepted.

Note that this patch also removes the sctp.intl_enable check for
idata chunks in sctp_chunk_event_lookup, as it will do this check
in validate_data later.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
Xin Long
8e0c3b73ce sctp: implement generate_ftsn for sctp_stream_interleave
generate_ftsn is added as a member of sctp_stream_interleave, used to
create fwdtsn or ifwdtsn chunk according to abandoned chunks, called
in sctp_retransmit and sctp_outq_sack.

sctp_generate_iftsn works for ifwdtsn, and sctp_generate_fwdtsn is
still used for making fwdtsn.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:21 -05:00
Xin Long
2d07a49ade sctp: add basic structures and make chunk function for ifwdtsn
sctp_ifwdtsn_skip, sctp_ifwdtsn_hdr and sctp_ifwdtsn_chunk are used to
define and parse I-FWD TSN chunk format, and sctp_make_ifwdtsn is a
function to build the chunk.

The I-FORWARD-TSN Chunk Format is defined in section 2.3.1 of RFC8260.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:21 -05:00
Yuval Mintz
428a68af3a net: sched: Move to new offload indication in RED
Let RED utilize the new internal flag, TCQ_F_OFFLOADED,
to mark a given qdisc as offloaded instead of using a dedicated
indication.

Also, change internal logic into looking at said flag when possible.

Fixes: 602f3baf22 ("net_sch: red: Add offload ability to RED qdisc")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:35:36 -05:00
Yuval Mintz
7a4fa29106 net: sched: Add TCA_HW_OFFLOAD
Qdiscs can be offloaded to HW, but current implementation isn't uniform.
Instead, qdiscs either pass information about offload status via their
TCA_OPTIONS or omit it altogether.

Introduce a new attribute - TCA_HW_OFFLOAD that would form a uniform
uAPI for the offloading status of qdiscs.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:35:36 -05:00
William Tu
94d7d8f292 ip6_gre: add erspan v2 support
Similar to support for ipv4 erspan, this patch adds
erspan v2 to ip6erspan tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:34:00 -05:00
William Tu
f551c91de2 net: erspan: introduce erspan v2 for ip_gre
The patch adds support for erspan version 2.  Not all features are
supported in this patch.  The SGT (security group tag), GRA (timestamp
granularity), FT (frame type) are set to fixed value.  Only hardware
ID and direction are configurable.  Optional subheader is also not
supported.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:34:00 -05:00
William Tu
1d7e2ed22f net: erspan: refactor existing erspan code
The patch refactors the existing erspan implementation in order
to support erspan version 2, which has additional metadata.  So, in
stead of having one 'struct erspanhdr' holding erspan version 1,
breaks it into 'struct erspan_base_hdr' and 'struct erspan_metadata'.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:33:59 -05:00
Willem de Bruijn
35b99dffc3 sock: free skb in skb_complete_tx_timestamp on error
skb_complete_tx_timestamp must ingest the skb it is passed. Call
kfree_skb if the skb cannot be enqueued.

Fixes: b245be1f4d ("net-timestamp: no-payload only sysctl")
Fixes: 9ac25fc063 ("net: fix socket refcounting in skb_complete_tx_timestamp()")
Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:30:36 -05:00
Sven Eckelmann
ff15c27c97 batman-adv: Add kernel-doc to externally visible functions
According to the kernel-doc documentation, externally visible functions
should be documented. This refers to all all non-static function which can
(and will) be used by functions in other sources files.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:24 +01:00
Sven Eckelmann
e57acf8e93 batman-adv: Add kernel-doc to functions in headers
Externally visible functions should be documented with kernel-doc. This
usually refers to non-static functions but also static inline files in
headers are visible in other files and should therefore be documented.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:23 +01:00
Sven Eckelmann
73844a8c78 batman-adv: Add kernel-doc to enums in headers
All enums in types.h are already documented. But some other headers
still have private enums which also should be documented.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:22 +01:00
Sven Eckelmann
c93effcf72 batman-adv: Add kernel-doc to structs in headers
All structs in types.h are already documented. But some other headers
still have private structs which also should be documented.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:22 +01:00
Sven Eckelmann
a07369d7fb batman-adv: Fix kernel-doc references to struct members
The correct syntax to create references in kernel-doc to a struct member is
not "struct_name::member"" but "&struct_name->member" or
"&struct_name.member". The correct syntax is required to get the correct
cross-referencing in the reStructuredText text output.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:21 +01:00
Sven Eckelmann
8b84cc4fb5 batman-adv: Use inline kernel-doc for enum/struct
The inline kernel-doc comments make it easier to keep changes to the
struct/enum synchronized with the documentation of the it. And it makes it
easier for larger structures like struct batadv_priv to read the
documentation inside the code.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:20 +01:00
Sven Eckelmann
7e9a8c2ce7 batman-adv: Use parentheses in function kernel-doc
The documentation describing kernel-doc comments for functions ("How to
format kernel-doc comments") uses parentheses at the end of the function
name. Using this format allows to use a consistent style when adding
documentation to a function and when referencing this function in a
different kernel-doc section.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:20 +01:00
Sven Eckelmann
6a3038f07c batman-adv: Add missing kernel-doc to packet.h
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:19 +01:00
Sven Eckelmann
eface060c2 batman-adv: Remove unused sched.h include
The linux/wait.h include was removed with commit 421d988b2c08 ("batman-adv:
Consolidate logging related functions"). The previously required (but not
unused) linux/sched.h include can also be dropped now.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:18 +01:00
Sven Eckelmann
8cfba951a0 batman-adv: include kobject.h for kobject_* functions
The linux/kobject.h provides the kobject_* function declarations and should
therefore be included directly before they are used.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:18 +01:00
Sven Eckelmann
3a64469e4f batman-adv: Include net.h for net_ratelimited_function
The linux/net.h provides the net_ratelimited_function. It should
therefore be included directly before it is used.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:17 +01:00
Sven Eckelmann
ecc36f5ee6 batman-adv: include build_bug.h for BUILD_BUG_ON define
commit bc6245e5ef ("bug: split BUILD_BUG stuff out into
<linux/build_bug.h>") added a new header for BUILD_BUG_ON. It should
therefore be included instead of linux/bug.h

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:29:15 +01:00
Sven Eckelmann
b92b94ac73 batman-adv: include gfp.h for GFP_* defines
The linux/gfp.h provides the GFP_ATOMIC and GFP_KERNEL define. It should
therefore be included instead of linux/fs.h.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:24:10 +01:00
Sven Eckelmann
9969ffa85f batman-adv: Add license header to Kconfig
The last remaining file without license notice and/or SPDX license
identifier under net/batman-adv/ is the Kconfig. It should have been
licensed under the same conditions as the rest of batman-adv and the
Makefile which uses the CONFIG_* variables from Kconfig.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:24:08 +01:00
Sven Eckelmann
7db7d9f369 batman-adv: Add SPDX license identifier above copyright header
The "Linux kernel licensing rules" require that each file has a SPDX
license identifier as first line (and sometimes as second line).

The FSFE REUSE practices [1] would also require the same tags but have no
restrictions on the placement in the source file. Using the "Linux kernel
licensing rules" is therefore also fulfilling the FSFE REUSE practices
requirements at the same time.

[1] https://reuse.software/practices/

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-12-15 17:22:45 +01:00
David S. Miller
8ce38aeb55 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-12-15

1) Currently we can add or update socket policies, but
   not clear them. Support clearing of socket policies
   too. From Lorenzo Colitti.

2) Add documentation for the xfrm device offload api.
   From Shannon Nelson.

3) Fix IPsec extended sequence numbers (ESN) for
   IPsec offloading. From Yossef Efraim.

4) xfrm_dev_state_add function returns success even for
   unsupported options, fix this to fail in such cases.
   From Yossef Efraim.

5) Remove a redundant xfrm_state assignment.
   From Aviv Heller.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:10:27 -05:00
David S. Miller
0f546ffcd0 Merge tag 'batadv-net-for-davem-20171215' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - Initialize the fragment headers, by Sven Eckelmann

 - Fix a NULL check in BATMAN V, by Sven Eckelmann

 - Fix kernel doc for the time_setup() change, by Sven Eckelmann

 - Use the right lock in BATMAN IV OGM Update, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:02:11 -05:00
Sean Wang
f0af34317f net: dsa: mediatek: combine MediaTek tag with VLAN tag
In order to let MT7530 switch can recognize well those egress packets
having both special tag and VLAN tag, the information about the special
tag should be carried on the existing VLAN tag. On the other hand, it's
unnecessary for extra handling for ingress packets when VLAN tag is
present since it is able to put the VLAN tag after the special tag and
then follow the existing way to parse.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 10:31:54 -05:00
Song Liu
2d17d8d79e xdp: linearize skb in netif_receive_generic_xdp()
In netif_receive_generic_xdp(), it is necessary to linearize all
nonlinear skb. However, in current implementation, skb with
troom <= 0 are not linearized. This patch fixes this by calling
skb_linearize() for all nonlinear skb.

Fixes: de8f3a83b0 ("bpf: add meta pointer for direct access")
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-15 14:34:36 +01:00
Thiago Rafael Becker
bdcf0a423e kernel: make groups_sort calling a responsibility group_info allocators
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:49 -08:00
Eric Dumazet
4688eb7cf3 tcp: refresh tcp_mstamp from timers callbacks
Only the retransmit timer currently refreshes tcp_mstamp

We should do the same for delayed acks and keepalives.

Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.

Fixes: 385e20706f ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by:  Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:04:04 -05:00
Wei Wang
9ee11bd03c tcp: fix potential underestimation on rcv_rtt
When ms timestamp is used, current logic uses 1us in
tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us.
This could cause rcv_rtt underestimation.
Fix it by always using a min value of 1ms if ms timestamp is used.

Fixes: 645f4c6f2e ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:01:17 -05:00
Yuchung Cheng
7268586baa tcp: pause Fast Open globally after third consecutive timeout
Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.

This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout.  Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.

Reported-by: Dragana Damjanovic <ddamjanovic@mozilla.com>
Reported-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:51:12 -05:00
Willem de Bruijn
8d74e9f88d net: avoid skb_warn_bad_offload on IS_ERR
skb_warn_bad_offload warns when packets enter the GSO stack that
require skb_checksum_help or vice versa. Do not warn on arbitrary
bad packets. Packet sockets can craft many. Syzkaller was able to
demonstrate another one with eth_type games.

In particular, suppress the warning when segmentation returns an
error, which is for reasons other than checksum offload.

See also commit 36c9247449 ("net: WARN if skb_checksum_help() is
called on skb requiring segmentation") for context on this warning.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:14:10 -05:00
Nikolay Aleksandrov
eb7935830d net: bridge: use rhashtable for fdbs
Before this patch the bridge used a fixed 256 element hash table which
was fine for small use cases (in my tests it starts to degrade
above 1000 entries), but it wasn't enough for medium or large
scale deployments. Modern setups have thousands of participants in a
single bridge, even only enabling vlans and adding a few thousand vlan
entries will cause a few thousand fdbs to be automatically inserted per
participating port. So we need to scale the fdb table considerably to
cope with modern workloads, and this patch converts it to use a
rhashtable for its operations thus improving the bridge scalability.
Tests show the following results (10 runs each), at up to 1000 entries
rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
is 2 times faster and at 30000 it is 50 times faster.
Obviously this happens because of the properties of the two constructs
and is expected, rhashtable keeps pretty much a constant time even with
10000000 entries (tested), while the fixed hash table struggles
considerably even above 10000.
As a side effect this also reduces the net_bridge struct size from 3248
bytes to 1344 bytes. Also note that the key struct is 8 bytes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 15:10:01 -05:00
Eric Dumazet
ec94c2696f tcp/dccp: avoid one atomic operation for timewait hashdance
First, rename __inet_twsk_hashdance() to inet_twsk_hashdance()

Then, remove one inet_twsk_put() by setting tw_refcnt to 3 instead
of 4, but adding a fat warning that we do not have the right to access
tw anymore after inet_twsk_hashdance()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 14:33:10 -05:00
Andy Shevchenko
22b371cbb9 Bluetooth: introduce DEFINE_SHOW_ATTRIBUTE() macro
This macro deduplicates a lot of similar code across the hci_debugfs.c
module. Targeting to be moved to seq_file.h eventually.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-12-13 20:20:34 +01:00
David S. Miller
d6da83813f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The follow patchset contains Netfilter fixes for your net tree,
they are:

1) Fix compilation warning in x_tables with clang due to useless
   redundant reassignment, from Colin Ian King.

2) Add bugtrap to net_exit to catch uninitialized lists, patch
   from Vasily Averin.

3) Fix out of bounds memory reads in H323 conntrack helper, this
   comes with an initial patch to remove replace the obscure
   CHECK_BOUND macro as a dependency. From Eric Sesterhenn.

4) Reduce retransmission timeout when window is 0 in TCP conntrack,
   from Florian Westphal.

6) ctnetlink clamp timeout to INT_MAX if timeout is too large,
   otherwise timeout wraps around and it results in killing the
   entry that is being added immediately.

7) Missing CAP_NET_ADMIN checks in cthelper and xt_osf, due to
   no netns support. From Kevin Cernekee.

8) Missing maximum number of instructions checks in xt_bpf, patch
   from Jann Horn.

9) With no CONFIG_PROC_FS ipt_CLUSTERIP compilation breaks,
   patch from Arnd Bergmann.

10) Missing netlink attribute policy in nftables exthdr, from
    Florian Westphal.

11) Enable conntrack with IPv6 MASQUERADE rules, as a357b3f80b
    should have done in first place, from Konstantin Khlebnikov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 14:12:20 -05:00
Neal Cardwell
b4f70c3d4e tcp: allow TLP in ECN CWR
This patch enables tail loss probe in cwnd reduction (CWR) state
to detect potential losses. Prior to this patch, since the sender
uses PRR to determine the cwnd in CWR state, the combination of
CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls
upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer()
defers sending wait for more ACKs. The ACKs may not come due to
packet losses.

Disallowing TLP when there is unused cwnd had the primary effect
of disallowing TLP when there is TSO deferral, Nagle deferral,
or we hit the rwin limit. Because basically every application
write() or incoming ACK will cause us to run tcp_write_xmit()
to see if we can send more, and then if we sent something we call
tcp_schedule_loss_probe() to see if we should schedule a TLP. At
that point, there are a few common reasons why some cwnd budget
could still be unused:

(a) rwin limit
(b) nagle check
(c) TSO deferral
(d) TSQ

For (d), after the next packet tx completion the TSQ mechanism
will allow us to send more packets, so we don't really need a
TLP (in practice it shouldn't matter whether we schedule one
or not). But for (a), (b), (c) the sender won't send any more
packets until it gets another ACK. But if the whole flight was
lost, or all the ACKs were lost, then we won't get any more ACKs,
and ideally we should schedule and send a TLP to get more feedback.
In particular for a long time we have wanted some kind of timer for
TSO deferral, and at least this would give us some kind of timer

Reported-by: Steve Ibanez <sibanez@stanford.edu>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Nandita Dukkipati <nanditad@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:59:21 -05:00
Cong Wang
039af9c66b net_sched: switch to exit_batch for action pernet ops
Since we now hold RTNL lock in tc_action_net_exit(), it is good to
batch them to speedup tc action dismantle.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:58:41 -05:00
Kevin Cernekee
a46182b002 net: igmp: Use correct source address on IGMPv3 reports
Closing a multicast socket after the final IPv4 address is deleted
from an interface can generate a membership report that uses the
source IP from a different interface.  The following test script, run
from an isolated netns, reproduces the issue:

    #!/bin/bash

    ip link add dummy0 type dummy
    ip link add dummy1 type dummy
    ip link set dummy0 up
    ip link set dummy1 up
    ip addr add 10.1.1.1/24 dev dummy0
    ip addr add 192.168.99.99/24 dev dummy1

    tcpdump -U -i dummy0 &
    socat EXEC:"sleep 2" \
        UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 &

    sleep 1
    ip addr del 10.1.1.1/24 dev dummy0
    sleep 5
    kill %tcpdump

RFC 3376 specifies that the report must be sent with a valid IP source
address from the destination subnet, or from address 0.0.0.0.  Add an
extra check to make sure this is the case.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:51:27 -05:00
Jon Maloy
c545a945d0 tipc: eliminate potential memory leak
In the function tipc_sk_mcast_rcv() we call refcount_dec(&skb->users)
on received sk_buffers. Since the reference counter might hit zero at
this point, we have a potential memory leak.

We fix this by replacing refcount_dec() with kfree_skb().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:44:36 -05:00
Pravin Shedge
83593010d3 net: remove duplicate includes
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:18:46 -05:00
Eric Dumazet
b5476022bb ipv4: igmp: guard against silly MTU values
IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.

Fix this by reading device mtu once and checking IPv4 minimal MTU.

This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:13:58 -05:00
Eric Dumazet
b9b312a7a4 ipv6: mcast: better catch silly mtu values
syzkaller reported crashes in IPv6 stack [1]

Xin Long found that lo MTU was set to silly values.

IPv6 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in mld code where it is assumed
the mtu is suitable.

Fix this by reading device mtu once and checking IPv6 minimal MTU.

[1]
 skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20
 head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo
 ------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:104!
 invalid opcode: 0000 [#1] SMP KASAN
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Modules linked in:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
 RSP: 0018:ffff8801db307508 EFLAGS: 00010286
 RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000
 RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95
 RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020
 R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540
 FS:  0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <IRQ>
  skb_over_panic net/core/skbuff.c:109 [inline]
  skb_put+0x181/0x1c0 net/core/skbuff.c:1694
  add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
  add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
  mld_send_cr net/ipv6/mcast.c:1903 [inline]
  mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
  call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
  expire_timers kernel/time/timer.c:1357 [inline]
  __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
  run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d3/0x210 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:540 [inline]
  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 13:13:15 -05:00
Łukasz Rymanowski
9e1e9f20ca Bluetooth: Add support to advertise when connected
So far, kernel did not allow to advertise when there was a connection
established. With this patch kernel does allow it if controller
supports it.

If controller supports non-connectable advertising when connected, then
only non-connectable advertising instances will be advertised.

Signed-off-by: Łukasz Rymanowski <lukasz.rymanowski@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-12-13 09:41:37 +01:00
Jaganath Kanakkassery
94386b6a5b Bluetooth: Remove redundant disable_advertising()
There is already __hci_req_disable_advertising() function for disabling,
so use it.

Signed-off-by: Jaganath Kanakkassery <jaganathx.kanakkassery@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-12-13 00:28:42 +01:00
Andy Shevchenko
8a95079448 Bluetooth: Utilize %*ph specifier
Instead of open coding byte-by-byte printing, re-use %*ph specifier.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-12-13 00:28:42 +01:00
Markus Elfring
1b259904a2 Bluetooth: Use common error handling code in bt_init()
* Improve jump targets so that a bit of exception handling can be better
  reused at the end of this function.

* Adjust five condition checks.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-12-13 00:28:40 +01:00