linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-24 03:42:52 +00:00

Author	SHA1	Message	Date
Florian Westphal	d7591f0c41	netfilter: x_tables: introduce and use xt_copy_counters_from_user The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:41 +02:00
Florian Westphal	aded9f3e9f	netfilter: x_tables: remove obsolete check Since 'netfilter: x_tables: validate targets of jumps' change we validate that the target aligns exactly with beginning of a rule, so offset test is now redundant. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:41 +02:00
Florian Westphal	95609155d7	netfilter: x_tables: remove obsolete overflow check for compat case too commit `9e67d5a739` ("[NETFILTER]: x_tables: remove obsolete overflow check") left the compat parts alone, but we can kill it there as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:40 +02:00
Florian Westphal	09d9686047	netfilter: x_tables: do compat validation via translate_table This looks like refactoring, but its also a bug fix. Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few sanity tests that are done in the normal path. For example, we do not check for underflows and the base chain policies. While its possible to also add such checks to the compat path, its more copy&pastry, for instance we cannot reuse check_underflow() helper as e->target_offset differs in the compat case. Other problem is that it makes auditing for validation errors harder; two places need to be checked and kept in sync. At a high level 32 bit compat works like this: 1- initial pass over blob: validate match/entry offsets, bounds checking lookup all matches and targets do bookkeeping wrt. size delta of 32/64bit structures assign match/target.u.kernel pointer (points at kernel implementation, needed to access ->compatsize etc.) 2- allocate memory according to the total bookkeeping size to contain the translated ruleset 3- second pass over original blob: for each entry, copy the 32bit representation to the newly allocated memory. This also does any special match translations (e.g. adjust 32bit to 64bit longs, etc). 4- check if ruleset is free of loops (chase all jumps) 5-first pass over translated blob: call the checkentry function of all matches and targets. The alternative implemented by this patch is to drop steps 3&4 from the compat process, the translation is changed into an intermediate step rather than a full 1:1 translate_table replacement. In the 2nd pass (step #3), change the 64bit ruleset back to a kernel representation, i.e. put() the kernel pointer and restore ->u.user.name . This gets us a 64bit ruleset that is in the format generated by a 64bit iptables userspace -- we can then use translate_table() to get the 'native' sanity checks. This has two drawbacks: 1. we re-validate all the match and target entry structure sizes even though compat translation is supposed to never generate bogus offsets. 2. we put and then re-lookup each match and target. THe upside is that we get all sanity tests and ruleset validations provided by the normal path and can remove some duplicated compat code. iptables-restore time of autogenerated ruleset with 300k chains of form -A CHAIN0001 -m limit --limit 1/s -j CHAIN0002 -A CHAIN0002 -m limit --limit 1/s -j CHAIN0003 shows no noticeable differences in restore times: old: 0m30.796s new: 0m31.521s 64bit: 0m25.674s Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:40 +02:00
Florian Westphal	0188346f21	netfilter: x_tables: xt_compat_match_from_user doesn't need a retval Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:40 +02:00
Florian Westphal	8dddd32756	netfilter: arp_tables: simplify translate_compat_table args Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:39 +02:00
Florian Westphal	329a080712	netfilter: ip6_tables: simplify translate_compat_table args Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:39 +02:00
Florian Westphal	7d3f843eed	netfilter: ip_tables: simplify translate_compat_table args Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:38 +02:00
Florian Westphal	13631bfc60	netfilter: x_tables: validate all offsets and sizes in a rule Validate that all matches (if any) add up to the beginning of the target and that each match covers at least the base structure size. The compat path should be able to safely re-use the function as the structures only differ in alignment; added a BUILD_BUG_ON just in case we have an arch that adds padding as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:38 +02:00
Florian Westphal	ce683e5f9d	netfilter: x_tables: check for bogus target offset We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:37 +02:00
Florian Westphal	7ed2abddd2	netfilter: x_tables: check standard target size too We have targets and standard targets -- the latter carries a verdict. The ip/ip6tables validation functions will access t->verdict for the standard targets to fetch the jump offset or verdict for chainloop detection, but this happens before the targets get checked/validated. Thus we also need to check for verdict presence here, else t->verdict can point right after a blob. Spotted with UBSAN while testing malformed blobs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:37 +02:00
Florian Westphal	fc1221b3a1	netfilter: x_tables: add compat version of xt_check_entry_offsets 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:36 +02:00
Florian Westphal	a08e4e190b	netfilter: x_tables: assert minimum target size The target size includes the size of the xt_entry_target struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:36 +02:00
Florian Westphal	aa412ba225	netfilter: x_tables: kill check_entry helper Once we add more sanity testing to xt_check_entry_offsets it becomes relvant if we're expecting a 32bit 'config_compat' blob or a normal one. Since we already have a lot of similar-named functions (check_entry, compat_check_entry, find_and_check_entry, etc.) and the current incarnation is short just fold its contents into the callers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:36 +02:00
Florian Westphal	7d35812c32	netfilter: x_tables: add and use xt_check_entry_offsets Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:35 +02:00
Florian Westphal	3647234101	netfilter: x_tables: validate targets of jumps When we see a jump also check that the offset gets us to beginning of a rule (an ipt_entry). The extra overhead is negible, even with absurd cases. 300k custom rules, 300k jumps to 'next' user chain: [ plus one jump from INPUT to first userchain ]: Before: real 0m24.874s user 0m7.532s sys 0m16.076s After: real 0m27.464s user 0m7.436s sys 0m18.840s Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:35 +02:00
Florian Westphal	f24e230d25	netfilter: x_tables: don't move to non-existent next rule Ben Hawkes says: In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it is possible for a user-supplied ipt_entry structure to have a large next_offset field. This field is not bounds checked prior to writing a counter value at the supplied offset. Base chains enforce absolute verdict. User defined chains are supposed to end with an unconditional return, xtables userspace adds them automatically. But if such return is missing we will move to non-existent next rule. Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:34 +02:00
Andrew Lunn	74c3e2a54b	dsa: Rename phys_port_mask to enabled_port_mask The phys in phys_port_mask suggests this mask is about PHYs. In fact, it means physical ports. Rename to enabled_port_mask, indicating external enabled ports of the switch, which is hopefully less confusing. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 18:15:23 -04:00
Andrew Lunn	5feebd0a8a	net: dsa: Remove allocation of driver private memory The drivers now allocate their own memory for private usage. Remove the allocation from the core code. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 18:15:23 -04:00
Andrew Lunn	7543a6d535	net: dsa: Have the switch driver allocate there own private memory Now the switch devices have a dev pointer, make use of it for allocating the drivers private data structures using a devm_kzalloc(). Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 18:15:23 -04:00
Andrew Lunn	bbb8d79399	net: dsa: Pass the dsa device to the switch drivers By passing a device structure to the switch devices, it allows them to use devm_* methods for resource management. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 18:15:22 -04:00
David S. Miller	71bbe25d01	To synchronize with Kalle, here's just a big change that affects all drivers - removing the duplicated enum ieee80211_band and replacing it by enum nl80211_band. On top of that, just a small documentation update. -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJXDp6hAAoJEGt7eEactAAd/xUQAJtKNwp9CLsx+QFx6lMoXX4x r0XA8DFgLp1BflS9P05/g1m0NiQxm3YuRtpze/FdPglb6AAVjLcqksf+vTkU+Lng p7rIkb/fQv5s5aoYPxNrD5zgwALVv9y5fI7rV7scj355iesCC0PmAP34own2Dihi eBVSammsh5ZNTQKLBk8vXECb0UKWsDBMgp4uQc35Bpw8XSx5Nrtl5JI/hMcckte0 a/FQyQKjmjl3O/nRLn3kzGPv1OnRiJOMb5fMWB+Xm2cLtmKPHIErgVk2l/CMaiYj sRJR8KaZQpQsyWiQU59UNpywlejy7Z1RsSWmuPhm0xTGzIF1wVIgHJSsRI/gNGD2 8Ey1P+RXkM8NVxrQr/0fis9XWyWfE8ne4tFsPiPOD3VmBiStIB9fAukJHLrvTmKU JrkXCePUkfNY/PqJqlP/RONBcysI253/snVF49oZ7LMBZiGDPhdRcEEcCaS0tmMM Qa+a78XvaH5xaKuMIDZ4qMdnMMcdv4g8G1DQeA1mb0EIGL1Gtu9BJsu9q8PqmjQU 1ZAf4MlWJWdYk+CtTNT4slSIQVKAN78s6j1HSB/bNcpWk9y93wBhJW0FdP7FtJ1I pjJGIVcLU98FKdqi2jqPEezbDXXzOz0gNQDbqfJyM9/R7ijnJcaPllviaWjEg/O7 8jMBOg87Hn7kq7JJGpKA =2xfe -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2016-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== To synchronize with Kalle, here's just a big change that affects all drivers - removing the duplicated enum ieee80211_band and replacing it by enum nl80211_band. On top of that, just a small documentation update. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 17:58:51 -04:00
Jon Paul Maloy	7d45a04cbc	tipc: remove remnants of old broadcast code We remove a couple of leftover fields in struct tipc_bearer. Those were used by the old broadcast implementation, and are not needed any longer. There is no functional changes in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 17:49:11 -04:00
Alexander Aring	edc73417d8	6lowpan: move mac802154 header In case of link-layer specific handling for 802.15.4 we need to cast to 802.15.4 sepcific structures. Simple add this header when include the 6lowpan header. Signed-off-by: Alexander Aring <aar@pengutronix.de> Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:10 +02:00
Alexander Aring	2732363181	6lowpan: add lowpan_is_ll function This patch adds the lowpan_is_ll function, which can be used to make a special 6lowpan linklayer handling for a specific 6lowpan linklayer type. Signed-off-by: Alexander Aring <aar@pengutronix.de> Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:10 +02:00
Alexander Aring	a5862f2aba	6lowpan: move eui64 uncompress function This function will be use in later functionality in other branches than generic 6lowpan, so we move it to the global 6lowpan header. Signed-off-by: Alexander Aring <aar@pengutronix.de> Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:10 +02:00
Alexander Aring	2bc068c3d6	6lowpan: iphc: remove unnecessary zero data This patch removes unnecessary zero data for a stack variable. Signed-off-by: Alexander Aring <aar@pengutronix.de> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:09 +02:00
Alexander Aring	7115a968b7	6lowpan: iphc: rename add lowpan prefix This patch adds a lowpan prefix to each functions which doesn't have such prefix currently. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:09 +02:00
Alexander Aring	353c224e28	6lowpan: move lowpan_802154_dev to 6lowpan This patch moves the 802.15.4 link layer specific structures to generic 6lowpan. This is necessary for special 802.15.4 6lowpan handling in 6lowpan generic layer. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:09 +02:00
Alexander Aring	2e4d60cbcf	6lowpan: change naming for lowpan private data This patch changes the naming for interface private data for lowpan intefaces. The current private data scheme is: ------------------------------------------------- \| 6LoWPAN Generic \| LinkLayer 6LoWPAN \| ------------------------------------------------- the current naming schemes are: - 6LoWPAN Generic: - lowpan_priv - LinkLayer 6LoWPAN: - BTLE - lowpan_dev - 802.15.4: - lowpan_dev_info the new naming scheme with this patch will be: - 6LoWPAN Generic: - lowpan_dev - LinkLayer 6LoWPAN: - BTLE - lowpan_btle_dev - 802.15.4: - lowpan_802154_dev Signed-off-by: Alexander Aring <aar@pengutronix.de> Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:09 +02:00
Alexander Aring	5a7f97e570	ieee802154: 6lowpan: fix short addr hash The short address is unique in combination with the panid. This patch will add the panid for generating an ieee802154 address hash. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:09 +02:00
Alexander Aring	9e3b71f343	nl802154: avoid address change while running lowpan The generation of autoconfigured IPv6 link-local addresses starts with a notification on interface up. To handle autoconfiguration according to RFC 4944 requires pan id and short address to generate an autoconfigured link-local address. This patch will avoid changing of these link-layer address configuration while lowpan interface is up. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-13 10:41:09 +02:00
David S. Miller	da0caadf0a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains the first batch of Netfilter updates for your net-next tree. 1) Define pr_fmt() in nf_conntrack, from Weongyo Jeong. 2) Define and register netfilter's afinfo for the bridge family, this comes in preparation for native nfqueue's bridge for nft, from Stephane Bryant. 3) Add new attributes to store layer 2 and VLAN headers to nfqueue, also from Stephane Bryant. 4) Parse new NFQA_VLAN and NFQA_L2HDR nfqueue netlink attributes coming from userspace, from Stephane Bryant. 5) Use net->ipv6.devconf_all->hop_limit instead of hardcoded hop_limit in IPv6 SYNPROXY, from Liping Zhang. 6) Remove unnecessary check for dst == NULL in nf_reject_ipv6, from Haishuang Yan. 7) Deinline ctnetlink event report functions, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-12 22:34:56 -04:00
Phil Sutter	bcf4934288	netfilter: ebtables: Fix extension lookup with identical name If a requested extension exists as module and is not loaded, ebt_check_match() might accidentally use an NFPROTO_UNSPEC one with same name and fail. Reproduced with limit match: Given xt_limit and ebt_limit both built as module, the following would fail: modprobe xt_limit ebtables -I INPUT --limit 1/s -j ACCEPT The fix is to make ebt_check_match() distrust a found NFPROTO_UNSPEC extension and retry after requesting an appropriate module. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-13 01:16:57 +02:00
Florian Westphal	ecdfb48cdd	netfilter: conntrack: move expectation event helper to ecache.c Not performance critical, it is only invoked when an expectation is added/destroyed. While at it, kill unused nf_ct_expect_event() wrapper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-12 23:01:57 +02:00
Florian Westphal	3c435e2e41	netfilter: conntrack: de-inline nf_conntrack_eventmask_report Way too large; move it to nf_conntrack_ecache.c. Reduces total object size by 1216 byte on my machine. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-12 23:01:52 +02:00
David S. Miller	69fb78121b	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-04-12 Here's a set of Bluetooth & 802.15.4 patches intended for the 4.7 kernel: - Fix for race condition in vhci driver - Memory leak fix for ieee802154/adf7242 driver - Improvements to deal with single-mode (LE-only) Bluetooth controllers - Fix for allowing the BT_SECURITY_FIPS security level - New BCM2E71 ACPI ID - NULL pointer dereference fix fox hci_ldisc driver Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-12 11:57:53 -04:00
Johannes Berg	57fbcce37b	cfg80211: remove enum ieee80211_band This enum is already perfectly aliased to enum nl80211_band, and the only reason for it is that we get IEEE80211_NUM_BANDS out of it. There's no really good reason to not declare the number of bands in nl80211 though, so do that and remove the cfg80211 one. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-12 15:56:15 +02:00
Dmitry Ivanov	8f815cdde3	nl80211: check netlink protocol in socket release notification A non-privileged user can create a netlink socket with the same port_id as used by an existing open nl80211 netlink socket (e.g. as used by a hostapd process) with a different protocol number. Closing this socket will then lead to the notification going to nl80211's socket release notification handler, and possibly cause an action such as removing a virtual interface. Fix this issue by checking that the netlink protocol is NETLINK_GENERIC. Since generic netlink has no notifier chain of its own, we can't fix the problem more generically. Fixes: `026331c4d9` ("cfg80211/mac80211: allow registering for and sending action frames") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Ivanov <dima@ubnt.com> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-12 15:39:06 +02:00
David Ahern	4f7f34eaab	net: vrf: Fix dev refcnt leak due to IPv6 prefix route ifupdown2 found a kernel bug with IPv6 routes and movement from the main table to the VRF table. Sequence of events: Create the interface and add addresses: ip link add dev eth4.105 link eth4 type vlan id 105 ip addr add dev eth4.105 8.105.105.10/24 ip -6 addr add dev eth4.105 2008:105:105::10/64 At this point IPv6 has inserted a prefix route in the main table even though the interface is 'down'. From there the VRF device is created: ip link add dev vrf105 type vrf table 105 ip addr add dev vrf105 9.9.105.10/32 ip -6 addr add dev vrf105 2000:9:105::10/128 ip link set vrf105 up Then the interface is enslaved, while still in the 'down' state: ip link set dev eth4.105 master vrf105 Since the device is down the VRF driver cycling the device does not send the NETDEV_UP and NETDEV_DOWN but rather the NETDEV_CHANGE event which does not flush the routes inserted prior. When the link is brought up ip link set dev eth4.105 up the prefix route is added in the VRF table, but does not remove the route from the main table. Fix by handling the NETDEV_CHANGEUPPER event similar what was implemented for IPv4 in `7f49e7a38b` ("net: Flush local routes when device changes vrf association") Fixes: `35402e3136` ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:56:20 -04:00
David Ahern	9ab179d83b	net: vrf: Fix dst reference counting Vivek reported a kernel exception deleting a VRF with an active connection through it. The root cause is that the socket has a cached reference to a dst that is destroyed. Converting the dst_destroy to dst_release and letting proper reference counting kick in does not work as the dst has a reference to the device which needs to be released as well. I talked to Hannes about this at netdev and he pointed out the ipv4 and ipv6 dst handling has dst_ifdown for just this scenario. Rather than continuing with the reinvented dst wheel in VRF just remove it and leverage the ipv4 and ipv6 versions. Fixes: `193125dbd8` ("net: Introduce VRF device driver") Fixes: `35402e3136` ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:56:20 -04:00
David Howells	e0e4d82f3b	rxrpc: Create a null security type and get rid of conditional calls Create a null security type for security index 0 and get rid of all conditional calls to the security operations. We expect normally to be using security, so this should be of little negative impact. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:41 -04:00
David Howells	648af7fca1	rxrpc: Absorb the rxkad security module Absorb the rxkad security module into the af_rxrpc module so that there's only one module file. This avoids a circular dependency whereby rxkad pins af_rxrpc and cached connections pin rxkad but can't be manually evicted (they will expire eventually and cease pinning). With this change, af_rxrpc can just be unloaded, despite having cached connections. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:41 -04:00
David Howells	6dd050f88d	rxrpc: Don't assume transport address family and size when using it Don't assume transport address family and size when using the peer address to send a packet. Instead, use the start of the transport address rather than any particular element of the union and use the transport address length noted inside the sockaddr_rxrpc struct. This will be necessary when IPv6 support is introduced. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:41 -04:00
David Howells	843099cac0	rxrpc: Don't pass gfp around in incoming call handling functions Don't pass gfp around in incoming call handling functions, but rather hard code it at the points where we actually need it since the value comes from within the rxrpc driver and is always the same. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:41 -04:00
David Howells	dc44b3a09a	rxrpc: Differentiate local and remote abort codes in structs In the rxrpc_connection and rxrpc_call structs, there's one field to hold the abort code, no matter whether that value was generated locally to be sent or was received from the peer via an abort packet. Split the abort code fields in two for cleanliness sake and add an error field to hold the Linux error number to the rxrpc_call struct too (sometimes this is generated in a context where we can't return it to userspace directly). Furthermore, add a skb mark to indicate a packet that caused a local abort to be generated so that recvmsg() can pick up the correct abort code. A future addition will need to be to indicate to userspace the difference between aborts via a control message. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:40 -04:00
David Howells	5b3e87f19e	rxrpc: Static arrays of strings should be const char const[] Static arrays of strings should be const char const[]. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:40 -04:00
David Howells	8e688d9c16	rxrpc: Move some miscellaneous bits out into their own file Move some miscellaneous bits out into their own file to make it easier to split the call handling. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:40 -04:00
David Howells	8f7e6e75d3	rxrpc: Disable a debugging statement that has been left enabled. Disable a debugging statement that has been left enabled Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:34:40 -04:00
Willem de Bruijn	4d0fc73ebe	rxrpc: do not pull udp headers on receive Commit `e6afc8ace6` modified the udp receive path by pulling the udp header before queuing an skbuff onto the receive queue. Rxrpc also calls skb_recv_datagram to dequeue an skb from a udp socket. Modify this receive path to also no longer expect udp headers. Fixes: `e6afc8ace6` ("udp: remove headers from UDP packets before queueing") Signed-off-by: Willem de Bruijn <willemb@google.com> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:31:33 -04:00
Willem de Bruijn	1da8c681d5	sunrpc: do not pull udp headers on receive Commit `e6afc8ace6` modified the udp receive path by pulling the udp header before queuing an skbuff onto the receive queue. Sunrpc also calls skb_recv_datagram to dequeue an skb from a udp socket. Modify this receive path to also no longer expect udp headers. Fixes: `e6afc8ace6` ("udp: remove headers from UDP packets before queueing") Reported-by: Franklin S Cooper Jr. <fcooper@ti.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:31:33 -04:00
Erik Hugne	ddb1d33969	tipc: purge deferred updates from dead nodes If a peer node becomes unavailable, in addition to removing the nametable entries from this node we also need to purge all deferred updates associated with this node. Signed-off-by: Erik Hugne <erik.hugne@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:22:20 -04:00
Erik Hugne	541726abe7	tipc: make dist queue pernet Nametable updates received from the network that cannot be applied immediately are placed on a defer queue. This queue is global to the TIPC module, which might cause problems when using TIPC in containers. To prevent nametable updates from escaping into the wrong namespace, we make the queue pernet instead. Signed-off-by: Erik Hugne <erik.hugne@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:22:20 -04:00
David Ahern	a6db4494d2	net: ipv4: Consider failed nexthops in multipath routes Multipath route lookups should consider knowledge about next hops and not select a hop that is known to be failed. Example: [h2] [h3] 15.0.0.5 \| \| 3\| 3\| [SP1] [SP2]--+ 1 2 1 2 \| \| /-------------+ \| \| \ / \| \| X \| \| / \ \| \| / \---------------\ \| 1 2 1 2 12.0.0.2 [TOR1] 3-----------------3 [TOR2] 12.0.0.3 4 4 \ / \ / \ / -------\| \|-----/ 1 2 [TOR3] 3\| \| [h1] 12.0.0.1 host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5: root@h1:~# ip ro ls ... 12.0.0.0/24 dev swp1 proto kernel scope link src 12.0.0.1 15.0.0.0/16 nexthop via 12.0.0.2 dev swp1 weight 1 nexthop via 12.0.0.3 dev swp1 weight 1 ... If the link between tor3 and tor1 is down and the link between tor1 and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and ssh 15.0.0.5 gets the other. Connections that attempt to use the 12.0.0.2 nexthop fail since that neighbor is not reachable: root@h1:~# ip neigh show ... 12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE 12.0.0.2 dev swp1 FAILED ... The failed path can be avoided by considering known neighbor information when selecting next hops. If the neighbor lookup fails we have no knowledge about the nexthop, so give it a shot. If there is an entry then only select the nexthop if the state is sane. This is similar to what fib_detect_death does. To maintain backward compatibility use of the neighbor information is based on a new sysctl, fib_multipath_use_neigh. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-11 15:16:13 -04:00
Dmitry Ivanov	e272602039	netlink: don't send NETLINK_URELEASE for unbound sockets All existing users of NETLINK_URELEASE use it to clean up resources that were previously allocated to a socket via some command. As a result, no users require getting this notification for unbound sockets. Sending it for unbound sockets, however, is a problem because any user (including unprivileged users) can create a socket that uses the same ID as an existing socket. Binding this new socket will fail, but if the NETLINK_URELEASE notification is generated for such sockets, the users thereof will be tricked into thinking the socket that they allocated the resources for is closed. In the nl80211 case, this will cause destruction of virtual interfaces that still belong to an existing hostapd process; this is the case that Dmitry noticed. In the NFC case, it will cause a poll abort. In the case of netlink log/queue it will cause them to stop reporting events, as if NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called. Fix this problem by checking that the socket is bound before generating the NETLINK_URELEASE notification. Cc: stable@vger.kernel.org Signed-off-by: Dmitry Ivanov <dima@ubnt.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-10 23:32:23 -04:00
David S. Miller	a36a0d4008	decnet: Do not build routes to devices without decnet private data. In particular, make sure we check for decnet private presence for loopback devices. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-10 23:01:30 -04:00
Marcelo Ricardo Leitner	ba6f5e33bd	sctp: avoid refreshing heartbeat timer too often Currently on high rate SCTP streams the heartbeat timer refresh can consume quite a lot of resources as timer updates are costly and it contains a random factor, which a) is also costly and b) invalidates mod_timer() optimization for not editing a timer to the same value. It may even cause the timer to be slightly advanced, for no good reason. As suggested by David Laight this patch now removes this timer update from hot path by leaving the timer on and re-evaluating upon its expiration if the heartbeat is still needed or not, similarly to what is done for TCP. If it's not needed anymore the timer is re-scheduled to the new timeout, considering the time already elapsed. For this, we now record the last tx timestamp per transport, updated in the same spots as hb timer was restarted on tx. Also split up sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the heartbeat one. On loopback with MTU of 65535 and data chunks with 1636, so that we have a considerable amount of chunks without stressing system calls, netperf -t SCTP_STREAM -l 30, perf looked like this before: Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000 Overhead Command Shared Object Symbol + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore - _raw_write_unlock_irqrestore - 96,54% _raw_spin_unlock_irqrestore - 36,14% mod_timer + 97,24% sctp_transport_reset_timers + 2,76% sctp_do_sm + 33,65% __wake_up_sync_key + 28,77% sctp_ulpq_tail_event + 1,40% del_timer - 1,84% mod_timer + 99,03% sctp_transport_reset_timers + 0,97% sctp_do_sm + 1,50% sctp_ulpq_tail_event And after this patch, now with netperf -l 60: Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000 Overhead Command Shared Object Symbol + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore + 49,89% __wake_up_sync_key + 45,68% sctp_ulpq_tail_event - 2,85% mod_timer + 76,51% sctp_transport_reset_t3_rtx + 23,49% sctp_do_sm + 1,55% del_timer + 2,50% netperf [sctp] [k] sctp_datamsg_from_user + 2,26% netperf [sctp] [k] sctp_sendmsg Throughput-wise, from 6800mbps without the patch to 7050mbps with it, ~3.7%. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-10 22:22:34 -04:00
David S. Miller	ae95d71261	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-04-09 17:41:41 -04:00
Eric Dumazet	03c5b53418	ipv6: fix inet6_lookup_listener() A stupid refactoring bug in inet6_lookup_listener() needs to be fixed in order to get proper SO_REUSEPORT behavior. Fixes: `3b24d854cb` ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-09 16:53:52 -04:00
Linus Torvalds	9ef11ceb0d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP, from Haishuang Yan. 2) Fix multicast frame handling in mac80211 AP code, from Felix Fietkau. 3) mac80211 station hashtable insert errors not handled properly, fix from Johannes Berg. 4) Fix TX descriptor count limit handling in e1000, from Alexander Duyck. 5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas. 6) Must assign rtnl_link_ops of the device before registering it, fix in ip6_tunnel from Thadeu Lima de Souza Cascardo. 7) Memory leak fix in tc action net exit, from WANG Cong. 8) Add missing AF_KCM entries to name tables, from Dexuan Cui. 9) Fix regression in GRE handling of csums wrt. FOU, from Alexander Duyck. 10) Fix memory allocation alignment and congestion map corruption in RDS, from Shamir Rabinovitch. 11) Fix default qdisc regression in tuntap driver, from Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) bridge, netem: mark mailing lists as moderated tuntap: restore default qdisc mpls: find_outdev: check for err ptr in addition to NULL check ipv6: Count in extension headers in skb->network_header RDS: fix congestion map corruption for PAGE_SIZE > 4k RDS: memory allocated must be align to 8 GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU net: add the AF_KCM entries to family name tables MAINTAINERS: intel-wired-lan list is moderated lib/test_bpf: Add additional BPF_ADD tests lib/test_bpf: Add test to check for result of 32-bit add that overflows lib/test_bpf: Add tests for unsigned BPF_JGT lib/test_bpf: Fix JMP_JSET tests VSOCK: Detach QP check should filter out non matching QPs. stmmac: fix adjust link call in case of a switch is attached af_packet: tone down the Tx-ring unsupported spew. net_sched: fix a memory leak in tc action samples/bpf: Enable powerpc support samples/bpf: Use llc in PATH, rather than a hardcoded value samples/bpf: Fix build breakage with map_perf_test_user.c ...	2016-04-09 10:50:44 -07:00
Vivien Didelot	4d5770b397	net: dsa: make the VLAN add function return void The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_vlan_prepare and port_vlan_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the driver itself. Make the DSA port_vlan_add routine return void for explicitness. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:50:41 -04:00
Vivien Didelot	8497aa618d	net: dsa: make the FDB add function return void The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_fdb_prepare and port_fdb_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the DSA driver itself. Make the DSA port_fdb_add routine return void for explicitness. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:50:40 -04:00
Vivien Didelot	43c44a9f65	net: dsa: make the STP state function return void The DSA layer doesn't care about the return code of the port_stp_update routine, so make it void in the layer and the DSA drivers. Replace the useless dsa_slave_stp_update function with a dsa_slave_stp_state function used to reply to the switchdev SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute. In the meantime, rename port_stp_update to port_stp_state_set to explicit the state change. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:50:40 -04:00
David S. Miller	1089ac6977	For the 4.6 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJXBQ4GAAoJEGt7eEactAAdWiMP/ibaP3I79NDc0s7wCDA+KRkm hx0Qx4a0wwm7lDFlnGBjY6yKr+XFDliCvdGX7XGpLSsTioNg7eXPpwx5FQoj6RiV 8+5RKE9fTguN9ofUzqAwHd9sVOaxvdlXbKfb/N93Gzjpw/meYk58wXdF7Almkroa ukgJeMzIlIh+6D96zFEA+Ofzp5chwh+x2Dn0wXutEe9P9fOERA859veAvx65b+Ql IRGTqyuY5B/wcbkr4o+DWQwgrdt7Vop9nYVPNWtMHm2JTzfuCSaQ2cD9TnVAK/bg /vtqC46KKNLyBRGexAPqdftY9PWcfipgE+n7k+Et4iGSmNm7Z3dEyewgXmqli7XJ X8Uiaq+N6Fpe06DVSU7aSRt8NLV64A44jXSfKRI9U2POUqKMn/PMdm8bhPW8qCdM ra6myWpQGHWK9e0TQQdShq0NQKGxCZAiSRiiIrbbvXl1CwXxkPCG39wAC3Sh1tEN ou4lGraeywGnTjaq+mwLEtHLoug8Y2x+Fz+Ze4Cu2enXxna9lp4lr+rFlc+2+0Er o9oPxkTk8krZGIj9M6PNc5W+InMwchaFX3076n67hnFHzFRlOQzkfffbPYlhKJDQ f8c9JiNZIoX/fD1TAKsrdO1+EKm/xo7w7pLgbMwQal8Jr88SkITDg0i3oXc56vNQ ZK2gUzwvrD/jh0AUyDfN =sj7y -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the 4.7 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:42:31 -04:00
David S. Miller	30d237a6c2	For the current RC series, we have the following fixes: * TDLS fixes from Arik and Ilan * rhashtable fixes from Ben and myself * documentation fixes from Luis * U-APSD fixes from Emmanuel * a TXQ fix from Felix * and a compiler warning suppression from Jeff -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJXBQyAAAoJEGt7eEactAAdymsP/i2zU+VQFpkB8RG+nn/AYogY x2RXgKjCWOs6FwUcu+VHMx0whKuMADjqbMABdlsGGK62Xa5aYYcObvY+CgCUAI+m unV7kYDIBUHudiTpxXgZYUvylhIvW37VYjc6BDoaq4Jc1rz/L69zSrNHmoNiQv+Y 113T0Ft5EEmEO1LP4s2GLMZTPqwgi2FnaP6UYFdTr2/ZfaRHlj2xRRG62WiT/q1x DMT2KWHvETCftpK3GwwkMSr0Au8CVV1soiQOoioTPRaevYbBFVi60GVXQeDlvFEV PqVCOEfsSvsw84phfHrW1bOxBeVsNYHbY/T4eVlC0zssUzz6KNH5jAfpyla1p0lP WniSqAaWxMcUYWCEBiOLa5LV2XVpXOuTpI84xcgc/BmprgzNyLgDAiCDtehpxALf Qmhc/rPR5BbLhTNY8z5qANG6mhQCHCo+52ypvBLMhZoajkPjgyBabwoqaRGje2ub vgzbAfqEguJmCAszw04KZ2UHInBcCDAZ5aOKiinawWvpftkDN0IJmO/HW5vnh4xV kawpo1eh1JzDcEyYfSjySHHFmR/5qnaDzaPL2cJthcOY/fGywibJHCQmoDPyH5jz Bkk0F0rMEcdQWs9pJLIMMzkA7BAlxYLYip0J9QImHL77sWK0QwUDCoryrgD6lL1D v2V31g1TZwPF2Noe9Rk7 =r3Hr -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== For the current RC series, we have the following fixes: * TDLS fixes from Arik and Ilan * rhashtable fixes from Ben and myself * documentation fixes from Luis * U-APSD fixes from Emmanuel * a TXQ fix from Felix * and a compiler warning suppression from Jeff ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:41:28 -04:00
Jiri Pirko	1fc2257e83	devlink: share user_ptr pointer for both devlink and devlink_port Ptr to devlink structure can be easily obtained from devlink_port->devlink. So share user_ptr[0] pointer for both and leave user_ptr[1] free for other users. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:40:08 -04:00
Jiri Pirko	a9844881ba	devlink: remove implicit type set in port register As we rely on caller zeroing or correctly set the struct before the call, this implicit type set is either no-op (DEVLINK_PORT_TYPE_NOTSET is 0) or it rewrites wanted value. So remove this. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Alexander Aring	feb2add323	6lowpan: iphc: fix handling of link-local compression This patch fixes handling in case of link-local address compression. A IPv6 link-local address is defined as fe80::/10 prefix which is also what ipv6_addr_type checks for link-local addresses. But IPHC compression for link-local addresses are for fe80::/64 types only. This patch adds additional checks for zero padded bits in case of link-local address compression to match on a fe80::/64 address only. Signed-off-by: Alexander Aring <aar@pengutronix.de> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-08 19:28:13 +02:00
Patrik Flykt	a164cee111	Bluetooth: Allow setting BT_SECURITY_FIPS with setsockopt Update the security level check to allow setting BT_SECURITY_FIPS for an L2CAP socket. Signed-off-by: Patrik Flykt <patrik.flykt@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-08 19:10:57 +02:00
Johan Hedberg	56b40fbf61	Bluetooth: Ignore unknown advertising packet types In case of buggy controllers send advertising packet types that we don't know of we should simply ignore them instead of trying to react to them in some (potentially wrong) way. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-08 18:51:44 +02:00
Johan Hedberg	f18ba58f53	Bluetooth: Fix setting NO_BREDR advertising flag If we're dealing with a single-mode controller or BR/EDR is disable for a dual-mode one, the NO_BREDR flag needs to be unconditionally present in the advertising data. This patch moves it out from behind an extra condition to be always set in the create_instance_adv_data() function if BR/EDR is disabled. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2016-04-08 18:50:40 +02:00
Roopa Prabhu	94a57f1f8a	mpls: find_outdev: check for err ptr in addition to NULL check find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to find the output device. In case of an error, inet{,6}_fib_lookup_dev() returns error pointer and dev_get_by_index() returns NULL. But the function only checks for NULL and thus can end up calling dev_put on an ERR_PTR. This patch adds an additional check for err ptr after the NULL check. Before: Trying to add an mpls route with no oif from user, no available path to 10.1.1.8 and no default route: $ip -f mpls route add 100 as 200 via inet 10.1.1.8 [ 822.337195] BUG: unable to handle kernel NULL pointer dereference at 00000000000003a3 [ 822.340033] IP: [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] PGD 1db38067 PUD 1de9e067 PMD 0 [ 822.340033] Oops: 0000 [#1] SMP [ 822.340033] Modules linked in: [ 822.340033] CPU: 0 PID: 11148 Comm: ip Not tainted 4.5.0-rc7+ #54 [ 822.340033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 822.340033] task: ffff88001db82580 ti: ffff88001dad4000 task.ti: ffff88001dad4000 [ 822.340033] RIP: 0010:[<ffffffff8148781e>] [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] RSP: 0018:ffff88001dad7a88 EFLAGS: 00010282 [ 822.340033] RAX: ffffffffffffff9b RBX: ffffffffffffff9b RCX: 0000000000000002 [ 822.340033] RDX: 00000000ffffff9b RSI: 0000000000000008 RDI: 0000000000000000 [ 822.340033] RBP: ffff88001ddc9ea0 R08: ffff88001e9f1768 R09: 0000000000000000 [ 822.340033] R10: ffff88001d9c1100 R11: ffff88001e3c89f0 R12: ffffffff8187e0c0 [ 822.340033] R13: ffffffff8187e0c0 R14: ffff88001ddc9e80 R15: 0000000000000004 [ 822.340033] FS: 00007ff9ed798700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 822.340033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 822.340033] CR2: 00000000000003a3 CR3: 000000001de89000 CR4: 00000000000006f0 [ 822.340033] Stack: [ 822.340033] 0000000000000000 0000000100000000 0000000000000000 0000000000000000 [ 822.340033] 0000000000000000 0801010a00000000 0000000000000000 0000000000000000 [ 822.340033] 0000000000000004 ffffffff8148749b ffffffff8187e0c0 000000000000001c [ 822.340033] Call Trace: [ 822.340033] [<ffffffff8148749b>] ? mpls_rt_alloc+0x2b/0x3e [ 822.340033] [<ffffffff81488e66>] ? mpls_rtm_newroute+0x358/0x3e2 [ 822.340033] [<ffffffff810e7bbc>] ? get_page+0x5/0xa [ 822.340033] [<ffffffff813b7d94>] ? rtnetlink_rcv_msg+0x17e/0x191 [ 822.340033] [<ffffffff8111794e>] ? __kmalloc_track_caller+0x8c/0x9e [ 822.340033] [<ffffffff813c9393>] ? rht_key_hashfn.isra.20.constprop.57+0x14/0x1f [ 822.340033] [<ffffffff813b7c16>] ? __rtnl_unlock+0xc/0xc [ 822.340033] [<ffffffff813cb794>] ? netlink_rcv_skb+0x36/0x82 [ 822.340033] [<ffffffff813b4507>] ? rtnetlink_rcv+0x1f/0x28 [ 822.340033] [<ffffffff813cb2b1>] ? netlink_unicast+0x106/0x189 [ 822.340033] [<ffffffff813cb5b3>] ? netlink_sendmsg+0x27f/0x2c8 [ 822.340033] [<ffffffff81392ede>] ? sock_sendmsg_nosec+0x10/0x1b [ 822.340033] [<ffffffff81393df1>] ? ___sys_sendmsg+0x182/0x1e3 [ 822.340033] [<ffffffff810e4f35>] ? __alloc_pages_nodemask+0x11c/0x1e4 [ 822.340033] [<ffffffff8110619c>] ? PageAnon+0x5/0xd [ 822.340033] [<ffffffff811062fe>] ? __page_set_anon_rmap+0x45/0x52 [ 822.340033] [<ffffffff810e7bbc>] ? get_page+0x5/0xa [ 822.340033] [<ffffffff810e85ab>] ? __lru_cache_add+0x1a/0x3a [ 822.340033] [<ffffffff81087ea9>] ? current_kernel_time64+0x9/0x30 [ 822.340033] [<ffffffff813940c4>] ? __sys_sendmsg+0x3c/0x5a [ 822.340033] [<ffffffff8148f597>] ? entry_SYSCALL_64_fastpath+0x12/0x6a [ 822.340033] Code: 83 08 04 00 00 65 ff 00 48 8b 3c 24 e8 40 7c f2 ff eb 13 48 c7 c3 9f ff ff ff eb 0f 89 ce e8 f1 ae f1 ff 48 89 c3 48 85 db 74 15 <48> 8b 83 08 04 00 00 65 ff 08 48 81 fb 00 f0 ff ff 76 0d eb 07 [ 822.340033] RIP [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182 [ 822.340033] RSP <ffff88001dad7a88> [ 822.340033] CR2: 00000000000003a3 [ 822.435363] ---[ end trace 98cc65e6f6b8bf11 ]--- After patch: $ip -f mpls route add 100 as 200 via inet 10.1.1.8 RTNETLINK answers: Network is unreachable Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reported-by: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 12:43:20 -04:00
Jakub Sitnicki	3ba3458fb9	ipv6: Count in extension headers in skb->network_header When sending a UDPv6 message longer than MTU, account for the length of fragmentable IPv6 extension headers in skb->network_header offset. Same as we do in alloc_new_skb path in __ip6_append_data(). This ensures that later on __ip6_make_skb() will make space in headroom for fragmentable extension headers: /* move skb->data to ip header from ext header */ if (skb->data < skb_network_header(skb)) __skb_pull(skb, skb_network_offset(skb)); Prevents a splat due to skb_under_panic: skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \ head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] KASAN CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65 [...] Call Trace: [<ffffffff813eb7b9>] skb_push+0x79/0x80 [<ffffffff8143397b>] eth_header+0x2b/0x100 [<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310 [<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0 [<ffffffff814efe3a>] ip6_output+0x16a/0x280 [<ffffffff815440c1>] ip6_local_out+0xb1/0xf0 [<ffffffff814f1115>] ip6_send_skb+0x45/0xd0 [<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0 [<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090 [...] Reported-by: Ji Jianwen <jiji@redhat.com> Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 22:41:37 -04:00
Jon Paul Maloy	5b7066c3dd	tipc: stricter filtering of packets in bearer layer Resetting a bearer/interface, with the consequence of resetting all its pertaining links, is not an atomic action. This becomes particularly evident in very large clusters, where a lot of traffic may happen on the remaining links while we are busy shutting them down. In extreme cases, we may even see links being re-created and re-established before we are finished with the job. To solve this, we now introduce a solution where we temporarily detach the bearer from the interface when the bearer is reset. This inhibits all packet reception, while sending still is possible. For the latter, we use the fact that the device's user pointer now is zero to filter out which packets can be sent during this situation; i.e., outgoing RESET messages only. This filtering serves to speed up the neighbors' detection of the loss event, and saves us from unnecessary probing. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 17:00:13 -04:00
Jon Paul Maloy	4e801fa14f	tipc: eliminate buffer leak in bearer layer When enabling a bearer we create a 'neigbor discoverer' instance by calling the function tipc_disc_create() before the bearer is actually registered in the list of enabled bearers. Because of this, the very first discovery broadcast message, created by the mentioned function, is lost, since it cannot find any valid bearer to use. Furthermore, the used send function, tipc_bearer_xmit_skb() does not free the given buffer when it cannot find a bearer, resulting in the leak of exactly one send buffer each time a bearer is enabled. This commit fixes this problem by introducing two changes: 1) Instead of attemting to send the discovery message directly, we let tipc_disc_create() return the discovery buffer to the calling function, tipc_enable_bearer(), so that the latter can send it when the enabling sequence is finished. 2) In tipc_bearer_xmit_skb(), as well as in the two other transmit functions at the bearer layer, we now free the indicated buffer or buffer chain when a valid bearer cannot be found. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 17:00:13 -04:00
shamir rabinovitch	579ba85552	RDS: fix congestion map corruption for PAGE_SIZE > 4k When PAGE_SIZE > 4k single page can contain 2 RDS fragments. If 'rds_ib_cong_recv' ignore the RDS fragment offset in to the page it then read the data fragment as far congestion map update and lead to corruption of the RDS connection far congestion map. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:58:28 -04:00
shamir rabinovitch	e98499ac63	RDS: memory allocated must be align to 8 Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory allocated by 'rds_page_remainder_alloc' using uint64_t pointer. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:58:27 -04:00
Alexander Duyck	a0ca153f98	GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU This patch fixes an issue I found in which we were dropping frames if we had enabled checksums on GRE headers that were encapsulated by either FOU or GUE. Without this patch I was barely able to get 1 Gb/s of throughput. With this patch applied I am now at least getting around 6 Gb/s. The issue is due to the fact that with FOU or GUE applied we do not provide a transport offset pointing to the GRE header, nor do we offload it in software as the GRE header is completely skipped by GSO and treated like a VXLAN or GENEVE type header. As such we need to prevent the stack from generating it and also prevent GRE from generating it via any interface we create. Fixes: `c3483384ee` ("gro: Allow tunnel stacking in the case of FOU/GUE") Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:56:33 -04:00
Tom Herbert	46aa2f30aa	udp: Remove udp_offloads Now that the UDP encapsulation GRO functions have been moved to the UDP socket we not longer need the udp_offload insfrastructure so removing it. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:53:30 -04:00
Tom Herbert	d92283e338	fou: change to use UDP socket GRO Adapt gue_gro_receive, gue_gro_complete to take a socket argument. Don't set udp_offloads any more. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:53:29 -04:00
Tom Herbert	38fd2af24f	udp: Add socket based GRO and config Add gro_receive and gro_complete to struct udp_tunnel_sock_cfg. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:53:29 -04:00
Tom Herbert	a6024562ff	udp: Add GRO functions to UDP socket This patch adds GRO functions (gro_receive and gro_complete) to UDP sockets. udp_gro_receive is changed to perform socket lookup on a packet. If a socket is found the related GRO functions are called. This features obsoletes using UDP offload infrastructure for GRO (udp_offload). This has the advantage of not being limited to provide offload on a per port basis, GRO is now applied to whatever individual UDP sockets are bound to. This also allows the possbility of "application defined GRO"-- that is we can attach something like a BPF program to a UDP socket to perfrom GRO on an application layer protocol. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:53:29 -04:00
Tom Herbert	63058308cd	udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb Add externally visible functions to lookup a UDP socket by skb. This will be used for GRO in UDP sockets. These functions also check if skb->dst is set, and if it is not skb->dev is used to get dev_net. This allows calling lookup functions before dst has been set on the skbuff. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:53:14 -04:00
Hannes Frederic Sowa	8ced425ee6	tun: use socket locks for sk_{attach,detatch}_filter This reverts commit `5a5abb1fa3` ("tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter") and replaces it to use lock_sock around sk_{attach,detach}_filter. The checks inside filter.c are updated with lockdep_sock_is_held to check for proper socket locks. It keeps the code cleaner by ensuring that only one lock governs the socket filter instead of two independent locks. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:44:14 -04:00
Hannes Frederic Sowa	1e1d04e678	net: introduce lockdep_is_held and update various places to use it The socket is either locked if we hold the slock spin_lock for lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned != 0). Check for this and at the same time improve that the current thread/cpu is really holding the lock. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:44:14 -04:00
Hannes Frederic Sowa	61881cfb5a	sock: fix lockdep annotation in release_sock During release_sock we use callbacks to finish the processing of outstanding skbs on the socket. We actually are still locked, sk_locked.owned == 1, but we already told lockdep that the mutex is released. This could lead to false positives in lockdep for lockdep_sock_is_held (we don't hold the slock spinlock during processing the outstanding skbs). I took over this patch from Eric Dumazet and tested it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:44:14 -04:00
Haishuang Yan	85f1e7c29a	netfilter: ipv6: unnecessary to check whether ip6_route_output() returns NULL ip6_route_output() never returns NULL, so it is not appropriate to check if the return value is NULL. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-07 18:53:08 +02:00
Jozsef Kadlecsik	644c7e48cb	netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options Baozeng Ding reported a KASAN stack out of bounds issue - it uncovered that the TCP option parsing routines in netfilter TCP connection tracking could read one byte out of the buffer of the TCP options. Therefore in the patch we check that the available data length is large enough to parse both TCP option code and size. Reported-by: Baozeng Ding <sploving1@gmail.com> Tested-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-07 18:42:37 +02:00
Eric Dumazet	8501786929	tcp/dccp: fix inet_reuseport_add_sock() David Ahern reported panics in __inet_hash() caused by my recent commit. The reason is inet_reuseport_add_sock() was still using sk_nulls_for_each_rcu() instead of sk_for_each_rcu(). SO_REUSEPORT enabled listeners were causing an instant crash. While chasing this bug, I found that I forgot to clear SOCK_RCU_FREE flag, as it is inherited from the parent at clone time. Fixes: `3b24d854cb` ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 12:02:33 -04:00
Florian Westphal	ff76def3bd	netfilter: arp_tables: register table in initns arptables is broken since we didn't register the table anymore -- even 'arptables -L' fails. Fixes: `b9e69e1273` ("netfilter: xtables: don't hook tables by default") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-07 11:58:49 +02:00
Dexuan Cui	0a1a37b6d6	net: add the AF_KCM entries to family name tables This is for the recent kcm driver, which introduces AF_KCM(41) in b7ac4eb(kcm: Kernel Connection Multiplexor module). Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 16:59:01 -04:00
Jiri Benc	a6d5bbf34e	ip_tunnel: implement __iptunnel_pull_header Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 16:50:32 -04:00
Jorgen Hansen	8ab18d71de	VSOCK: Detach QP check should filter out non matching QPs. The check in vmci_transport_peer_detach_cb should only allow a detach when the qp handle of the transport matches the one in the detach message. Testing: Before this change, a detach from a peer on a different socket would cause an active stream socket to register a detach. Reviewed-by: George Zhang <georgezhang@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 16:39:09 -04:00
Dave Jones	6ae81ced37	af_packet: tone down the Tx-ring unsupported spew. Trinity and other fuzzers can hit this WARN on far too easily, resulting in a tainted kernel that hinders automated fuzzing. Replace it with a rate-limited printk. Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 16:05:20 -04:00
David S. Miller	32fa270c8a	Revert "bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr" This reverts commit `c862cc9b70`. Patch lacks a real-name Signed-off-by. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 15:42:45 -04:00
Jeff Mahoney	b4201cc4fc	mac80211: fix "warning: ‘target_metric’ may be used uninitialized" This fixes: net/mac80211/mesh_hwmp.c:603:26: warning: ‘target_metric’ may be used uninitialized in this function target_metric is only consumed when reply = true so no bug exists here, but not all versions of gcc realize it. Initialize to 0 to remove the warning. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 15:10:25 +02:00
Jouni Malinen	4ce2bd9c4c	cfg80211: Allow reassociation to be requested with internal SME If the user space issues a NL80211_CMD_CONNECT with NL80211_ATTR_PREV_BSSID when there is already a connection, allow this to proceed as a reassociation instead of rejecting the new connect command with EALREADY. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> [validate prev_bssid] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 15:09:28 +02:00
Jouni Malinen	ba6fbacf9c	cfg80211: Add option to specify previous BSSID for Connect command This extends NL80211_CMD_CONNECT to allow the NL80211_ATTR_PREV_BSSID attribute to be used similarly to way this was already allowed with NL80211_CMD_ASSOCIATE. This allows user space to request reassociation (instead of association) when already connected to an AP. This provides an option to reassociate within an ESS without having to disconnect and associate with the AP. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:21 +02:00
Felix Fietkau	918fe04b28	mac80211: minstrel_ht: set A-MSDU tx limits based on selected max_prob_rate Prevents excessive A-MSDU aggregation at low data rates or bad conditions. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:20 +02:00
Felix Fietkau	6e0456b545	mac80211: add A-MSDU tx support Requires software tx queueing and fast-xmit support. For good performance, drivers need frag_list support as well. This avoids the need for copying data of aggregated frames. Running without it is only supported for debugging purposes. To avoid performance and packet size issues, the rate control module or driver needs to limit the maximum A-MSDU size by setting max_rc_amsdu_len in struct ieee80211_sta. Signed-off-by: Felix Fietkau <nbd@openwrt.org> [fix locking issue] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:19 +02:00
Johannes Berg	c9c5962b56	mac80211: enable collecting station statistics per-CPU If the driver advertises the new HW flag USE_RSS, make the station statistics on the fast-rx path per-CPU. This will enable calling the RX in parallel, only hitting locking or shared cachelines when the fast-RX path isn't available. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:19 +02:00
Johannes Berg	49ddf8e6e2	mac80211: add fast-rx path The regular RX path has a lot of code, but with a few assumptions on the hardware it's possible to reduce the amount of code significantly. Currently the assumptions on the driver are the following: * hardware/driver reordering buffer (if supporting aggregation) * hardware/driver decryption & PN checking (if using encryption) * hardware/driver did de-duplication * hardware/driver did A-MSDU deaggregation * AP_LINK_PS is used (in AP mode) * no client powersave handling in mac80211 (in client mode) of which some are actually checked per packet: * de-duplication * PN checking * decryption and additionally packets must * not be A-MSDU (have been deaggregated by driver/device) * be data packets * not be fragmented * be unicast * have RFC 1042 header Additionally dynamically we assume: * no encryption or CCMP/GCMP, TKIP/WEP/other not allowed * station must be authorized * 4-addr format not enabled Some data needed for the RX path is cached in a new per-station "fast_rx" structure, so that we only need to look at this and the packet, no other memory when processing packets on the fast RX path. After doing the above per-packet checks, the data path collapses down to a pretty simple conversion function taking advantage of the data cached in the small fast_rx struct. This should speed up the RX processing, and will make it easier to reason about parallelizing RX (for which statistics will need to be per-CPU still.) Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:18 +02:00
Johannes Berg	0f9c5a61d4	mac80211: fix RX u64 stats consistency on 32-bit platforms On 32-bit platforms, the 64-bit counters we keep need to be protected to be consistently read. Use the u64_stats_sync mechanism to do that. In order to not end up with overly long lines, refactor the tidstats assignments a bit. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:17 +02:00
Johannes Berg	4f6b1b3daa	mac80211: fix last RX rate data consistency When storing the last_rate_* values in the RX code, there's nothing to guarantee consistency, so a concurrent reader could see, e.g. last_rate_idx on the new value, but last_rate_flag still on the old, getting completely bogus values in the end. To fix this, I lifted the sta_stats_encode_rate() function from my old rate statistics code, which encodes the entire rate data into a single 16-bit value, avoiding the consistency issue. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:17 +02:00
Johannes Berg	b8da6b6a99	mac80211: add separate last_ack variable Instead of touching the rx_stats.last_rx from the status path, introduce and use a status_stats.last_ack variable. This will make rx_stats.last_rx indicate when the last frame was received, making it available for real "last_rx" and statistics gathering; statistics, when done per-CPU, will need to figure out which place was updated last for those items where the "last" value is exposed. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:16 +02:00
Johannes Berg	2df8bfd724	mac80211: remove rx_stats.last_rx update after sta alloc There's no need to update rx_stats.last_rx after allocating a station since it's already updated during allocation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:15 +02:00
Johannes Berg	0be6ed1338	mac80211: move averaged values out of rx_stats Move the averaged values out of rx_stats and into rx_stats_avg, to cleanly split them out. The averaged ones cannot be supported for parallel RX in a per-CPU fashion, while the other values can be collected per CPU and then combined/selected when needed. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:15 +02:00
Johannes Berg	8ebaa5b0a7	mac80211: move semicolon out of CALL_RXH macro Move the semicolon, people typically assume that and once line already put a semicolon behind the "call". Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:14 +02:00
Johannes Berg	de8f18d3a8	mac80211: count MSDUs in A-MSDU properly For the RX MSDU statistics, we need to count the number of MSDUs created and accepted from an A-MSDU. Right now, all frames in any A-MSDUs were completely ignored. Fix this by moving the RX MSDU statistics accounting into the deliver function. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:13 +02:00
Johannes Berg	d63b548fff	mac80211: allow passing transmitter station on RX Sometimes drivers already looked up, or know out-of-band from their device, which station transmitted a given RX frame. Allow them to pass the station pointer to mac80211 to save the extra lookup. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-06 13:18:13 +02:00
Aaron Conole	4da46cebbd	net/core/dev: Warn on a too-short GRO frame When signaling that a GRO frame is ready to be processed, the network stack correctly checks length and aborts processing when a frame is less than 14 bytes. However, such a condition is really indicative of a broken driver, and should be loudly signaled, rather than silently dropped as the case is today. Convert the condition to use net_warn_ratelimited() to ensure the stack loudly complains about such broken drivers. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 19:58:39 -04:00
Thadeu Lima de Souza Cascardo	b6ee376cb0	ip6_tunnel: set rtnl_link_ops before calling register_netdevice When creating an ip6tnl tunnel with ip tunnel, rtnl_link_ops is not set before ip6_tnl_create2 is called. When register_netdevice is called, there is no linkinfo attribute in the NEWLINK message because of that. Setting rtnl_link_ops before calling register_netdevice fixes that. Fixes: `0b11245722` ("ip6tnl: add support of link creation via rtnl") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 19:48:51 -04:00
Bjorn Helgaas	727ceaa49b	Revert "netpoll: Fix extra refcount release in netpoll_cleanup()" This reverts commit `543e3a8da5`. Direct callers of __netpoll_setup() depend on it to set np->dev, so we can't simply move that assignment up to netpoll_stup(). Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 19:34:44 -04:00
samanthakumar	627d2d6b55	udp: enable MSG_PEEK at non-zero offset Enable peeking at UDP datagrams at the offset specified with socket option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up to the end of the given datagram. Implement the SO_PEEK_OFF semantics introduced in commit `ef64a54f6e` ("sock: Introduce the SO_PEEK_OFF sock option"). Increase the offset on peek, decrease it on regular reads. When peeking, always checksum the packet immediately, to avoid recomputation on subsequent peeks and final read. The socket lock is not held for the duration of udp_recvmsg, so peek and read operations can run concurrently. Only the last store to sk_peek_off is preserved. Signed-off-by: Sam Kumar <samanthakumar@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 16:29:37 -04:00
samanthakumar	e6afc8ace6	udp: remove headers from UDP packets before queueing Remove UDP transport headers before queueing packets for reception. This change simplifies a follow-up patch to add MSG_PEEK support. Signed-off-by: Sam Kumar <samanthakumar@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 16:29:37 -04:00
Marcelo Ricardo Leitner	e43569e6d3	sctp: flush if we can't fit another DATA chunk There is no point on delaying the packet if we can't fit a single byte of data on it anymore. So lets just reduce the threshold by the amount that a data chunk with 4 bytes (rounding) would use. v2: based on the right tree Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 15:39:44 -04:00
Bob Copeland	e596af8279	mac80211: mesh: flush paths outside of plink lock Lockdep warned of a lock dependency between the mesh_plink lock and the internal lock for the rhashtable. The problem is that the rhashtable code uses a spin lock with softirqs enabled, while mesh_plink_timer executes a walk (to flush paths on a state change) inside a softirq with the plink lock held. This leads to the following deadlock if the timer fires while rht lock is held on this CPU, and plink lock is held on another CPU: CPU0 CPU1 ---- ---- lock(&(&ht->lock)->rlock); local_irq_disable(); lock(&(&sta->mesh->plink_lock)->rlock); lock(&(&ht->lock)->rlock); <Interrupt> lock(&(&sta->mesh->plink_lock)->rlock); * DEADLOCK * Fix by waiting until we drop the plink lock to flush paths. Fixes: d48a1b7cd439 ("mac80211: mesh: convert path table to rhashtable") Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:54 +02:00
Bob Copeland	0371a08fbb	mac80211: mesh: fix cleanup for mesh pathtable The mesh path table needs to be around for the entire time the interface is in mesh mode, as users can perform an mpath dump at any time. The existing path table lifetime is instead tied to the mesh BSS which can cause crashes when different MBSSes are joined in the context of a single interface, or when the path table is dumped when no MBSS is joined. Introduce a new function to perform the final teardown of the interface and perform path table cleanup there. We already free the individual path elements when the leaving the mesh so no additional cleanup is needed there. This fixes the following crash: [ 47.753026] BUG: unable to handle kernel paging request at fffffff0 [ 47.753026] IP: [<c0239765>] kthread_data+0xa/0xe [ 47.753026] pde = 00741067 pte = 00000000 [ 47.753026] Oops: 0000 [#4] PREEMPT [ 47.753026] Modules linked in: ppp_generic slhc 8021q garp mrp sch_fq_codel iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables ath9k_htc ath5k 8139too ath10k_pci ath10k_core arc4 ath9k ath9k_common ath9k_hw mac80211 ath cfg80211 cpufreq_powersave br_netfilter bridge stp llc ipw usb_wwan sierra_net usbnet af_alg natsemi via_rhine mii iTCO_wdt iTCO_vendor_support gpio_ich sierra coretemp pcspkr i2c_i801 lpc_ich ata_generic ata_piix libata ide_pci_generic piix e1000e igb i2c_algo_bit ptp pps_core [last unloaded: 8139too] [ 47.753026] CPU: 0 PID: 12 Comm: kworker/u2:1 Tainted: G D W 4.5.0-wt-V3 #6 [ 47.753026] Hardware name: To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080016 11/07/2014 [ 47.753026] task: f645a0c0 ti: f6462000 task.ti: f6462000 [ 47.753026] EIP: 0060:[<c0239765>] EFLAGS: 00010002 CPU: 0 [ 47.753026] EIP is at kthread_data+0xa/0xe [ 47.753026] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 [ 47.753026] ESI: f645a0c0 EDI: f645a2fc EBP: f6463a80 ESP: f6463a78 [ 47.753026] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 47.753026] CR0: 8005003b CR2: 00000014 CR3: 353e5000 CR4: 00000690 [ 47.753026] Stack: [ 47.753026] c0236866 00000000 f6463aac c05768b4 00000009 f6463ba8 f6463ab0 c0247010 [ 47.753026] 00000000 f645a0c0 f6464000 00000009 f6463ba8 f6463ab8 c0576eb2 f645a0c0 [ 47.753026] f6463aec c0228be4 c06335a4 f6463adc f6463ad0 c06c06d4 f6463ae4 c02471b0 [ 47.753026] Call Trace: [ 47.753026] [<c0236866>] ? wq_worker_sleeping+0xb/0x78 [ 47.753026] [<c05768b4>] __schedule+0xda/0x587 [ 47.753026] [<c0247010>] ? vprintk_default+0x12/0x14 [ 47.753026] [<c0576eb2>] schedule+0x72/0x89 [ 47.753026] [<c0228be4>] do_exit+0xb8/0x71d [ 47.753026] [<c02471b0>] ? kmsg_dump+0xa9/0xae [ 47.753026] [<c0203576>] oops_end+0x69/0x70 [ 47.753026] [<c021dcdb>] no_context+0x1bb/0x1c5 [ 47.753026] [<c021de1b>] __bad_area_nosemaphore+0x136/0x140 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c021de32>] bad_area_nosemaphore+0xd/0x10 [ 47.753026] [<c021e0a1>] __do_page_fault+0x26c/0x320 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c021e2fa>] do_page_fault+0xb/0xd [ 47.753026] [<c05798f8>] error_code+0x58/0x60 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c0239765>] ? kthread_data+0xa/0xe [ 47.753026] [<c0236866>] ? wq_worker_sleeping+0xb/0x78 [ 47.753026] [<c05768b4>] __schedule+0xda/0x587 [ 47.753026] [<c0247010>] ? vprintk_default+0x12/0x14 [ 47.753026] [<c0576eb2>] schedule+0x72/0x89 [ 47.753026] [<c0228be4>] do_exit+0xb8/0x71d [ 47.753026] [<c02471b0>] ? kmsg_dump+0xa9/0xae [ 47.753026] [<c0203576>] oops_end+0x69/0x70 [ 47.753026] [<c021dcdb>] no_context+0x1bb/0x1c5 [ 47.753026] [<c021de1b>] __bad_area_nosemaphore+0x136/0x140 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c021de32>] bad_area_nosemaphore+0xd/0x10 [ 47.753026] [<c021e0a1>] __do_page_fault+0x26c/0x320 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c021e2fa>] do_page_fault+0xb/0xd [ 47.753026] [<c05798f8>] error_code+0x58/0x60 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c0239765>] ? kthread_data+0xa/0xe [ 47.753026] [<c0236866>] ? wq_worker_sleeping+0xb/0x78 [ 47.753026] [<c05768b4>] __schedule+0xda/0x587 [ 47.753026] [<c0391e32>] ? put_io_context_active+0x6d/0x95 [ 47.753026] [<c0576eb2>] schedule+0x72/0x89 [ 47.753026] [<c02291f8>] do_exit+0x6cc/0x71d [ 47.753026] [<c0203576>] oops_end+0x69/0x70 [ 47.753026] [<c021dcdb>] no_context+0x1bb/0x1c5 [ 47.753026] [<c021de1b>] __bad_area_nosemaphore+0x136/0x140 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c021de32>] bad_area_nosemaphore+0xd/0x10 [ 47.753026] [<c021e0a1>] __do_page_fault+0x26c/0x320 [ 47.753026] [<c03b9160>] ? debug_smp_processor_id+0x12/0x16 [ 47.753026] [<c02015e2>] ? __switch_to+0x24/0x40e [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c021e2fa>] do_page_fault+0xb/0xd [ 47.753026] [<c05798f8>] error_code+0x58/0x60 [ 47.753026] [<c021e2ef>] ? vmalloc_sync_all+0x19a/0x19a [ 47.753026] [<c03b59d2>] ? rhashtable_walk_init+0x5c/0x93 [ 47.753026] [<f9843221>] mesh_path_tbl_expire.isra.24+0x19/0x82 [mac80211] [ 47.753026] [<f984408b>] mesh_path_expire+0x11/0x1f [mac80211] [ 47.753026] [<f9842bb7>] ieee80211_mesh_work+0x73/0x1a9 [mac80211] [ 47.753026] [<f98207d1>] ieee80211_iface_work+0x2ff/0x311 [mac80211] [ 47.753026] [<c0235fa3>] process_one_work+0x14b/0x24e [ 47.753026] [<c0236313>] worker_thread+0x249/0x343 [ 47.753026] [<c02360ca>] ? process_scheduled_works+0x24/0x24 [ 47.753026] [<c0239359>] kthread+0x9e/0xa3 [ 47.753026] [<c0578e50>] ret_from_kernel_thread+0x20/0x40 [ 47.753026] [<c02392bb>] ? kthread_parkme+0x18/0x18 [ 47.753026] Code: 6b c0 85 c0 75 05 e8 fb 74 fc ff 89 f8 84 c0 75 08 8d 45 e8 e8 34 dd 33 00 83 c4 28 5b 5e 5f 5d c3 55 8b 80 10 02 00 00 89 e5 5d <8b> 40 f0 c3 55 b9 04 00 00 00 89 e5 52 8b 90 10 02 00 00 8d 45 [ 47.753026] EIP: [<c0239765>] kthread_data+0xa/0xe SS:ESP 0068:f6463a78 [ 47.753026] CR2: 00000000fffffff0 [ 47.753026] ---[ end trace 867ca0bdd0767790 ]--- Fixes: 3b302ada7f0a ("mac80211: mesh: move path tables into if_mesh") Reported-by: Fred Veldini <fred.veldini@gmail.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:53 +02:00
Bob Copeland	68bb54b47e	mac80211: mesh: fix mesh path kerneldoc Several of the mesh path fields are undocumented and some of the documentation is no longer correct or relevant after the switch to rhashtable. Clean up the kernel doc accordingly and reorder some fields to match the structure layout. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:53 +02:00
Bob Copeland	3257523bed	mac80211: mesh: reorder structure members Reduce padding waste in struct mesh_table and struct rmc_entry by moving the smaller fields to the end. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:52 +02:00
Bob Copeland	18b27ff7d2	mac80211: mesh: embed gates hlist head directly Since we have converted the mesh path tables to rhashtable, we are no longer swapping out the entire mesh_pathtbl pointer with RCU. As a result, we no longer need indirection to the hlist head for the gates list and can simply embed it, saving a pair of pointer-sized allocations. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:51 +02:00
Bob Copeland	47a0489ce1	mac80211: mesh: use hlist for rmc cache The RMC cache has 256 list heads plus a u32, which puts it at the unfortunate size of 4104 bytes with padding. kmalloc() will then round this up to the next power-of-two, so we wind up actually using two pages here where most of the second is wasted. Switch to hlist heads here to reduce the structure size down to fit within a page. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:51 +02:00
Bob Copeland	0aa7fabbd5	mac80211: mesh: handle failed alloc for rmc cache In the unlikely case that mesh_rmc_init() fails with -ENOMEM, the rmc pointer will be left as NULL but the interface is still operational because ieee80211_mesh_init_sdata() is not allowed to fail. If this happens, we would blindly dereference rmc when checking whether a multicast frame is in the cache. Instead just drop the frames in the forwarding path. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:50 +02:00
Bob Copeland	749329594b	mac80211: mesh: fix crash in mesh_path_timer The mesh_path_reclaim() function, called from an rcu callback, cancels the mesh_path_timer associated with a mesh path. Unfortunately, this call can happen much later, perhaps after the hash table itself is destroyed. Such a situation led to the following crash in mesh_path_send_to_gates() when dereferencing the tbl pointer: [ 23.901661] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 23.905516] IP: [<ffffffff814c910b>] mesh_path_send_to_gates+0x2b/0x740 [ 23.908757] PGD 99ca067 PUD 99c4067 PMD 0 [ 23.910789] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 23.913485] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.5.0-rc6-wt+ #43 [ 23.916675] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 23.920471] task: ffffffff81685500 ti: ffffffff81678000 task.ti: ffffffff81678000 [ 23.922619] RIP: 0010:[<ffffffff814c910b>] [<ffffffff814c910b>] mesh_path_send_to_gates+0x2b/0x740 [ 23.925237] RSP: 0018:ffff88000b403d30 EFLAGS: 00010286 [ 23.926739] RAX: 0000000000000000 RBX: ffff880009bc0d20 RCX: 0000000000000102 [ 23.928796] RDX: 000000000000002e RSI: 0000000000000001 RDI: ffff880009bc0d20 [ 23.930895] RBP: ffff88000b403e18 R08: 0000000000000001 R09: 0000000000000001 [ 23.932917] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880009c20940 [ 23.936370] R13: ffff880009bc0e70 R14: ffff880009c21c40 R15: ffff880009bc0d20 [ 23.939823] FS: 0000000000000000(0000) GS:ffff88000b400000(0000) knlGS:0000000000000000 [ 23.943688] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 23.946429] CR2: 0000000000000008 CR3: 00000000099c5000 CR4: 00000000000006b0 [ 23.949861] Stack: [ 23.950840] 000000000000002e ffff880009c20940 ffff88000b403da8 ffffffff8109e551 [ 23.954467] ffffffff82711be2 000000000000002e 0000000000000000 ffffffff8166a5f5 [ 23.958141] 0000000000685ce8 0000000000000246 ffff880009bc0d20 ffff880009c20940 [ 23.961801] Call Trace: [ 23.962987] <IRQ> [ 23.963963] [<ffffffff8109e551>] ? vprintk_emit+0x351/0x5e0 [ 23.966782] [<ffffffff8109e8ff>] ? vprintk_default+0x1f/0x30 [ 23.969529] [<ffffffff810ffa41>] ? printk+0x48/0x50 [ 23.971956] [<ffffffff814ceef3>] mesh_path_timer+0x133/0x160 [ 23.974707] [<ffffffff814cedc0>] ? mesh_nexthop_resolve+0x230/0x230 [ 23.977775] [<ffffffff810b04ee>] call_timer_fn+0xce/0x330 [ 23.980448] [<ffffffff810b0425>] ? call_timer_fn+0x5/0x330 [ 23.983126] [<ffffffff814cedc0>] ? mesh_nexthop_resolve+0x230/0x230 [ 23.986091] [<ffffffff810b097c>] run_timer_softirq+0x22c/0x390 Instead of cancelling in the RCU callback, set a new flag to prevent the timer from being rearmed, and then cancel the timer synchronously when freeing the mesh path. This leaves mesh_path_reclaim() doing nothing but kfree, so switch to kfree_rcu(). Fixes: 3b302ada7f0a ("mac80211: mesh: move path tables into if_mesh") Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:49 +02:00
Ayala Beker	52cfa1d614	mac80211: track and tell driver about GO client P2P PS abilities Legacy clients don't support P2P power save mechanism, and thus if a P2P GO has a legacy client connected to it, it should disable P2P PS mechanisms. Let the driver know about this with a new bss_conf parameter. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:49 +02:00
Ayala Beker	17b9424786	cfg80211: allow userspace to specify client P2P PS support Legacy clients don't support P2P power save mechanisms, and thus if a P2P GO has a legacy client connected to it, it has to make some changes in the PS behavior. To handle this, add an attribute to specify whether a station supports P2P PS or not. If the attribute was not specified cfg80211 will assume that station supports it for P2P GO interface, and does NOT support it for AP interface, matching the current assumptions in the code. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:47 +02:00
Johannes Berg	b100e5d622	mac80211: avoid useless memory write on each frame RX In the likely case that probe_count is 0, don't write to the memory there. Also use ifmgd consistently in the function, instead of using sdata->u.mgd as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 21:34:21 +02:00
Johannes Berg	2c61cf9c56	mac80211: fix cipher scheme function name The code is only used with iwlwifi, but still should have proper mac80211 naming scheme; fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 12:12:41 +02:00
Johannes Berg	c84387d2f2	mac80211: clean up station flags debugfs Avoid the really strange %s%s%s expression, use an array of flag names and check that all flags are present. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 12:12:26 +02:00
Johannes Berg	602fae425c	mac80211: don't start dynamic PS timer if not needed If the device implements dynamic PS itself, there's no need to ever start the dynamic powersave timer on RX. While at it, fix up some indentation in this code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 12:11:54 +02:00
Johannes Berg	fc4a25c5b7	mac80211: remove sta_info debugfs sub-struct Since the previous patch, the struct only has a single member, so remove the struct and leave just the single member. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:59:05 +02:00
Mohammed Shafi Shajakhan	96f321c9d4	mac80211: Remove unused variable in per STA debugfs struct Remove unused variable in per STA debugfs structure, 'commit `34e895075e` ("mac80211: allow station add/remove to sleep")' removed the only user of 'add_has_run'. Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:57:11 +02:00
Sara Sharon	1e0bbebaae	mac80211: enable starting BA session with custom timeout Currently the debugfs entry for starting aggregation session starts it with timeout of 5 seconds. Allow opening a session with a custom timeout (according to spec 0 is no timeout). while at it, refactor the function and remove the magic numbers. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:46:05 +02:00
Sara Sharon	e03521232d	mac80211: add NETIF_F_RXCSUM to features white list NETIF_F_RXCSUM is not in the white list, though some drivers may want to set it in order to enable seeing the actual RX checksum status in ethtool. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:45:51 +02:00
Emmanuel Grumbach	f278ce4ffa	mac80211: Set global RRM capability Allow publishing RRM capabilities for features that are not HW dependent. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:44:46 +02:00
Felix Fietkau	06171e9c0b	mac80211: minstrel_ht: improve sample rate skip logic There were a few issues that were slowing down the process of finding the optimal rate, especially on devices with multi-rate retry limitations: When max_tp_rate[0] was slower than max_tp_rate[1], the code did not sample max_tp_rate[1], which would often allow it to switch places with max_tp_rate[0] (e.g. if only the first sampling attempts were bad, but the rate is otherwise good). Also, sample attempts of rates between max_tp_rate[0] and [1] were being ignored in this case, because the code only checked if the rate was slower than [1]. Fix this by checking against the fastest / second fastest max_tp_rate instead of assuming a specific order between the two. In my tests this patch significantly reduces the time until minstrel_ht finds the optimal rate right after assoc Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:40:06 +02:00
Emmanuel Grumbach	f6d4671a08	mac80211: close the SP when we enqueue frames during the SP Since we enqueued the frame that was supposed to be sent during the SP, and that frame may very well cary the IEEE80211_TX_STATUS_EOSP bit, we may never close the SP (WLAN_STA_SP will never be cleared). If that happens, we will not open any new SP and will never respond to any poll frame from the client. Clear WLAN_STA_SP manually if a frame that was polled during the SP is queued because of a starting A-MPDU session. The client may not see the EOSP bit, but it will at least be able to poll new frames in another SP. Reported-by: Alesya Shapira <alesya.shapira@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> [remove erroneous comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:33:49 +02:00
Ilan Peer	4b559ec0bf	mac80211: Fix BW upgrade for TDLS peers It is possible that the station is connected to an AP with bandwidth of 80+80MHz or 160MHz. In such cases there is no need to perform an upgrade as the maximal supported bandwidth is 80MHz. In addition, when upgrading and setting center_freq1 and bandwidth to 80MHz also set center_freq2 to 0. Fixes: `0fabfaafec` ("mac80211: upgrade BW of TDLS peers when possible" Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:26:33 +02:00
Emmanuel Grumbach	facde7f332	mac80211: don't send deferred frames outside the SP Frames that are sent between ampdu_action(IEEE80211_AMPDU_TX_START) and the move to the HT_AGG_STATE_OPERATIONAL state are buffered. If we try to start an A-MPDU session while the peer is sleeping and polling frames with U-APSD, we may have frames that will be buffered by ieee80211_tx_prep_agg. These frames have IEEE80211_TX_CTL_NO_PS_BUFFER set since they are sent to a sleeping client and possibly IEEE80211_TX_STATUS_EOSP. If the frame is buffered, we need clear these two flags since they will be re-sent after the move to HT_AGG_STATE_OPERATIONAL state which is very likely to happen after the SP ends. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:16:50 +02:00
Luis de Bethencourt	c2d45923e3	mac80211: remove description of dropped member Commit `976bd9efda` ("mac80211: move beacon_loss_count into ifmgd") removed the member from the sta_info struct but the description stayed lingering. Remove it. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:12:09 +02:00
Ben Greear	b6bf8c688e	mac80211: ensure no limits on station rhashtable By default, the rhashtable logic will fail to insert objects if the key-chains are too long and un-balanced. In the degenerate case where mac80211 is creating many virtual interfaces connected to the same peer(s), this case can happen. St insecure_elasticity to true to allow chains to grow as long as needed. Signed-off-by: Ben Greear <greearb@candelatech.com> [remove message, change commit message slightly] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 11:06:58 +02:00
Johannes Berg	62b14b241c	mac80211: properly deal with station hashtable insert errors The original hand-implemented hash-table in mac80211 couldn't result in insertion errors, and while converting to rhashtable I evidently forgot to check the errors. This surfaced now only because Ben is adding many identical keys and that resulted in hidden insertion errors. Cc: stable@vger.kernel.org Fixes: `7bedd0cfad` ("mac80211: use rhashtable for station table") Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:58:30 +02:00
Felix Fietkau	07310a6314	mac80211: do not pass injected frames without a valid rate to the driver Fall back to rate control if the requested bitrate was not found. Fixes: `dfdfc2beb0` ("mac80211: Parse legacy and HT rate in injected frames") Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:58:21 +02:00
Sven Eckelmann	f66b60f652	mac80211: fix parsing of 40Mhz in injected radiotap header The MCS bandwidth part of the radiotap header is 2 bits wide. The full 2 bit have to compared against IEEE80211_RADIOTAP_MCS_BW_40 and not only if the first bit is set. Otherwise IEEE80211_RADIOTAP_MCS_BW_40 can be confused with IEEE80211_RADIOTAP_MCS_BW_20U. Fixes: `dfdfc2beb0` ("mac80211: Parse legacy and HT rate in injected frames") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:58:16 +02:00
Arend van Spriel	38de03d2a2	nl80211: add feature for BSS selection support Introducing a new feature that the driver can use to indicate the driver/firmware supports configuration of BSS selection criteria upon CONNECT command. This can be useful when multiple BSS-es are found belonging to the same ESS, ie. Infra-BSS with same SSID. The criteria can then be used to offload selection of a preferred BSS. Reviewed-by: Hante Meuleman <meuleman@broadcom.com> Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com> Reviewed-by: Lei Zhang <leizh@broadcom.com> Signed-off-by: Arend van Spriel <arend@broadcom.com> [move wiphy support check into parse_bss_select()] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:34 +02:00
Sara Sharon	f59374eb42	mac80211: synchronize driver rx queues before removing a station Some devices, like iwlwifi, have RSS queues. This may cause a situation where a disassociation is handled in control path and results in station removal while there are prior RX frames that were still not processed in other queues. When they will be processed the station will be gone, and the frames will be dropped. Add a synchronization interface to avoid that. When driver returns from the synchronization mac80211 may remove the station. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:34 +02:00
Bob Copeland	60854fd945	mac80211: mesh: convert path table to rhashtable In the time since the mesh path table was implemented as an RCU-traversable, dynamically growing hash table, a generic RCU hashtable implementation was added to the kernel. Switch the mesh path table over to rhashtable to remove some code and also gain some features like automatic shrinking. Cc: Thomas Graf <tgraf@suug.ch> Cc: netdev@vger.kernel.org Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:33 +02:00
Bob Copeland	8f6fd83c6c	rhashtable: accept GFP flags in rhashtable_walk_init In certain cases, the 802.11 mesh pathtable code wants to iterate over all of the entries in the forwarding table from the receive path, which is inside an RCU read-side critical section. Enable walks inside atomic sections by allowing GFP_ATOMIC allocations for the walker state. Change all existing callsites to pass in GFP_KERNEL. Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Bob Copeland <me@bobcopeland.com> [also adjust gfs2/glock.c and rhashtable tests] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:32 +02:00
Bob Copeland	947c2a0ecc	mac80211: mesh: embed known gates list in struct mesh_path The mesh path table uses a struct mesh_node in its hlists in order to support a resizable hash table: the mesh_node provides an indirection to the actual mesh path so that two different bucket lists can point to the same path entry. However, for the known gates list, we don't need this indirection because there is ever only one list. So we can just embed the hlist_node in the mesh path itself, which simplifies things a bit and saves a linear search whenever we need to find an item in the list. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:32 +02:00
Bob Copeland	b15dc38b98	mac80211: mesh: factor out common mesh path allocation code Remove duplicate code to allocate and initialize a mesh path or mesh proxy path. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:31 +02:00
Bob Copeland	443954815b	mac80211: mesh: don't hash sdata in mpath tables Now that the sdata pointer is the same for all entries of a path table, hashing it is pointless, so hash only the address. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:31 +02:00
Bob Copeland	2bdaf386f9	mac80211: mesh: move path tables into if_mesh The mesh path and mesh gate hashtables are global, containing all of the mpaths for every mesh interface, but the paths are all tied logically to a single interface. The common case is just a single mesh interface, so optimize for that by moving the global hashtable into the per-interface struct. Doing so allows us to drop sdata pointer comparisons inside the lookups and also saves a few bytes of BSS and data. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:30 +02:00
Jouni Malinen	e345f44f2b	mac80211: Support a scan request for a specific BSSID If the cfg80211 scan trigger operation specifies a single BSSID, use that value instead of the wildcard BSSID in the Probe Request frames. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:28 +02:00
Jouni Malinen	818965d391	cfg80211: Allow a scan request for a specific BSSID This allows scans for a specific BSSID to be optimized by the user space application by requesting the driver to set the Probe Request frame BSSID field (Address 3) to the specified BSSID instead of the wildcard BSSID. This prevents other APs from replying which reduces airtime need and latency in getting the response from the target AP through. This is an optimization and as such, it is acceptable for some of the drivers not to support the mechanism. If not supported, the wildcard BSSID will be used and more responses may be received. Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:28 +02:00
Arik Nemtsov	aa507a7bc5	mac80211: recalc min_def chanctx even when chandef is identical The min_def chanctx is affected not only by the current chandef, but sometimes also by other stations on the vif. There's a valid scenario where a TDLS peer can widen its BW, thereby causing the min_def to increase. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:51:08 +02:00
Arik Nemtsov	59021c6759	mac80211: TDLS: change BW calculation for WIDER_BW peers The previous approach simply ignored chandef restrictions when calculating the appropriate peer BW for a WIDER_BW peer. This could result in a regulatory violation if both peers indicated 80MHz support, but the regdomain forbade it. Change the approach to setting a WIDER_BW peer's BW. Don't exempt it from the chandef width at first. If during TDLS negotiation the chandef width is upgraded, update the peer's BW to match. Fixes: `0fabfaafec` ("mac80211: upgrade BW of TDLS peers when possible") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:50:52 +02:00
Arik Nemtsov	db8d99774c	mac80211: TDLS: always downgrade invalid chandefs Even if the current chandef width is equal to the station's max-BW, it doesn't mean it's a valid width for TDLS. Make sure to always check regulatory constraints in these cases. Fixes: `0fabfaafec` ("mac80211: upgrade BW of TDLS peers when possible") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:50:32 +02:00
Felix Fietkau	c3732a7b37	mac80211: fix AP buffered multicast frames with queue control and txq Buffered multicast frames must be passed to the driver directly via drv_tx instead of going through the txq, otherwise they cannot easily be scheduled to be sent after DTIM. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:50:17 +02:00
Sara Sharon	f980ebc058	mac80211: allow not sending MIC up from driver for HW crypto When HW crypto is used, there's no need for the CCMP/GCMP MIC to be available to mac80211, and the hardware might have removed it already after checking. The MIC is also useless to have when the frame is already decrypted, so allow indicating that it's not present. Since we are running out of bits in mac80211_rx_flags, make the flags field a u64. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:48:56 +02:00
Johannes Berg	162dd6a725	mac80211: allow drivers to report CLOCK_BOOTTIME for scan results This was requested by Android, and the appropriate cfg80211 API had been added by Dmitry. Support it in mac80211, allowing drivers to provide the timestamp. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:48:55 +02:00
Lorenzo Bianconi	646e76bb5d	mac80211: parse VHT info in injected frames Add VHT radiotap parsing support to ieee80211_parse_tx_radiotap(). That capability has been tested using a d-link dir-860l rev b1 running OpenWrt trunk and mt76 driver Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:48:54 +02:00
João Paulo Rechi Vita	1948b2a2ec	rfkill: Use switch to demux userspace operations Using a switch to handle different ev.op values in rfkill_fop_write() makes the code easier to extend, as out-of-range values can always be handled by the default case. Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> [roll in fix for RFKILL_OP_CHANGE from Jouni] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:48:53 +02:00
Johannes Berg	98bd147d79	wext: unregister_pernet_subsys() on notifier registration failure If register_netdevice_notifier() fails (which in practice it can't right now), we should call unregister_pernet_subsys(). Do that. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:48:44 +02:00
Eric Dumazet	4ce7e93cb3	tcp: rate limit ACK sent by SYN_RECV request sockets Attackers like to use SYNFLOOD targeting one 5-tuple, as they hit a single RX queue (and cpu) on the victim. If they use random sequence numbers in their SYN, we detect they do not match the expected window and send back an ACK. This patch adds a rate limitation, so that the effect of such attacks is limited to ingress only. We roughly double our ability to absorb such attacks. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Maciej Żenczykowski <maze@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:20 -04:00
Eric Dumazet	a9d6532b56	ipv4: tcp: set SOCK_USE_WRITE_QUEUE for ip_send_unicast_reply() TCP uses per cpu 'sockets' to send some packets : - RST packets ( tcp_v4_send_reset()) ) - ACK packets for SYN_RECV and TIMEWAIT sockets By setting SOCK_USE_WRITE_QUEUE flag, we tell sock_wfree() to not call sk_write_space() since these internal sockets do not care. This gives a small performance improvement, merely by allowing cpu to properly predict the sock_wfree() conditional branch, and avoiding one atomic operation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:20 -04:00
Eric Dumazet	9caad86415	tcp: increment sk_drops for listeners Goal: packets dropped by a listener are accounted for. This adds tcp_listendrop() helper, and clears sk_drops in sk_clone_lock() so that children do not inherit their parent drop count. Note that we no longer increment LINUX_MIB_LISTENDROPS counter when sending a SYNCOOKIE, since the SYN packet generated a SYNACK. We already have a separate LINUX_MIB_SYNCOOKIESSENT Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:20 -04:00
Eric Dumazet	532182cd61	tcp: increment sk_drops for dropped rx packets Now ss can report sk_drops, we can instruct TCP to increment this per socket counter when it drops an incoming frame, to refine monitoring and debugging. Following patch takes care of listeners drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:20 -04:00
Eric Dumazet	15239302ed	sock_diag: add SK_MEMINFO_DROPS Reporting sk_drops to user space was available for UDP sockets using /proc interface. Add this to sock_diag, so that we can have the same information available to ss users, and we'll be able to add sk_drops indications for TCP sockets as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:20 -04:00
Eric Dumazet	3b24d854cb	tcp/dccp: do not touch listener sk_refcnt under synflood When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes. By letting listeners use SOCK_RCU_FREE infrastructure, we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets, only listeners are impacted by this change. Peak performance under SYNFLOOD is increased by ~33% : On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps Most consuming functions are now skb_set_owner_w() and sock_wfree() contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:20 -04:00
Eric Dumazet	2d331915a0	tcp/dccp: use rcu locking in inet_diag_find_one_icsk() RX packet processing holds rcu_read_lock(), so we can remove pairs of rcu_read_lock()/rcu_read_unlock() in lookup functions if inet_diag also holds rcu before calling them. This is needed anyway as __inet_lookup_listener() and inet6_lookup_listener() will soon no longer increment refcount on the found listener. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:19 -04:00
Eric Dumazet	ee3cf32a4a	tcp/dccp: remove BH disable/enable in lookup Since linux 2.6.29, lookups only use rcu locking. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:19 -04:00
Eric Dumazet	ca065d0cf8	udp: no longer use SLAB_DESTROY_BY_RCU Tom Herbert would like not touching UDP socket refcnt for encapsulated traffic. For this to happen, we need to use normal RCU rules, with a grace period before freeing a socket. UDP sockets are not short lived in the high usage case, so the added cost of call_rcu() should not be a concern. This actually removes a lot of complexity in UDP stack. Multicast receives no longer need to hold a bucket spinlock. Note that ip early demux still needs to take a reference on the socket. Same remark for functions used by xt_socket and xt_PROXY netfilter modules, but this might be changed later. Performance for a single UDP socket receiving flood traffic from many RX queues/cpus. Simple udp_rx using simple recvfrom() loop : 438 kpps instead of 374 kpps : 17 % increase of the peak rate. v2: Addressed Willem de Bruijn feedback in multicast handling - keep early demux break in __udp4_lib_demux_lookup() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Willem de Bruijn <willemb@google.com> Tested-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:19 -04:00
Eric Dumazet	a4298e4522	net: add SOCK_RCU_FREE socket flag We want a generic way to insert an RCU grace period before socket freeing for cases where RCU_SLAB_DESTROY_BY_RCU is adding too much overhead. SLAB_DESTROY_BY_RCU strict rules force us to take a reference on the socket sk_refcnt, and it is a performance problem for UDP encapsulation, or TCP synflood behavior, as many CPUs might attempt the atomic operations on a shared sk_refcnt UDP sockets and TCP listeners can set SOCK_RCU_FREE so that their lookup can use traditional RCU rules, without refcount changes. They can set the flag only once hashed and visible by other cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Tested-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:19 -04:00
Bastien Philbert	c862cc9b70	bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr This fixes the incorrect variable assignment on error path in br_sysfs_addbr for when the call to kobject_create_and_add fails to assign the value of -EINVAL to the returned variable of err rather then incorrectly return zero making callers think this function has succeededed due to the previous assignment being assigned zero when assigning it the successful return value of the call to sysfs_create_group which is zero. Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 16:12:37 -04:00
Haishuang Yan	be447f3054	ipv6: l2tp: fix a potential issue in l2tp_ip6_recv pskb_may_pull() can change skb->data, so we have to load ptr/optr at the right place. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 16:00:28 -04:00
Haishuang Yan	5745b8232e	ipv4: l2tp: fix a potential issue in l2tp_ip_recv pskb_may_pull() can change skb->data, so we have to load ptr/optr at the right place. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 16:00:28 -04:00
Soheil Hassas Yeganeh	c14ac9451c	sock: enable timestamping using control messages Currently, SOL_TIMESTAMPING can only be enabled using setsockopt. This is very costly when users want to sample writes to gather tx timestamps. Add support for enabling SO_TIMESTAMPING via control messages by using tsflags added in `struct sockcm_cookie` (added in the previous patches in this series) to set the tx_flags of the last skb created in a sendmsg. With this patch, the timestamp recording bits in tx_flags of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg. Please note that this is only effective for overriding the recording timestamps flags. Users should enable timestamp reporting (e.g., SOF_TIMESTAMPING_SOFTWARE \| SOF_TIMESTAMPING_OPT_ID) using socket options and then should ask for SOF_TIMESTAMPING_TX_* using control messages per sendmsg to sample timestamps for each write. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 15:50:30 -04:00
Soheil Hassas Yeganeh	ad1e46a837	ipv6: process socket-level control messages in IPv6 Process socket-level control messages by invoking __sock_cmsg_send in ip6_datagram_send_ctl for control messages on the SOL_SOCKET layer. This makes sure whenever ip6_datagram_send_ctl is called for udp and raw, we also process socket-level control messages. This is a bit uglier than IPv4, since IPv6 does not have something like ipcm_cookie. Perhaps we can later create a control message cookie for IPv6? Note that this commit interprets new control messages that were ignored before. As such, this commit does not change the behavior of IPv6 control messages. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 15:50:30 -04:00
Soheil Hassas Yeganeh	24025c465f	ipv4: process socket-level control messages in IPv4 Process socket-level control messages by invoking __sock_cmsg_send in ip_cmsg_send for control messages on the SOL_SOCKET layer. This makes sure whenever ip_cmsg_send is called in udp, icmp, and raw, we also process socket-level control messages. Note that this commit interprets new control messages that were ignored before. As such, this commit does not change the behavior of IPv4 control messages. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 15:50:30 -04:00
Soheil Hassas Yeganeh	3dd17e63f5	sock: accept SO_TIMESTAMPING flags in socket cmsg Accept SO_TIMESTAMPING in control messages of the SOL_SOCKET level as a basis to accept timestamping requests per write. This implementation only accepts TX recording flags (i.e., SOF_TIMESTAMPING_TX_HARDWARE, SOF_TIMESTAMPING_TX_SOFTWARE, SOF_TIMESTAMPING_TX_SCHED, and SOF_TIMESTAMPING_TX_ACK) in control messages. Users need to set reporting flags (e.g., SOF_TIMESTAMPING_OPT_ID) per socket via socket options. This commit adds a tsflags field in sockcm_cookie which is set in __sock_cmsg_send. It only override the SOF_TIMESTAMPING_TX_* bits in sockcm_cookie.tsflags allowing the control message to override the recording behavior per write, yet maintaining the value of other flags. This patch implements validating the control message and setting tsflags in struct sockcm_cookie. Next commits in this series will actually implement timestamping per write for different protocols. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 15:50:30 -04:00
Soheil Hassas Yeganeh	6b084928ba	tcp: use one bit in TCP_SKB_CB to mark ACK timestamps Currently, to avoid a cache line miss for accessing skb_shinfo, tcp_ack_tstamp skips socket that do not have SOF_TIMESTAMPING_TX_ACK bit set in sk_tsflags. This is implemented based on an implicit assumption that the SOF_TIMESTAMPING_TX_ACK is set via socket options for the duration that ACK timestamps are needed. To implement per-write timestamps, this check should be removed and replaced with a per-packet alternative that quickly skips packets missing ACK timestamps marks without a cache-line miss. To enable per-packet marking without a cache line miss, use one bit in TCP_SKB_CB to mark a whether a SKB might need a ack tx timestamp or not. Further checks in tcp_ack_tstamp are not modified and work as before. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 15:50:29 -04:00
Soheil Hassas Yeganeh	6db8b963a7	tcp: accept SOF_TIMESTAMPING_OPT_ID for passive TFO SOF_TIMESTAMPING_OPT_ID is set to get data-independent IDs to associate timestamps with send calls. For TCP connections, tp->snd_una is used as the starting point to calculate relative IDs. This socket option will fail if set before the handshake on a passive TCP fast open connection with data in SYN or SYN/ACK, since setsockopt requires the connection to be in the ESTABLISHED state. To address these, instead of limiting the option to the ESTABLISHED state, accept the SOF_TIMESTAMPING_OPT_ID option as long as the connection is not in LISTEN or CLOSE states. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 15:50:29 -04:00
Willem de Bruijn	39771b127b	sock: break up sock_cmsg_snd into __sock_cmsg_snd and loop To process cmsg's of the SOL_SOCKET level in addition to cmsgs of another level, protocols can call sock_cmsg_send(). This causes a double walk on the cmsghdr list, one for SOL_SOCKET and one for the other level. Extract the inner demultiplex logic from the loop that walks the list, to allow having this called directly from a walker in the protocol specific code. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 15:50:29 -04:00
Linus Torvalds	4a2d057e4f	Merge branch 'PAGE_CACHE_SIZE-removal' Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov: "PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. Let's stop pretending that pages in page cache are special. They are not. The first patch with most changes has been done with coccinelle. The second is manual fixups on top. The third patch removes macros definition" [ I was planning to apply this just before rc2, but then I spaced out, so here it is right _after_ rc2 instead. As Kirill suggested as a possibility, I could have decided to only merge the first two patches, and leave the old interfaces for compatibility, but I'd rather get it all done and any out-of-tree modules and patches can trivially do the converstion while still also working with older kernels, so there is little reason to try to maintain the redundant legacy model. - Linus ] * PAGE_CACHE_SIZE-removal: mm: drop PAGE_CACHE_* and page_cache_{get,release} definition mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	2016-04-04 10:50:24 -07:00
Kirill A. Shutemov	ea1754a084	mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Herbert Xu	ef609c238a	sunrpc: Fix skcipher/shash conversion The skcpiher/shash conversion introduced a number of bugs in the sunrpc code: 1) Missing calls to skcipher_request_set_tfm lead to crashes. 2) The allocation size of shash_desc is too small which leads to memory corruption. Fixes: `3b5cf20cf4` ("sunrpc: Use skcipher and ahash/shash") Reported-by: J. Bruce Fields <bfields@fieldses.org> Tested-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-04-04 21:45:51 +08:00
Haishuang Yan	7822ce73e6	netlink: use nla_get_in_addr and nla_put_in_addr for ipv4 address Since nla_get_in_addr and nla_put_in_addr were implemented, so use them appropriately. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-02 20:15:58 -04:00
Yuchung Cheng	2349262397	tcp: remove cwnd moderation after recovery For non-SACK connections, cwnd is lowered to inflight plus 3 packets when the recovery ends. This is an optional feature in the NewReno RFC 2582 to reduce the potential burst when cwnd is "re-opened" after recovery and inflight is low. This feature is questionably effective because of PRR: when the recovery ends (i.e., snd_una == high_seq) NewReno holds the CA_Recovery state for another round trip to prevent false fast retransmits. But if the inflight is low, PRR will overwrite the moderated cwnd in tcp_cwnd_reduction() later regardlessly. So if a receiver responds bogus ACKs (i.e., acking future data) to speed up transfer after recovery, it can only induce a burst up to a window worth of data packets by acking up to SND.NXT. A restart from (short) idle or receiving streched ACKs can both cause such bursts as well. On the other hand, if the recovery ends because the sender detects the losses were spurious (e.g., reordering). This feature unconditionally lowers a reverted cwnd even though nothing was lost. By principle loss recovery module should not update cwnd. Further pacing is much more effective to reduce burst. Hence this patch removes the cwnd moderation feature. v2 changes: revised commit message on bogus ACKs and burst, and missing signature Signed-off-by: Matt Mathis <mattmathis@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-02 20:11:43 -04:00
Linus Torvalds	05cf8077e5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Missing device reference in IPSEC input path results in crashes during device unregistration. From Subash Abhinov Kasiviswanathan. 2) Per-queue ISR register writes not being done properly in macb driver, from Cyrille Pitchen. 3) Stats accounting bugs in bcmgenet, from Patri Gynther. 4) Lightweight tunnel's TTL and TOS were swapped in netlink dumps, from Quentin Armitage. 5) SXGBE driver has off-by-one in probe error paths, from Rasmus Villemoes. 6) Fix race in save/swap/delete options in netfilter ipset, from Vishwanath Pai. 7) Ageing time of bridge not set properly when not operating over a switchdev device. Fix from Haishuang Yan. 8) Fix GRO regression wrt nested FOU/GUE based tunnels, from Alexander Duyck. 9) IPV6 UDP code bumps wrong stats, from Eric Dumazet. 10) FEC driver should only access registers that actually exist on the given chipset, fix from Fabio Estevam. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits) net: mvneta: fix changing MTU when using per-cpu processing stmmac: fix MDIO settings Revert "stmmac: Fix 'eth0: No PHY found' regression" stmmac: fix TX normal DESC net: mvneta: use cache_line_size() to get cacheline size net: mvpp2: use cache_line_size() to get cacheline size net: mvpp2: fix maybe-uninitialized warning tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter net: usb: cdc_ncm: adding Telit LE910 V2 mobile broadband card rtnl: fix msg size calculation in if_nlmsg_size() fec: Do not access unexisting register in Coldfire net: mvneta: replace MVNETA_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES net: mvpp2: replace MVPP2_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES net: dsa: mv88e6xxx: Clear the PDOWN bit on setup net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read, write} bpf: make padding in bpf_tunnel_key explicit ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates bnxt_en: Fix ethtool -a reporting. bnxt_en: Fix typo in bnxt_hwrm_set_pause_common(). bnxt_en: Implement proper firmware message padding. ...	2016-04-01 20:03:33 -05:00
Daniel Borkmann	5a5abb1fa3	tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter Sasha Levin reported a suspicious rcu_dereference_protected() warning found while fuzzing with trinity that is similar to this one: [ 52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage! [ 52.765688] other info that might help us debug this: [ 52.765695] rcu_scheduler_active = 1, debug_locks = 1 [ 52.765701] 1 lock held by a.out/1525: [ 52.765704] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20 [ 52.765721] stack backtrace: [ 52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264 [...] [ 52.765768] Call Trace: [ 52.765775] [<ffffffff813e488d>] dump_stack+0x85/0xc8 [ 52.765784] [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110 [ 52.765792] [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90 [ 52.765801] [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun] [ 52.765810] [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun] [ 52.765818] [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210 [ 52.765827] [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun] [ 52.765834] [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690 [ 52.765843] [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60 [ 52.765850] [<ffffffff81261519>] SyS_ioctl+0x79/0x90 [ 52.765858] [<ffffffff81003ba2>] do_syscall_64+0x62/0x140 [ 52.765866] [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25 Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH, DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices. Since the fix in `f91ff5b9ff` ("net: sk_{detach\|attach}_filter() rcu fixes") sk_attach_filter()/sk_detach_filter() now dereferences the filter with rcu_dereference_protected(), checking whether socket lock is held in control path. Since its introduction in `9940516259` ("tun: socket filter support"), tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the sock_owned_by_user(sk) doesn't apply in this specific case and therefore triggers the false positive. Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair that is used by tap filters and pass in lockdep_rtnl_is_held() for the rcu_dereference_protected() checks instead. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-01 14:33:46 -04:00
Nicolas Dichtel	c57c7a95da	rtnl: fix msg size calculation in if_nlmsg_size() Size of the attribute IFLA_PHYS_PORT_NAME was missing. Fixes: `db24a9044e` ("net: add support for phys_port_name") CC: David Ahern <dsahern@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-31 16:49:54 -04:00
Daniel Borkmann	c0e760c9c6	bpf: make padding in bpf_tunnel_key explicit Make the 2 byte padding in struct bpf_tunnel_key between tunnel_ttl and tunnel_label members explicit. No issue has been observed, and gcc/llvm does padding for the old struct already, where tunnel_label was not yet present, so the current code works, but since it's part of uapi, make sure we don't introduce holes in structs. Therefore, add tunnel_ext that we can use generically in future (f.e. to flag OAM messages for backends, etc). Also add the offset to the compat tests to be sure should some compilers not padd the tail of the old version of bpf_tunnel_key. Fixes: `4018ab1875` ("bpf: support flow label for bpf_skb_{set, get}_tunnel_key") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-30 19:01:33 -04:00
Eric Dumazet	2d4212261f	ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates IPv6 counters updates use a different macro than IPv4. Fixes: `36cbb2452c` ("udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-30 19:01:33 -04:00
Alexander Duyck	c3483384ee	gro: Allow tunnel stacking in the case of FOU/GUE This patch should fix the issues seen with a recent fix to prevent tunnel-in-tunnel frames from being generated with GRO. The fix itself is correct for now as long as we do not add any devices that support NETIF_F_GSO_GRE_CSUM. When such a device is added it could have the potential to mess things up due to the fact that the outer transport header points to the outer UDP header and not the GRE header as would be expected. Fixes: `fac8e0f579` ("tunnels: Don't apply GRO to multiple layers of encapsulation.") Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-30 16:02:33 -04:00
Marcelo Ricardo Leitner	28fd34985b	sctp: really allow using GFP_KERNEL on sctp_packet_transmit Somehow my patch for commit `cea8768f33` ("sctp: allow sctp_transmit_packet and others to use gfp") missed two important chunks, which are now added. Fixes: `cea8768f33` ("sctp: allow sctp_transmit_packet and others to use gfp") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-By: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-30 15:41:22 -04:00
Haishuang Yan	5e263f7126	bridge: Allow set bridge ageing time when switchdev disabled When NET_SWITCHDEV=n, switchdev_port_attr_set will return -EOPNOTSUPP, we should ignore this error code and continue to set the ageing time. Fixes: `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-30 15:38:13 -04:00
Liping Zhang	8fef24ca90	netfilter: ip6t_SYNPROXY: remove magic number for hop_limit Replace '64' with the per-net ipv6_devconf_all's hop_limit when building the ipv6 header. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-29 13:31:17 +02:00
Stephane Bryant	8d45ff22f1	netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR This makes nf queues use NFQA_VLAN and NFQA_L2HDR in verdict to modify the original skb Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-29 13:29:30 +02:00
Stephane Bryant	15824ab29f	netfilter: bridge: pass L2 header and VLAN as netlink attributes in queues to userspace - This creates 2 netlink attribute NFQA_VLAN and NFQA_L2HDR. - These are filled up for the PF_BRIDGE family on the way to userspace. - NFQA_VLAN is a nested attribute, with the NFQA_VLAN_PROTO and the NFQA_VLAN_TCI carrying the corresponding vlan_proto and vlan_tci fields from the skb using big endian ordering (and using the CFI bit as the VLAN_TAG_PRESENT flag in vlan_tci as in the skb) Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-29 13:26:38 +02:00

... 2 3 4 5 6 ...

41725 Commits