linux

Author	SHA1	Message	Date
Ben Hutchings	baceced932	batman-adv: Fix double-put of vlan object Each batadv_tt_local_entry hold a single reference to a batadv_softif_vlan. In case a new entry cannot be added to the hash table, the error path puts the reference, but the reference will also now be dropped by batadv_tt_local_entry_release(). Fixes: `a33d970d0b` ("batman-adv: Fix reference counting of vlan object for tt_local_entry") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-29 04:01:47 -04:00
Sven Eckelmann	9c4604a298	batman-adv: Fix use-after-free/double-free of tt_req_node The tt_req_node is added and removed from a list inside a spinlock. But the locking is sometimes removed even when the object is still referenced and will be used later via this reference. For example batadv_send_tt_request can create a new tt_req_node (including add to a list) and later re-acquires the lock to remove it from the list and to free it. But at this time another context could have already removed this tt_req_node from the list and freed it. CPU#0 batadv_batman_skb_recv from net_device 0 -> batadv_iv_ogm_receive -> batadv_iv_ogm_process -> batadv_iv_ogm_process_per_outif -> batadv_tvlv_ogm_receive -> batadv_tvlv_ogm_receive -> batadv_tvlv_containers_process -> batadv_tvlv_call_handler -> batadv_tt_tvlv_ogm_handler_v1 -> batadv_tt_update_orig -> batadv_send_tt_request -> batadv_tt_req_node_new spin_lock(...) allocates new tt_req_node and adds it to list spin_unlock(...) return tt_req_node CPU#1 batadv_batman_skb_recv from net_device 1 -> batadv_recv_unicast_tvlv -> batadv_tvlv_containers_process -> batadv_tvlv_call_handler -> batadv_tt_tvlv_unicast_handler_v1 -> batadv_handle_tt_response spin_lock(...) tt_req_node gets removed from list and is freed spin_unlock(...) CPU#0 <- returned to batadv_send_tt_request spin_lock(...) tt_req_node gets removed from list and is freed MEMORY CORRUPTION/SEGFAULT/... spin_unlock(...) This can only be solved via reference counting to allow multiple contexts to handle the list manipulation while making sure that only the last context holding a reference will free the object. Fixes: `a73105b8d4` ("batman-adv: improved client announcement mechanism") Signed-off-by: Sven Eckelmann <sven@narfation.org> Tested-by: Martin Weinelt <martin@darmstadt.freifunk.net> Tested-by: Amadeus Alfa <amadeus@chemnitz.freifunk.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-29 04:01:47 -04:00
Simon Wunderlich	0b3dd7dfb8	batman-adv: replace WARN with rate limited output on non-existing VLAN If a VLAN tagged frame is received and the corresponding VLAN is not configured on the soft interface, it will splat a WARN on every packet received. This is a quite annoying behaviour for some scenarios, e.g. if bat0 is bridged with eth0, and there are arbitrary VLAN tagged frames from Ethernet coming in without having any VLAN configuration on bat0. The code should probably create vlan objects on the fly and transparently transport these VLAN-tagged Ethernet frames, but until this is done, at least the WARN splat should be replaced by a rate limited output. Fixes: `354136bcc3` ("batman-adv: fix kernel crash due to missing NULL checks") Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-29 04:01:47 -04:00
daniel	0888d5f3c0	Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address The bridge is falsly dropping ipv6 mulitcast packets if there is: 1. No ipv6 address assigned on the brigde. 2. No external mld querier present. 3. The internal querier enabled. When the bridge fails to build mld queries, because it has no ipv6 address, it slilently returns, but keeps the local querier enabled. This specific case causes confusing packet loss. Ipv6 multicast snooping can only work if: a) An external querier is present OR b) The bridge has an ipv6 address an is capable of sending own queries Otherwise it has to forward/flood the ipv6 multicast traffic, because snooping cannot work. This patch fixes the issue by adding a flag to the bridge struct that indicates that there is currently no ipv6 address assinged to the bridge and returns a false state for the local querier in __br_multicast_querier_exists(). Special thanks to Linus Lüssing. Fixes: `d1d81d4c3d` ("bridge: check return value of ipv6_dev_get_saddr()") Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-28 08:03:04 -04:00
Jouni Malinen	126e755732	mac80211: Fix mesh estab_plinks counting in STA removal case If a user space program (e.g., wpa_supplicant) deletes a STA entry that is currently in NL80211_PLINK_ESTAB state, the number of established plinks counter was not decremented and this could result in rejecting new plink establishment before really hitting the real maximum plink limit. For !user_mpm case, this decrementation is handled by mesh_plink_deactive(). Fix this by decrementing estab_plinks on STA deletion (mesh_sta_cleanup() gets called from there) so that the counter has a correct value and the Beacon frame advertisement in Mesh Configuration element shows the proper value for capability to accept additional peers. Cc: stable@vger.kernel.org Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>	2016-06-28 12:39:50 +02:00
Amitoj Kaur Chawla	56e2f23b72	caif: Remove unneeded header file Drop redundant include of moduleparam.h The Coccinelle semantic patch used to make this change is as follows: @ includesmodule @ @@ #include <linux/module.h> @ depends on includesmodule @ @@ - #include <linux/moduleparam.h> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-28 05:26:14 -04:00
David Ahern	637c841dd7	net: diag: Add support to filter on device index Add support to inet_diag facility to filter sockets based on device index. If an interface index is in the filter only sockets bound to that index (sk_bound_dev_if) are returned. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-28 05:25:04 -04:00
Tom Goff	70a0dec451	ipmr/ip6mr: Initialize the last assert time of mfc entries. This fixes wrong-interface signaling on 32-bit platforms for entries created when jiffies > 2^31 + MFC_ASSERT_THRESH. Signed-off-by: Tom Goff <thomas.goff@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-28 04:14:09 -04:00
Heiko Carstens	eb090ad2ad	s390/iucv: use basic blocks for iucv inline assemblies Use only simple inline assemblies which consist of a single basic block if the register asm construct is being used. Otherwise gcc would generate broken code if the compiler option --sanitize-coverage=trace-pc would be used. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2016-06-28 09:32:31 +02:00
Huw Davies	3f09354ac8	netlabel: Implement CALIPSO config functions for SMACK. SMACK uses similar functions to control CIPSO, these are the equivalent functions for CALIPSO and follow exactly the same semantics. int netlbl_cfg_calipso_add(struct calipso_doi doi_def, struct netlbl_audit audit_info) Adds a CALIPSO doi. void netlbl_cfg_calipso_del(u32 doi, struct netlbl_audit audit_info) Removes a CALIPSO doi. int netlbl_cfg_calipso_map_add(u32 doi, const char domain, const struct in6_addr addr, const struct in6_addr mask, struct netlbl_audit *audit_info) Creates a mapping between a domain and a CALIPSO doi. If addr and mask are non-NULL this creates an address-selector type mapping. This also extends netlbl_cfg_map_del() to remove IPv6 address-selector mappings. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:06:18 -04:00
Huw Davies	4fee5242bf	calipso: Add a label cache. This works in exactly the same way as the CIPSO label cache. The idea is to allow the lsm to cache the result of a secattr lookup so that it doesn't need to perform the lookup for every skbuff. It introduces two sysctl controls: calipso_cache_enable - enables/disables the cache. calipso_cache_bucket_size - sets the size of a cache bucket. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:06:17 -04:00
Huw Davies	2e532b7028	calipso: Add validation of CALIPSO option. Lengths, checksum and the DOI are checked. Checking of the level and categories are left for the socket layer. CRC validation is performed in the calipso module to avoid unconditionally linking crc_ccitt() into ipv6. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:06:17 -04:00
Huw Davies	a04e71f631	netlabel: Pass a family parameter to netlbl_skbuff_err(). This makes it possible to route the error to the appropriate labelling engine. CALIPSO is far less verbose than CIPSO when encountering a bogus packet, so there is no need for a CALIPSO error handler. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:06:16 -04:00
Huw Davies	2917f57b6b	calipso: Allow the lsm to label the skbuff directly. In some cases, the lsm needs to add the label to the skbuff directly. A NF_INET_LOCAL_OUT IPv6 hook is added to selinux to match the IPv4 behaviour. This allows selinux to label the skbuffs that it requires. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:06:15 -04:00
Huw Davies	0868383b82	ipv6: constify the skb pointer of ipv6_find_tlv(). Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:06:15 -04:00
Huw Davies	e1adea9270	calipso: Allow request sockets to be relabelled by the lsm. Request sockets need to have a label that takes into account the incoming connection as well as their parent's label. This is used for the outgoing SYN-ACK and for their child full-socket. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:05:29 -04:00
Huw Davies	56ac42bc94	ipv6: Allow request socks to contain IPv6 options. If set, these will take precedence over the parent's options during both sending and child creation. If they're not set, the parent's options (if any) will be used. This is to allow the security_inet_conn_request() hook to modify the IPv6 options in just the same way that it already may do for IPv4. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:05:28 -04:00
Huw Davies	ceba1832b1	calipso: Set the calipso socket label to match the secattr. CALIPSO is a hop-by-hop IPv6 option. A lot of this patch is based on the equivalent CISPO code. The main difference is due to manipulating the options in the hop-by-hop header. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:51 -04:00
Huw Davies	3faa8f982f	netlabel: Move bitmap manipulation functions to the NetLabel core. This is to allow the CALIPSO labelling engine to use these. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:51 -04:00
Huw Davies	e67ae213c7	ipv6: Add ipv6_renew_options_kern() that accepts a kernel mem pointer. The functionality is equivalent to ipv6_renew_options() except that the newopt pointer is in kernel, not user, memory The kernel memory implementation will be used by the CALIPSO network labelling engine, which needs to be able to set IPv6 hop-by-hop options. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:50 -04:00
Huw Davies	d7cce01504	netlabel: Add support for removing a CALIPSO DOI. Remove a specified DOI through the NLBL_CALIPSO_C_REMOVE command. It requires the attribute: NLBL_CALIPSO_A_DOI. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:49 -04:00
Huw Davies	dc7de73f19	netlabel: Add support for creating a CALIPSO protocol domain mapping. This extends the NLBL_MGMT_C_ADD and NLBL_MGMT_C_ADDDEF commands to accept CALIPSO protocol DOIs. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:49 -04:00
Huw Davies	e1ce69df7e	netlabel: Add support for enumerating the CALIPSO DOI list. Enumerate the DOI list through the NLBL_CALIPSO_C_LISTALL command. It takes no attributes. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:48 -04:00
Huw Davies	a5e34490c3	netlabel: Add support for querying a CALIPSO DOI. Query a specified DOI through the NLBL_CALIPSO_C_LIST command. It requires the attribute: NLBL_CALIPSO_A_DOI. The reply will contain: NLBL_CALIPSO_A_MTYPE Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:47 -04:00
Huw Davies	cb72d38211	netlabel: Initial support for the CALIPSO netlink protocol. CALIPSO is a packet labelling protocol for IPv6 which is very similar to CIPSO. It is specified in RFC 5570. Much of the code is based on the current CIPSO code. This adds support for adding passthrough-type CALIPSO DOIs through the NLBL_CALIPSO_C_ADD command. It requires attributes: NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS. NLBL_CALIPSO_A_DOI. In passthrough mode the CALIPSO engine will map MLS secattr levels and categories directly to the packet label. At this stage, the major difference between this and the CIPSO code is that IPv6 may be compiled as a module. To allow for this the CALIPSO functions are registered at module init time. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:46 -04:00
Huw Davies	8f18e675c3	netlabel: Add an address family to domain hash entries. The reason is to allow different labelling protocols for different address families with the same domain. This requires the addition of an address family attribute in the netlink communication protocol. It is used in several messages: NLBL_MGMT_C_ADD and NLBL_MGMT_C_ADDDEF take it as an optional attribute for the unlabelled protocol. It may be one of AF_INET, AF_INET6 or AF_UNSPEC (to specify both address families). If it is missing, it defaults to AF_UNSPEC. NLBL_MGMT_C_LISTALL and NLBL_MGMT_C_LISTDEF return it as part of the enumeration of each item. Addtionally, it may be sent to LISTDEF to specify which address family to return. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:46 -04:00
Huw Davies	96a8f7f88d	netlabel: Mark rcu pointers with __rcu. This fixes sparse errors of the form: incompatible types in comparison expression (different address spaces) Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2016-06-27 15:02:45 -04:00
Stefan Hajnoczi	4192f672fa	vsock: make listener child lock ordering explicit There are several places where the listener and pending or accept queue child sockets are accessed at the same time. Lockdep is unhappy that two locks from the same class are held. Tell lockdep that it is safe and document the lock ordering. Originally Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> sent a similar patch asking whether this is safe. I have audited the code and also covered the vsock_pending_work() function. Suggested-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-27 10:44:46 -04:00
Paolo Abeni	48f1dcb55a	ipv6: enforce egress device match in per table nexthop lookups with the commit `8c14586fc3` ("net: ipv6: Use passed in table for nexthop lookups"), net hop lookup is first performed on route creation in the passed-in table. However device match is not enforced in table lookup, so the found route can be later discarded due to egress device mismatch and no global lookup will be performed. This cause the following to fail: ip link add dummy1 type dummy ip link add dummy2 type dummy ip link set dummy1 up ip link set dummy2 up ip route add 2001:db8:8086::/48 dev dummy1 metric 20 ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20 ip route add 2001:db8:8086::/48 dev dummy2 metric 21 ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21 RTNETLINK answers: No route to host This change fixes the issue enforcing device lookup in ip6_nh_lookup_table() v1->v2: updated commit message title Fixes: `8c14586fc3` ("net: ipv6: Use passed in table for nexthop lookups") Reported-and-tested-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-27 10:37:20 -04:00
David S. Miller	5db15872c5	Merge tag 'linux-can-next-for-4.8-20160623' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2016-06-17 this is a pull request of 4 patches for net-next/master. Arnd Bergmann's patch fixes a regresseion in af_can introduced in linux-can-next-for-4.8-20160617. There are two patches by Ramesh Shanmugasundaram, which add CAN-2.0 support to the rcar_canfd driver. And a patch by Ed Spiridonov that adds better error diagnoses messages to the Ed Spiridonov driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-27 10:33:42 -04:00
Amitoj Kaur Chawla	810bf11033	tipc: Use kmemdup instead of kmalloc and memcpy Replace calls to kmalloc followed by a memcpy with a direct call to kmemdup. The Coccinelle semantic patch used to make this change is as follows: @@ expression from,to,size,flag; statement S; @@ - to = $kmalloc\\|kzalloc$(size,flag); + to = kmemdup(from,size,flag); if (to==NULL \|\| ...) S - memcpy(to, from, size); Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-27 09:56:58 -04:00
David S. Miller	2b7c4f7a0e	Merge tag 'rxrpc-rewrite-20160622-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Get rid of conn bundle and transport structs Here's the next part of the AF_RXRPC rewrite. The primary purpose of this set is to get rid of the rxrpc_conn_bundle and rxrpc_transport structs. This simplifies things for future development of the connection handling. To this end, the following significant changes are made: (1) The rxrpc_connection struct is given pointers to the local and peer endpoints, inside the rxrpc_conn_parameters struct. Pointers to the transport's copy of these pointers are then redirected to the connection struct. (2) Exclusive connection handling is fixed. Exclusive connections should do just one call and then be retired. They are used in security negotiations and, I believe, the idea is to avoid reuse of negotiated security contexts. The current code is doing a single connection per socket and doing all the calls over that. With this change it gets a new connection for each call made. (3) A new sendmsg() control message marker is added to make individual calls operate over exclusive connections. This should be used in future in preference to the sockopt that marks a socket as "exclusive connection". (4) IDs for client connections initiated by a machine are now allocated from a global pool using the IDR facility and are unique across all client connections, no matter their destination. The IDR facility is then used to look up a connection on the connection ID alone. Other parameters are then verified afterwards. Note that the IDR facility may use a lot of memory if the IDs it holds are widely scattered. Given this, in a future commit, client connections will be retired if they are more than a certain distance from the last ID allocated. The client epoch is advanced by 1 each time the client ID counter wraps. Connections outside the current epoch will also be retired in a future commit. (5) The connection bundle concept is removed and the client connection tree is moved into the local endpoint. The queue for waiting for a call channel is moved to the rxrpc_connection struct as there can only be one connection for any particular key going to any particular peer now. (6) The rxrpc_transport struct is removed and the service connection tree is moved into the peer struct. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-26 16:01:54 -04:00
Eric Dumazet	4d202a0d31	net_sched: generalize bulk dequeue When qdisc bulk dequeue was added in linux-3.18 (commit `5772e9a346` "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"), it was constrained to some specific qdiscs. With some extra care, we can extend this to all qdiscs, so that typical traffic shaping solutions can benefit from small batches (8 packets in this patch). For example, HTB is often used on some multi queue device. And bonding/team are multi queue devices... Idea is to bulk-dequeue packets mapping to the same transmit queue. This brings between 35 and 80 % performance increase in HTB setup under pressure on a bonding setup : 1) NUMA node contention : 610,000 pps -> 1,110,000 pps 2) No node contention : 1,380,000 pps -> 1,930,000 pps Now we should work to add batches on the enqueue() side ;) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-25 12:19:35 -04:00
Eric Dumazet	338ed9b4de	net_sched: sch_htb: export class backlog in dumps We already get child qdisc qlen, we also can get its backlog so that class dumps can report it. Also replace qstats by a single drop counter, but move it in a separate cache line so that drops do not dirty useful cache lines. Tested: $ tc -s cl sh dev eth0 class htb 1:1 root leaf 3: prio 0 rate 1Gbit ceil 1Gbit burst 500000b cburst 500000b Sent 2183346912 bytes 9021815 pkt (dropped 2340774, overlimits 0 requeues 0) rate 1001Mbit 517543pps backlog 120758b 499p requeues 0 lended: 9021770 borrowed: 0 giants: 0 tokens: 9 ctokens: 9 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-25 12:19:35 -04:00
Eric Dumazet	008830bc32	net_sched: fq_codel: cache skb->truesize into skb->cb Now we defer skb drops, it makes sense to keep a copy of skb->truesize in struct codel_skb_cb to avoid one cache line miss per dropped skb in fq_codel_drop(), to reduce latencies a bit further. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-25 12:19:35 -04:00
Eric Dumazet	520ac30f45	net_sched: drop packets after root qdisc lock is released Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-25 12:19:35 -04:00
Jiri Slaby	a2a5f1a2d2	net: ircomm, cleanup TIOCGSERIAL In ircomm_tty_get_serial_info, struct serial_struct is memset to 0 and then some members set to 0 explicitly. Remove the latter as it is obviously superfluous. And remove the retinfo check against NULL. copy_to_user will take care of that. Part of hub6 cleanup series. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Samuel Ortiz <samuel@sortiz.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-06-25 08:56:30 -07:00
Jarno Rajahalme	7d904c7bcd	openvswitch: Only set mark and labels with a commit flag. Only set conntrack mark or labels when the commit flag is specified. This makes sure we can not set them before the connection has been persisted, as in that case the mark and labels would be lost in an event of an userspace upcall. OVS userspace already requires the commit flag to accept setting ct_mark and/or ct_labels. Validate for this in the kernel API. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-25 11:55:51 -04:00
Jarno Rajahalme	1c1779fa54	openvswitch: Set mark and labels before confirming. Set conntrack mark and labels right before committing so that the initial conntrack NEW event has the mark and labels. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-25 11:55:51 -04:00
Arturo Borrero	0071e184a5	netfilter: nf_tables: add support for inverted logic in nft_lookup Introduce a new configuration option for this expression, which allows users to invert the logic of set lookups. In _init() we will now return EINVAL if NFT_LOOKUP_F_INV is in anyway related to a map lookup. The code in the _eval() function has been untangled and updated to sopport the XOR of options, as we should consider 4 cases: * lookup false, invert false -> NFT_BREAK * lookup false, invert true -> return w/o NFT_BREAK * lookup true, invert false -> return w/o NFT_BREAK * lookup true, invert true -> NFT_BREAK Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:29 +02:00
Pablo Neira Ayuso	82bec71d46	netfilter: nf_tables: get rid of NFT_BASECHAIN_DISABLED This flag was introduced to restore rulesets from the new netdev family, but since `5ebe0b0eec` ("netfilter: nf_tables: destroy basechain and rules on netdevice removal") the ruleset is released once the netdev is gone. This also removes nft_register_basechain() and nft_unregister_basechain() since they have no clients anymore after this rework. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:28 +02:00
Florian Westphal	3183ab8997	netfilter: conntrack: allow increasing bucket size via sysctl too No need to restrict this to module parameter. We export a copy of the real hash size -- when user alters the value we allocate the new table, copy entries etc before we update the real size to the requested one. This is also needed because the real size is used by concurrent readers and cannot be changed without synchronizing the conntrack generation seqcnt. We only allow changing this value from the initial net namespace. Tested using http-client-benchmark vs. httpterm with concurrent while true;do echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets done Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:28 +02:00
Pablo Neira Ayuso	8eee54be73	netfilter: nft_hash: support deletion of inactive elements New elements are inactive in the preparation phase, and its NFT_SET_ELEM_BUSY_MASK flag is set on. This busy flag doesn't allow us to delete it from the same transaction, following a sequence like: begin transaction add element X delete element X end transaction This sequence is valid and may be triggered by robots. To resolve this problem, allow deactivating elements that are active in the current generation (ie. those that has been just added in this batch). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:27 +02:00
Pablo Neira Ayuso	4e5001651f	netfilter: nft_rbtree: check for next generation when deactivating elements set->ops->deactivate() is invoked from nft_del_setelem() that happens from the transaction path, so we have to check if the object is active in the next generation, not the current. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:26 +02:00
Pablo Neira Ayuso	37a9cc5255	netfilter: nf_tables: add generation mask to sets Similar to ("netfilter: nf_tables: add generation mask to tables"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:26 +02:00
Pablo Neira Ayuso	664b0f8cd8	netfilter: nf_tables: add generation mask to chains Similar to ("netfilter: nf_tables: add generation mask to tables"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:25 +02:00
Pablo Neira Ayuso	f2a6d76676	netfilter: nf_tables: add generation mask to tables This patch addresses two problems: 1) The netlink dump is inconsistent when interfering with an ongoing transaction update for several reasons: 1.a) We don't honor the internal NFT_TABLE_INACTIVE flag, and we should be skipping these inactive objects in the dump. 1.b) We perform speculative deletion during the preparation phase, that may result in skipping active objects. 1.c) The listing order changes, which generates noise when tracking incremental ruleset update via tools like git or our own testsuite. 2) We don't allow to add and to update the object in the same batch, eg. add table x; add table x { flags dormant\; }. In order to resolve these problems: 1) If the user requests a deletion, the object becomes inactive in the next generation. Then, ignore objects that scheduled to be deleted from the lookup path, as they will be effectively removed in the next generation. 2) From the get/dump path, if the object is not currently active, we skip it. 3) Support 'add X -> update X' sequence from a transaction. After this update, we obtain a consistent list as long as we stay in the same generation. The userspace side can detect interferences through the generation counter so it can restart the dumping. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:24 +02:00
Pablo Neira Ayuso	889f7ee7c6	netfilter: nf_tables: add generic macros to check for generation mask Thus, we can reuse these to check the genmask of any object type, not only rules. This is required now that tables, chain and sets will get a generation mask field too in follow up patches. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:24 +02:00
Vishwanath Pai	7643507fe8	netfilter: xt_NFLOG: nflog-range does not truncate packets li->u.ulog.copy_len is currently ignored by the kernel, we should truncate the packet to either li->u.ulog.copy_len (if set) or copy_range before sending it to userspace. 0 is a valid input for copy_len, so add a new flag to indicate whether this was option was specified by the user or not. Add two flags to indicate whether nflog-size/copy_len was set or not. XT_NFLOG_F_COPY_LEN is for XT_NFLOG and NFLOG_F_COPY_LEN for nfnetlink_log On the userspace side, this was initially represented by the option nflog-range, this will be replaced by --nflog-size now. --nflog-range would still exist but does not do anything. Reported-by: Joe Dollard <jdollard@akamai.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:23 +02:00
Liping Zhang	e1dbbc5907	netfilter: nf_reject_ipv4: don't send tcp RST if the packet is non-TCP In iptables, if the user add a rule to send tcp RST and specify the non-TCP protocol, such as UDP, kernel will reject this request. But in nftables, this validity check only occurs in nft tool, i.e. only in userspace. This means that user can add such a rule like follows via nfnetlink: "nft add rule filter forward ip protocol udp reject with tcp reset" This will generate some confusing tcp RST packets. So we should send tcp RST only when it is TCP packet. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-24 11:03:22 +02:00

... 308 309 310 311 312 ...

57946 Commits