linux

Author	SHA1	Message	Date
Shlomo Pongratz	a1d0cd8ed5	net/udp_offload: Handle static checker complaints Fixed few issues around using __rcu prefix and rcu_assign_pointer, also fixed a warning print to use ntohs(port) and not htons(port). net/ipv4/udp_offload.c:112:9: error: incompatible types in comparison expression (different address spaces) net/ipv4/udp_offload.c:113:9: error: incompatible types in comparison expression (different address spaces) net/ipv4/udp_offload.c:176:19: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-23 12:59:08 -08:00
Christoph Paasch	3ad88cf70a	tcp: metrics: Handle v6/v4-mapped sockets in tcp-metrics A socket may be v6/v4-mapped. In that case sk->sk_family is AF_INET6, but the IP being used is actually an IPv4-address. Current's tcp-metrics will thus represent it as an IPv6-address: root@server:~# ip tcp_metrics ::ffff:10.1.1.2 age 22.920sec rtt 18750us rttvar 15000us cwnd 10 10.1.1.2 age 47.970sec rtt 16250us rttvar 10000us cwnd 10 This patch modifies the tcp-metrics so that they are able to handle the v6/v4-mapped sockets correctly. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-23 12:48:28 -08:00
Linus Torvalds	5ee7a81a9f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "This contains a fix for a potential use-after-module-unload bug noticed by Al and caching improvements for read-only fuse filesystems by Andrew Gallagher" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: support clients that don't implement 'open' fuse: don't invalidate attrs when not using atime fuse: fix SetPageUptodate() condition in STORE fuse: fix pipe_buf_operations	2014-01-23 09:22:58 -08:00
Yann Droneaud	5ae5e991ee	6lowpan: add a license to 6lowpan_iphc module Since commit `8df8c56a5a`, 6lowpan_iphc is a module of its own. Unfortunately, it lacks some infrastructure to behave like a good kernel citizen: kernel: 6lowpan_iphc: module license 'unspecified' taints kernel. kernel: Disabling lock debugging due to kernel taint This patch adds the basic MODULE_LICENSE(); with GPL license: the code was copied from net/ieee802154/6lowpan.c which is GPL and the module exports symbol with EXPORT_SYMBOL_GPL();. Cc: Jukka Rissanen <jukka.rissanen@linux.intel.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:17 -08:00
Jiri Pirko	3bad540ed8	bonding: convert netlink to use slave data info api Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:17 -08:00
Jiri Pirko	ba7d49b1f0	rtnetlink: provide api for getting and setting slave info Recent patch bonding: add netlink attributes to slave link dev (`1d3ee88ae0`) Introduced yet another device specific way to access slave information over rtnetlink. There is one already there for bridge. This patch introduces generic way to do this, for getting and setting info as well by extending link_ops. Later on, this new interface will be used for bridge ports as well. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
Jiri Pirko	df7dbcbbaf	rtnetlink: put "BOND" into nl attribute names which are related to bonding Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
viresh kumar	f618002b0b	net/neighbour: queue work on power efficient wq Workqueue used in neighbour layer have no real dependency of scheduling these on the cpu which scheduled them. On a idle system, it is observed that an idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one. This patch replaces normal workqueues with power efficient versions. This doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is enabled. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
viresh kumar	906e073f3e	net/ipv4: queue work on power efficient wq Workqueue used in ipv4 layer have no real dependency of scheduling these on the cpu which scheduled them. On a idle system, it is observed that an idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one. This patch replaces normal workqueues with power efficient versions. This doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is enabled. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
FX Le Bail	7c90cc2d40	ipv6: enable anycast addresses as source addresses for datagrams This change allows to consider an anycast address valid as source address when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item. So, when sending a datagram with ancillary data, the unicast and anycast addresses are handled in the same way. - Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local on given interface or is global. - Uses it in ip6_datagram_send_ctl(). Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
Toshiaki Makita	bdf4351bbc	bridge: Remove unnecessary vlan_put_tag in br_handle_vlan br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing port is the bridge device. This is unnecessary because __netif_receive_skb_core() can handle skbs with HW accelerated vlan tag. In current implementation, __netif_receive_skb_core() needs to extract the vlan tag embedded in skb data. This could cause low network performance especially when receiving frames at a high frame rate on the bridge device. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:29:27 -08:00
Christoph Paasch	00ca9c5b2b	tcp: metrics: Fix rcu-race when deleting multiple entries In `bbf852b96e` I introduced the tmlist, which allows to delete multiple entries from the cache that match a specified destination if no source-IP is specified. However, as the cache is an RCU-list, we should not create this tmlist, as it will change the tcpm_next pointer of the element that will be deleted and so a thread iterating over the cache's entries while holding the RCU-lock might get "redirected" to this tmlist. This patch fixes this, by reverting back to the old behavior prior to `bbf852b96e`, which means that we simply change the tcpm_next pointer of the previous element (pp) to jump over the one we are deleting. The difference is that we call kfree_rcu() directly on the cache entry, which allows us to delete multiple entries from the list. Fixes: `bbf852b96e` (tcp: metrics: Delete all entries matching a certain destination) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:26:16 -08:00
Linus Torvalds	bb1281f2aa	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual rocket science stuff from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) neighbour.h: fix comment sched: Fix warning on make htmldocs caused by wait.h slab: struct kmem_cache is protected by slab_mutex doc: Fix typo in USB Gadget Documentation of/Kconfig: Spelling s/one/once/ mkregtable: Fix sscanf handling lp5523, lp8501: comment improvements thermal: rcar: comment spelling treewide: fix comments and printk msgs IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart() Documentation: update /proc/uptime field description Documentation: Fix size parameter for snprintf arm: fix comment header and macro name asm-generic: uaccess: Spelling s/a ny/any/ mtd: onenand: fix comment header doc: driver-model/platform.txt: fix a typo drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text doc: Fix typo (acces_process_vm -> access_process_vm) treewide: Fix typos in printk drivers/gpu/drm/qxl/Kconfig: reformat the help text ...	2014-01-22 21:21:55 -08:00
Harry Mason	29824310ce	sch_htb: let skb->priority refer to non-leaf class If the class in skb->priority is not a leaf, apply filters from the selected class, not the qdisc. This lets netfilter or user space partially classify the packet. Signed-off-by: Harry Mason <harry.mason@smoothwall.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 17:39:48 -08:00
Neil Horman	2d36097d26	af_packet: Add Queue mapping mode to af_packet fanout operation This patch adds a queue mapping mode to the fanout operation of af_packet sockets. This allows user space af_packet users to better filter on flows ingressing and egressing via a specific hardware queue, and avoids the potential packet reordering that can occur when FANOUT_CPU is being used and irq affinity varies. Tested successfully by myself. applies to net-next Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 17:35:50 -08:00
Miklos Szeredi	28a625cbc2	fuse: fix pipe_buf_operations Having this struct in module memory could Oops when if the module is unloaded while the buffer still persists in a pipe. Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal merge them into nosteal_pipe_buf_ops (this is the same as default_pipe_buf_ops except stealing the page from the buffer is not allowed). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2014-01-22 19:36:57 +01:00
Hannes Frederic Sowa	809fa972fd	reciprocal_divide: update/correction of the algorithm Jakub Zawadzki noticed that some divisions by reciprocal_divide() were not correct [1][2], which he could also show with BPF code after divisions are transformed into reciprocal_value() for runtime invariance which can be passed to reciprocal_divide() later on; reverse in BPF dump ended up with a different, off-by-one K in some situations. This has been fixed by Eric Dumazet in commit `aee636c480` ("bpf: do not use reciprocal divide"). This follow-up patch improves reciprocal_value() and reciprocal_divide() to work in all cases by using Granlund and Montgomery method, so that also future use is safe and without any non-obvious side-effects. Known problems with the old implementation were that division by 1 always returned 0 and some off-by-ones when the dividend and divisor where very large. This seemed to not be problematic with its current users, as far as we can tell. Eric Dumazet checked for the slab usage, we cannot surely say so in the case of flex_array. Still, in order to fix that, we propose an extension from the original implementation from commit `6a2d7a955d` resp. [3][4], by using the algorithm proposed in "Division by Invariant Integers Using Multiplication" [5], Torbjörn Granlund and Peter L. Montgomery, that is, pseudocode for q = n/d where q, n, d is in u32 universe: 1) Initialization: int l = ceil(log_2 d) uword m' = floor((1<<32)((1<<l)-d)/d)+1 int sh_1 = min(l,1) int sh_2 = max(l-1,0) 2) For q = n/d, all uword: uword t = (nm')>>32 q = (t+((n-t)>>sh_1))>>sh_2 The assembler implementation from Agner Fog [6] also helped a lot while implementing. We have tested the implementation on x86_64, ppc64, i686, s390x; on x86_64/haswell we're still half the latency compared to normal divide. Joint work with Daniel Borkmann. [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c [3] https://gmplib.org/~tege/division-paper.pdf [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556 [6] http://www.agner.org/optimize/asmlib.zip Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: Jesse Gross <jesse@nicira.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Daniel Borkmann	89770b0a69	net: introduce reciprocal_scale helper and convert users As David Laight suggests, we shouldn't necessarily call this reciprocal_divide() when users didn't requested a reciprocal_value(); lets keep the basic idea and call it reciprocal_scale(). More background information on this topic can be found in [1]. Joint work with Hannes Frederic Sowa. [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html Suggested-by: David Laight <david.laight@aculab.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Daniel Borkmann	f337db64af	random32: add prandom_u32_max and convert open coded users Many functions have open coded a function that returns a random number in range [0,N-1]. Under the assumption that we have a PRNG such as taus113 with being well distributed in [0, ~0U] space, we can implement such a function as uword t = (n*m')>>32, where m' is a random number obtained from PRNG, n the right open interval border and t our resulting random number, with n,m',t in u32 universe. Lets go with Joe and simply call it prandom_u32_max(), although technically we have an right open interval endpoint, but that we have documented. Other users can further be migrated to the new prandom_u32_max() function later on; for now, we need to make sure to migrate reciprocal_divide() users for the reciprocal_divide() follow-up fixup since their function signatures are going to change. Joint work with Hannes Frederic Sowa. Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
David S. Miller	14e481445d	net: Missing change from the ether_addr_copy() fixups. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 22:54:01 -08:00
David S. Miller	9be68c1ae0	net: Fix some fallout from the etner_addr_copy() changes. net/appletalk/aarp.c: In function ‘__aarp_send_query’: net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration] ... net/atm/lec.c: In function ‘send_to_lecd’: net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default] In file included from net/atm/lec.c:17:0: include/linux/etherdevice.h:227:20: note: expected ‘u8 ’ but argument is of type ‘unsigned char ()[6]’ ... net/caif/caif_usb.c: In function ‘cfusbl_create’: net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration] Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:57:26 -08:00
wangweidong	5bc1d1b4a2	sctp: remove macros sctp_bh_[un]lock_sock Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:41:36 -08:00
wangweidong	048ed4b626	sctp: remove macros sctp_{lock\|release}_sock Redefined {lock\|release}_sock to sctp_{lock\|release}_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:41:36 -08:00
wangweidong	387602dfdc	sctp: remove macros sctp_write_[un]_lock Redefined write_[un]lock to sctp_write_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	3c8e43ba9f	sctp: remove macros sctp_spin_[un]lock Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	79b91130a2	sctp: remove macros sctp_local_bh_{disable\|enable} Redefined local_bh_{disable\|enable} to sctp_local_bh_{disable\|enable} for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:40 -08:00
Joe Perches	d08f161a10	dsa: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:05 -08:00
Joe Perches	9ea08b1299	pktgen: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:05 -08:00
Joe Perches	c62326abac	netpoll: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:05 -08:00
Joe Perches	34b2cff4ee	caif_usb: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
Joe Perches	116e853f7f	atm: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
Joe Perches	90ccb6aa40	appletalk: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Convert struct aarp_entry.hwaddr[6] to hwaddr[ETH_ALEN]. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
Joe Perches	07fc67befd	8021q: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
Or Gerlitz	e27a2f8395	net: Export gro_find_by_type helpers Export the gro_find_receive/complete_by_type helpers to they can be invoked by the gro callbacks of encapsulation protocols such as vxlan. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:04 -08:00
Or Gerlitz	b582ef0990	net: Add GRO support for UDP encapsulating protocols Add GRO handlers for protocols that do UDP encapsulation, with the intent of being able to coalesce packets which encapsulate packets belonging to the same TCP session. For GRO purposes, the destination UDP port takes the role of the ether type field in the ethernet header or the next protocol in the IP header. The UDP GRO handler will only attempt to coalesce packets whose destination port is registered to have gro handler. Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive code twice on a packet. This solves the problem of udp encapsulated packets whose inner VM packet is udp and happen to carry a port which has registered offloads. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:04 -08:00
Linus Torvalds	f075e0f699	Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "The bulk of changes are cleanups and preparations for the upcoming kernfs conversion. - cgroup_event mechanism which is and will be used only by memcg is moved to memcg. - pidlist handling is updated so that it can be served by seq_file. Also, the list is not sorted if sane_behavior. cgroup documentation explicitly states that the file is not sorted but it has been for quite some time. - All cgroup file handling now happens on top of seq_file. This is to prepare for kernfs conversion. In addition, all operations are restructured so that they map 1-1 to kernfs operations. - Other cleanups and low-pri fixes" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits) cgroup: trivial style updates cgroup: remove stray references to css_id doc: cgroups: Fix typo in doc/cgroups cgroup: fix fail path in cgroup_load_subsys() cgroup: fix missing unlock on error in cgroup_load_subsys() cgroup: remove for_each_root_subsys() cgroup: implement for_each_css() cgroup: factor out cgroup_subsys_state creation into create_css() cgroup: combine css handling loops in cgroup_create() cgroup: reorder operations in cgroup_create() cgroup: make for_each_subsys() useable under cgroup_root_mutex cgroup: css iterations and css_from_dir() are safe under cgroup_mutex cgroup: unify pidlist and other file handling cgroup: replace cftype->read_seq_string() with cftype->seq_show() cgroup: attach cgroup_open_file to all cgroup files cgroup: generalize cgroup_pidlist_open_file cgroup: unify read path so that seq_file is always used cgroup: unify cgroup_write_X64() and cgroup_write_string() cgroup: remove cftype->read(), ->read_map() and ->write() hugetlb_cgroup: convert away from cftype->read() ...	2014-01-21 17:51:34 -08:00
Dan Carpenter	08d4d217df	rxrpc: out of bound read in debug code Smatch complains because we are using an untrusted index into the rxrpc_acks[] array. It's just a read and it's only in the debug code, but it's simple enough to add a check and fix it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 17:02:52 -08:00
Yegor Yefremov	2fa053a0a2	8021q: update description Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add some beautifications like replacing 'ethernet' with 'Ethernet' and removing unneeded spaces. Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 17:01:25 -08:00
Hannes Frederic Sowa	82b276cd2b	ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow connecting and binding to them. sendmsg logic does already check msg->name for this but must trust already connected sockets which could be set up for connection to ipv4 address family. Per-socket flag ipv6only is of no use here, as it is under users control by setsockopt. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:59:19 -08:00
FX Le Bail	446fab5933	ipv6: enable anycast addresses as source addresses in ICMPv6 error messages - Uses ipv6_anycast_destination() in icmp6_send(). Suggested-by: Bill Fink <billfink@mindspring.com> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:53:23 -08:00
Peter Pan(潘卫平)	4d83e17730	tcp: delete redundant calls of tcp_mtup_init() As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen sock, we can delete the redundant calls of tcp_mtup_init() in tcp_{v4,v6}_syn_recv_sock(). Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:52:31 -08:00
Daniel Borkmann	f0d4eb29d1	packet: fix a couple of cppcheck warnings Doesn't bring much, but also doesn't hurt us to fix 'em: 1) In tpacket_rcv() flush dcache page we can restirct the scope for start and end and remove one layer of indent. 2) In tpacket_destruct_skb() we can restirct the scope for ph. 3) In alloc_one_pg_vec_page() we can remove the NULL assignment and change spacing a bit. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:51:42 -08:00
Duan Jiong	967680e027	ipv4: remove the useless argument from ip_tunnel_hash() Since commit c544193214("GRE: Refactor GRE tunneling code") introduced function ip_tunnel_hash(), the argument itn is no longer in use, so remove it. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:48:18 -08:00
Sabrina Dubroca	f14fe8a848	net: remove unnecessary initializations in net_dev_init softnet_data is already set to 0, no need to use memset or initialize specific fields to 0 or NULL afterwards. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:44:45 -08:00
WANG Cong	6e6a50c254	net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup() So that we will not expose struct tcf_common to modules. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 14:43:16 -08:00
WANG Cong	c779f7af99	net_sched: act: fetch hinfo from a->ops->hinfo Every action ops has a pointer to hash info, so we don't need to hard-code it in each module. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 14:43:16 -08:00
Linus Torvalds	8e50966072	GPIO tree bulk changes for v3.14 A big set this merge window, as we have much going on in this subsystem. Major changes this time: - Some core improvements and cleanups to the new GPIO descriptor API. This seems to be working now so we can start the exodus to this API, moving gradually away from the global GPIO numberspace. - Incremental improvements to the ACPI GPIO core, and move the few GPIO ACPI clients we have to the GPIO descriptor API right now before we go any further. We actually managed to contain this before we started to litter the kernel with yet another hackish global numberspace for the ACPI GPIOs, which is a big win. - The RFkill GPIO driver and all platforms using it have been migrated to use the GPIO descriptors rather than fixed number assignments. Tegra machine has been migrated as part of this. - New drivers for MOXA ART, Xtensa GPIO32 and SMSC SCH311x. Those should be really good examples of how I expect a nice GPIO driver to look these days. - Do away with custom GPIO implementations on a major part of the ARM machines: ks8695, lpc32xx, mv78xx0. Make a first step towards the same in the horribly convoluted Samsung S3C include forest. We expect to continue to clean this up as we move forward. - Flag GPIO lines used for IRQ on adnp, bcm-kona, em, intel-mid and lynxpoint. This makes the GPIOlib core aware that a certain GPIO line is used for IRQs and can then enforce some semantics such as disallowing a GPIO line marked as in use for IRQ to be switched to output mode. - Drop all use of irq_set_chip_and_handler_name(). The name provided in these cases were just unhelpful tags like "mux" or "demux". - Extend the MCP23s08 driver to handle interrupts. - Minor incremental improvements for rcar, lynxpoint, em 74x164 and msm drivers. - Some non-urgent bug fixes here and there, duplicate #includes and that usual kind of cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJS3i/MAAoJEEEQszewGV1zVB8P/Rjzgx8To0gQPn49M4u/A1Mk mAzpUoKa05ILTKBm/bpWPYZPpg9PDqUxOYPsIDEAkc70BKMPTXxrYiE+LSfIzwaJ a8IRwOzNL7Iwc+zPNS/GrmRJyxymb4lmMD/fypk/YaumZ6j4Hbo+9R8Zct9gbZ5Q ZbKtz6kLhbkbNCc71bVMgk6yacSBx1ak8Xpd12HlW85NgOCoBj7/DI1Lb61x1ImY NYpSpmtfGGTkQLtBl5dTLefZOvL1dKSct9TMOsA2jzNqf3zA1YA6XOxPGHK/qtjq 3s9cN1sIVF/g7sm1+qoKXe0OTQrXHT7SX8BH9/tb3MiKO8ItactlQUJlYNR3WFSN zm1PNe5zWr+GWzV0iUrqoMN4XX8nThiFDOxZpOwBTZcUD6qtDFIZp41M3qxwFTbJ hCtSQ8gUO1Ce+xtOQYYOwEkRS7FZa1Z+p/lendTFuGDh6DcXy97SrKkTktM4Q98B LhqrwUzCdES0ecNDi2+P5y4Fc7M0cMMn9SnFvbSBObLB89TF9uzMIn8jUBCZMvrM eAeZlRBYk8F+6F12higaWqZyiBKIEubXo/Z8T0L2KEDm/z/ddJvhQgBKvWlf3rqi RToD446rda+RhFBnxLZ3mTui5nZ2WyKTOqhVqeBuriJhE/cTUaQHUBUrbOwx20kE Xb9mQ2n3GRk2157n1CLY =lW2i -----END PGP SIGNATURE----- Merge tag 'gpio-v3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO tree bulk changes from Linus Walleij: "A big set this merge window, as we have much going on in this subsystem. The changes to other subsystems (notably a slew of ARM machines as I am doing away with their custom APIs) have all been ACKed to the extent possible. Major changes this time: - Some core improvements and cleanups to the new GPIO descriptor API. This seems to be working now so we can start the exodus to this API, moving gradually away from the global GPIO numberspace. - Incremental improvements to the ACPI GPIO core, and move the few GPIO ACPI clients we have to the GPIO descriptor API right now before we go any further. We actually managed to contain this before we started to litter the kernel with yet another hackish global numberspace for the ACPI GPIOs, which is a big win. - The RFkill GPIO driver and all platforms using it have been migrated to use the GPIO descriptors rather than fixed number assignments. Tegra machine has been migrated as part of this. - New drivers for MOXA ART, Xtensa GPIO32 and SMSC SCH311x. Those should be really good examples of how I expect a nice GPIO driver to look these days. - Do away with custom GPIO implementations on a major part of the ARM machines: ks8695, lpc32xx, mv78xx0. Make a first step towards the same in the horribly convoluted Samsung S3C include forest. We expect to continue to clean this up as we move forward. - Flag GPIO lines used for IRQ on adnp, bcm-kona, em, intel-mid and lynxpoint. This makes the GPIOlib core aware that a certain GPIO line is used for IRQs and can then enforce some semantics such as disallowing a GPIO line marked as in use for IRQ to be switched to output mode. - Drop all use of irq_set_chip_and_handler_name(). The name provided in these cases were just unhelpful tags like "mux" or "demux". - Extend the MCP23s08 driver to handle interrupts. - Minor incremental improvements for rcar, lynxpoint, em 74x164 and msm drivers. - Some non-urgent bug fixes here and there, duplicate #includes and that usual kind of cleanups" Fix up broken Kconfig file manually to make this all compile. * tag 'gpio-v3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (71 commits) gpio: mcp23s08: fix casting caused build warning gpio: mcp23s08: depend on OF_GPIO gpio: mcp23s08: Add irq functionality for i2c chips ARM: S5P[v210\|c100\|64x0]: Fix build error gpio: pxa: clamp gpio get value to [0,1] ARM: s3c24xx: explicit dependency on <plat/gpio-cfg.h> ARM: S3C[24\|64]xx: move includes back under <mach/> scope Documentation / ACPI: update to GPIO descriptor API gpio / ACPI: get rid of acpi_gpio.h gpio / ACPI: register to ACPI events automatically mmc: sdhci-acpi: convert to use GPIO descriptor API ARM: s3c24xx: fix build error gpio: f7188x: set can_sleep attribute gpio: samsung: Update documentation gpio: samsung: Remove hardware.h inclusion gpio: xtensa: depend on HAVE_XTENSA_GPIO32 gpio: clps711x: Enable driver compilation with COMPILE_TEST gpio: clps711x: Use of_match_ptr() net: rfkill: gpio: convert to descriptor-based GPIO interface leds: s3c24xx: Fix build failure ...	2014-01-21 10:09:12 -08:00
Linus Torvalds	a0fa1dd3cd	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: - Add the initial implementation of SCHED_DEADLINE support: a real-time scheduling policy where tasks that meet their deadlines and periodically execute their instances in less than their runtime quota see real-time scheduling and won't miss any of their deadlines. Tasks that go over their quota get delayed (Available to privileged users for now) - Clean up and fix preempt_enable_no_resched() abuse all around the tree - Do sched_clock() performance optimizations on x86 and elsewhere - Fix and improve auto-NUMA balancing - Fix and clean up the idle loop - Apply various cleanups and fixes * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) sched: Fix __sched_setscheduler() nice test sched: Move SCHED_RESET_ON_FORK into attr::sched_flags sched: Fix up attr::sched_priority warning sched: Fix up scheduler syscall LTP fails sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls sched/core: Fix htmldocs warnings sched/deadline: No need to check p if dl_se is valid sched/deadline: Remove unused variables sched/deadline: Fix sparse static warnings m68k: Fix build warning in mac_via.h sched, thermal: Clean up preempt_enable_no_resched() abuse sched, net: Fixup busy_loop_us_clock() sched, net: Clean up preempt_enable_no_resched() abuse sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding sched/preempt, locking: Rework local_bh_{dis,en}able() sched/clock, x86: Avoid a runtime condition in native_sched_clock() sched/clock: Fix up clear_sched_clock_stable() sched/clock, x86: Use a static_key for sched_clock_stable sched/clock: Remove local_irq_disable() from the clocks sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs ...	2014-01-20 10:42:08 -08:00
Weilong Chen	82ef3d5d5f	net: fix "queues" uevent between network namespaces When I create a new namespace with 'ip netns add net0', or add/remove new links in a namespace with 'ip link add/delete type veth', rx/tx queues events can be got in all namespaces. That is because rx/tx queue ktypes do not have namespace support, and their kobj parents are setted to NULL. This patch is to fix it. Reported-by: Libo Chen <chenlibo@huawei.com> Signed-off-by: Libo Chen <chenlibo@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 20:02:02 -08:00
WANG Cong	671314a5ab	net_sched: act: remove capab from struct tc_action_ops It is not actually implemented. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 19:58:07 -08:00
Jason Wang	9d08dd3d32	net: document accel_priv parameter for __dev_queue_xmit() To silent "make htmldocs" warning. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 19:55:51 -08:00
Hannes Frederic Sowa	602582ca7a	ipv6: optimize link local address search ipv6_link_dev_addr sorts newly added addresses by scope in ifp->addr_list. Smaller scope addresses are added to the tail of the list. Use this fact to iterate in reverse over addr_list and break out as soon as a higher scoped one showes up, so we can spare some cycles on machines with lot's of addresses. The ordering of the addresses is not relevant and we are more likely to get the eui64 generated address with this change anyway. Suggested-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 19:55:50 -08:00
Hannes Frederic Sowa	4b261c75a9	ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data for IPv4 datagrams on IPv6 sockets. This patch splits the ip6_datagram_recv_ctl into two functions, one which handles both protocol families, AF_INET and AF_INET6, while the ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data. ip6_datagram_recv_*_ctl never reported back any errors, so we can make them return void. Also provide a helper for protocols which don't offer dual personality to further use ip6_datagram_recv_ctl, which is exported to modules. I needed to shuffle the code for ping around a bit to make it easier to implement dual personality for ping ipv6 sockets in future. Reported-by: Gert Doering <gert@space.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 19:53:18 -08:00
Yang Yingliang	a6e2fe17eb	sch_netem: replace magic numbers with enumerate Replace some magic numbers which describe states of 4-state model loss generator with enumerate. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 17:17:34 -08:00
Florent Fourcot	6444f72b4b	ipv6: add flowlabel_consistency sysctl With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of flow label unicity. This patch introduces a new sysctl to protect the old behaviour, enable by default. Changelog of V3: * rename ip6_flowlabel_consistency to flowlabel_consistency * use net_info_ratelimited() * checkpatch cleanups Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 17:12:31 -08:00
Florent Fourcot	46e5f40176	ipv6: add a flag to get the flow label used remotly This information is already available via IPV6_FLOWINFO of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label information. But it is probably logical and easier for users to add this here, and to control both sent/received flow label values with the IPV6_FLOWLABEL_MGR option. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 17:12:31 -08:00
Florent Fourcot	df3687ffc6	ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET With this option, the socket will reply with the flow label value read on received packets. The goal is to have a connection with the same flow label in both direction of the communication. Changelog of V4: * Do not erase the flow label on the listening socket. Use pktopts to store the received value Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-19 17:12:31 -08:00
Eric Dumazet	3acfa1e73c	ipv4: be friend with drop monitor Replace some dev_kfree_skb() with kfree_skb() calls when we drop one skb, this might help bug tracking. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-18 23:08:02 -08:00
Steffen Hurrle	342dfc306f	net: add build-time checks for msg->msg_name size This is a follow-up patch to `f3d3342602` ("net: rework recvmsg handler msg_name and msg_namelen logic"). DECLARE_SOCKADDR validates that the structure we use for writing the name information to is not larger than the buffer which is reserved for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR consistently in sendmsg code paths. Signed-off-by: Steffen Hurrle <steffen@hurrle.net> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-18 23:04:16 -08:00
Michal Sekletar	ea02f9411d	net: introduce SO_BPF_EXTENSIONS For user space packet capturing libraries such as libpcap, there's currently only one way to check which BPF extensions are supported by the kernel, that is, commit `aa1113d9f8` ("net: filter: return -EINVAL if BPF_S_ANC* operation is not supported"). For querying all extensions at once this might be rather inconvenient. Therefore, this patch introduces a new option which can be used as an argument for getsockopt(), and allows one to obtain information about which BPF extensions are supported by the current kernel. As David Miller suggests, we do not need to define any bits right now and status quo can just return 0 in order to state that this versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later additions to BPF extensions need to add their bits to the bpf_tell_extensions() function, as documented in the comment. Signed-off-by: Michal Sekletar <msekleta@redhat.com> Cc: David Miller <davem@davemloft.net> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-18 19:08:58 -08:00
David S. Miller	4180442058	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c net/ipv4/tcp_metrics.c Overlapping changes between the "don't create two tcp metrics objects with the same key" race fix in net and the addition of the destination address in the lookup key in net-next. Minor overlapping changes in bnx2x driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-18 00:55:41 -08:00
Stephen Warren	c91514972b	Bluetooth: remove direct compilation of 6lowpan_iphc.c It's now built as a separate utility module, and enabling BT selects that module in Kconfig. This fixes: net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data' net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress' net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here net/ieee802154/built-in.o: In function `lowpan_header_compress': net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress' net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here net/ieee802154/built-in.o: In function `lowpan_process_data': net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data' net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here make[1]: *** [net/built-in.o] Error 1 (this change probably simply wasn't "git add"d to `a53d34c346`) Fixes: `a53d34c346` ("net: move 6lowpan compression code to separate module") Fixes: `18722c2470` ("Bluetooth: Enable 6LoWPAN support for BT LE devices") Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 19:13:49 -08:00
sfeldma@cumulusnetworks.com	1d3ee88ae0	bonding: add netlink attributes to slave link dev If link is IFF_SLAVE, extend link dev netlink attributes to include slave attributes with new IFLA_SLAVE nest. Add netlink notification (RTM_NEWLINK) when slave status changes from backup to active, or visa-versa. Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE attributes. Currently only used by bonding driver, but could be used by other aggregating devices with slaves. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 18:51:58 -08:00
Eric Dumazet	6c7e7610ff	ipv4: fix a dst leak in tunnels This patch : 1) Remove a dst leak if DST_NOCACHE was set on dst Fix this by holding a reference only if dst really cached. 2) Remove a lockdep warning in __tunnel_dst_set() This was reported by Cong Wang. 3) Remove usage of a spinlock where xchg() is enough 4) Remove some spurious inline keywords. Let compiler decide for us. Fixes: `7d442fab0a` ("ipv4: Cache dst in tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 18:36:39 -08:00
Flavio Leitner	6a7cc41872	ipv6: send Change Status Report after DAD is completed The RFC 3810 defines two type of messages for multicast listeners. The "Current State Report" message, as the name implies, refreshes the current state to the querier. Since the querier sends Query messages periodically, there is no need to retransmit the report. On the other hand, any change should be reported immediately using "State Change Report" messages. Since it's an event triggered by a change and that it can be affected by packet loss, the rfc states it should be retransmitted [RobVar] times to make sure routers will receive timely. Currently, we are sending "Current State Reports" after DAD is completed. Before that, we send messages using unspecified address (::) which should be silently discarded by routers. This patch changes to send "State Change Report" messages after DAD is completed fixing the behavior to be RFC compliant and also to pass TAHI IPv6 testsuite. Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 18:12:29 -08:00
Hannes Frederic Sowa	11ffff752c	ipv6: simplify detection of first operational link-local address on interface In commit `1ec047eb47` ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") I build the detection of the first operational link-local address much to complex. Additionally this code now has a race condition. Replace it with a much simpler variant, which just scans the address list when duplicate address detection completes, to check if this is the first valid link local address and send RS and MLD reports then. Fixes: `1ec047eb47` ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") Reported-by: Jiri Pirko <jiri@resnulli.us> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 18:10:01 -08:00
Christoph Paasch	77f99ad16a	tcp: metrics: Avoid duplicate entries with the same destination-IP Because the tcp-metrics is an RCU-list, it may be that two soft-interrupts are inside __tcp_get_metrics() for the same destination-IP at the same time. If this destination-IP is not yet part of the tcp-metrics, both soft-interrupts will end up in tcpm_new and create a new entry for this IP. So, we will have two tcp-metrics with the same destination-IP in the list. This patch checks twice __tcp_get_metrics(). First without holding the lock, then while holding the lock. The second one is there to confirm that the entry has not been added by another soft-irq while waiting for the spin-lock. Fixes: `51c5d0c4b1` (tcp: Maintain dynamic metrics in local cache.) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 18:05:34 -08:00
Florent Fourcot	1d13a96c74	ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT This patch is following the commit `b903d324be` (ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAIT). For the same reason than tclass, we have to store the flow label in the inet_timewait_sock to provide consistency of flow label on the last ACK. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 17:56:33 -08:00
Gerald Schaefer	c196403b79	net: rds: fix per-cpu helper usage commit `ae4b46e9d` "net: rds: use this_cpu_* per-cpu helper" broke per-cpu handling for rds. chpfirst is the result of __this_cpu_read(), so it is an absolute pointer and not __percpu. Therefore, __this_cpu_write() should not operate on chpfirst, but rather on cache->percpu->first, just like __this_cpu_read() did before. Cc: <stable@vger.kernel.org> # 3.8+ Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 17:52:22 -08:00
John W. Linville	7916a07557	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2014-01-17 14:43:17 -05:00
Michael Dalton	a953be53ce	net-sysfs: add support for device-specific rx queue sysfs attributes Extend existing support for netdevice receive queue sysfs attributes to permit a device-specific attribute group. Initial use case for this support will be to allow the virtio-net device to export per-receive queue mergeable receive buffer size. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 23:46:06 -08:00
Michael Dalton	097b4f19e5	net: allow > 0 order atomic page alloc in skb_page_frag_refill skb_page_frag_refill currently permits only order-0 page allocs unless GFP_WAIT is used. Change skb_page_frag_refill to attempt higher-order page allocations whether or not GFP_WAIT is used. If memory cannot be allocated, the allocator will fall back to successively smaller page allocs (down to order-0 page allocs). This change brings skb_page_frag_refill in line with the existing page allocation strategy employed by netdev_alloc_frag, which attempts higher-order page allocations whether or not GFP_WAIT is set, falling back to successively lower-order page allocations on failure. Part of migration of virtio-net to per-receive queue page frag allocators. Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 23:46:06 -08:00
Wei Yongjun	722e47d792	net_sched: fix error return code in fw_change_attrs() The error code was not set if change indev fail, so the error condition wasn't reflected in the return value. Fix to return a negative error code from this error handling case instead of 0. Fixes: `2519a602c2` ('net_sched: optimize tcf_match_indev()') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:12:03 -08:00
Ying Xue	9bbb4ecc68	tipc: standardize recvmsg routine Standardize the behaviour of waiting for events in TIPC recvmsg() so that all variables of socket or port structures are protected within socket lock, allowing the process of calling recvmsg() to be woken up at appropriate time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	391a6dd1da	tipc: standardize sendmsg routine of connected socket Standardize the behaviour of waiting for events in TIPC send_packet() so that all variables of socket or port structures are protected within socket lock, allowing the process of calling sendmsg() to be woken up at appropriate time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	3f40504f7e	tipc: standardize sendmsg routine of connectionless socket Comparing the behaviour of how to wait for events in TIPC sendmsg() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. For instance, sk_sleep() and tport->congested variables associated with socket are exposed without socket lock protection while wait_event_interruptible_timeout() accesses them. So standardizing it with similar implementation in other stacks can help us correct these errors which the process of calling sendmsg() cannot be woken up event if an expected event arrive at socket or improperly woken up although the wake condition doesn't match. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	6398e23cdb	tipc: standardize accept routine Comparing the behaviour of how to wait for events in TIPC accept() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. As sk_sleep() and sk->sk_receive_queue variables associated with socket are not protected by socket lock, the process of calling accept() may be woken up improperly or sometimes cannot be woken up at all. After standardizing it with inet_csk_wait_for_connect routine, we can get benefits including: avoiding 'thundering herd' phenomenon, adding a timeout mechanism for accept(), coping with a pending signal, and having sk_sleep() and sk->sk_receive_queue being always protected within socket lock scope and so on. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	78eb3a5379	tipc: standardize connect routine Comparing the behaviour of how to wait for events in TIPC connect() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. For instance, as both sock->state and sk_sleep() are directly fed to wait_event_interruptible_timeout() as its arguments, and socket lock has to be released before we call wait_event_interruptible_timeout(), the two variables associated with socket are exposed out of socket lock protection, thereby probably getting stale values so that the process of calling connect() cannot be woken up exactly even if correct event arrives or it is woken up improperly even if the wake condition is not satisfied in practice. Therefore, standardizing its behaviour with sk_stream_wait_connect routine can avoid these risks. Additionally the implementation of connect routine is simplified as a whole, allowing it to return correct values in all different cases. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
wangweidong	abfce3ef58	sctp: remove the unnecessary assignment When go the right path, the status is 0, no need to assign it again. So just remove the assignment. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 17:31:49 -08:00
WANG Cong	6c80563c2f	net_sched: act: pick a different type for act_xt In tcf_register_action() we check either ->type or ->kind to see if there is an existing action registered, but ipt action registers two actions with same type but different kinds. They should have different types too. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 17:24:11 -08:00
David S. Miller	7dff08bbda	Included change: - properly format already existing kerneldoc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJS1xdYAAoJEEKTMo6mOh1VNv0P/R73udErvBaHTUIAVSJ3pH8/ rQ673H+/ZDi6zJQNlNe5UHLu5/IrK/ARCz5Ldl71lNpNHjRNGloSmjIUsdOZY01q D8ZR9YdWxpVcTCNSKxGnDrhVyShM3bN70h2KuYWv01/IsTGJKvkkhnG0SXqJqKYm XBRrEyVnyn52mQM7BhWSkCrX+wSb8OyEtie4Gdn8LjKJ0i2LYeQZ7vnXSR4aORXt pqYe7O+cwWZoj7Vpqea8Ye7srt6BEATEGwuezcQYH+FvaIEepORHWsqqZm6z/o91 wyjjoPcFVyKoEUASdrkC0Hq9eqoaRBVo3yZ/AWlFJ06O/eVhTNKIBN6HykmPJ+vW 9QESaHjZ8Gackz6ehU86h7mu6Mtrbfk73BZfG23SVL6cZltXazE539v0Pv8Ge+P/ 2NKdDjRp1vCHViRLNCwX20VmZFg7bHVDHgYV0gBH/VgZAihhRhSQk9LEEhk1fbij UQjBMtRnWOoxNHi311Khsxqm+0X82/mBN2mw5VdV5S9ZnBXbbCNRAuxKlZWiZNGV MJ2oPUrzs4RIlhKuCc1Ckjr2n7xtbU/j05ASZ4iJgbDkulkGEniDBL+K1bOFIPJl Kb0T4F/0ucz2IcbeDijG8Xe+5XZmaFNbDQ0ooBiHrOAEVM9vqsiQ76P5uetWPnWr Gop6D4HE4fCGjWcONp1u =1KKs -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Included change: - properly format already existing kerneldoc Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 17:22:58 -08:00
WANG Cong	fb1d598d48	net_sched: act: use tcf_hash_release() in net/sched/act_police.c Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 17:21:55 -08:00
David S. Miller	8c12ec7411	Included change: - properly compute the batman-adv header overhead. Such result is later used to initialize the hard_header_len member of the soft-interface netdev object -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJS1xUQAAoJEEKTMo6mOh1VStwP/RsjepUZEgjIXs/SrKC0Pkdk UCnznIyn8b7sLIQokimNESDxADbTphyIBjMZC2XI87udVIRHD3v1NlHWRUzWYvqq KvYqxuQTfbRnHPwzWiuOzQb9tGJxF+cG4CX8E4ydKqm8LmrNgMX+hL3gq50t5FHU jtFOiTcqjrZTPcXsdtOyoH9eYDv+9TTqgeetCF4iOZ7G+W/3d4h/EGAaWiCmSjvP 4vdxRFokCdr0OkU/a790SbtHNMWcjCxFCtRSoK9cZFrRjCE+ersOg1g0vWA100Vi O0yRkZWqODnTu12skhcA2aojFENiT9CFFvZENMpfUxJKnSortFebpKXo+vyW19gC H6uYR1/KLk00pFOzuKu27hQBvARiKwwkuubeSNthas9jadDSO5ZgZGrDeE4U3GNZ KIkHunf2HVS2zpylRdnb9A9B/mkO+iW5y6rVtDdXTkrzyhmqTpJPeJqi8j1S10Zl oyoJIpS+wN7Bslf3fo0f8Pag7cXT28GD4/iPmCkJDpc1GGz7A/HJF/Vpx06L8BWB pl9TFkxZutBOpCjsFCJ+DvRnmgjolGVqyyyDhizeBVrKMvDW/oJ1PHR6Tx5S3S35 QG3ZNmWIaiZ73bJxzJAY5KxNoehGbPzITEBdTlpo2AMuP1+XFadDF+lquqmeAJSQ TAI0Paz0VP/RRXxhksKZ =opzN -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included change: - properly compute the batman-adv header overhead. Such result is later used to initialize the hard_header_len member of the soft-interface netdev object Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 17:16:43 -08:00
Veaceslav Falico	1d486bfb66	net: add NETDEV_PRECHANGEMTU to notify before mtu change happens Currently, if a device changes its mtu, first the change happens (invloving all the side effects), and after that the NETDEV_CHANGEMTU is sent so that other devices can catch up with the new mtu. However, if they return NOTIFY_BAD, then the change is reverted and error returned. This is a really long and costy operation (sometimes). To fix this, add NETDEV_PRECHANGEMTU notification which is called prior to any change actually happening, and if any callee returns NOTIFY_BAD - the change is aborted. This way we're skipping all the playing with apply/revert the mtu. CC: "David S. Miller" <davem@davemloft.net> CC: Jiri Pirko <jiri@resnulli.us> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 17:15:41 -08:00
Tom Herbert	0b4cec8c2e	net: Check skb->rxhash in gro_receive When initializing a gro_list for a packet, first check the rxhash of the incoming skb against that of the skb's in the list. This should be a very strong inidicator of whether the flow is going to be matched, and potentially allows a lot of other checks to be short circuited. Use skb_hash_raw so that we don't force the hash to be calculated. Tested by running netperf 200 TCP_STREAMs between two machines with GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction of time in dev_gro_receive. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 16:22:54 -08:00
Daniel Borkmann	b013840810	packet: use percpu mmap tx frame pending refcount In PF_PACKET's packet mmap(), we can avoid using one atomic_inc() and one atomic_dec() call in skb destructor and use a percpu reference count instead in order to determine if packets are still pending to be sent out. Micro-benchmark with [1] that has been slightly modified (that is, protcol = 0 in socket(2) and bind(2)), example on a rather crappy testing machine; I expect it to scale and have even better results on bigger machines: ./packet_mm_tx -s7000 -m7200 -z700000 em1, avg over 2500 runs: With patch: 4,022,015 cyc Without patch: 4,812,994 cyc time ./packet_mm_tx -s64 -c10000000 em1 > /dev/null, stable: With patch: real 1m32.241s user 0m0.287s sys 1m29.316s Without patch: real 1m38.386s user 0m0.265s sys 1m35.572s In function tpacket_snd(), it is okay to use packet_read_pending() since in fast-path we short-circuit the condition already with ph != NULL, since we have next frames to process. In case we have MSG_DONTWAIT, we also do not execute this path as need_wait is false here anyway, and in case of _no_ MSG_DONTWAIT flag, it is okay to call a packet_read_pending(), because when we ever reach that path, we're done processing outgoing frames anyway and only look if there are skbs still outstanding to be orphaned. We can stay lockless in this percpu counter since it's acceptable when we reach this path for the sum to be imprecise first, but we'll level out at 0 after all pending frames have reached the skb destructor eventually through tx reclaim. When people pin a tx process to particular CPUs, we expect overflows to happen in the reference counter as on one CPU we expect heavy increase; and distributed through ksoftirqd on all CPUs a decrease, for example. As David Laight points out, since the C language doesn't define the result of signed int overflow (i.e. rather than wrap, it is allowed to saturate as a possible outcome), we have to use unsigned int as reference count. The sum over all CPUs when tx is complete will result in 0 again. The BUG_ON() in tpacket_destruct_skb() we can remove as well. It can _only_ be set from inside tpacket_snd() path and we made sure to increase tx_ring.pending in any case before we called po->xmit(skb). So testing for tx_ring.pending == 0 is not too useful. Instead, it would rather have been useful to test if lower layers didn't orphan the skb so that we're missing ring slots being put back to TP_STATUS_AVAILABLE. But such a bug will be caught in user space already as we end up realizing that we do not have any TP_STATUS_AVAILABLE slots left anymore. Therefore, we're all set. Btw, in case of RX_RING path, we do not make use of the pending member, therefore we also don't need to use up any percpu memory here. Also note that __alloc_percpu() already returns a zero-filled percpu area, so initialization is done already. [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 16:17:12 -08:00
Daniel Borkmann	87a2fd286a	packet: don't unconditionally schedule() in case of MSG_DONTWAIT In tpacket_snd(), when we've discovered a first frame that is not in status TP_STATUS_SEND_REQUEST, and return a NULL buffer, we exit the send routine in case of MSG_DONTWAIT, since we've finished traversing the mmaped send ring buffer and don't care about pending frames. While doing so, we still unconditionally call an expensive schedule() in the packet_current_frame() "error" path, which is unnecessary in this case since it's enough to just quit the function. Also, in case MSG_DONTWAIT is not set, we should rather test for need_resched() first and do schedule() only if necessary since meanwhile pending frames could already have finished processing and called skb destructor. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 16:17:11 -08:00
Daniel Borkmann	902fefb82e	packet: improve socket create/bind latency in some cases Most people acquire PF_PACKET sockets with a protocol argument in the socket call, e.g. libpcap does so with htons(ETH_P_ALL) for all its sockets. Most likely, at some point in time a subsequent bind() call will follow, e.g. in libpcap with ... memset(&sll, 0, sizeof(sll)); sll.sll_family = AF_PACKET; sll.sll_ifindex = ifindex; sll.sll_protocol = htons(ETH_P_ALL); ... as arguments. What happens in the kernel is that already in socket() syscall, we install a proto hook via register_prot_hook() if our protocol argument is != 0. Yet, in bind() we're almost doing the same work by doing a unregister_prot_hook() with an expensive synchronize_net() call in case during socket() the proto was != 0, plus follow-up register_prot_hook() with a bound device to it this time, in order to limit traffic we get. In the case when the protocol and user supplied device index (== 0) does not change from socket() to bind(), we can spare us doing the same work twice. Similarly for re-binding to the same device and protocol. For these scenarios, we can decrease create/bind latency from ~7447us (sock-bind-2 case) to ~89us (sock-bind-1 case) with this patch. Alternatively, for the first case, if people care, they should simply create their sockets with proto == 0 argument and define the protocol during bind() as this saves a call to synchronize_net() as well (sock-bind-3 case). In all other cases, we're tied to user space behaviour we must not change, also since a bind() is not strictly required. Thus, we need the synchronize_net() to make sure no asynchronous packet processing paths still refer to the previous elements of po->prot_hook. In case of mmap()ed sockets, the workflow that includes bind() is socket() -> setsockopt(<ring>) -> bind(). In that case, a pair of {__unregister, register}_prot_hook is being called from setsockopt() in order to install the new protocol receive handler. Thus, when we call bind and can skip a re-hook, we have already previously installed the new handler. For fanout, this is handled different entirely, so we should be good. Timings on an i7-3520M machine: * sock-bind-1: 89 us * sock-bind-2: 7447 us * sock-bind-3: 75 us sock-bind-1: socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3 bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=all(0), pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0 sock-bind-2: socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3 bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1), pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0 sock-bind-3: socket(PF_PACKET, SOCK_RAW, 0) = 3 bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1), pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 16:17:11 -08:00
Paul Gortmaker	cf17228372	net/ipv4: don't use module_init in non-modular gre_offload Recent commit `438e38fadc` ("gre_offload: statically build GRE offloading support") added new module_init/module_exit calls to the gre_offload.c file. The file is obj-y and can't be anything other than built-in. Currently it can never be built modular, so using module_init as an alias for __initcall can be somewhat misleading. Fix this up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. We also make the inclusion explicit. Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of device_initcall directly in this change means that the runtime impact is zero -- it will remain at level 6 in initcall ordering. As for the module_exit, rather than replace it with __exitcall, we simply remove it, since it appears only UML does anything with those, and even for UML, there is no relevant cleanup to be done here. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 16:08:27 -08:00
Eric Dumazet	0864c15883	net: eth_type_trans() should use skb_header_pointer() eth_type_trans() can read uninitialized memory as drivers do not necessarily pull more than 14 bytes in skb->head before calling it. As David suggested, we can use skb_header_pointer() to fix this without breaking some drivers that might not expect eth_type_trans() pulling 2 additional bytes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 15:30:31 -08:00
David S. Miller	5ff1dd2416	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== This small batch contains several Netfilter fixes for your net-next tree, more specifically: * Fix compilation warning in nft_ct in NF_CONNTRACK_MARK is not set, from Kristian Evensen. * Add dependency to IPV6 for NF_TABLES_INET. This one has been reported by the several robots that are testing .config combinations, from Paul Gortmaker. * Fix default base chain policy setting in nf_tables, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 11:44:54 -08:00
Jiri Pirko	89740ca74f	neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is not initialized yet. This might cause confusion for the people who use NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for usage in ndo_neigh_setup. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 11:31:58 -08:00
Eric Dumazet	aee636c480	bpf: do not use reciprocal divide At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 17:02:08 -08:00
Thomas Haller	5b84efecb7	ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE Refactor the deletion/update of prefix routes when removing an address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address present with this flag, to not cleanup the route. Instead, assume that userspace is taking care of this route. Also perform the same cleanup, when userspace changes an existing address to add NOPREFIXROUTE (to an address that didn't have this flag). This is done because when the address was added, a prefix route was created for it. Since the user now wants to handle this route by himself, we cleanup this route. This cleanup of the route is not totally robust. There is no guarantee, that the route we are about to delete was really the one added by the kernel. This behavior does not change by the patch, and in practice it should work just fine. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 17:00:40 -08:00
Thomas Haller	761aac737e	ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes When adding/modifying an IPv6 address, the userspace application needs a way to suppress adding a prefix route. This is for example relevant together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf generated addresses, but depending on on-link, no route for the prefix should be added. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 17:00:40 -08:00
David S. Miller	6631c5cea8	Revert "batman-adv: drop dependency against CRC16" This reverts commit `12afc36e38`. The dependency is actually still necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 16:54:14 -08:00
wangweidong	0ea5e4df7b	sctp: create helper function to enable\|disable sackdelay add sctp_spp_sackdelay_{enable\|disable} helper function for avoiding code duplication. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 16:47:48 -08:00
Li RongQing	d76ed22b22	ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h, and use this macro as possible. And define ip6_tclass helper to return tclass Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 15:53:18 -08:00
Dmitry Eremin-Solenikov	a53d34c346	net: move 6lowpan compression code to separate module IEEE 802.15.4 and Bluetooth networking stacks share 6lowpan compression code. Instead of introducing Makefile/Kconfig hacks, build this code as a separate module referenced from both ieee802154 and bluetooth modules. This fixes the following build error observed in some kernel configurations: net/built-in.o: In function `header_create': 6lowpan.c:(.text+0x166149): undefined reference to `lowpan_header_compress' net/built-in.o: In function `bt_6lowpan_recv': (.text+0x166b3c): undefined reference to `lowpan_process_data' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dmitry Eremin-Solenikov <dmitry_eremin@mentor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 15:36:38 -08:00
Veaceslav Falico	5bb025fae5	net: rename sysfs symlinks on device name change Currently, we don't rename the upper/lower_ifc symlinks in /sys/class/net/*/ , which might result stale/duplicate links/names. Fix this by adding netdev_adjacent_rename_links(dev, oldname) which renames all the upper/lower interface's links to dev from the upper/lower_oldname to the new name. We don't need a rollback because only we control these symlinks and if we fail to rename them - sysfs will anyway complain. Reported-by: Ding Tianhong <dingtianhong@huawei.com> CC: Ding Tianhong <dingtianhong@huawei.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 15:16:20 -08:00
Veaceslav Falico	3ee3270756	net: add sysfs helpers for netdev_adjacent logic They clean up the code a bit and can be used further. CC: Ding Tianhong <dingtianhong@huawei.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 15:16:19 -08:00
Simon Wunderlich	1b371d1307	batman-adv: use consistent kerneldoc style Reported-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-16 00:16:00 +01:00
Marek Lindner	1df0cbd509	batman-adv: fix batman-adv header overhead calculation Batman-adv prepends a full ethernet header in addition to its own header. This has to be reflected in the MTU calculation, especially since the value is used to set dev->hard_header_len. Introduced by `411d6ed93a` ("batman-adv: consider network coding overhead when calculating required mtu") Reported-by: cmsv <cmsv@wirelesspt.net> Reported-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-15 23:54:20 +01:00
Jiri Pirko	3977458c9c	neigh: split lines for NEIGH_VAR_SET so they are not too long introduced by: commit `1f9248e560` "neigh: convert parms to an array" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 14:46:00 -08:00
Kristian Evensen	847c8e2959	netfilter: nft_ct: fix compilation warning if NF_CONNTRACK_MARK is not set net/netfilter/nft_ct.c: In function 'nft_ct_set_eval': net/netfilter/nft_ct.c:136:6: warning: unused variable 'value' [-Wunused-variable] Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-15 11:00:14 +01:00
Eric Dumazet	b53c733600	tcp: do not export tcp_gso_segment() and tcp_gro_receive() tcp_gso_segment() and tcp_gro_receive() no longer need to be exported. IPv4 and IPv6 offloads are statically linked. Note that tcp_gro_complete() is still used by bnx2x, unfortunately. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:53:48 -08:00
Ying Xue	7f2b8562c2	net: nl80211: __dev_get_by_index instead of dev_get_by_index to find interface As __cfg80211_rdev_from_attrs(), nl80211_dump_wiphy_parse() and nl80211_set_wiphy() are all under rtnl_lock protection, __dev_get_by_index() instead of dev_get_by_index() should be used to find interface handler in them allowing us to avoid to change interface reference counter. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:50:47 -08:00
Ying Xue	5af28de353	can: use __dev_get_by_index instead of dev_get_by_index to find interface As cgw_create_job() is always under rtnl_lock protection, __dev_get_by_index() instead of dev_get_by_index() should be used to find interface handler in it having us avoid to change interface reference counter. Cc: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:50:47 -08:00
Ying Xue	a74e942694	caif: __dev_get_by_index instead of dev_get_by_index to find interface The following call chains indicate that chnl_net_open() is under rtnl_lock protection as __dev_open() is protected by rtnl_lock. So if __dev_get_by_index() instead of dev_get_by_index() is used to find interface handler in it, this would help us avoid to change interface reference counter. __dev_open() chnl_net_open() Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:50:47 -08:00
Ying Xue	16b77695ed	batman-adv: use __dev_get_by_index instead of dev_get_by_index to find interface The following call chains indicate that batadv_is_on_batman_iface() is always under rtnl_lock protection as call_netdevice_notifier() is protected by rtnl_lock. So if __dev_get_by_index() rather than dev_get_by_index() is used to find interface handler in it, this would help us avoid to change interface reference counter. call_netdevice_notifier() batadv_hard_if_event() batadv_hardif_add_interface() batadv_is_valid_iface() batadv_is_on_batman_iface() Cc: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:50:47 -08:00
Ying Xue	d4c5fba2f6	decnet: use __dev_get_by_index instead of dev_get_by_index to find interface The following call chain we can identify that dn_cache_getroute() is protected under rtnl_lock. So if we use __dev_get_by_index() instead of dev_get_by_index() to find interface handlers in it, this would help us avoid to change interface reference counter. rtnetlink_rcv() rtnl_lock() netlink_rcv_skb() dn_cache_getroute() rtnl_unlock() Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:50:46 -08:00
Ying Xue	d9ac62be57	dcb: use __dev_get_by_name instead of dev_get_by_name to find interface The following call chain indicates that dcb_doit() is protected under rtnl_lock. So if we use __dev_get_by_name() instead of dev_get_by_name() to find interface handlers in it, this would help us avoid to change interface reference counter. rtnetlink_rcv() rtnl_lock() netlink_rcv_skb() dcb_doit() rtnl_unlock() Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:50:46 -08:00
FX Le Bail	ec35b61ea5	IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6 This change move anycast_src_echo_reply sysctl with other ipv6 sysctls. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:18:22 -08:00
Dan Carpenter	0e864b21e5	sctp: remove a redundant NULL check It confuses Smatch when we check "sinit" for NULL and then non-NULL and that causes a false positive warning later. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:18:22 -08:00
stephen hemminger	963a185539	tipc: spelling fixes Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:18:22 -08:00
stephen hemminger	db9c7c3943	ipv6: addrconf spelling fixes Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:18:22 -08:00
Hannes Frederic Sowa	95f4a45de1	net: avoid reference counter overflows on fib_rules in multicast forwarding Bob Falken reported that after 4G packets, multicast forwarding stopped working. This was because of a rule reference counter overflow which freed the rule as soon as the overflow happend. This patch solves this by adding the FIB_LOOKUP_NOREF flag to fib_rules_lookup calls. This is safe even from non-rcu locked sections as in this case the flag only implies not taking a reference to the rule, which we don't need at all. Rules only hold references to the namespace, which are guaranteed to be available during the call of the non-rcu protected function reg_vif_xmit because of the interface reference which itself holds a reference to the net namespace. Fixes: `f0ad0860d0` ("ipv4: ipmr: support multiple tables") Fixes: `d1db275dd3` ("ipv6: ip6mr: support multiple tables") Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Cc: Julian Anastasov <ja@ssi.bg> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 17:37:25 -08:00
Geert Uytterhoeven	88bfe6ea00	net: Spelling s/transmition/transmission/ Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 17:11:26 -08:00
Christian Engelmayer	267d29a69c	ieee802154: Fix memory leak in ieee802154_add_iface() Fix a memory leak in the ieee802154_add_iface() error handling path. Detected by Coverity: CID 710490. Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 15:40:56 -08:00
Aruna-Hewapathirane	63862b5bef	net: replace macros net_random and net_srandom with direct calls to prandom This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 15:15:25 -08:00
Hannes Frederic Sowa	825edac4e7	ipv6: copy traffic class from ping request to reply Suggested-by: Simon Schneider <simon-schneider@gmx.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 15:08:52 -08:00
WANG Cong	72c1d3bdd5	ipv4: register igmp_notifier even when !CONFIG_PROC_FS We still need this notifier even when we don't config PROC_FS. It should be rare to have a kernel without PROC_FS, so just for completeness. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 15:03:33 -08:00
Ben Hutchings	ae78dbfa40	net: Add trace events for all receive entry points, exposing more skb fields The existing net/netif_rx and net/netif_receive_skb trace events provide little information about the skb, nor do they indicate how it entered the stack. Add trace events at entry of each of the exported functions, including most fields that are likely to be interesting for debugging driver datapath behaviour. Split netif_rx() and netif_receive_skb() so that internal calls are not traced. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 14:46:02 -08:00
Ben Hutchings	d87d04a785	net: Add net_dev_start_xmit trace event, exposing more skb fields The existing net/net_dev_xmit trace event provides little information about the skb that has been passed to the driver, and it is not simple to add more since the skb may already have been freed at the point the event is emitted. Add a separate trace event before the skb is passed to the driver, including most fields that are likely to be interesting for debugging driver datapath behaviour. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 14:46:02 -08:00
Ben Hutchings	20567661a1	net: Fix indentation in dev_hard_start_xmit() Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 14:45:42 -08:00
David S. Miller	0a379e21c5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-01-14 14:42:42 -08:00
Paul Durrant	ed1f50c3a7	net: add skb_checksum_setup This patch adds a function to set up the partial checksum offset for IP packets (and optionally re-calculate the pseudo-header checksum) into the core network code. The implementation was previously private and duplicated between xen-netback and xen-netfront, however it is not xen-specific and is potentially useful to any network driver. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 14:24:19 -08:00
Paul Gortmaker	419331d8ff	netfilter: Add dependency on IPV6 for NF_TABLES_INET Commit `1d49144c0a` ("netfilter: nf_tables: add "inet" table for IPv4/IPv6") allows creation of non-IPV6 enabled .config files that will fail to configure/link as follows: warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES) warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES) warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES) net/built-in.o: In function `nft_reject_eval': nft_reject.c:(.text+0x651e8): undefined reference to `nf_ip6_checksum' nft_reject.c:(.text+0x65270): undefined reference to `ip6_route_output' nft_reject.c:(.text+0x656c4): undefined reference to `ip6_dst_hoplimit' make: *** [vmlinux] Error 1 Since the feature is to allow for a mixed IPV4 and IPV6 table, it seems sensible to make it depend on IPV6. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-14 16:33:00 +01:00
Ilya Dryomov	f2be82b005	libceph: fix preallocation check in get_reply() The check that makes sure that we have enough memory allocated to read in the entire header of the message in question is currently busted. It compares front_len of the incoming message with iov_len field of ceph_msg::front structure, which is used primarily to indicate the amount of data already read in, and not the size of the allocated buffer. Under certain conditions (e.g. a short read from a socket followed by that socket's shutdown and owning ceph_connection reset) this results in a warning similar to [85688.975866] libceph: get_reply front 198 > preallocated 122 (4#0) and, through another bug, leads to forever hung tasks and forced reboots. Fix this by comparing front_len with front_alloc_len field of struct ceph_msg, which stores the actual size of the buffer. Fixes: http://tracker.ceph.com/issues/5425 Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-01-14 11:27:47 +02:00
Ilya Dryomov	3f0a4ac55f	libceph: rename front to front_len in get_reply() Rename front local variable to front_len in get_reply() to make its purpose more clear. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-01-14 11:27:42 +02:00
Ilya Dryomov	3cea4c3071	libceph: rename ceph_msg::front_max to front_alloc_len Rename front_max field of struct ceph_msg to front_alloc_len to make its purpose more clear. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-01-14 11:27:26 +02:00
WANG Cong	b86f81cca9	bridge: move br_net_exit() to br.c And it can become static. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 23:42:39 -08:00
David S. Miller	aef2b45fe4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Conflicts: net/xfrm/xfrm_policy.c Steffen Klassert says: ==================== This pull request has a merge conflict between commits `be7928d20b` ("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and `da7c224b1b` ("net: xfrm: xfrm_policy: silence compiler warning") from the net-next tree and commit `2f3ea9a95c` ("xfrm: checkpatch erros with inline keyword position") from the ipsec-next tree. The version from net-next can be used, like it is done in linux-next. 1) Checkpatch cleanups, from Weilong Chen. 2) Fix lockdep complaints when pktgen is used with IPsec, from Fan Du. 3) Update pktgen to allow any combination of IPsec transport/tunnel mode and AH/ESP/IPcomp type, from Fan Du. 4) Make pktgen_dst_metrics static, Fengguang Wu. 5) Compile fix for pktgen when CONFIG_XFRM is not set, from Fan Du. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 23:14:25 -08:00
Neal Cardwell	70315d22d3	inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock (not just TIME_WAIT), and for such sockets the tw_substate field holds the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2. This brings the inet_diag state-matching code in line with the field it uses to populate idiag_state. This is also analogous to the info exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state and get_timewait4_sock() exports tw->tw_substate. Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi state time-wait" would also return sockets in state TCP_FIN_WAIT2. This is an old bug that predates `05dbc7b` ("tcp/dccp: remove twchain"). Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 22:35:46 -08:00
David S. Miller	853dc21bfe	Included changes: - drop dependency against CRC16 - move to new release version - add size check at compile time for packet structs - update copyright years in every file - implement new bonding/interface alternation feature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJS0p53AAoJEEKTMo6mOh1VZ7sP/RDgh5PMXC5l1LNTG9gtsV5Z zzqs+kEfeTCivcMtONXhBhuli7wlW3eZehscB/Hn6VOKf40Ktwb0tNjrJZ+OMEHC hwR7i56ucNOKA5L1cswWwprIY3tV3dK+tI/Y7Oc0d/HAkhU2j3wHWdzCdUMa2yBp PurZZrRFXrqcKtIKP+AK1DkkQ2TyUZVNB6dZDf1AifMxhcfFf6Vxg1JGj6AKgvKF zf9q8SC5x33qGfvx4VML1b0JNChAwt01PecY/Eo154W3eOWHNXh9UxQEQBRoChbJ A/hCoo/LWGFeyqZwEeWBIjR+nGi/5zbCg310FGTgjWZaJ+BBD/mjKAjDhtszX2sI Lf0TJ7ytfCn/qij0Xmqg3R5RnmstJpb+weZ7gMqk63o9I06RtvV6/x58tPB+7+Y1 AJ5egjL0yUypn7LtFUL3z1S7Np6m+FC9KuH47Yc1FyXR8tSwDWYszdlAloqPeYVI CDMw73T+6vsWr2UPBnXqudy0BtG3XT8LFXAUC9GMkz0k8GY8RZK7MeUun9Sax+ra c7MC+yZYDhwKgNSiEw3J86n0ozsIGVCheAQGuenmdicQdirJa8KImj1CZCoGqWBk kWsyNhw68SVLZFSfCiKPjJjbWijpVfQKM/TSB30Cmj2L1hyWeEtBn0McHig6srdX hf1Xh4tFh1yGwMYdCcAq =BKos -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - drop dependency against CRC16 - move to new release version - add size check at compile time for packet structs - update copyright years in every file - implement new bonding/interface alternation feature Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 21:50:27 -08:00
Eric Paris	4440e85481	audit: convert all sessionid declaration to unsigned int Right now the sessionid value in the kernel is a combination of u32, int, and unsigned int. Just use unsigned int throughout. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:46 -05:00
Veaceslav Falico	2315dc91a5	net: make dev_set_mtu() honor notification return code Currently, after changing the MTU for a device, dev_set_mtu() calls NETDEV_CHANGEMTU notification, however doesn't verify it's return code - which can be NOTIFY_BAD - i.e. some of the net notifier blocks refused this change, and continues nevertheless. To fix this, verify the return code, and if it's an error - then revert the MTU to the original one, notify again and pass the error code. CC: Jiri Pirko <jiri@resnulli.us> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 15:19:26 -08:00
stephen hemminger	6daaf0de2f	sctp: make sctp_addto_chunk_fixed local Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 14:42:30 -08:00
stephen hemminger	b5d2b2858f	l2tp: make local functions static Avoid needless export of local functions Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 12:00:16 -08:00
Neal Cardwell	b884b1a46f	gre_offload: simplify GRE header length calculation in gre_gso_segment() Simplify the GRE header length calculation in gre_gso_segment(). Switch to an approach that is simpler, faster, and more general. The new approach will continue to be correct even if we add support for the optional variable-length routing info that may be present in a GRE header. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:53:49 -08:00
WANG Cong	7eb8896df0	net_sched: act: remove struct tcf_act_hdr It is not necessary at all. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:50:15 -08:00
WANG Cong	a8701a6c7a	net_sched: avoid casting void pointer tp->root is a void* pointer, no need to cast it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:50:15 -08:00
WANG Cong	2519a602c2	net_sched: optimize tcf_match_indev() tcf_match_indev() is called in fast path, it is not wise to search for a netdev by ifindex and then compare by its name, just compare the ifindex. Also, dev->name could be changed by user-space, therefore the match would be always fail, but dev->ifindex could be consistent. BTW, this will also save some bytes from the core struct of u32. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:50:15 -08:00
WANG Cong	832d1d5bfa	net_sched: add struct net pointer to tcf_proto_ops->dump It will be needed by the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:50:14 -08:00
WANG Cong	a56e19538d	net_sched: act: clean up notification functions Refactor tcf_add_notify() and factor out tcf_del_notify(). Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:50:14 -08:00
WANG Cong	ddafd34f41	net_sched: act: move idx_gen into struct tcf_hashinfo There is no need to store the index separatedly since tcf_hashinfo is allocated statically too. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:50:14 -08:00
Luis R. Rodriguez	4f7b91404c	cfg80211: make regulatory_hint() remove REGULATORY_CUSTOM_REG The REGULATORY_CUSTOM_REG can be used during early init with the goal of overriding the wiphy's default regulatory settings in case the alpha2 of the device is not known. In the case that the alpha2 becomes known lets avoid having drivers having to clear the REGULATORY_CUSTOM_REG flag by doing it for them when regulatory_hint() is used. Cc: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-01-13 14:46:58 -05:00
Eric Dumazet	600adc18eb	net: gro: change GRO overflow strategy GRO layer has a limit of 8 flows being held in GRO list, for performance reason. When a packet comes for a flow not yet in the list, and list is full, we immediately give it to upper stacks, lowering aggregation performance. With TSO auto sizing and FQ packet scheduler, this situation happens more often. This patch changes strategy to simply evict the oldest flow of the list. This works better because of the nature of packet trains for which GRO is efficient. This also has the effect of lowering the GRO latency if many flows are competing. Tested : Used a 40Gbps NIC, with 4 RX queues, and 200 concurrent TCP_STREAM netperf. Before patch, aggregate rate is 11Gbps (while a single flow can reach 30Gbps) After patch, line rate is reached. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:43:46 -08:00
John W. Linville	f13352519e	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next	2014-01-13 14:40:59 -05:00
Wei Yongjun	d10dbad2aa	gre_offload: fix sparse non static symbol warning Fixes the following sparse warning: net/ipv4/gre_offload.c:253:5: warning: symbol 'gre_gro_complete' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:36:43 -08:00
John W. Linville	ec665facde	This is the first NFC pull request for 3.14 It includes: * A new NFC driver for Marvell's 8897, and a few NCI fixes and improvements needed to support this chipset. * An LLCP fix for how we were setting the default MIU on a p2p link. If there is no explicit MIU extension announced at connection time, we must use the default one and not the one announced at LLCP link establishement time. * A pn544 EEPROM config update. Some of the currently EEPROM configured values are overwriting the firmware ones while other should not be set by the driver itself. * Some NFC digital stack fixes and improvements. Asynchronous functions are better documented, RF technologies and CRC functions are set upon PSL_REQ reception, and a few minor bugs are fixed. * Minor and miscelaneous pn533, mei_phy and port100 fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJSzfOKAAoJEIqAPN1PVmxKODEP/i1tmx6bwSjuR0gMyvIkqcBJ 1mM7BwdXlTKTvS/HaKTqaftS5S9Kj/IYSsHPjqRAJp3ipZdc39D8rR3jiyhWzKyD A/o6whTBTnyAgt8/enNp+h8S/Iq+E3itL/51KUOeeFIKSpGqqfcssZ1/3qhvoYZQ 75zck2OPiEs8KBl1bCrrzK1kP4s8aEH6PepmXd7WS8njKe+dcyl3erw0IVN4WPfP FKFemvL/HP8+cUyshdiQGRiSw+TyD1VLaZinhyoeJxVRcXUjcodLwtCIATwqvu54 2fMk1ccFineAQZGFfZGbtMAjHQLUeOpHHxFfdkW1g7P9IBp4zjtEiNOhNvPnKlR2 p4g4R/vPdXxbQWjIoWzXI8qw/eFq8xIVC0ap37W/Y65532ParnXESAwk29BJ6770 kqpHTjfZTUmW2POuvqhEKUKPPVp5nt0ArgfnjvHOS1wxcT885vWeu/YOxpOm9VdU rjFSBBaBDC43vGkCHn5szU9sEwu4O1/JFHElSToXsu+bRtS0tA3O62Kv732RZmbm 1SCbZ63o1ivZr8Q37bY1NDW1/YdwUJMNbEb/t/wDLBqYx0vQcD0aDUWCwoACi2Du FbUElc975E1ChvM7VfV7uqFN0Pc++M1IEHLcM2BiXmSIGjziFY02FFDkqHiea/tC xlYfYrrUoIj0v/u7q1t4 =ydBy -----END PGP SIGNATURE----- Merge tag 'nfc-next-3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz <sameo@linux.intel.com> says: "This is the first NFC pull request for 3.14 It includes: * A new NFC driver for Marvell's 8897, and a few NCI fixes and improvements needed to support this chipset. * An LLCP fix for how we were setting the default MIU on a p2p link. If there is no explicit MIU extension announced at connection time, we must use the default one and not the one announced at LLCP link establishement time. * A pn544 EEPROM config update. Some of the currently EEPROM configured values are overwriting the firmware ones while other should not be set by the driver itself. * Some NFC digital stack fixes and improvements. Asynchronous functions are better documented, RF technologies and CRC functions are set upon PSL_REQ reception, and a few minor bugs are fixed. * Minor and miscelaneous pn533, mei_phy and port100 fixes." Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-01-13 14:36:42 -05:00
Hannes Frederic Sowa	8ed1dc44d3	ipv4: introduce hardened ip_no_pmtu_disc mode This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors to be honored by protocols which do more stringent validation on the ICMP's packet payload. This knob is useful for people who e.g. want to run an unmodified DNS server in a namespace where they need to use pmtu for TCP connections (as they are used for zone transfers or fallback for requests) but don't want to use possibly spoofed UDP pmtu information. Currently the whitelisted protocols are TCP, SCTP and DCCP as they check if the returned packet is in the window or if the association is valid. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Suggested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:22:55 -08:00
Hannes Frederic Sowa	0954cf9c61	ipv6: introduce ip6_dst_mtu_forward and protect forwarding path with it In the IPv6 forwarding path we are only concerend about the outgoing interface MTU, but also respect locked MTUs on routes. Tunnel provider or IPSEC already have to recheck and if needed send PtB notifications to the sending host in case the data does not fit into the packet with added headers (we only know the final header sizes there, while also using path MTU information). The reason for this change is, that path MTU information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv6 forwarding path to wrongfully emit Packet-too-Big errors and drop IPv6 packets. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:22:54 -08:00
Hannes Frederic Sowa	f87c10a8aa	ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing While forwarding we should not use the protocol path mtu to calculate the mtu for a forwarded packet but instead use the interface mtu. We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was introduced for multicast forwarding. But as it does not conflict with our usage in unicast code path it is perfect for reuse. I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular dependencies because of IPSKB_FORWARDED. Because someone might have written a software which does probe destinations manually and expects the kernel to honour those path mtus I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone can disable this new behaviour. We also still use mtus which are locked on a route for forwarding. The reason for this change is, that path mtus information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv4 forwarding path to wrongfully emit fragmentation needed notifications or start to fragment packets along a path. Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED won't be set and further fragmentation logic will use the path mtu to determine the fragmentation size. They also recheck packet size with help of path mtu discovery and report appropriate errors. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:22:54 -08:00
Terry Lam	6c76a07a71	HHF qdisc: fix jiffies-time conversion. This is to be compatible with the use of "get_time" (i.e. default time unit in us) in iproute2 patch for HHF as requested by Stephen. Signed-off-by: Terry Lam <vtlam@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 11:20:39 -08:00
Peter Zijlstra	1774e9f3e5	sched, net: Clean up preempt_enable_no_resched() abuse The only valid use of preempt_enable_no_resched() is if the very next line is schedule() or if we know preemption cannot actually be enabled by that statement due to known more preempt_count 'refs'. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 17:39:04 +01:00
Antonio Quartulli	12afc36e38	batman-adv: drop dependency against CRC16 The crc16 functionality is not used anymore, therefore we can safely remove the dependency in the Kbuild file. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:21 +01:00
Simon Wunderlich	8d90d775ca	batman-adv: Start new development cycle Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:20 +01:00
Simon Wunderlich	e19f9759ed	batman-adv: update copyright years for 2014 Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:19 +01:00
Simon Wunderlich	031ace8d05	batman-adv: add build checks for packet sizes With unrolling the batadv_header into the respective structures, the offsetof checks are now useless. Instead, add build checks for all packet types which go over the wire to avoid problems with wrong sizes or compatibility issues on some architectures which don't use every day. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:19 +01:00
Antonio Quartulli	82ab33214e	batman-adv: remove returns at the end of void functions Return at the end of void functions is not needed. Since most of the void functions in the code do not do so, make all the others consistent by removing the useless returns. Actually all the functions to be "fixed" are in network-coding.h only. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-12 14:41:18 +01:00
Simon Wunderlich	cb1c92ec37	batman-adv: add debugfs support to view multiif tables Show tables for the multi interface operation. Originator tables are added per hard interface. This patch also changes the API by adding the interface to the bat_orig_print() parameters. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:16 +01:00
Simon Wunderlich	5bc7c1eb44	batman-adv: add debugfs structure for information per interface To show information per interface, add a debugfs hardif structure similar to the system in sysfs. Hard interface folders will be created in "$debugfs/batman-adv/". Files are not yet added. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:15 +01:00
Simon Wunderlich	f3b3d90189	batman-adv: add bonding again With the new interface alternating, the first hop may send packets in a round robin fashion to it's neighbors because it has multiple valid routes built by the multi interface optimization. This patch enables the feature if bonding is selected. Note that unlike the bonding implemented before, this version is much simpler and may even enable multi path routing to a certain degree. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:15 +01:00
Simon Wunderlich	ef0a937f7a	batman-adv: consider outgoing interface in OGM sending The current OGM sending an aggregation functionality decides on which interfaces a packet should be sent when it parses the forward packet struct. However, with the network wide multi interface optimization the outgoing interface is decided by the OGM processing function. This is reflected by moving the decision in the OGM processing function and add the outgoing interface in the forwarding packet struct. This practically implies that an OGM may be added multiple times (once per outgoing interface), and this also affects aggregation which needs to consider the outgoing interface as well. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:14 +01:00
Simon Wunderlich	c039876892	batman-adv: add WiFi penalty If the same interface is used for sending and receiving, there might be throughput degradation on half-duplex interfaces such as WiFi. Add a penalty if the same interface is used to reflect this problem in the metric. At the same time, change the hop penalty from 30 to 15 so there will be no change for single wifi mesh network. the effective hop penalty will stay at 30 due to the new wifi penalty for these networks. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:13 +01:00
Simon Wunderlich	7351a4822d	batman-adv: split out router from orig_node For the network wide multi interface optimization there are different routers for each outgoing interface (outgoing from the OGM perspective, incoming for payload traffic). To reflect this, change the router and associated data to a list of routers. While at it, rename batadv_orig_node_get_router() to batadv_orig_router_get() to follow the new naming scheme. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:11 +01:00
Simon Wunderlich	89652331c0	batman-adv: split tq information in neigh_node struct For the network wide multi interface optimization it is required to save metrics per outgoing interface in one neighbor. Therefore a new type is introduced to keep interface-specific information. This also requires some changes in access and list management. The compare and equiv_or_better API calls are changed to take the outgoing interface into consideration. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:10 +01:00
Simon Wunderlich	f6c8b71173	batman-adv: remove bonding and interface alternating Remove bonding and interface alternating code - it will be replaced by a new, network-wide multi interface optimization which enables both bonding and interface alternating in a better way. Keep the sysfs and find router function though, this will be needed later. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-12 14:41:10 +01:00
David S. Miller	45593c2bd2	Included changes: - substitute FSF address with URL - deselect current bat-GW when GW-client mode gets deactivated - send every DHCP packet using bat-unicast messages when GW-client mode is enabled - implement the Extended Isolation mechanism (it is an enhancement of the already existing batman-AP-isolation). This mechanism allows the user to drop packets exchanged by selected clients by using netfilter marks. - fix typ0 in header guard - minor code cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJSzk9EAAoJEEKTMo6mOh1VT9UP/0Mdcg7VM7oKSUEkUszAAQsW HwYVxFo89bwxMVauv4qAnmC6J4mV1IeciXFnpTeon8Bqr7isRDz8gCpDV/6m9AZp Rh3PFEWkJEE7xZy2sSOyn2cZrgP/Wd/zYTxac+XAf0I+cjSYo40vGGc1/9/EN7to lo1A4ru+BQJvQkt500a859Z5PAAsVXolYtLqJcxD0eGDbzR1kHTQmUDJEEkNwzUP 55vLu1KsSbYxw/T4A8ABKwCvkGRTJhgKmKKvwymeH9PHc5ODAZeInw1HTupKwIOQ W+WJxksJ0oBEuZB7y2NVXBRyPC2bF3D10C/7yZlul0PEntmT8vWV/eeO+Lw59YS3 rzFi+wpvdHwkjuBKpr+mc8lMPE0nWU31HqpFJP3y5IzsjL31kWT6sioWHxhY1zo9 hvZpb2/F8BniSgT5o3vpMcfInBQefViXP6ELjyB5i6+z2Pf8TqPukNGxCEWpOF6O r8HUHlPjbwERohK8/x4LRA8F7VpNagvMJ8kSHRUeR1j5QfcpbqFj3xi5LEBciakT WHok0AJdNrFUBVuj2n9z0hHFTTGF4Yxqf61A/vHJkROEthqwvoqtTOX5L1c1ZC/f DV6q4m2mLWuUxuRLGVWlXoN2XK8min+Of4RABzicX47EUIjTiPHOh38oymQUXG7D FS7/kYo5ZMIJmJgNKp+4 =6EjB -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - substitute FSF address with URL - deselect current bat-GW when GW-client mode gets deactivated - send every DHCP packet using bat-unicast messages when GW-client mode is enabled - implement the Extended Isolation mechanism (it is an enhancement of the already existing batman-AP-isolation). This mechanism allows the user to drop packets exchanged by selected clients by using netfilter marks. - fix typ0 in header guard - minor code cleanups Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 17:59:34 -05:00
Christoph Paasch	3e7013ddf5	tcp: metrics: Allow selective get/del of tcp-metrics based on src IP We want to be able to get/del tcp-metrics based on the src IP. This patch adds the necessary parsing of the netlink attribute and if the source address is set, it will match on this one too. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 17:38:18 -05:00
Christoph Paasch	bbf852b96e	tcp: metrics: Delete all entries matching a certain destination As we now can have multiple entries per destination-IP, the "ip tcp_metrics delete address ADDRESS" command deletes all of them. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 17:38:18 -05:00
Christoph Paasch	8a59359cb8	tcp: metrics: New netlink attribute for src IP and dumped in netlink reply This patch adds a new netlink attribute for the source-IP and appends it to the netlink reply. Now, iproute2 can have access to the source-IP. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 17:38:18 -05:00
Christoph Paasch	a544302820	tcp: metrics: Add source-address to tcp-metrics We add the source-address to the tcp-metrics, so that different metrics will be used per source/destination-pair. We use the destination-hash to store the metric inside the hash-table. That way, deleting and dumping via "ip tcp_metrics" is easy. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 17:38:18 -05:00
Christoph Paasch	324fd55a19	tcp: metrics: rename tcpm_addr to tcpm_daddr As we will add also the source-address, we rename all accesses to the tcp-metrics address to use "daddr". Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 17:38:18 -05:00
David S. Miller	1a6c1e5bd2	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please pull these updates for the 3.14 stream! For the mac80211 bits, Johannes says: "Felix adds some helper functions for P2P NoA software tracking, Joe fixes alignment (but as this apparently never caused issues I didn't send it to 3.13), Kyeyoon/Jouni add QoS-mapping support (a Hotspot 2.0 feature), Weilong fixed a bunch of checkpatch errors and I get to play fire-fighter or so and clean up other people's locking issues. I also added nl80211 vendor-specific events, as we'd discussed at the wireless summit." For the iwlwifi bits, Emmanuel says: "I have here a rework of the interrupt handling to meet RT kernel requirements - basically we don't take any lock in the primary interrupt handler. This gave me a good reason to clean things up a bit on the way. There is also a fix of the QoS mapping along with a few workarounds for hardware / firmware issues that are hard to hit. Three fixes suggested by static analyzers, and other various stuff. Most importantly, I update the Copyright note to include the new year." For the bluetooth bits, Gustavo says: "More patches to 3.14. The bulk of changes here is the 6LoWPAN support for Bluetooth LE Devices. The commits that touches net/ieee802154/ are already acked by David Miller. Other than that we have some RFCOMM fixes and improvements plus fixes and clean ups all over the tree." Beyond that, ath9k, brcmfmac, mwifiex, and wil6210 get their usual level of attention. The wl1251 driver gets a number of updates, and there are a handful of other bits here and there. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 14:53:33 -05:00
David S. Miller	ef8570d859	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== This batch contains one single patch with the l2tp match for xtables, from James Chapman. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 14:50:02 -05:00
Jason Wang	f663dd9aaf	net: core: explicitly select a txq before doing l2 forwarding Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 13:23:08 -05:00
David S. Miller	c4d7099867	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== For the mac80211 bits, Johannes says: "I have a fix from Javier for mac80211_hwsim when used with wmediumd userspace, and a fix from Felix for buffering in AP mode." For the NFC bits, Samuel says: "This pull request only contains one fix for a regression introduced with commit `e29a9e2ae1`. Without this fix, we can not establish a p2p link in target mode. Only initiator mode works." For the iwlwifi bits, Emmanuel says: "It only includes new device IDs so it's not vital. If you have a pull request to net.git anyway, I'd happy to have this in." ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 13:21:22 -05:00
Pablo Neira Ayuso	8f46df184c	netfilter: nf_tables: fix missing byteorder conversion in policy When fetching the policy attribute, the byteorder conversion was missing, breaking the chain policy setting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-10 18:26:13 +01:00
John W. Linville	235f939228	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: net/ieee802154/6lowpan.c	2014-01-10 10:59:40 -05:00
Johannes Berg	b77cf4f8e1	mac80211: handle MMPDUs at EOSP correctly If a uAPSD service period ends with an MMPDU, we currently just send that MMPDU, but it obviously won't get the EOSP bit set as it doesn't have a QoS header. This contradicts the standard, so add a QoS-nulldata frame after the MMPDU to properly terminate the service period with a frame that has EOSP set. Also fix a bug wrt. the TID for the MMPDU, it shouldn't be set to 0 unconditionally but use the actual TID that was assigned. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-10 09:50:02 +01:00
Johannes Berg	03c8c06f2d	mac80211: reset TX info flags when frame will be reprocessed The temporary TX info flags need to be cleared if the frame will be processed through the TX handlers again, otherwise it can get messed up. This fixes a bug that happened when an aggregation session was stopped while the station was sleeping - some frames might get transmitted marked as aggregation erroneously without this fix. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-10 09:43:34 +01:00
Johannes Berg	f9f760b488	mac80211: release multiple ACs in uAPSD, fix more-data bug When a response for PS-Poll or a uAPSD trigger frame is sent, the more-data bit should be set according to 802.11-2012 11.2.1.5 h), meaning that it should indicate more data on the relevant ACs (delivery-enabled or nondelivery-enabled for uAPSD or PS-Poll.) In, for example, the following scenario: * 1 frame on VO queue (either in driver or in mac80211) * at least 1 frame on VI queue (in the driver) * both VO/VI are delivery-enabled * uAPSD trigger frame received The more-data flag to the driver would not be set, even though it should be. While fixing this, I noticed that we should really release frames from multiple ACs where there's data buffered in the driver for the corresponding TIDs. To address all this, restructure the code a bit to consider all ACs if we only release driver frames or only buffered frames. This also addresses the more-data bug described above as now the TIDs will all be marked as released, so the driver will have to check the number of frames. While at it, clarify some code and comments and remove the found variable, replacing it with the appropriate sw/hw release check. Reported-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-10 09:43:34 +01:00
Johannes Berg	0a1cb80975	mac80211: fix PS-Poll driver release TID Using ffs() for the PS-Poll release TID is wrong, it will cause frames to be released in order 0 1 2 3 4 5 6 7 instead of the correct 7 6 5 4 3 0 2 1. Fix this by adding a new function that implements "highest priority TID" properly. Reported-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-10 09:43:34 +01:00
Fan Du	6bae919003	{xfrm,pktgen} Fix compiling error when CONFIG_XFRM is not set 0-DAY kernel build testing backend reported below error: All error/warnings: net/core/pktgen.c: In function 'pktgen_if_write': >> >> net/core/pktgen.c:1487:10: error: 'struct pktgen_dev' has no member named 'spi' >> >> net/core/pktgen.c:1488:43: error: 'struct pktgen_dev' has no member named 'spi' Fix this by encapuslating the code with CONFIG_XFRM. Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-10 07:46:24 +01:00
Hannes Frederic Sowa	07edd741c8	ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIME In the past the IFA_PERMANENT flag indicated, that the valid and preferred lifetime where ignored. Since change `fad8da3e08` ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") we honour at least the preferred lifetime on those addresses. As such the valid lifetime gets recalculated and updated to 0. If loopback address is added manually this problem does not occur. Also if NetworkManager manages IPv6, those addresses will get added via inet6_rtm_newaddr and thus will have a correct lifetime, too. Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com> Reported-by: Damien Wyart <damien.wyart@gmail.com> Fixes: `fad8da3e08` ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") Cc: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-09 23:07:47 -05:00
David S. Miller	751fcac19a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== nf_tables updates for net-next The following patchset contains the following nf_tables updates, mostly updates from Patrick McHardy, they are: * Add the "inet" table and filter chain type for this new netfilter family: NFPROTO_INET. This special table/chain allows IPv4 and IPv6 rules, this should help to simplify the burden in the administration of dual stack firewalls. This also includes several patches to prepare the infrastructure for this new table and a new meta extension to match the layer 3 and 4 protocol numbers, from Patrick McHardy. * Load both IPv4 and IPv6 conntrack modules in nft_ct if the rule is used in NFPROTO_INET, as we don't certainly know which one would be used, also from Patrick McHardy. * Do not allow to delete a table that contains sets, otherwise these sets become orphan, from Patrick McHardy. * Hold a reference to the corresponding nf_tables family module when creating a table of that family type, to avoid the module deletion when in use, from Patrick McHardy. * Update chain counters before setting the chain policy to ensure that we don't leave the chain in inconsistent state in case of errors (aka. restore chain atomicity). This also fixes a possible leak if it fails to allocate the chain counters if no counters are passed to be restored, from Patrick McHardy. * Don't check for overflows in the table counter if we are just renaming a chain, from Patrick McHardy. * Replay the netlink request after dropping the nfnl lock to load the module that supports provides a chain type, from Patrick. * Fix chain type module references, from Patrick. * Several cleanups, function renames, constification and code refactorizations also from Patrick McHardy. * Add support to set the connmark, this can be used to set it based on the meta mark (similar feature to -j CONNMARK --restore), from Kristian Evensen. * A couple of fixes to the recently added meta/set support and nft_reject, and fix missing chain type unregistration if we fail to register our the family table/filter chain type, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-09 21:36:01 -05:00
Pablo Neira Ayuso	cf4dfa8539	netfilter: nf_tables: fix error path in the init functions We have to unregister chain type if this fails to register netns. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 23:25:48 +01:00
James Chapman	74f77a6b2b	netfilter: introduce l2tp match extension Introduce an xtables add-on for matching L2TP packets. Supports L2TPv2 and L2TPv3 over IPv4 and IPv6. As well as filtering on L2TP tunnel-id and session-id, the filtering decision can also include the L2TP packet type (control or data), protocol version (2 or 3) and encapsulation type (UDP or IP). The most common use for this will likely be to filter L2TP data packets of individual L2TP tunnels or sessions. While a u32 match can be used, the L2TP protocol headers are such that field offsets differ depending on bits set in the header, making rules for matching generic L2TP connections cumbersome. This match extension takes care of all that. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 21:36:39 +01:00
Wei Yongjun	d0eb1f7e66	ip_tunnel: fix sparse non static symbol warning Fixes the following sparse warning: net/ipv4/ip_tunnel.c:116:18: warning: symbol 'tunnel_dst_check' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-09 14:31:47 -05:00
Wei Yongjun	ece37c87ab	openvswitch: Use kmem_cache_free() instead of kfree() memory allocated by kmem_cache_alloc() should be freed using kmem_cache_free(), not kfree(). Fixes: `e298e50570` ('openvswitch: Per cpu flow stats.') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-09 14:26:39 -05:00
Patrick McHardy	3876d22dba	netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain() We don't encode argument types into function names and since besides nft_do_chain() there are only AF-specific versions, there is no risk of confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:16 +01:00
Patrick McHardy	44a6f0df03	netfilter: nf_tables: prohibit deletion of a table with existing sets We currently leak the set memory when deleting a table that still has sets in it. Return EBUSY when attempting to delete a table with sets. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:16 +01:00
Patrick McHardy	7047f9d052	netfilter: nf_tables: take AF module reference when creating a table The table refers to data of the AF module, so we need to make sure the module isn't unloaded while the table exists. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:16 +01:00
Patrick McHardy	c5c1f975ad	netfilter: nf_tables: perform flags validation before table allocation Simplifies error handling. Additionally use the correct type u32 for the host byte order flags value. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00
Patrick McHardy	fa2c1de0bb	netfilter: nf_tables: minor nf_chain_type cleanups Minor nf_chain_type cleanups: - reorder struct to plug a hoe - rename struct module member to "owner" for consistency - rename nf_hookfn array to "hooks" for consistency - reorder initializers for better readability Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00
Patrick McHardy	2a37d755b8	netfilter: nf_tables: constify chain type definitions and pointers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00
Patrick McHardy	93b0806f00	netfilter: nf_tables: replay request after dropping locks to load chain type To avoid races, we need to replay to request after dropping the nfnl_mutex to auto-load the chain type module. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:14 +01:00
Patrick McHardy	88ce65a71c	netfilter: nf_tables: add missing module references to chain types In some cases we neither take a reference to the AF info nor to the chain type, allowing the module to be unloaded while in use. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:14 +01:00
Patrick McHardy	baae3e62f3	netfilter: nf_tables: fix chain type module reference handling The chain type module reference handling makes no sense at all: we take a reference immediately when the module is registered, preventing the module from ever being unloaded. Fix by taking a reference when we're actually creating a chain of the chain type and release the reference when destroying the chain. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:14 +01:00
Patrick McHardy	758206760c	netfilter: nf_tables: fix check for table overflow The table use counter is only increased for new chains, so move the check to the correct position. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:13 +01:00
Patrick McHardy	4401a86200	netfilter: nf_tables: restore chain change atomicity Chain counter validation is performed after the chain policy has potentially been changed. Move counter validation/setting before changing of the chain policy to fix this. Additionally fix a memory leak if chain counter allocation fails for new chains, remove an unnecessary free_percpu() and move counter allocation for new chains Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:13 +01:00
Patrick McHardy	57de2a0cd9	netfilter: nf_tables: split chain policy validation from actually setting it Currently nf_tables_newchain() atomicity is broken because of having validation of some netlink attributes performed after changing attributes of the chain. The chain policy is (currently) fine, but split it up as preparation for the following fixes and to avoid future mistakes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:13 +01:00
Pablo Neira Ayuso	b38895c577	netfilter: nft_meta: fix lack of validation of the input register We have to validate that the input register is in the range of allowed registers, otherwise we can take a incorrect register value as input that may lead us to a crash. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:04:16 +01:00
Kristian Evensen	c4ede3d382	netfilter: nft_ct: Add support to set the connmark This patch adds kernel support for setting properties of tracked connections. Currently, only connmark is supported. One use-case for this feature is to provide the same functionality as -j CONNMARK --save-mark in iptables. Some restructuring was needed to implement the set op. The new structure follows that of nft_meta. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 19:07:44 +01:00
Ujjal Roy	f5aa0d21dd	cfg80211: add sanity check for retry limit in wext-compat Block setting the wrong values through iwconfig retry command. Add sanity checking before sending the retry limit to the driver. Signed-off-by: Ujjal Roy <royujjal@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-09 17:05:28 +01:00
John W. Linville	0f74d82d80	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-01-09 10:19:01 -05:00
Ilan Peer	bdfbec2d2d	cfg80211: Add a function to get the number of supported channels Add a utility function to get the number of channels supported by the device, and update the places in the code that need this data. Signed-off-by: Ilan Peer <ilan.peer@intel.com> [replace another occurrence in libertas, fix kernel-doc, fix bugs] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-09 14:24:24 +01:00
David S. Miller	54b553e2c1	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains three Netfilter updates, they are: * Fix wrong usage of skb_header_pointer in the DCCP protocol helper that has been there for quite some time. It was resulting in copying the dccp header to a pointer allocated in the stack. Fortunately, this pointer provides room for the dccp header is 4 bytes long, so no crashes have been reported so far. From Daniel Borkmann. * Use format string to print in the invocation of nf_log_packet(), again in the DCCP helper. Also from Daniel Borkmann. * Revert "netfilter: avoid get_random_bytes call" as prandom32 does not guarantee enough entropy when being calling this at boot time, that may happen when reloading the rule. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-08 15:04:56 -05:00
Antonio Quartulli	42cb0bef01	batman-adv: set the isolation mark in the skb if needed If a broadcast packet is coming from a client marked as isolated, then mark the skb using the isolation mark so that netfilter (or any other application) can recognise them. The mark is written in the skb based on the mask value: only bits set in the mask are substitued by those in the mark value Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:46 +01:00
Antonio Quartulli	eceb22ae0b	batman-adv: create helper function to get AP isolation status The AP isolation status may be evaluated in different spots. Create an helper function to avoid code duplication. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:45 +01:00
Antonio Quartulli	2d2fcc2a3f	batman-adv: extend the ap_isolation mechanism Change the AP isolation mechanism to not only "isolate" WIFI clients but also all those marked with the more generic "isolation flag" (BATADV_TT_CLIENT_ISOLA). The result is that when AP isolation is on any unicast packet originated by an "isolated" client and directed to another "isolated" client is dropped at the source node. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:44 +01:00
Antonio Quartulli	dd24ddb265	batman-adv: print the new BATADV_TT_CLIENT_ISOLA flag Print the new BATADV_TT_CLIENT_ISOLA flag properly in the Local and Global Translation Table output. The character 'I' is used in the flags column to indicate that the entry is marked as isolated. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:44 +01:00
Antonio Quartulli	9464d07188	batman-adv: mark a local client as isolated when needed A client sending packets which mark matches the value configured via sysfs has to be identified as isolated using the TT_CLIENT_ISOLA flag. The match is mask based, meaning that only bits set in the mask are compared with those in the mark value. If the configured mask is equal to 0 no operation is performed. Such flag is then advertised within the classic client announcement mechanism. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:43 +01:00
Antonio Quartulli	c42edfe382	batman-adv: add isolation_mark sysfs attribute This attribute can be used to set and read the value and the mask of the skb mark which will be used to classify the source non-mesh client as ISOLATED. In this way a client can be advertised as such and the mark can potentially be restored at the receiving node before delivering the skb. This can be helpful for creating network wide netfilter policies. This sysfs file expects a string of the shape "$mark/$mask". Where $mark has to be a 32-bit number in any base, while $mask must be a 32bit mask expressed in hex base. Only bits in $mark covered by the bitmask are really stored. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:42 +01:00
Antonio Quartulli	6c413b1c22	batman-adv: send every DHCP packet as bat-unicast In different situations it is possible that the DHCP server or client uses broadcast Ethernet frames to send messages to each other. The GW component in batman-adv takes care of using bat-unicast packets to bring broadcast DHCP Discover/Requests to the "best" server. On the way back the DHCP server usually sends unicasts, but upon client request it may decide to use broadcasts as well. This patch improves the GW component so that it now snoops and sends as unicast all the DHCP packets, no matter if they were generated by a DHCP server or client. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:42 +01:00
Antonio Quartulli	36484f84d5	batman-adv: remove parenthesis from return statements Remove parenthesis around return expression as suggested by checkpatch. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:41 +01:00
Antonio Quartulli	4e820e72db	batman-adv: rename gw_deselect() to gw_reselect() The function batadv_gw_deselect() is actually not deselecting anything. It is just informing the GW code to perform a re-election procedure when possible. The current gateway is not being touched at all and therefore the name of this function is rather misleading. Rename it to batadv_gw_reselect() to batadv_gw_reselect() to make its behaviour easier to grasp. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:41 +01:00
Antonio Quartulli	f316318157	batman-adv: deselect current GW on client mode switch off When switching from gw_mode client to either off or server the current selected gateway has to be deselected. In this way when client mode is enabled again a gateway re-election is forced and a GW_ADD event is consequently sent. The current behaviour instead is to keep the current gateway leading to no GW_ADD event when gw_mode client is selected for a second time Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-08 20:49:40 +01:00
Antonio Quartulli	ebf38fb7ab	batman-adv: remove FSF address from GPL disclaimer As suggested by checkpatch, remove all the references to the FSF address since the kernel already has one reference in its documentation. In this way it is easier to update it in case of future changes. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:39 +01:00
Antonio Quartulli	3fba7325bb	batman-adv: don't switch byte order too often if not needed If possible, operations like ntohs/ntohl should not be performed too often. Use a variable to locally store the converted value and then use it. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-01-08 20:49:39 +01:00
Antonio Quartulli	a48bcacdb3	batman-adv: properly rename define in distributed arp table header file Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-01-08 20:49:38 +01:00
John W. Linville	300e5fd160	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2014-01-08 13:44:29 -05:00
John W. Linville	2eff7c791a	This is the first NFC fixes pull request for 3.13. It only contains one fix for a regression introduced with commit `e29a9e2ae1`. Without this fix, we can not establish a p2p link in target mode. Only initiator mode works. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJSywyaAAoJEIqAPN1PVmxK3eMP/2XKZa2qzcMZfWZBLEMXQHce UO36BqZF0nre0ZUZQmaXzM5L0PM/UNxvhf2DsLx3s54/Tk/o0wHYSM/7GfX58dkX YAYSbG3viQM9L1dRhgfGQwRxXUcd8M85fLUH9SdJeUzCIgxXEDFnlpkeaKPE+UaM PgCHRXbW+cDje4DSO5JXVzIRFzsCtAwgd5bx6u/5bM5PLzxGwHJHkHe+lZ9b4Hbe ZLsqbU0HRy0rB8hmpro6NcrIoNHEXYupdd1gwslb88jdA+BDOUAZj7htcBBcC9Lw 7D8RQTI6YNDsOzcLJyWmmfqTKd1j3RWPcTSuWkAGTvy04VOhGEc1LcgOOVLvhsLZ Hw412d0lYOiIwtjeIwS1etY42+f7tPOHOuhWFO3EQX0/1fIQ2H9V18DkM2qFPCWT L5GwV70YWbYfeRF1i3kGWKN15qLh9/toxB6rgE8eM2vCCGJXbt52yz9oSNM3QvZf 2rHnzAHCEKGv4xS7oHhJSnkwH3Nd7WR6gjWatbfswrjtaDU2PKlsL703Uxk1gIPz o3itmJv/Ej1j6TqUpUz2Vs/sr4dltCDiD7tiaiv8zprHn0LZcNyp1/E+p31pMHBP 8IW3BXxpBh3S8gpJ932LVAYe40ymQ0m1ZhssddbH1CC90jfs6m/HfaUC+GsgBX/x 0iaxEH1jwwRIZC9p1Qso =YkKJ -----END PGP SIGNATURE----- Merge tag 'nfc-fixes-3.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes Samuel Ortiz <sameo@linux.intel.com> says: "This is the first NFC fixes pull request for 3.13. It only contains one fix for a regression introduced with commit `e29a9e2ae1`. Without this fix, we can not establish a p2p link in target mode. Only initiator mode works." Signed-off-by: John W. Linville <linville@tuxdriver.com>	2014-01-08 13:36:17 -05:00
Ying Xue	da7c224b1b	net: xfrm: xfrm_policy: silence compiler warning Fix below compiler warning: net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function] Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 22:45:26 -05:00
Jon Paul Maloy	581465fa28	tipc: make link start event synchronous When a link is created we delay the start event by launching it to be executed later in a tasklet. As we hold all the necessary locks at the moment of creation, and there is no risk of deadlock or contention, this delay serves no purpose in the current code. We remove this obsolete indirection step, and the associated function link_start(). At the same time, we rename the function tipc_link_stop() to the more appropriate tipc_link_purge_queues(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:26 -05:00
Ying Xue	f9a2c80b8b	tipc: introduce new spinlock to protect struct link_req Currently, only 'bearer_lock' is used to protect struct link_req in the function disc_timeout(). This is unsafe, since the member fields 'num_nodes' and 'timer_intv' might be accessed by below three different threads simultaneously, none of them grabbing bearer_lock in the critical region: link_activate() tipc_bearer_add_dest() tipc_disc_add_dest() req->num_nodes++; tipc_link_reset() tipc_bearer_remove_dest() tipc_disc_remove_dest() req->num_nodes-- disc_update() read req->num_nodes write req->timer_intv disc_timeout() read req->num_nodes read/write req->timer_intv Without lock protection, the only symptom of a race is that discovery messages occasionally may not be sent out. This is not fatal, since such messages are best-effort anyway. On the other hand, since discovery messages are not time critical, adding a protecting lock brings no serious overhead either. So we add a new, dedicated spinlock in order to guarantee absolute data consistency in link_req objects. This also helps reduce the overall role of the bearer_lock, which we want to remove completely in a later commit series. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
Jon Paul Maloy	b9d4c33935	tipc: remove 'has_redundant_link' flag from STATE link protocol messages The flag 'has_redundant_link' is defined only in RESET and ACTIVATE protocol messages. Due to an ambiguity in the protocol specification it is currently also transferred in STATE messages. Its value is used to initialize a link state variable, 'permit_changeover', which is used to inhibit futile link failover attempts when it is known that the peer node has no working links at the moment, although the local node may still think it has one. The fact that 'has_redundant_link' incorrectly is read from STATE messages has the effect that 'permit_changeover' sometimes gets a wrong value, and permanently blocks any links from being re-established. Such failures can only occur in in dual-link systems, and are extremely rare. This bug seems to have always been present in the code. Furthermore, since commit `b4b5610223` ("tipc: Ensure both nodes recognize loss of contact between them"), the 'permit_changeover' field serves no purpose any more. The task of enforcing 'lost contact' cycles at both peer endpoints is now taken by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN in struct tipc_node to abort unnecessary failover attempts. We therefore remove the 'has_redundant_link' flag from STATE messages, as well as the now redundant 'permit_changeover' variable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
Jon Paul Maloy	170b3927b4	tipc: rename functions related to link failover and improve comments The functionality related to link addition and failover is unnecessarily hard to understand and maintain. We try to improve this by renaming some of the functions, at the same time adding or improving the explanatory comments around them. Names such as "tipc_rcv()" etc. also align better with what is used in other networking components. The changes in this commit are purely cosmetic, no functional changes are made. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
David S. Miller	a04c0e2c0d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains two patches: * fix the IRC NAT helper which was broken when adding (incomplete) IPv6 support, from Daniel Borkmann. * Refine the previous bugtrap that Jesper added to catch problems for the usage of the sequence adjustment extension in IPVs in Dec 16th, it may spam messages in case of finding a real bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:38:17 -05:00
Daniel Borkmann	be7928d20b	net: xfrm: xfrm_policy: fix inline not at beginning of declaration Fix three warnings related to: net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] Just removing the inline keyword is sufficient as the compiler will decide on its own about inlining or not. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:34:00 -05:00
Patrick McHardy	9638f33ecf	netfilter: nft_ct: load both IPv4 and IPv6 conntrack modules for NFPROTO_INET The ct expression can currently not be used in the inet family since we don't have a conntrack module for NFPROTO_INET, so nf_ct_l3proto_try_module_get() fails. Add some manual handling to load the modules for both NFPROTO_IPV4 and NFPROTO_IPV6 if the ct expression is used in the inet family. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:32 +01:00
Patrick McHardy	4566bf2706	netfilter: nft_meta: add l4proto support For L3-proto independant rules we need to get at the L4 protocol value directly. Add it to the nft_pktinfo struct and use the meta expression to retrieve it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:31 +01:00
Patrick McHardy	124edfa9e0	netfilter: nf_tables: add nfproto support to meta expression Needed by multi-family tables to distinguish IPv4 and IPv6 packets. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:30 +01:00
Patrick McHardy	1d49144c0a	netfilter: nf_tables: add "inet" table for IPv4/IPv6 This patch adds a new table family and a new filter chain that you can use to attach IPv4 and IPv6 rules. This should help to simplify rule-set maintainance in dual-stack setups. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:25 +01:00
Patrick McHardy	115a60b173	netfilter: nf_tables: add support for multi family tables Add support to register chains to multiple hooks for different address families for mixed IPv4/IPv6 tables. Signed-off-by: Patrick McHardy <kaber@trash.net>	2014-01-07 23:55:46 +01:00
Patrick McHardy	c9484874e7	netfilter: nf_tables: add hook ops to struct nft_pktinfo Multi-family tables need the AF from the hook ops. Add a pointer to the hook ops and replace usage of the hooknum member in struct nft_pktinfo. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:50:43 +01:00
Patrick McHardy	3b088c4bc0	netfilter: nf_tables: make chain types override the default AF functions Currently the AF-specific hook functions override the chain-type specific hook functions. That doesn't make too much sense since the chain types are a special case of the AF-specific hooks. Make the AF-specific hook functions the default and make the optional chain type hooks override them. As a side effect, the necessary code restructuring reduces the code size, f.i. in case of nf_tables_ipv4.o: nf_tables_ipv4_init_net \| -24 nft_do_chain_ipv4 \| -113 2 functions changed, 137 bytes removed, diff: -137 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:50:43 +01:00
Pablo Neira Ayuso	688d18636f	netfilter: nft_reject: fix compilation warning if NF_TABLES_IPV6 is disabled net/netfilter/nft_reject.c: In function 'nft_reject_eval': net/netfilter/nft_reject.c:37:14: warning: unused variable 'net' [-Wunused-variable] Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:50:43 +01:00
Jerry Chu	bf5a755f5e	net-gre-gro: Add GRE support to the GRO stack This patch built on top of Commit `299603e837` ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO stack. It also serves as an example for supporting other encapsulation protocols in the GRO stack in the future. The patch supports version 0 and all the flags (key, csum, seq#) but will flush any pkt with the S (seq#) flag. This is because the S flag is not support by GSO, and a GRO pkt may end up in the forwarding path, thus requiring GSO support to break it up correctly. Currently the "packet_offload" structure only contains L3 (ETH_P_IP/ ETH_P_IPV6) GRO offload support so the encapped pkts are limited to IP pkts (i.e., w/o L2 hdr). But support for other protocol type can be easily added, so is the support for GRE variations like NVGRE. The patch also support csum offload. Specifically if the csum flag is on and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE), the code will take advantage of the csum computed by the h/w when validating the GRE csum. Note that commit `60769a5dcd` "ipv4: gre: add GRO capability" already introduces GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. But GRO is done after GRE hdr has been removed (i.e., decapped). The following patch applies GRO when pkts first come in (before hitting the GRE tunnel code). There is some performance advantage for applying GRO as early as possible. Also this approach is transparent to other subsystem like Open vSwitch where GRE decap is handled outside of the IP stack hence making it harder for the gro_cells stuff to apply. On the other hand, some NICs are still not capable of hashing on the inner hdr of a GRE pkt (RSS). In that case the GRO processing of pkts from the same remote host will all happen on the same CPU and the performance may be suboptimal. I'm including some rough preliminary performance numbers below. Note that the performance will be highly dependent on traffic load, mix as usual. Moreover it also depends on NIC offload features hence the following is by no means a comprehesive study. Local testing and tuning will be needed to decide the best setting. All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs. (super_netperf 50 -H 192.168.1.18 -l 30) An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1 mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123) is configured. The GRO support for pkts AFTER decap are controlled through the device feature of the GRE device (e.g., ethtool -K gre1 gro on/off). 1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off thruput: 9.16Gbps CPU utilization: 19% 1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off thruput: 5.9Gbps CPU utilization: 15% 1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 12-13% 1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 10% The following tests were performed on a different NIC that is capable of csum offload. I.e., the h/w is capable of computing IP payload csum (CHECKSUM_COMPLETE). 2.1 ethtool -K gre1 gro on (hence will use gro_cells) 2.1.1 ethtool -K eth0 gro off; csum offload disabled thruput: 8.53Gbps CPU utilization: 9% 2.1.2 ethtool -K eth0 gro off; csum offload enabled thruput: 8.97Gbps CPU utilization: 7-8% 2.1.3 ethtool -K eth0 gro on; csum offload disabled thruput: 8.83Gbps CPU utilization: 5-6% 2.1.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.98Gbps CPU utilization: 5% 2.2 ethtool -K gre1 gro off 2.2.1 ethtool -K eth0 gro off; csum offload disabled thruput: 5.93Gbps CPU utilization: 9% 2.2.2 ethtool -K eth0 gro off; csum offload enabled thruput: 5.62Gbps CPU utilization: 8% 2.2.3 ethtool -K eth0 gro on; csum offload disabled thruput: 7.69Gbps CPU utilization: 8% 2.2.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.96Gbps CPU utilization: 5-6% Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 16:21:31 -05:00
Benjamin Poirier	cdb3f4a31b	net: Do not enable tx-nocache-copy by default There are many cases where this feature does not improve performance or even reduces it. For example, here are the results from tests that I've run using 3.12.6 on one Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are from the Xeon, but they're similar on the i7. All numbers report the mean±stddev over 10 runs of 10s. 1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache copy from user on transmit" There is no statistically significant difference between tx-nocache-copy on/off. nic irqs spread out (one queue per cpu) 200x netperf -r 1400,1 tx-nocache-copy off 692000±1000 tps 50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3 tx-nocache-copy on 693000±1000 tps 50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7 200x netperf -r 14000,14000 tx-nocache-copy off 86450±80 tps 50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40 tx-nocache-copy on 86110±60 tps 50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20 2) single stream throughput tests tx-nocache-copy leads to higher service demand throughput cpu0 cpu1 demand (Gb/s) (Gcycle) (Gcycle) (cycle/B) nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send) tx-nocache-copy off 9402±5 9.4±0.2 0.80±0.01 tx-nocache-copy on 9403±3 9.85±0.04 0.838±0.004 nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send) tx-nocache-copy off 9401±5 5.83±0.03 5.0±0.1 0.923±0.007 tx-nocache-copy on 9404±2 5.74±0.03 5.523±0.009 0.958±0.002 As a second example, here are some results from Eric Dumazet with latest net-next. tx-nocache-copy also leads to higher service demand (cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz) lpq83:~# ./ethtool -K eth0 tx-nocache-copy on lpq83:~# perf stat ./netperf -H lpq84 -c MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000 Performance counter stats for './netperf -H lpq84 -c': 4282.648396 task-clock # 0.423 CPUs utilized 9,348 context-switches # 0.002 M/sec 88 CPU-migrations # 0.021 K/sec 355 page-faults # 0.083 K/sec 11,812,797,651 cycles # 2.758 GHz [82.79%] 9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%] 4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%] 6,053,172,792 instructions # 0.51 insns per cycle # 1.49 stalled cycles per insn [83.64%] 597,275,583 branches # 139.464 M/sec [83.70%] 8,960,541 branch-misses # 1.50% of all branches [83.65%] 10.128990264 seconds time elapsed lpq83:~# ./ethtool -K eth0 tx-nocache-copy off lpq83:~# perf stat ./netperf -H lpq84 -c MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000 Performance counter stats for './netperf -H lpq84 -c': 2847.375441 task-clock # 0.281 CPUs utilized 11,632 context-switches # 0.004 M/sec 49 CPU-migrations # 0.017 K/sec 354 page-faults # 0.124 K/sec 7,646,889,749 cycles # 2.686 GHz [83.34%] 6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%] 1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%] 2,079,702,453 instructions # 0.27 insns per cycle # 2.94 stalled cycles per insn [83.22%] 363,773,213 branches # 127.757 M/sec [83.29%] 4,242,732 branch-misses # 1.17% of all branches [83.51%] 10.128449949 seconds time elapsed CC: Tom Herbert <therbert@google.com> Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 16:20:19 -05:00
Erik Hugne	732256b933	tipc: correctly unlink packets from deferred packet queue When we pull a received packet from a link's 'deferred packets' queue for processing, its 'next' pointer is not cleared, and still refers to the next packet in that queue, if any. This is incorrect, but caused no harm before commit `40ba3cdf54` ("tipc: message reassembly using fragment chain") was introduced. After that commit, it may sometimes lead to the following oops: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: tipc CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W 3.13.0-rc2+ #6 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880017af4880 ti: ffff880017aee000 task.ti: ffff880017aee000 RIP: 0010:[<ffffffff81710694>] [<ffffffff81710694>] skb_try_coalesce+0x44/0x3d0 RSP: 0018:ffff880016603a78 EFLAGS: 00010212 RAX: 6b6b6b6bd6d6d6d6 RBX: ffff880013106ac0 RCX: ffff880016603ad0 RDX: ffff880016603ad7 RSI: ffff88001223ed00 RDI: ffff880013106ac0 RBP: ffff880016603ab8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88001223ed00 R13: ffff880016603ad0 R14: 000000000000058c R15: ffff880012297650 FS: 0000000000000000(0000) GS:ffff880016600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000805b000 CR3: 0000000011f5d000 CR4: 00000000000006e0 Stack: ffff880016603a88 ffffffff810a38ed ffff880016603aa8 ffff88001223ed00 0000000000000001 ffff880012297648 ffff880016603b68 ffff880012297650 ffff880016603b08 ffffffffa0006c51 ffff880016603b08 00ffffffa00005fc Call Trace: <IRQ> [<ffffffff810a38ed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa0006c51>] tipc_link_recv_fragment+0xd1/0x1b0 [tipc] [<ffffffffa0007214>] tipc_recv_msg+0x4e4/0x920 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffffa000177c>] tipc_l2_rcv_msg+0xcc/0x250 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffff8171e65b>] __netif_receive_skb_core+0x80b/0xd00 [<ffffffff8171df94>] ? __netif_receive_skb_core+0x144/0xd00 [<ffffffff8171eb76>] __netif_receive_skb+0x26/0x70 [<ffffffff8171ed6d>] netif_receive_skb+0x2d/0x200 [<ffffffff8171fe70>] napi_gro_receive+0xb0/0x130 [<ffffffff815647c2>] e1000_clean_rx_irq+0x2c2/0x530 [<ffffffff81565986>] e1000_clean+0x266/0x9c0 [<ffffffff81985f7b>] ? notifier_call_chain+0x2b/0x160 [<ffffffff8171f971>] net_rx_action+0x141/0x310 [<ffffffff81051c1b>] __do_softirq+0xeb/0x480 [<ffffffff819817bb>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810b8c42>] ? handle_fasteoi_irq+0x72/0x100 [<ffffffff81052346>] irq_exit+0x96/0xc0 [<ffffffff8198cbc3>] do_IRQ+0x63/0xe0 [<ffffffff81981def>] common_interrupt+0x6f/0x6f <EOI> This happens when the last fragment of a message has passed through the the receiving link's 'deferred packets' queue, and at least one other packet was added to that queue while it was there. After the fragment chain with the complete message has been successfully delivered to the receiving socket, it is released. Since 'next' pointer of the last fragment in the released chain now is non-NULL, we get the crash shown above. We fix this by clearing the 'next' pointer of all received packets, including those being pulled from the 'deferred' queue, before they undergo any further processing. Fixes: `40ba3cdf54` ("tipc: message reassembly using fragment chain") Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 16:15:24 -05:00
J. Bruce Fields	bba0f88bf7	minor svcauth_gss.c cleanup	2014-01-07 16:01:16 -05:00
Jiri Pirko	dfd1582d1e	ipv4: loopback device: ignore value changes after device is upped When lo is brought up, new ifa is created. Then, devconf and neigh values bitfield should be set so later changes of default values would not affect lo values. Note that the same behaviour is in ipv6. Also note that this is likely not an issue in many distros (for example Fedora 19) because userspace sets address to lo manually before bringing it up. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 15:55:17 -05:00
FX Le Bail	509aba3b0d	IPv6: add the option to use anycast addresses as source addresses in echo reply This change allows to follow a recommandation of RFC4942. - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses as source addresses for ICMPv6 echo reply. This sysctl is false by default to preserve existing behavior. - Add inline check ipv6_anycast_destination(). - Use them in icmpv6_echo_reply(). Reference: RFC4942 - IPv6 Transition/Coexistence Security Considerations (http://tools.ietf.org/html/rfc4942#section-2.1.6) 2.1.6. Anycast Traffic Identification and Security [...] To avoid exposing knowledge about the internal structure of the network, it is recommended that anycast servers now take advantage of the ability to return responses with the anycast address as the source address if possible. Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 15:51:39 -05:00
Li RongQing	657e5d1965	ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.c initialise pcpu_tstats.syncp to kill the calltrace [ 11.973950] Call Trace: [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 Before `469bdcefdc` ("ipv6: fix the use of pcpu_tstats in ip6_vti.c"), the pcpu_tstats.syncp is not used to pretect the 64bit elements of pcpu_tstats, so not appear this calltrace. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 14:12:46 -05:00
Thierry Escande	b711ad524b	NFC: digital: Set rf tech and crc functions when receiving a PSL_REQ This patch sets the correct rf tech value and crc functions in target mode when receiving a PSL_REQ, as done when receiving an ATR_REQ. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 18:48:12 +01:00
Thierry Escande	48e1044515	NFC: digital: Set current target active on activate_target() call The curr_protocol field of nfc_digital_dev structure used to determine if a target is currently active was set too soon, immediately when a target is found. This is not good since there is no other way than deactivate_target() to reset curr_protocol and if activate_target() is not called, the target remains active and it's not possible to put the device in poll mode anymore. With this patch curr_protocol is set when nfc core activates a target, puts a device up, or when an ATR_REQ is received in target mode. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 18:48:12 +01:00
Heikki Krogerus	2111cad655	net: rfkill: gpio: convert to descriptor-based GPIO interface Convert to the safer gpiod_* family of API functions. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Stephen Warren <swarren@nvidia.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2014-01-07 18:45:20 +01:00
Emmanuel Grumbach	349b196044	mac80211: allow to set smps mode to OFF in AP mode In managed mode, we should not ask for OFF mode because the power settings may still require DYNAMIC. In AP mode, this should be allowed since the default settings is OFF and AUTOMATIC is not allowed. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:25:49 +01:00
Johannes Berg	3c2723f503	mac80211: clean up prepare_for_handlers() return value Using an int with 0/1 is not very common, make the function return a bool instead with the same values (false/true). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:23:24 +01:00
Emmanuel Grumbach	40791942ec	mac80211: simplify code in ieee80211_prepare_and_rx_handle No need to assign the return value of prepare_for_handlers to a variable if the only usage is to test it. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:22:15 +01:00
Emmanuel Grumbach	87ee475ef6	mac80211: clean up garbage in comment Not clear how this landed here. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:21:56 +01:00
Masanari Iida	8faaaead62	treewide: fix comments and printk msgs This patch fixed several typo in printk from various part of kernel source. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-01-07 15:06:07 +01:00
Claudio Takahasi	e825eb1d7e	Bluetooth: Fix 6loWPAN peer lookup This patch fixes peer address lookup for 6loWPAN over Bluetooth Low Energy links. ADDR_LE_DEV_PUBLIC, and ADDR_LE_DEV_RANDOM are the values allowed for "dst_type" field in the hci_conn struct for LE links. Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2014-01-07 11:32:15 -02:00
Claudio Takahasi	b071a62099	Bluetooth: Fix setting Universal/Local bit This patch fixes the Bluetooth Low Energy Address type checking when setting Universal/Local bit for the 6loWPAN network device or for the peer device connection. ADDR_LE_DEV_PUBLIC or ADDR_LE_DEV_RANDOM are the values allowed for "src_type" and "dst_type" in the hci_conn struct. The Bluetooth link type can be obtainned reading the "type" field in the same struct. Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2014-01-07 11:32:11 -02:00
Eric Dumazet	438e38fadc	gre_offload: statically build GRE offloading support GRO/GSO layers can be enabled on a node, even if said node is only forwarding packets. This patch permits GSO (and upcoming GRO) support for GRE encapsulated packets, even if the host has no GRE tunnel setup. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 20:28:34 -05:00
David S. Miller	39b6b2992f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== [GIT net-next] Open vSwitch Open vSwitch changes for net-next/3.14. Highlights are: * Performance improvements in the mechanism to get packets to userspace using memory mapped netlink and skb zero copy where appropriate. * Per-cpu flow stats in situations where flows are likely to be shared across CPUs. Standard flow stats are used in other situations to save memory and allocation time. * A handful of code cleanups and rationalization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 19:48:38 -05:00
Amitkumar Karwar	22c15bf30b	NFC: NCI: Add set_config API This API can be used by drivers to send their custom configuration using SET_CONFIG NCI command to the device. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 01:32:40 +01:00
Amitkumar Karwar	86e8586ed5	NFC: NCI: Add setup handler Some drivers require special configuration while initializing. This patch adds setup handler for this custom configuration. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 01:32:40 +01:00
Amitkumar Karwar	1907299867	NFC: NCI: Don't reverse local general bytes Local general bytes returned by nfc_get_local_general_bytes() are already in correct order. We don't need to reverse them. Remove local_gb[] local array as it's not needed any more. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 01:32:40 +01:00
Stephen Hemminger	443cd88c8a	ovs: make functions local Several functions and datastructures could be local Found with 'make namespacecheck' Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:54:39 -08:00
Thomas Graf	09c5e6054e	openvswitch: Compute checksum in skb_gso_segment() if needed The copy & csum optimization is no longer present with zerocopy enabled. Compute the checksum in skb_gso_segment() directly by dropping the HW CSUM capability from the features passed in. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:24 -08:00
Thomas Graf	bda56f143c	openvswitch: Use skb_zerocopy() for upcall Use of skb_zerocopy() can avoid the expensive call to memcpy() when copying the packet data into the Netlink skb. Completes checksum through skb_checksum_help() if not already done in GSO segmentation. Zerocopy is only performed if user space supported unaligned Netlink messages. memory mapped netlink i/o is preferred over zerocopy if it is set up. Cost of upcall is significantly reduced from: + 7.48% vhost-8471 [k] memcpy + 5.57% ovs-vswitchd [k] memcpy + 2.81% vhost-8471 [k] csum_partial_copy_generic to: + 5.72% ovs-vswitchd [k] memcpy + 3.32% vhost-5153 [k] memcpy + 0.68% vhost-5153 [k] skb_zerocopy (megaflows disabled) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:17 -08:00
Thomas Graf	8055a89cfa	openvswitch: Pass datapath into userspace queue functions Allows removing the net and dp_ifindex argument and simplify the code. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:07 -08:00
Thomas Graf	44da5ae5fb	openvswitch: Drop user features if old user space attempted to create datapath Drop user features if an outdated user space instance that does not understand the concept of user_features attempted to create a new datapath. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:00 -08:00
Thomas Graf	43d4be9cb5	openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:53 -08:00
Thomas Graf	af2806f8f9	net: Export skb_zerocopy() to zerocopy from one skb to another Make the skb zerocopy logic written for nfnetlink queue available for use by other modules. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:42 -08:00
Wei Yongjun	5f03f47c9c	openvswitch: remove duplicated include from flow_table.c Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:35 -08:00
Daniel Borkmann	11d6c461b3	net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback} As we're only doing a kfree() anyway in the RCU callback, we can simply use kfree_rcu, which does the same job, and remove the function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:30 -08:00
Pravin B Shelar	e298e50570	openvswitch: Per cpu flow stats. With mega flow implementation ovs flow can be shared between multiple CPUs which makes stats updates highly contended operation. This patch uses per-CPU stats in cases where a flow is likely to be shared (if there is a wildcard in the 5-tuple and therefore likely to be spread by RSS). In other situations, it uses the current strategy, saving memory and allocation time. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:24 -08:00
Thomas Graf	795449d8b8	openvswitch: Enable memory mapped Netlink i/o Use memory mapped Netlink i/o for all unicast openvswitch communication if a ring has been set up. Benchmark * pktgen -> ovs internal port * 5M pkts, 5M flows * 4 threads, 8 cores Before: Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags) 74163pps 5339Mb/sec (5339736000bps) errors: 0 + 2.98% ovs-vswitchd [k] copy_user_generic_string + 2.49% ovs-vswitchd [k] memcpy + 1.84% kpktgend_2 [k] memcpy + 1.81% kpktgend_1 [k] memcpy + 1.81% kpktgend_3 [k] memcpy + 1.78% kpktgend_0 [k] memcpy After: Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags) 206358pps 14857Mb/sec (14857776000bps) errors: 0 + 2.80% ovs-vswitchd [k] memcpy + 1.31% kpktgend_2 [k] memcpy + 1.23% kpktgend_0 [k] memcpy + 1.09% kpktgend_1 [k] memcpy + 1.04% kpktgend_3 [k] memcpy + 0.96% ovs-vswitchd [k] copy_user_generic_string Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:12 -08:00
Thomas Graf	aae9f0e22c	netlink: Avoid netlink mmap alloc if msg size exceeds frame size An insufficent ring frame size configuration can lead to an unnecessary skb allocation for every Netlink message. Check frame size before taking the queue lock and allocating the skb and re-check with lock to be safe. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:06 -08:00
Thomas Graf	bb9b18fb55	genl: Add genlmsg_new_unicast() for unicast message allocation Allocates a new sk_buff large enough to cover the specified payload plus required Netlink headers. Will check receiving socket for memory mapped i/o capability and use it if enabled. Will fall back to non-mapped skb if message size exceeds the frame size of the ring. Signed-of-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:53 -08:00
Jesse Gross	663efa3696	openvswitch: Silence RCU lockdep checks from flow lookup. Flow lookup can happen either in packet processing context or userspace context but it was annotated as requiring RCU read lock to be held. This also allows OVS mutex to be held without causing warnings. Reported-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Thomas Graf <tgraf@redhat.com>	2014-01-06 15:51:48 -08:00
Andy Zhou	5bb506324d	openvswitch: Change ovs_flow_tbl_lookup_xx() APIs API changes only for code readability. No functional chnages. This patch removes the underscored version. Added a new API ovs_flow_tbl_lookup_stats() that returns the n_mask_hits. Reported by: Ben Pfaff <blp@nicira.com> Reviewed-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:41 -08:00
Ben Pfaff	8f49ce1135	openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit). We won't normally have a ton of flow masks but using a size_t to store values no bigger than sizeof(struct sw_flow_key) seems excessive. This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit systems. On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but sw_flow_mask only by 8 bytes due to padding. Compile tested only. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:27 -08:00
Ben Pfaff	d1211908b9	openvswitch: Correct comment. Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:21 -08:00
David S. Miller	56a4342dfe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 17:37:45 -05:00
Gianluca Anzolin	f86772af6a	Bluetooth: Remove rfcomm_carrier_raised() Remove the rfcomm_carrier_raised() definition as that function isn't used anymore. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Gianluca Anzolin	4a2fb3ecc7	Bluetooth: Always wait for a connection on RFCOMM open() This patch fixes two regressions introduced with the recent rfcomm tty rework. The current code uses the carrier_raised() method to wait for the bluetooth connection when a process opens the tty. However processes may open the port with the O_NONBLOCK flag or set the CLOCAL termios flag: in these cases the open() syscall returns immediately without waiting for the bluetooth connection to complete. This behaviour confuses userspace which expects an established bluetooth connection. The patch restores the old behaviour by waiting for the connection in rfcomm_dev_activate() and removes carrier_raised() from the tty_port ops. As a side effect the new code also fixes the case in which the rfcomm tty device is created with the flag RFCOMM_REUSE_DLC: the old code didn't call device_move() and ModemManager skipped the detection probe. Now device_move() is always called inside rfcomm_dev_activate(). Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com> Reported-by: Beson Chow <blc+bluez@mail.vanade.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Gianluca Anzolin	e228b63390	Bluetooth: Move rfcomm_get_device() before rfcomm_dev_activate() This is a preparatory patch which moves the rfcomm_get_device() definition before rfcomm_dev_activate() where it will be used. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Gianluca Anzolin	5b89924187	Bluetooth: Release RFCOMM port when the last user closes the TTY This patch fixes a userspace regression introduced by the commit `29cd718b`. If the rfcomm device was created with the flag RFCOMM_RELEASE_ONHUP the user space expects that the tty_port is released as soon as the last process closes the tty. The current code attempts to release the port in the function rfcomm_dev_state_change(). However it won't get a reference to the relevant tty to send a HUP: at that point the tty is already destroyed and therefore NULL. This patch fixes the regression by taking over the tty refcount in the tty install method(). This way the tty_port is automatically released as soon as the tty is destroyed. As a consequence the check for RFCOMM_RELEASE_ONHUP flag in the hangup() method is now redundant. Instead we have to be careful with the reference counting in the rfcomm_release_dev() function. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reported-by: Alexander Holler <holler@ahsoftware.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Jamal Hadi Salim	805c1f4aed	net_sched: act: action flushing missaccounting action flushing missaccounting Account only for deleted actions Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:46:56 -05:00
Jamal Hadi Salim	63acd6807c	net_sched: Remove unnecessary checks for act->ops Remove unnecessary checks for act->ops (suggested by Eric Dumazet). Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:46:32 -05:00
sfeldma@cumulusnetworks.com	fbf2671bb8	bridge: use DEVICE_ATTR_xx macros Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:40:46 -05:00
Curt Brune	fe0d692bbc	bridge: use spin_lock_bh() in br_multicast_set_hash_max br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune <curt@cumulusnetworks.com> Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:39:47 -05:00
Eric Dumazet	996b175e39	tcp: out_of_order_queue do not use its lock TCP out_of_order_queue lock is not used, as queue manipulation happens with socket lock held and we therefore use the lockless skb queue routines (as __skb_queue_head()) We can use __skb_queue_head_init() instead of skb_queue_head_init() to make this more consistent. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:34:34 -05:00
Hannes Frederic Sowa	88ad31491e	ipv6: don't install anycast address for /128 addresses on routers It does not make sense to create an anycast address for an /128-prefix. Suppress it. As `32019e651c` ("ipv6: Do not leave router anycast address for /127 prefixes.") shows we also may not leave them, because we could accidentally remove an anycast address the user has allocated or got added via another prefix. Cc: François-Xavier Le Bail <fx.lebail@yahoo.com> Cc: Thomas Haller <thaller@redhat.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:32:43 -05:00
Jeff Layton	0fdc26785d	sunrpc: get rid of use_gssp_lock We can achieve the same result with a cmpxchg(). This also fixes a potential race in use_gss_proxy(). The value of sn->use_gss_proxy could go from -1 to 1 just after we check it in use_gss_proxy() but before we acquire the spinlock. The procfile write would end up returning success but the value would flip to 0 soon afterward. With this method we not only avoid locking but the first "setter" always wins. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-01-06 15:14:18 -05:00
Jeff Layton	a92e5eb110	sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt An nfsd thread can call use_gss_proxy and find it set to '1' but find gssp_clnt still NULL, so that when it attempts the upcall the result will be an unnecessary -EIO. So, ensure that gssp_clnt is created first, and set the use_gss_proxy variable only if that succeeds. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-01-06 15:14:17 -05:00
Jeff Layton	1654a04cd7	sunrpc: don't wait for write before allowing reads from use-gss-proxy file It doesn't make much sense to make reads from this procfile hang. As far as I can tell, only gssproxy itself will open this file and it never reads from it. Change it to just give the present setting of sn->use_gss_proxy without waiting for anything. Note that we do not want to call use_gss_proxy() in this codepath since an inopportune read of this file could cause it to be disabled prematurely. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-01-06 15:14:16 -05:00
Vijay Subramanian	d4b36210c2	net: pkt_sched: PIE AQM scheme Proportional Integral controller Enhanced (PIE) is a scheduler to address the bufferbloat problem. >From the IETF draft below: " Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real time video streaming and financial transactions) run in the Internet, high latency and jitter degrade application performance. There is a pressing need to design intelligent queue management schemes that can control latency and jitter; and hence provide desirable quality of service to users. We present here a lightweight design, PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design does not require per-packet timestamp, so it incurs very small overhead and is simple enough to implement in both hardware and software. " Many thanks to Dave Taht for extensive feedback, reviews, testing and suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and suggestions. Naeem Khademi and Dave Taht independently contributed to ECN support. For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. A copy of the paper can be found at ftp://ftpeng.cisco.com/pie/. Please also refer to the IETF draft submission at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/. For problems with the iproute2/tc or Linux kernel code, please contact Vijay Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu (mysuryan@cisco.com) Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Mythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 15:13:01 -05:00
John W. Linville	a50f9d5ee0	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-01-06 14:19:18 -05:00
Thomas Pedersen	057d5f4ba1	mac80211: sync dtim_count to TSF On starting a mesh or AP BSS, the interface dtim_count countdown should match that of the driver TSF. Signed-off-by: Thomas Pedersen <twpedersen@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 20:10:47 +01:00
John W. Linville	9d1cd503c7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2014-01-06 14:08:41 -05:00
Ujjal Roy	60a4fe0ae9	cfg80211: fix wext-compat for getting retry value While getting the retry limit, wext-compat returns the value without updating the flag for retry->flags is 0. Also in this case, it updates long retry flag when short and long retry value are unequal. So, iwconfig never showing "Retry short limit" and showing "Retry long limit" when both values are unequal. Updated the flags and corrected the condition properly. Signed-off-by: Ujjal Roy <royujjal@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 20:00:12 +01:00
David S. Miller	83111e7fe8	netfilter: Fix build failure in nfnetlink_queue_core.c. net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid': net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function) net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1 Just a missing include of net/tcp_states.h Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 13:36:06 -05:00
David S. Miller	9aa28f2b71	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: <pablo@netfilter.org> ==================== nftables updates for net-next The following patchset contains nftables updates for your net-next tree, they are: * Add set operation to the meta expression by means of the select_ops() infrastructure, this allows us to set the packet mark among other things. From Arturo Borrero Gonzalez. * Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel Borkmann. * Add new queue expression to nf_tables. These comes with two previous patches to prepare this new feature, one to add mask in nf_tables_core to evaluate the queue verdict appropriately and another to refactor common code with xt_NFQUEUE, from Eric Leblond. * Do not hide nftables from Kconfig if nfnetlink is not enabled, also from Eric Leblond. * Add the reject expression to nf_tables, this adds the missing TCP RST support. It comes with an initial patch to refactor common code with xt_NFQUEUE, again from Eric Leblond. * Remove an unused variable assignment in nf_tables_dump_set(), from Michal Nazarewicz. * Remove the nft_meta_target code, now that Arturo added the set operation to the meta expression, from me. * Add help information for nf_tables to Kconfig, also from me. * Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is available to other nf_tables objects, requested by Arturo, from me. * Expose the table usage counter, so we can know how many chains are using this table without dumping the list of chains, from Tomasz Bursztyka. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 13:29:30 -05:00

... 4 5 6 7 8 ...

31568 Commits