linux

Author	SHA1	Message	Date
Nikolay Aleksandrov	fe9ef3ce39	net: ipmr: make ip_mroute_getsockopt more understandable Use a switch to determine if optname is correct and set val accordingly. This produces a much more straight-forward and readable code. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 15:06:38 -05:00
Nikolay Aleksandrov	7ef8f65df9	net: ipmr: fix code and comment style Trivial code and comment style fixes, also removed some extra newlines, spaces and tabs. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 15:06:38 -05:00
Nikolay Aleksandrov	c316c629f1	net: ipmr: remove some pimsm ifdefs and simplify Add the helper pimsm_enabled() which replaces the old CONFIG_IP_PIMSM define and is used to check if any version of PIM-SM has been enabled. Use a single if defined(CONFIG_IP_PIMSM_V1) \|\| defined(CONFIG_IP_PIMSM_V2) for the pim-sm shared code. This is okay w.r.t IGMPMSG_WHOLEPKT because only a VIFF_REGISTER device can send such packet, and it can't be created if pimsm_enabled() is false. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 15:06:38 -05:00
Nikolay Aleksandrov	f3d431810e	net: ipmr: always define mroute_reg_vif_num Before mroute_reg_vif_num was defined only if any of the CONFIG_PIMSM_ options were set, but that's not really necessary as the size of the struct is the same in both cases (checked with pahole, both cases size is 3256 bytes) and we can remove some unnecessary ifdefs to simplify the code. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 15:06:37 -05:00
Nikolay Aleksandrov	1113ebbcf9	net: ipmr: move the tbl id check in ipmr_new_table Move the table id check in ipmr_new_table and make it return error pointer. We need this change for the upcoming netlink table manipulation support in order to avoid code duplication and a race condition. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 15:06:37 -05:00
David S. Miller	930d3142b8	Merge branch 'rhashtable-test-enhancements' Phil Sutter says: ==================== improve fault-tolerance of rhashtable runtime-test The following series aims to improve lib/test_rhashtable in different situations: Patch 1 allows the kernel to reschedule so the test does not block too long on slow systems. Patch 2 fixes behaviour under pressure, retrying inserts in non-permanent error case (-EBUSY). Patch 3 auto-adjusts the upper table size limit according to the number of threads (in concurrency test). In fact, the current default is already too small. Patch 4 makes it possible to retry inserts even in supposedly permanent error case (-ENOMEM) to expose rhashtable's remaining problem of -ENOMEM being not as permanent as it is expected to be. Changes since v1: - Introduce insert_retry() which is then used in single-threaded test as well. - Do not retry inserts by default if -ENOMEM was returned. - Rename the retry counter to be a bit more verbose about what it contains. - Add patch 4 as a debugging aid. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 12:36:09 -05:00
Phil Sutter	d662e037fc	rhashtable-test: allow to retry even if -ENOMEM was returned This is rather a hack to expose the current issue with rhashtable to under high pressure sometimes return -ENOMEM even though system memory is not exhausted and a consecutive insert may succeed. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 12:36:08 -05:00
Phil Sutter	95e435afef	rhashtable-test: calculate max_entries value by default A maximum table size of 64k entries is insufficient for the multiple threads test even in default configuration (10 threads * 50000 objects = 500000 objects in total). Since we know how many objects will be inserted, calculate the max size unless overridden by parameter. Note that specifying the exact number of objects upon table init won't suffice as that value is being rounded down to the next power of two - anticipate this by rounding up to the next power of two in beforehand. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 12:36:08 -05:00
Phil Sutter	9e9089e5a2	rhashtable-test: retry insert operations After adding cond_resched() calls to threadfunc(), a surprisingly high rate of insert failures occurred probably due to table resizes getting a better chance to run in background. To not soften up the remaining tests, retry inserts until they either succeed or fail permanently. Also change the non-threaded test to retry insert operations, too. Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 12:36:08 -05:00
Phil Sutter	cd5b318daf	rhashtable-test: add cond_resched() to thread test This should fix for soft lockup bugs triggered on slow systems. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 12:36:08 -05:00
David S. Miller	3d40e44361	Merge branch 'dsa-gpio-reset' Andrew Lunn says: ==================== DSA: GPIO to reset switches These two patches add support for using a GPIO to hard reset a switch during reset. v2: Thanks to a clue from Neil Armstrong, i figured out how to convert the gpio into a gpiod, while keeping the ACTIVE_LOW flag, so simplifiying the set/reset code. I have not included the Tested-by: from Phil Reid, since i made a lot of changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 11:53:11 -05:00
Andrew Lunn	c8c1b39a86	dsa: mv88e6xxx.c: Hardware reset the chip if available The device tree binding now allows a gpio to be specified which is attached to the switch chips reset line. If it is defined, perform a hardware reset on the switch during setup. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 11:53:10 -05:00
Andrew Lunn	cc30c16344	net: dsa: Add support for a switch reset gpio Some boards have a gpio line tied to the switch reset pin. Allow this gpio to be retrieved from the device tree, and take the switch out of reset before performing the probe. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-23 11:53:10 -05:00
Saurabh Sengar	3f8c0f7efb	gianfar: use of_property_read_bool() use of_property_read_bool() for testing bool property Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-22 20:47:14 -05:00
Yuval Mintz	5e091e7ad0	bnx2x: Utilize FW 7.13.1.0. Commit 46e8a249423ff "bnx2x: Add FW 7.13.1.0" added said .bin FW to linux-firmware; This patch incorporates the FW in the bnx2x driver. This introduces 2 fixes/enhancements: - In some management protocols there are outer-vlan configurations that can be dynamically changed while device is running. This fixes some corner cases where such a change did not take effect. - Prevent VFs from sending MAC control frames; FW would treat a VF sending such a packet as malicious and block any further communication done by the VF. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-22 12:19:06 -05:00
David Ahern	b811580d91	net: IPv6 fib lookup tracepoint Add tracepoint to show fib6 table lookups and result. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-22 11:54:10 -05:00
Eric Dumazet	e2f9dc3bd2	net: avoid NULL deref in napi_get_frags() napi_alloc_skb() can return NULL. We should not crash should this happen. Fixes: `93f93a4404` ("net: move skb_mark_napi_id() into core networking stack") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 16:43:14 -05:00
Simon Horman	b3d39a8805	ravb: use clock rate as basis for GTI.TIV The GTI.TIV may be set to 2GHz^2 / rate, where rate is that of the clock of the device. Rather than assuming a rate of 130MHz use the actual rate of the clock. The motivation for this is to use the correct rate on the r8a7795/Salvator-X which is advertised as 133MHz but may differ depending on the extal present on the Salvator-X. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:52:37 -05:00
Ondrej Zary	1777ddb84a	dl2k: Implement suspend Add suspend/resume support to dl2k driver. This requires RX/TX rings to be reset so split out the required functionality from alloc_list() into new rio_reset_ring(). Tested on Asus NX1101 (IP1000A) and D-Link DGE-550T (DL-2000). Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:48:27 -05:00
Ondrej Zary	966e07f4bf	dl2k: Reorder and cleanup initialization Move HW init and stop into separate functions. Request IRQ only after the HW has been reset (so interrupts are disabled and no stale interrupts are pending). Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:48:27 -05:00
Ondrej Zary	39536ff81e	dl2k: Handle memory allocation errors in alloc_list If memory allocation fails in alloc_list(), free the already allocated memory and return -ENOMEM. In rio_open(), call alloc_list() first and abort if it fails. Move HW access (set RFDListPtr) out ot alloc_list(). Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:48:27 -05:00
David S. Miller	6b99c6d558	Merge branch 'tipc-cleanups-improvements' Jon Maloy says: ==================== tipc: some cleanups and improvements This series mostly contains cleanups and cosmetic code changes. The only real functional change is in #4 and #5, where we change the locking structure for nodes and links in order to permit full concurrency between links working in parallel on different interfaces. Since the groundwork for this has been done in previous commit series, this change constitutes only the final, small step to achieve that goal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:11 -05:00
Jon Paul Maloy	1a90632da8	tipc: eliminate remnants of hungarian notation The number of variables with Hungarian notation (l_ptr, n_ptr etc.) has been significantly reduced over the last couple of years. We now root out the last traces of this practice. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	38206d5939	tipc: narrow down interface towards struct tipc_link We move the definition of struct tipc_link from link.h to link.c in order to minimize its exposure to the rest of the code. When needed, we define new functions to make it possible for external entities to access and set data in the link. Apart from the above, there are no functional changes. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	5be9c08671	tipc: narrow down exposure of struct tipc_node In our effort to have less code and include dependencies between entities such as node, link and bearer, we try to narrow down the exposed interface towards the node as much as possible. In this commit, we move the definition of struct tipc_node, along with many of its associated function declarations, from node.h to node.c. We also move some function definitions from link.c and name_distr.c to node.c, since they access fields in struct tipc_node that should not be externally visible. The moved functions are renamed according to new location, and made static whenever possible. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	5405ff6e15	tipc: convert node lock to rwlock According to the node FSM a node in state SELF_UP_PEER_UP cannot change state inside a lock context, except when a TUNNEL_PROTOCOL (SYNCH or FAILOVER) packet arrives. However, the node's individual links may still change state. Since each link now is protected by its own spinlock, we finally have the conditions in place to convert the node spinlock to an rwlock_t. If the node state and arriving packet type are rigth, we can let the link directly receive the packet under protection of its own spinlock and the node lock in read mode. In all other cases we use the node lock in write mode. This enables full concurrent execution between parallel links during steady-state traffic situations, i.e., 99+ % of the time. This commit implements this change. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	2312bf61ae	tipc: introduce per-link spinlock As a preparation to allow parallel links to work more independently from each other we introduce a per-link spinlock, to be stored in the struct nodes's link entry area. Since the node lock still is a regular spinlock there is no increase in parallellism at this stage. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	1d7e1c2595	tipc: reduce code dependency between binding table and node layer The file name_distr.c currently contains three functions, named_cluster_distribute(), tipc_publ_subcscribe() and tipc_publ_unsubscribe() that all directly access fields in struct tipc_node. We want to eliminate such dependencies, so we move those functions to the file node.c and rename them to tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe() respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	5c10e97940	tipc: small cleanup of function tipc_node_check_state() The function tipc_node_check_state() contains the core logics for handling link synchronization and failover. For this reason, it is important to keep it as comprehensible as possible. In this commit, we make three small cleanups. 1) If the node is in state SELF_DOWN_PEER_LEAVING and the received packet confirms that the peer has lost contact, there will be no further action in this function. To make this clearer, we return from the function directly after the state change. 2) Since commit `0f8b8e28fb` ("tipc: eliminate risk of stalled link synchronization") only the logically first TUNNEL_PROTO/SYNCH packet can alter the link state and set the synch point, independently of arrival order. Hence, there is not any longer any need to adjust the synch value in case such packets arrive in disorder. We remove this adjustment. 3) It is the intention that any message arriving on any of the links may trig a check for and possible termination of a node SYNCH state. A redundant and unnoticed check for tipc_link_is_synching() obviously beats this purpose, with the effect that only packets arriving on the synching link may currently end the synch state. We remove this check. This change will further shorten the synchronization period between parallel links. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	c7cad0d6f7	tipc: move linearization of buffers to generic code In commit `5cbb28a4bf` ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function tipc_udp_recv(). The location of the change was selected in order to make the commit easily appliable to 'net' and 'stable'. We now move this linearization to where it should be done, in the functions tipc_named_rcv() and tipc_link_proto_rcv() respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:09 -05:00
David S. Miller	12ded5cae6	Merge branch 'bnx2x-stats' Yuval Mintz says: ==================== bnx2x: Statistics patch series This series contains 2 small statistics-related patches, first adding a new SW statistics and the other exposing port stats for multi-function devices. Please consider applying this series to `net-next'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 12:14:53 -05:00
Yuval Mintz	3fb2d4926c	bnx2x: Show port statistics in Multi-function Today, port statistics are being presented when using `ethool -S' only for single-function devices, but there are some port statistics which are crucial for analyzing bottle-necks. E.g., HW Rx discards due to lack of buffer space [when device isn't handling ingress traffic fast enough]. Judging the pros and cons, it was decided that in-order to better support automatic dump-gathering tools, bnx2x should no longer hide those stats. This leaves only VFs lacking the port statistics. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 12:14:52 -05:00
Yuval Mintz	6a5311982e	bnx2x: Add new SW stat 'tx_exhaustion_events' Driver already has an internal counter for number of times a given queue had to be stopped due to Tx ring exhaustion. This add the counter to the statistics presented by driver, e.g., by using `ethtool -S'. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 12:14:52 -05:00
David S. Miller	7521cd43ef	Merge branch 'ppp-kill-zombie-state' Guillaume Nault says: ==================== ppp: Remove PPPOX_ZOMBIE socket state Several issues have been found lately wrt. the PPPOX_ZOMBIE socket state. This state is now only set upon reception of a PADT to stop further transmissions. However this is redundant with the PADT workqueue mechanism introduced by `287f3a943f` ("pppoe: Use workqueue to die properly when a PADT is received"). We can thus simplify pppox socket state handling by getting rid of PPPOX_ZOMBIE entirely. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:31:27 -05:00
Guillaume Nault	a8acce6aa5	ppp: remove PPPOX_ZOMBIE socket state PPPOX_ZOMBIE is never set anymore. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:31:26 -05:00
Guillaume Nault	8734e485fe	ppp: don't set sk_state to PPPOX_ZOMBIE in pppoe_disc_rcv() Since `287f3a943f` ("pppoe: Use workqueue to die properly when a PADT is received"), pppoe_disc_rcv() disconnects the socket by scheduling pppoe_unbind_sock_work(). This is enough to stop socket transmission and makes the PPPOX_ZOMBIE state uncessary. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:31:26 -05:00
David S. Miller	bdc17fad6f	Merge branch 'mlxsw-vlan' Jiri Pirko says: ==================== mlxsw: small driver update Couple of VLAN-related patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:06:03 -05:00
Ido Schimmel	b07a966c70	mlxsw: spectrum: Add error paths to __mlxsw_sp_port_vlans_add The operation of adding VLANs on a port via switchdev ops can fail and we need to be prepared for it. If we do not rollback hardware operations following a failure, hardware and software will remain in an inconsistent state. Solve that by adding suitable error paths to __mlxsw_sp_port_vlans_add. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:06:03 -05:00
Ido Schimmel	3b7ad5ece4	mlxsw: spectrum: Unify setting of HW VLAN filters When adding or deleting VLANs from a bridged port, HW VLAN filters must be set accordingly. Instead of having the same code in both add and delete functions, just wrap it in a function and call it with the appropriate parameters. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:06:03 -05:00
Ido Schimmel	06c071f68d	mlxsw: spectrum: Use correct PVID value when removing VLANs When removing a range of VLANs in which PVID is a member we should use the correct PVID value instead of some VLAN in the range. Also, change two print statements to use 'dev' instead of 'mlxsw_sp_port->dev', as it's already used in other print statements in the function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:06:02 -05:00
Daniel Borkmann	f99bf205da	bpf: add show_fdinfo handler for maps Add a handler for show_fdinfo() to be used by the anon-inodes backend for eBPF maps, and dump the map specification there. Not only useful for admins, but also it provides a minimal way to compare specs from ELF vs pinned object. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 11:04:15 -05:00
Jon Ringle	7b5dc0dd59	net: encx24j600: move rev announcement to probe function When encx24j600 is open and closed many times due to userspace polling the interface, the log gets noise with this log message. Moving this to encx24j600_spi_probe function where it belongs. Signed-off-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 10:45:20 -05:00
David S. Miller	85c72ba1ed	Merge branch 'net-generic-busy-polling' Eric Dumazet says: ==================== net: extend busy polling support This patch series extends busy polling range to tunnels devices, and adds busy polling generic support to all NAPI drivers. No need to provide ndo_busy_poll() method and extra synchronization between ndo_busy_poll() and normal napi->poll() method. This was proven very difficult and bug prone. mlx5 driver is changed to support busy polling using this new method, and a second mlx5 patch adds napi_complete_done() support and proper SNMP accounting. bnx2x and mlx4 drivers are converted to new infrastructure, reducing kernel bloat and improving performance. Latest patch, adding generic support, adds a new requirement : -free_netdev() and netif_napi_del() must be called from process context. Since this might not be the case in some drivers, we might have to either : fix the non conformant drivers (by disabling busy polling on them) or revert this last patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:43 -05:00
Eric Dumazet	93d05d4a32	net: provide generic busy polling to all NAPI drivers NAPI drivers no longer need to observe a particular protocol to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y) napi_hash_add() and napi_hash_del() are automatically called from core networking stack, respectively from netif_napi_add() and netif_napi_del() This patch depends on free_netdev() and netif_napi_del() being called from process context, which seems to be the norm. Drivers might still prefer to call napi_hash_del() on their own, since they might combine all the rcu grace periods into a single one, knowing their NAPI structures lifetime, while core networking stack has no idea of a possible combining. Once this patch proves to not bring serious regressions, we will cleanup drivers to either remove napi_hash_del() or provide appropriate rcu grace periods combining. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:42 -05:00
Eric Dumazet	34cbe27e81	net: napi_hash_del() returns a boolean status napi_hash_del() will soon be used from both drivers (if they want) or core networking stack. Callers are responsibles to ensure an RCU grace period is respected before freeing napi structure : napi_hash_del() can signal if this RCU grace period is needed or not. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:42 -05:00
Eric Dumazet	6180d9de61	net: move napi_hash[] into read mostly section We do not often add/delete a napi context. Moving napi_hash[] into read mostly section avoids potential false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:42 -05:00
Eric Dumazet	d64b5e85bf	net: add netif_tx_napi_add() netif_tx_napi_add() is a variant of netif_napi_add() It should be used by drivers that use a napi structure to exclusively poll TX. We do not want to add this kind of napi in napi_hash[] in following patches, adding generic busy polling to all NAPI drivers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:41 -05:00
Eric Dumazet	93f93a4404	net: move skb_mark_napi_id() into core networking stack We would like to automatically provide busy polling support to all NAPI drivers, without them having to implement anything. skb_mark_napi_id() can be called from napi_gro_receive() and napi_get_frags(). Few drivers are still calling skb_mark_napi_id() because they use netif_receive_skb(). They should eventually call napi_gro_receive() instead. I will leave this to drivers maintainers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:41 -05:00
Eric Dumazet	868fdb0606	mlx4: remove mlx4_en_low_latency_recv() Busy polling can now be handled in generic NAPI poll infrastructure. This removes complexity and fast path overhead : mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi() Tested: Without busy polling : lpaa23:~# echo 0 >/proc/sys/net/core/busy_read lpaa24:~# echo 0 >/proc/sys/net/core/busy_read lpaa23:~# ./netperf -H lpaa24 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 47330.78 With busy polling : lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# ./netperf -H lpaa24 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 97643.55 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:40 -05:00
Eric Dumazet	b59768c6b4	bnx2x: remove bnx2x_low_latency_recv() support Switch to native NAPI polling, as this reduces overhead and complexity. Normal path is faster, since one cmpxchg() is not anymore requested, and busy polling with the NAPI polling has same performance. Tested: lpk50:~# cat /proc/sys/net/core/busy_read 70 lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 40095.07 16384 87380 IpInReceives 401062 0.0 IpInDelivers 401062 0.0 IpOutRequests 401079 0.0 TcpActiveOpens 7 0.0 TcpPassiveOpens 3 0.0 TcpAttemptFails 3 0.0 TcpEstabResets 5 0.0 TcpInSegs 401036 0.0 TcpOutSegs 401052 0.0 TcpOutRsts 38 0.0 UdpInDatagrams 26 0.0 UdpOutDatagrams 27 0.0 Ip6OutNoRoutes 1 0.0 TcpExtDelayedACKs 1 0.0 TcpExtTCPPrequeued 98 0.0 TcpExtTCPDirectCopyFromPrequeue 98 0.0 TcpExtTCPHPHits 4 0.0 TcpExtTCPHPHitsToUser 98 0.0 TcpExtTCPPureAcks 5 0.0 TcpExtTCPHPAcks 101 0.0 TcpExtTCPAbortOnData 6 0.0 TcpExtBusyPollRxPackets 400832 0.0 TcpExtTCPOrigDataSent 400983 0.0 IpExtInOctets 21273867 0.0 IpExtOutOctets 21261254 0.0 IpExtInNoECTPkts 401064 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-18 16:17:40 -05:00

1 2 3 4 5 ...

560697 Commits