linux

Author	SHA1	Message	Date
Andrew Ruder	751bb6fd80	dm9000: clean up reset code * Change a hard-coded 0x3 to NCR_RST \| NCR_MAC_LBK in dm9000_reset * Every single place where dm9000_init_dm9000 was ran, a dm9000_reset was called immediately before-hand. Bring dm9000_reset into dm9000_init_dm9000. * The following commit updated the dm9000_probe reset routine to use NCR_RST \| NCR_MAC_LBK: 6741f40 DM9000B: driver initialization upgrade and a later commit added a bug-fix to always reset the chip twice: `09ee9f8` dm9000: Implement full reset of DM9000 network device Unfortunately, since the changes in 6741f40 were made by replacing the dm9000_probe dm9000_reset with the adjusted iow(), the changes in `09ee9f8` were not incorporated into the dm9000_probe reset. Furthermore, it bypassed the requisite reset-delay causing some boards to get at least one "read wrong id ..." dev_err message during dm9000_probe. Signed-off-by: Andrew Ruder <andrew.ruder@elecsyscorp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:12:10 -07:00
Andrew Ruder	6b50d038e7	dm9000: acquire irq flags from device tree The DM9000 supports both active high interrupts and active low interrupts. This is configured via the attached EEPROM. In the device-tree case, make sure that the DM9000 driver passes the correct flags to request_irq. Signed-off-by: Andrew Ruder <andrew.ruder@elecsyscorp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:12:10 -07:00
Simon Horman	3b392ddba2	MPLS: Use mpls_features to activate software MPLS GSO segmentation If an MPLS packet requires segmentation then use mpls_features to determine if the software implementation should be used. As no driver advertises MPLS GSO segmentation this will always be the case. I had not noticed that this was necessary before as software MPLS GSO segmentation was already being used in my test environment. I believe that the reason for that is the skbs in question always had fragments and the driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the case for most drivers). Thus software segmentation was activated by skb_gso_ok(). This introduces the overhead of an extra call to skb_network_protocol() in the case where where CONFIG_NET_MPLS_GSO is set and skb->ip_summed == CHECKSUM_NONE. Thanks to Jesse Gross for prompting me to investigate this. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:05:09 -07:00
David S. Miller	d68de60f73	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-06-05 This series contains updates to i40e and i40evf. Jesse fixes an issue reported by Dave Jones where a couple of FD checks ended up using bitwise OR where it should have been bitwise AND. Neerav removes unused defines and macros for receive LRO. Fix the driver from allowing the user to set a larger MTU size that the hardware was being configured to support. Refactors send version which moves code in two places into a small helper function. Kamil modifies register diagnostics since register ranges can vary among the different NVMs to avoid false test results. So now we try to identify the full range and use it for a register test and if we fail to define the proper register range, we will only test the first register from that group. Then removes the check for large buffer since this was added in the case this structure changed in the future, since the AQ definition is now mature enough that this check is no longer necessary. Mitch fixes i40evf driver to allocate descriptors in groups of 32 since the hardware requires it. Also fixes a crash when the ring size changed because it would change the count before deallocating resources, causing the driver to either free nonexistent buffers or leak leftover buffers. Fixed the driver to notify the VF for all types of resets so the VF can attempt a graceful reinit. Shannon refactors stats collection to create a unifying stats update routine to call the various stat collection routines. Removes rx_errors and rx_missed stats since they were removed from the chip design. Added missing VSI statistics that the hardware offers but are not apart of the standard netdev stats. v2: dropped patch "i40e: Allow disabling of DCB via debugfs" from Neerav based on feedback from David Miller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 13:04:31 -07:00
Shannon Nelson	41a9e55c89	i40e: add missing VSI statistics Add a couple more statistics that the hardware offers but aren't part of the standard netdev stats. Change-ID: I201db2898f2c284aee3d9631470bc5edd349e9a5 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:58:13 -07:00
Shannon Nelson	03da6f6a4f	i40e/i40evf: remove rx_errors and rx_missed The rx_errors (GLV_REPC) and rx_missed (GLV_RMPC) were removed from the chip design. Change-ID: Ifdeb69c90feac64ec95c36d3d32c75e3a06de3b7 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:50:02 -07:00
Shannon Nelson	7812fddc9c	i40e: refactor stats collection Pull the PF stat collection out of the VSI collection routine, and add a unifying stats update routine to call the various stat collection routines. Change-ID: I224192455bb3a6e5dc0a426935e67dffc123e306 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:42:55 -07:00
Jesse Brandeburg	44033fac14	i40e: refactor send version This change moves some common code in two places into a small helper function, and corrects a bug in one of the two places in the process. Change-ID: If3bba7152b240f13a7881eb0e8a781655fa66ce7 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:34:59 -07:00
Kamil Krawczyk	4f4e17bd12	i40e/i40evf: VEB structure added, GTIME macro update Structure for VEB context added. Update macro for transition from ms to GTIME (us) time units. Change-ID: Ib3a19587b4cf355348655df8f60c6f37bb1497a3 Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:27:39 -07:00
Mitch Williams	263fc48f97	i40e: notify VF of all types of resets Currently, the PF driver only notifies the VFs for PF reset events. Instead, notify the VFs for all types of resets, so they can attempt a graceful reinit. Change-ID: I03eb7afde25727198ef620f8b4e78bb667a11370 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:20:19 -07:00
Mitch Williams	d732a18445	i40evf: fix crash when changing ring sizes i40evf_set_ringparam was broken in several ways. First, it only changed the size of the first ring, and second, changing the ring size would often result in a panic because it would change the count before deallocating resources, causing the driver to either free nonexistent buffers, or leak leftover buffers. Fix this by storing the descriptor count in the adapter structure, and updating the count for each ring each time we allocate them. This ensures that we always free the right size ring, and always end up with the requested count when the device is (re)opened. Change-ID: I298396cd3d452ba8509d9f2d33a93f25868a9a55 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:13:01 -07:00
Mitch Williams	337eb08e5a	i40evf: set descriptor multiple to 32 Hardware requires descriptors to be allocated in groups of 32. Change-ID: I752ccc96769d1bd8d3018c004b8aeff464045bf2 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 02:05:35 -07:00
Jesse Brandeburg	61a46a4c07	i40e: clamp jumbo frame size The driver was allowing the user to set larger size MTU than the hardware was being configured to support. The driver was already using VLAN_HLEN when setting the hardware max receivable frame size, so just add it to the netdev MTU set entry point as well. Change-ID: Ie20e2a35d04f8c411253e255bea79ca69aaeaea3 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 01:58:31 -07:00
Jesse Brandeburg	d0277054ac	i40e/i40evf: remove unused RX_LRO define Remove unused defines and macros for RX_LRO. Change-ID: I8ca6715edfa62b56837417a1c4ff68c2345dab6e Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 01:49:18 -07:00
Kamil Krawczyk	c5cc0cfd27	i40e: remove check for large buffer We introduced this check in case this structure changed in the future, the AQ definition is now mature enough that this check is no longer necessary. Change-ID: Ic66321d0a08557dc9d8cb84029185352cb534330 Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 01:35:07 -07:00
Kamil Krawczyk	22dd9ae8af	i40e: Rework register diagnostic Register range, being subject to register diagnostic, can vary among different NVMs. We will try to identify the full range and use it for a register test. This is needed to avoid false test results. If we fail to define the proper register range we will test only the first register from that group. Change-ID: Ieee7173c719733b61d3733177a94dc557eb7b3fd Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 01:27:46 -07:00
Jesse Brandeburg	d3a90b70d8	i40e: don't use OR to check a value A couple of FD checks ended up using bitwise OR to check a value, which ends up always being evaluated to true. This should fix the issue. Thanks to DaveJ for noticing and reporting the issue! CC: Dave Jones <davej@redhat.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-05 01:20:06 -07:00
WANG Cong	ebbe495f19	ipv4: use skb frags api in udp4_hwcsum() Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 00:51:47 -07:00
WANG Cong	4cb28970a2	net: use the new API kvfree() It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 00:49:51 -07:00
Manuel Schölling	9638f6713f	dns_resolver: Do not accept domain names longer than 255 chars According to RFC1035 "[...] the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less." Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 00:05:53 -07:00
David S. Miller	555878b9ee	Merge branch 'isdn-capi' Tilman Schmidt says: ==================== ISDN patches for net-next (v2) Here's v2 of the series of patches for the ISDN CAPI subsystem prepared by Paul Bolle and reviewed by yours truly. It reflects GregKH's review, resulting in a substantial simplification. Please merge via net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 23:13:49 -07:00
Paul Bolle	d1cadce15a	isdn/capi: fix (middleware) device nodes Since v2.4 the capi driver used the following device nodes if "middleware" support was enabled: /dev/capi20 /dev/capi/0 /dev/capi/1 [...] /dev/capi20 is a character device node. /dev/capi/0 (and up) are tty device nodes (with a different major). This device node (naming) scheme is not documented anywhere, as far as I know. It was originally provided by the capifs pseudo filesystem (before udev became available). It is required for example by the pppd capiplugin. It was supported until a few years ago. But a number of developments broke it: - v2.6.6 (May 2004) renamed /dev/capi20 to /dev/capi and removed the "/" from the name of capi's tty driver. The explanation of the patch that did this included two examples of udev rules "to restore the old namespace"; - either udev 154 (May 2010) or udev 179 (January 2012) stopped allowing to rename device nodes, and thus the ability to have /dev/capi20 appear instead of /dev/capi and /dev/capi/0 (and up) instead of /dev/capi0 (and up); - v3.0 (July 2011) also removed capifs. That disabled another method to create the /dev/capi/0 (and up) device nodes. So now users need to manually tweak their setup (eg, create /dev/capi/ and fill that with symlinks) to get things working. This is all rather hacky and only discoverable by searching the web. Fix all this by renaming /dev/capi back to /dev/capi20, and by setting the name of the "capi_nc" tty driver to "capi!" so the tty device nodes appear as /dev/capi/0 (and up). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Tilman Schmidt <tilman@imap.cc> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 23:13:41 -07:00
Paul Bolle	a79f5d2612	isdn/capi: Make verbose reporting depend on capidrv The Kconfig symbol ISDN_DRV_AVMB1_VERBOSE_REASON is only used for capi_info2str(). That function is only used in capidrv.c. So setting it without setting ISDN_CAPI_CAPIDRV is pointless. Make it depend on ISDN_CAPI_CAPIDRV, rename it to ISDN_CAPI_CAPIDRV_VERBOSE and put its entry after ISDN_CAPI_CAPIDRV's entry. Since this symbol seems to be primarily used for debugging, keep it off by default. By now the last users of capidrv hopefully know all they need to know about the reasons for disconnecting. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 23:13:41 -07:00
Paul Bolle	ca05e3a755	isdn/capi: move capi_info2str to capidrv.c capi_info2str() is apparently meant to be of general utility. It is actually only used in capidrv.c. So move it from capiutil.c to capidrv.c and (obviously) stop exporting it. And, since we're touching this, merge the two versions of this function. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 23:13:41 -07:00
David S. Miller	00d115fc75	Merge branch 'inet_csums' Tom Herbert says: ==================== net: Support checksum in UDP This patch series adds support for using checksums in UDP tunnels. With this it is possible that two or more checksums may be set within the same packet and we would like to do that efficiently. This series also creates some new helper functions to be used by various tunnel protocol implementations. v2: Fixed indentation in tcp_v6_send_check arguments. v3: Move udp_set_csum and udp6_set_csum to be not inlined Also have this functions call with a nocheck boolean argument instead of passing a sock structure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:52:31 -07:00
Tom Herbert	359a0ea987	vxlan: Add support for UDP checksums (v4 sending, v6 zero csums) Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:39 -07:00
Tom Herbert	4749c09c37	gre: Call gso_make_checksum Call gso_make_checksum. This should have the benefit of using a checksum that may have been previously computed for the packet. This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that offload GRE GSO with and without the GRE checksum offloaed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	0f4f4ffa7b	net: Add GSO support for UDP tunnels with checksum Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	e9c3a24b3a	tcp: Call gso_make_checksum Call common gso_make_checksum when calculating checksum for a TCP GSO segment. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	7e2b10c1e5	net: Support for multiple checksums with gso When creating a GSO packet segment we may need to set more than one checksum in the packet (for instance a TCP checksum and UDP checksum for VXLAN encapsulation). To be efficient, we want to do checksum calculation for any part of the packet at most once. This patch adds csum_start offset to skb_gso_cb. This tracks the starting offset for skb->csum which is initially set in skb_segment. When a protocol needs to compute a transport checksum it calls gso_make_checksum which computes the checksum value from the start of transport header to csum_start and then adds in skb->csum to get the full checksum. skb->csum and csum_start are then updated to reflect the checksum of the resultant packet starting from the transport header. This patch also adds a flag to skbuff, encap_hdr_csum, which is set in *gso_segment fucntions to indicate that a tunnel protocol needs checksum calculation Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	77157e1973	l2tp: call udp{6}_set_csum Call common functions to set checksum for UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
Tom Herbert	af5fcba7f3	udp: Generic functions to set checksum Added udp_set_csum and udp6_set_csum functions to set UDP checksums in packets. These are for simple UDP packets such as those that might be created in UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 22:46:38 -07:00
David S. Miller	6579867c8b	Merge branch 'bonding-macvlan' Vlad Yasevich says: ==================== Fix support for macvlan devices on top bonding Currently, macvlan devices do not work well over bond interfaces. Everything works well, untill a failover is triggered in the bond device and then macvlan becomes unreachble untill arp entries are flushed. This series adds needed functionality to handle correct notifications and update switches with mac addresses assigned to macvlans. The first patch simply addes IFF_UNICAST_FLT flag to bonds since they already correctly manage the unicast filter list of the slaves, so we might as well prevent the bond from needlessly going into promiscuous mode. The second patch adds notifier handler to macvlan to trigger correct ARP notifications. The third patch adds handling for TLB and RLB modes that use special ETH_P_LOOPBACK type packets to teach switch about mac addresses. It also allow ARPs for the macvlan mac addresses to be handled by RLB mode. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:14:17 -07:00
Vlad Yasevich	14af9963ba	bonding: Support macvlans on top of tlb/rlb mode bonds To make TLB mode work, the patch allows learning packets to be sent using mac addresses assigned to macvlan devices, also taking into an account vlans that may be between the bond and macvlan device. To make RLB work, all we have to do is accept ARP packets for addresses added to the bond dev->uc list. Since RLB mode will take care to update the peers directly with correct mac addresses, learning packets for these addresses do not have be send to switch. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:13:54 -07:00
Vlad Yasevich	4c99125568	macvlan: Support bonding events Bonding and team drivers generate specific events during failover that trigger switch updates. When a macvlan device is configured on top of bonding, we want switches to learn about the macvlan devices as well. This patch adds a handler to macvlan driver to propagate these events to all macvlan devices. We let the generic inetdev event handler do the work. This allows macvlan to operated correctly over active-backup mode bond. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:13:54 -07:00
Vlad Yasevich	c565b488c6	bonding: Turn on IFF_UNICAST_FLT on bond devices Bonding devices manage the unicast filters of the underlying interfaces, but do not turn on IFF_UNICAST_FLT flag. Thus anytime a unicast address is added to the bond, the bond is places in promiscuous mode. Turn on IFF_UNICAST_FLT on the bond device so that the bond does not go into promiscuous mode needlesly. If an underlying device does not support unicast filtering, that device will automaticall enter promiscuous mode already. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:13:54 -07:00
Sasha Levin	f830b0223c	net: Revert "fib_trie: use seq_file_net rather than seq->private" This reverts commit `30f38d2fdd`. fib_triestat is surrounded by a big lie: while it claims that it's a seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't: static const struct file_operations fib_triestat_fops = { .owner = THIS_MODULE, .open = fib_triestat_seq_open, .read = seq_read, .llseek = seq_lseek, .release = single_release_net, }; Yes, fib_triestat is just a regular file. A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files you could do seq_file_net() to get the net ptr, doing so for a regular file would be wrong and would dereference an invalid pointer. The fib_triestat lie claimed a victim, and trying to show the file would be bad for the kernel. This patch just reverts the issue and fixes fib_triestat, which still needs a rewrite to either be a seq_file or stop claiming it is. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:11:41 -07:00
Antonio Ospite	cef33c815a	trivial: drivers/net/ethernet/nvidia/forcedeth.c: fix typo s/SUBSTRACT1/SUBTRACT1/ Signed-off-by: Antonio Ospite <ao2@ao2.it> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 15:07:42 -07:00
Xiubo Li	898157ed74	gianfar: Fix the section mismatch warnings. Building with CONFIG_DEBUG_SECTION_MISMATCH enabled, the following WARNING is occured: LD drivers/net/built-in.o WARNING: drivers/net/built-in.o(.text+0xcd4c): Section mismatch in reference from the function gfar_probe() to the function .init.text:gfar_init_addr_hash_table() The function gfar_probe() references the function __init gfar_init_addr_hash_table(). This is often because gfar_probe lacks a __init annotation or the annotation of gfar_init_addr_hash_table is wrong. Signed-off-by: Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:55:23 -07:00
David S. Miller	9ab89acc86	Merge branch 'xen-netback-netfront-multiqueue' Wei Liu says: ==================== This is rebased version of Andrew's V8 patch series. The original cover letter: -------------------- xen-net{back, front}: Multiple transmit and receive queues This patch series implements multiple transmit and receive queues (i.e. multiple shared rings) for the xen virtual network interfaces. The series is split up as follows: - Patch 1 brings the 'grant_copy_op' array back into struct xenvif, in preparation for multi-queue support. See the patch itself for more details. - Patches 2 and 4 factor out the queue-specific data for netback and netfront respectively, and modify the rest of the code to use these as appropriate. - Patches 3 and 5 introduce new XenStore keys to negotiate and use multiple shared rings and event channels, and code to connect these as appropriate. - Patch 6 documents the XenStore keys required for the new feature in include/xen/interface/io/netif.h All other transmit and receive processing remains unchanged, i.e. there is a kthread per queue and a NAPI context per queue. The performance of these patches has been analysed in detail, with results available at: http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing To summarise: * Using multiple queues allows a VM to transmit at line rate on a 10 Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s with a single queue. * For intra-host VM--VM traffic, eight queues provide 171% of the throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s. * There is a corresponding increase in total CPU usage, i.e. this is a scaling out over available resources, not an efficiency improvement. * Results depend on the availability of sufficient CPUs, as well as the distribution of interrupts and the distribution of TCP streams across the queues. Queue selection is currently achieved via an L4 hash on the packet (i.e. TCP src/dst port, IP src/dst address) and is not negotiated between the frontend and backend, since only one option exists. Future patches to support other frontends (particularly Windows) will need to add some capability to negotiate not only the hash algorithm selection, but also allow the frontend to specify some parameters to this. Note that queue selection is a decision by the transmitting system about which queue to use for a particular packet. In general, the algorithm may differ between the frontend and the backend with no adverse effects. Queue-specific XenStore entries for ring references and event channels are stored hierarchically, i.e. under .../queue-N/... where N varies from 0 to one less than the requested number of queues (inclusive). If only one queue is requested, it falls back to the flat structure where the ring references and event channels are written at the same level as other vif information. V8: - Squash the queue error handling code into patch 3. - Update the documentation (patch 6) according to comments on the equivalent patch to Xen. V7: - Rebase on latest net-next, which includes the netback grant mapping patch series from Zoltan Kiss - Reduce QUEUE_NAME_SIZE by 1 to avoid double-counting the trailing '\0' - Simplify the queue hashing by using (hash % num_queues) instead of multiply & shift. - Add ratelimited warning for invalid queue selection. - Fix error handling to correctly tear down already setup queues. - Use dev->real_num_tx_queues instead of separately maintaining a count of the number of queues. V6: - Use 'max_queues' as the module param. name for both netback and netfront. V5: - Fix bug in xenvif_free() that could lead to an attempt to transmit an skb after the queue structures had been freed. - Improve the XenStore protocol documentation in netif.h. - Fix IRQ_NAME_SIZE double-accounting for null terminator. - Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue). - Don't initialise a local variable that is set in both branches (xspath). V4: - Add MODULE_PARM_DESC() for the multi-queue parameters for netback and netfront modules. - Move del_timer_sync() in netfront to after unregister_netdev, which restores the order in which these functions were called before applying these patches. V3: - Further indentation and style fixups. V2: - Rebase onto net-next. - Change queue->number to queue->id. - Add atomic operations around the small number of stats variables that are not queue-specific or per-cpu. - Fixup formatting and style issues. - XenStore protocol changes documented in netif.h. - Default max. number of queues to num_online_cpus(). - Check requested number of queues does not exceed maximum. -------------------- I rebased this on top of net-next. No functional change is introduced. The patch that needed some extra care was "xen-netback: Factor queue-specific data into queue struct" because it clashed with a fix introduced in net. A simple test of creating guest, iperf, then shutting down guest worked as expected. The last patch fixes a minor problem that queue name is not initialised in xen-netfront, resulting in names like "-tx" "-rx" in /proc/interrupt. Changes since v9 (no functional change introduced): * include commit summary in the commit message of first patch * fold David Vrabel's Reviewed-by into last patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:30 -07:00
Wei Liu	8b715010aa	xen-netfront: initialise queue name in xennet_init_queue Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:17 -07:00
Andrew J. Bennieston	a2deb8b1e7	xen-net{back, front}: Document multi-queue feature in netif.h Document the multi-queue feature in terms of XenStore keys to be written by the backend and by the frontend. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:17 -07:00
Andrew J. Bennieston	50ee60611b	xen-netfront: Add support for multiple queues Build on the refactoring of the previous patch to implement multiple queues between xen-netfront and xen-netback. Check XenStore for multi-queue support, and set up the rings and event channels accordingly. Write ring references and event channels to XenStore in a queue hierarchy if appropriate, or flat when using only one queue. Update the xennet_select_queue() function to choose the queue on which to transmit a packet based on the skb hash result. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:17 -07:00
Andrew J. Bennieston	2688fcb794	xen-netfront: Factor queue-specific data into queue struct. In preparation for multi-queue support in xen-netfront, move the queue-specific data from struct netfront_info to struct netfront_queue, and update the rest of the code to use this. Also adds loops over queues where appropriate, even though only one is configured at this point, and uses alloc_etherdev_mq() and the corresponding multi-queue netif wake/start/stop functions in preparation for multiple active queues. Finally, implements a trivial queue selection function suitable for ndo_select_queue, which simply returns 0, selecting the first (and only) queue. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:17 -07:00
Andrew J. Bennieston	8d3d53b3e4	xen-netback: Add support for multiple queues Builds on the refactoring of the previous patch to implement multiple queues between xen-netfront and xen-netback. Writes the maximum supported number of queues into XenStore, and reads the values written by the frontend to determine how many queues to use. Ring references and event channels are read from XenStore on a per-queue basis and rings are connected accordingly. Also adds code to handle the cleanup of any already initialised queues if the initialisation of a subsequent queue fails. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:16 -07:00
Wei Liu	e9ce7cb6b1	xen-netback: Factor queue-specific data into queue struct In preparation for multi-queue support in xen-netback, move the queue-specific data from struct xenvif into struct xenvif_queue, and update the rest of the code to use this. Also adds loops over queues where appropriate, even though only one is configured at this point, and uses alloc_netdev_mq() and the corresponding multi-queue netif wake/start/stop functions in preparation for multiple active queues. Finally, implements a trivial queue selection function suitable for ndo_select_queue, which simply returns 0 for a single queue and uses skb_get_hash() to compute the queue index otherwise. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:16 -07:00
Andrew J. Bennieston	a55d9766ce	xen-netback: Move grant_copy_op array back into struct xenvif. This array was allocated separately in commit `ac3d5ac2` ("xen-netback: fix guest-receive-side array sizes") due to it being very large, and a struct xenvif is allocated as the netdev_priv part of a struct net_device, i.e. via kmalloc() but falling back to vmalloc() if the initial alloc. fails. In preparation for the multi-queue patches, where this array becomes part of struct xenvif_queue and is always allocated through vzalloc(), move this back into the struct xenvif. Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:48:16 -07:00
David S. Miller	9bcc14d239	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to e1000, igb and ixgbe. Emil provides his version 2 fix for the detection of SFP+ capable interfaces. In cases where the driver is loaded while there are no SFP+ modules in cage, the interface was not being detected as SFP capable. Resolve the issue by identifying interfaces with no PHY type set as SFP capable which allows the driver to detect the SFP module when the interface is brought up. In this version 2 of the patch, the 82599 specific check was removed since we only have 82598 devices that are SFP capable. Jacob removes the including of the export header in the ixgbe PTP core, since it is not needed. Renames igb_ptp_enable() to igb_ptp_feature_enable() to better reflect the actual functions purpose. Todd fixes the ethtool loopback test for i354 backplane devices since we do not know what PHY is to be used for the devices, use MAC loopback for ethtool tests. Todd also sets the packet buffer size register defaults for i210 devices. Yongjian Xu removes the check for skb->len being negative or zero since there is never a case where it would be zero or negative for e1000. Manuel Schölling updates e1000 to use the time_after() helper function. v2: Fix indentation on wrapped line in patch 3 of the series based on feedback from David Miller ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-04 14:40:17 -07:00
Manuel Schölling	1aa65f4d7f	e1000: Use time_after() for time comparison To be future-proof and for better readability the time comparisons are modified to use time_after() instead of plain, error-prone math. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-03 23:58:06 -07:00
Yongjian Xu	c46d150404	e1000: remove the check: skb->len<=0 There is no case skb->len would be 0 or 'negative'. Remove the check. Signed-off-by: Yongjian Xu <xuyongjiande@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-06-03 23:57:57 -07:00

1 2 3 4 5 ...

444218 Commits