linux

Author	SHA1	Message	Date
Jiri Pirko	77d47afbf3	neigh: use neigh_parms_net() to get struct neigh_parms->net pointer This fixes compile error when CONFIG_NET_NS is not set. Introduced by: commit `1d4c8c2984` "neigh: restore old behaviour of default parms values" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 17:58:44 -05:00
Paul Moore	10ae76faa9	cipso: cleanup cipso_v4_translate() when !CONFIG_NETLABEL Don't needlessly recompute 'opt[opt_iter + 1]' as we already have it stored in 'tag_len'. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 17:56:54 -05:00
Jiri Pirko	971a351ccb	ipv6 addrconf: revert /proc/net/if_inet6 ifa_flag format Turned out that applications like ifconfig do not handle the change. So revert ifa_flag format back to 2-letter hex value. Introduced by: commit `479840ffdb` "ipv6 addrconf: extend ifa_flags to u32" Reported-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Tested-by: FLorent Fourcot <florent.fourcot@enst-bretagne.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 17:47:18 -05:00
Guenter Roeck	e6e25bba9b	igb: Start temperature sensor attribute index with 1 Per hwmon ABI, temperature sensor attribute index starts with 1, not 0. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Jean Delvare <khali@linux-fr.org> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:35 -08:00
Guenter Roeck	e3670b8195	igb: Convert to use devm_hwmon_device_register_with_groups Simplify the code. Attach hwmon sysfs attributes to hwmon device instead of pci device. Avoid race conditions caused by attributes being created after registration and provide mandatory 'name' attribute by using new hwmon API. Other cleanup: Instead of allocating memory for hwmon attributes, move attributes and all other hwmon related data into struct hwmon_buff and allocate the entire structure using devm_kzalloc. Check return value from calls to igb_add_hwmon_attr() one by one instead of logically combining them all together. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:35 -08:00
Carolyn Wyborny	56cec24916	igb: Add new feature Media Auto Sense for 82580 devices only This patch adds support for the hardware feature Media Auto Sense. This feature requires a custom EEPROM image provided by our customer support team. The feature allows hardware designed with dual PHY's, fiber and copper to be used with either media without additional EEPROM changes. Fiber is preferred and driver will swap and configure for fiber media if sensed by the device at any time. Device will swap back to copper if it is the only media detected. Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:35 -08:00
Aaron Sierra	89dbefb213	igb: Support ports mapped in 64-bit PCI space This patch resolves an issue with 64-bit PCI addresses being truncated because the return values of pci_resource_start() and pci_resource_end() were being cast to unsigned long. Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:34 -08:00
Carolyn Wyborny	2bdfc4e271	igb: Add media switching feature for i354 PHY's This patch adds a new feature which is supported in some PHY's on some i354 devices. This feature is Auto Media Detect and allows which ever media is detected first by the PHY to be the media used and configured by the device. This is a media swapping feature that is wholly contained in the Marvell PHY. Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:34 -08:00
Don Skidmore	5cdab2f620	ixgbe: Focus config of head, tail ntc, and ntu all into a single function This patch makes it so that head, tail, next to clean, and next to use are all reset in a single function for the Tx or Rx path. Previously the code for this was spread out over several areas which could make it difficult to track what the values for these were. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:33 -08:00
Jacob Keller	c0832b2c10	ixgbevf: update Kconfig description This patch updates the ixgbevf Kconfig description, as the VF driver supports more than just the 82599 device. This patch renames the config menu item, as well as updates the help description to make it more obvious that the driver supports more than just a single device group. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:33 -08:00
Catherine Sullivan	893238ac13	i40e: Bump version number Version updated to 0.3.13-k Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:32 -08:00
Jesse Brandeburg	0319577f89	i40e: remove and fix confusing define name I40E_ITR_NONE was being used as an ITRN register index by accident because it was easily associated with the I40E_RX_ITR and friends defines. Change the name slightly in order to make it clear that I40E_ITR_NONE is really associated with the DYN_CTL register sets. Change-Id: I04702c027c7495b90a8bf2db85d3e085a2c7d02a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:31 -08:00
Shannon Nelson	1fa18370e4	i40e: complain about out-of-range descriptor request Instead of silently clamping the descriptor change request into the proper range, fail the request and complain in the log file. Change-Id: Id55ef59255d93c04bedffa8e25fe7ea796c90f32 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:30 -08:00
Kamil Krawczyk	639dc3777f	i40e: loopback info and set loopback fix Add information about current loopback mode to data returned from get_link_info function. Minor fix in set_loopback function and update in loopback types enum. Change-Id: I9d1c540a84ab18eef5ea6429be6331f33fc06aca Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:30 -08:00
Shannon Nelson	b03aaa9c73	i40e: restrict diag test messages Use the netif_info() macro to restrict messaging to when the HW bit is enabled in the msglvl netdev message mask. Change-Id: I83030d4402991cfb7da100da00f05ce502ada4ae Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-12-10 01:27:29 -08:00
Johan Hedberg	71fb419724	Bluetooth: Fix handling of L2CAP Command Reject over LE If we receive an L2CAP command reject message over LE we should take appropriate action on the corresponding channel. This is particularly important when trying to interact with a remote pre-4.1 system using LE CoC signaling messages. If we don't react to the command reject the corresponding socket would not be notified until a connection timeout occurs. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2013-12-10 01:15:44 -08:00
Jack Morgenstein	7c6d74d23a	mlx4_core: Roll back round robin bitmap allocation commit for CQs, SRQs, and MPTs Commit `f4ec9e9` "mlx4_core: Change bitmap allocator to work in round-robin fashion" introduced round-robin allocation (via bitmap) for all resources which allocate via a bitmap. Round robin allocation is desirable for mcgs, counters, pd's, UARs, and xrcds. These are simply numbers, with no involvement of ICM memory mapping. Round robin is required for QPs, since we had a problem with immediate reuse of a 24-bit QP number (commit `f4ec9e9`). However, for other resources which use the bitmap allocator and involve mapping ICM memory -- MPTs, CQs, SRQs -- round-robin is not desirable. What happens in these cases is the following: ICM memory is allocated and mapped in chunks of 256K. Since the resource allocation index goes up monotonically, the allocator will eventually require mapping a new chunk. Now, chunks are also unmapped when their reference count goes back to zero. Thus, if a single app is running and starts/exits frequently we will have the following situation: When the app starts, a new chunk must be allocated and mapped. When the app exits, the chunk reference count goes back to zero, and the chunk is unmapped and freed. Therefore, the app must pay the cost of allocation and mapping of ICM memory each time it runs (although the price is paid only when allocating the initial entry in the new chunk). For apps which allocate MPTs/SRQs/CQs and which operate as described above, this presented a performance problem. We therefore roll back the round-robin allocator modification for MPTs, CQs, SRQs. Reported-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 21:12:13 -05:00
Florent Fourcot	ac1eabcaec	ipv6: use ip6_flowinfo helper Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 21:03:50 -05:00
Florent Fourcot	3308de2b84	ipv6: add ip6_flowlabel helper And use it if possible. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 21:03:49 -05:00
Florent Fourcot	82e9f105a2	ipv6: remove rcv_tclass of ipv6_pinfo tclass information in now already stored in rcv_flowinfo We do not need to store the same information twice. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 21:03:49 -05:00
Florent Fourcot	37cfee909c	ipv6: move IPV6_TCLASS_MASK definition in ipv6.h Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 21:03:49 -05:00
Florent Fourcot	1397ed35f2	ipv6: add flowinfo for tcp6 pkt_options for all cases The current implementation of IPV6_FLOWINFO only gives a result if pktoptions is available (thanks to the ip6_datagram_recv_ctl function). It gives inconsistent results to user space, sometimes there is a result for getsockopt(IPV6_FLOWINFO), sometimes not. This patch add rcv_flowinfo to store it, and return it to the userspace in the same way than other pkt_options. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 21:03:49 -05:00
Rafał Miłecki	5824d2d16d	bgmac: connect to PHY and make use of PHY device We were already registering MDIO bus, but we were not connecting bgmac to the PHY. Add proper call and implement adjust link function to switch MAC into requested state. At the same time it's possible to drop our internal PHY management. This is a "standard" PHY, so the "Generic PHY" driver works perfectly fine with this. Don't duplicate the code. Finally make use of phy_ethtool_[gs]set functions instead implementing them from scratch. This change was successfully tested on BCM5357. I was able to autonegotiate 1000Mb/s full duplex, as well as force any of the 10/100/1000 half/full modes. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:59:25 -05:00
Joe Perches	2c722fe1c8	etherdevice: Optimize a few is_<foo>_ether_addr functions If CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set, several is_<foo>_ether_addr functions can be slightly improved by using u32 dereferences. I believe all current uses of is_zero_ether_addr and is_broadcast_ether_addr are u16 aligned, so always use u16 references to improve those functions performance. Document the u16 alignment requirements. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:58:11 -05:00
Joe Perches	d9cd4fe5ef	batadv: Slight optimization of batadv_compare_eth Use the newly added generic routine ether_addr_equal_unaligned to test if possibly unaligned to u16 Ethernet addresses are equal. This slightly improves comparison time for systems with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:58:11 -05:00
Joe Perches	73eaef87e9	etherdevice: Add ether_addr_equal_unaligned Add a generic routine to test if possibly unaligned to u16 Ethernet addresses are equal. If CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set, this uses the slightly faster generic routine ether_addr_equal, otherwise this uses memcmp. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:58:10 -05:00
David S. Miller	4e58879346	Merge branch 'neigh' Jiri Pirko says: ==================== neigh: respect default parms values This is a long standing regression. But since the patchset is bigger and the regression happened in 2007, I'm proposing this to net-next instead. Basically the problem is that if user wants to use /etc/sysctl.conf to specify default values of neigh related params, he is not able to do that. The reason is that the default values are copied to dev instance right after netdev is registered. And that is way to early. The original behaviour for ipv4 was that this happened after first address was assigned to device. For ipv6 this was apparently from the very beginning. So this patchset basically reverts the behaviour back to what it was in 2007 for ipv4 and changes the behaviour for ipv6 so they are both the same. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:56:27 -05:00
Jiri Pirko	bba24896f0	neigh: ipv6: respect default values set before an address is assigned to device Make the behaviour similar to ipv4. This will allow user to set sysctl default neigh param values and these values will be respected even by devices registered before (that ones what do not have address set yet). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:56:12 -05:00
Jiri Pirko	1d4c8c2984	neigh: restore old behaviour of default parms values Previously inet devices were only constructed when addresses are added. Therefore the default neigh parms values they get are the ones at the time of these operations. Now that we're creating inet devices earlier, this changes the behaviour of default neigh parms values in an incompatible way (see bug #8519). This patch creates a compromise by setting the default values at the same point as before but only for those that have not been explicitly set by the user since the inet device's creation. Introduced by: commit `8030f54499` Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu Feb 22 01:53:47 2007 +0900 [IPV4] devinet: Register inetdev earlier. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:56:12 -05:00
Jiri Pirko	73af614aed	neigh: use tbl->family to distinguish ipv4 from ipv6 Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:56:12 -05:00
Jiri Pirko	cb5b09c17f	neigh: wrap proc dointvec functions This will be needed later on to provide better management of default values. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:56:12 -05:00
Jiri Pirko	1f9248e560	neigh: convert parms to an array This patch converts the neigh param members to an array. This allows easier manipulation which will be needed later on to provide better management of default values. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:56:12 -05:00
David S. Miller	65be6291c8	Merge branch 'phy_reset' Florian Fainelli says: ==================== net: phy: consolidate PHY reset This patchset consolidates the PHY reset through the MII BMCR register by using a central place were this is done. This patchset resumes the work Kyle Moffett started here: https://lkml.org/lkml/2011/10/20/301 Note that at this point, drivers doing funky things after issuing a PHY reset using phy_init_hw() will still suffer from PHY state machine problems, this will be taken care of later on. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:39:05 -05:00
Florian Fainelli	0c9eb5b931	net: sh_eth: do not issue a wild PHY reset through BMCR The sh_eth driver issues an uncontrolled PHY reset through the MII register BMCR but fails to wait for the reset to complete, and will also implicitely wipe out all possible PHY fixups applied. Use phy_init_hw() which remedies both problems. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:59 -05:00
Florian Fainelli	01b0114e06	net: tc35815: use phy_init_hw for PHY reset Instead of open-coding the PHY reset through MII BMCR, use phy_init_hw() which does that for us and also makes sure that any PHY specific fixups are applied. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:59 -05:00
Florian Fainelli	78de53f05c	net: pxa168_eth: use phy_init_hw for PHY reset Instead of open-coding a PHY reset through the MII BMCR register, use phy_init_hw() which does this for us and ensures that PHY device fixups are also applied. We also remove a call to ethernet_phy_reset() which is now unncessary since phy_attach() calls phy_attach_direct() which in turns calls phy_init_hw(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:59 -05:00
Florian Fainelli	7cd1463664	net: mv643xx_eth: use phy_init_hw to reset PHY Instead of open-coding a PHY reset through the MII BMCR register, use phy_init_hw() which does that for us and will also make sure that PHY fixups are applied if required. We also remove a call to phy_reset() due to the following sequence of calls in the driver: phy_scan() -> phy_connect() -> phy_connect_direct() -> phy_attach_direct() -> phy_init_hw() and we only have a call to phy_init() after phy_scan(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:59 -05:00
Florian Fainelli	87aa9f9c61	net: phy: consolidate PHY reset in phy_init_hw() There are quite a lot of drivers touching a PHY device MII_BMCR register to reset the PHY without taking care of: 1) ensuring that BMCR_RESET is cleared after a given timeout 2) the PHY state machine resuming to the proper state and re-applying potentially changed settings such as auto-negotiation Introduce phy_poll_reset() which will take care of polling the MII_BMCR for the BMCR_RESET bit to be cleared after a given timeout or return a timeout error code. In order to make sure the PHY is in a correct state, phy_init_hw() first issues a software reset through MII_BMCR and then applies any fixups. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:59 -05:00
Florian Fainelli	06d87cec73	net: bfin_mac: do not reset PHY after phy_start() The PHY is already reset during driver probing, and this manual reset after calling phy_start() will wipe out board-specific PHY fixups and driver specific configuration initialization. Remove that explicit PHY reset. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:59 -05:00
Florian Fainelli	f0528ce7a4	net: greth: use phy_read_status() In case the greth driver is bound to anything but the Generic PHY driver or the PHY has a special read_status callback implemented, unexpected things will happen. Make sure we that we use phy_read_status() which does the proper abstraction of calling the driver specific read_status() callback for a given PHY. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:58 -05:00
Florian Fainelli	2613f95f61	net: phy: use phy_init_hw instead of open-coding it Use phy_init_hw() instead of open-coding it in phy_mii_ioctl(), this improves consistenty and makes sure that we will not duplicate the same routine somewhere else. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:58 -05:00
Florian Fainelli	114002bc1a	net: phy: report link partner features through ethtool The PHY library already reads the MII_STAT1000 and MII_LPA registers in genphy_read_status(), so extend it to also populate the PHY device link partner advertised features such that we can feed this back into ethtool when asked for it in phy_ethtool_gset(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:38:58 -05:00
Zhi Yong Wu	73713357ab	tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg() By checking related codes, it is impossible that ret > len or total_len, so we should remove some useless codes in both above functions. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:36:08 -05:00
Zhi Yong Wu	41e4af69a5	macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg() By checking related codes, it is impossible that ret > len or total_len, so we should remove some useless coeds in both above functions. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:35:42 -05:00
Paul Durrant	ca2f09f2b2	xen-netback: improve guest-receive-side flow control The way that flow control works without this patch is that, in start_xmit() the code uses xenvif_count_skb_slots() to predict how many slots xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek' counter which it then uses to determine if the shared ring has that amount of space available by checking whether 'req_prod' has passed that value. If the ring doesn't have space the tx queue is stopped. xenvif_gop_skb() will then consume slots and update 'req_cons' and issue responses, updating 'rsp_prod' as it goes. The frontend will consume those responses and post new requests, by updating req_prod. So, req_prod chases req_cons which chases rsp_prod, and can never exceed that value. Thus if xenvif_count_skb_slots() ever returns a number of slots greater than xenvif_gop_skb() uses, req_cons_peek will get to a value that req_prod cannot possibly achieve (since it's limited by the 'real' req_cons) and, if this happens enough times, req_cons_peek gets more than a ring size ahead of req_cons and the tx queue then remains stopped forever waiting for an unachievable amount of space to become available in the ring. Having two routines trying to calculate the same value is always going to be fragile, so this patch does away with that. All we essentially need to do is make sure that we have 'enough stuff' on our internal queue without letting it build up uncontrollably. So start_xmit() makes a cheap optimistic check of how much space is needed for an skb and only turns the queue off if that is unachievable. net_rx_action() is the place where we could do with an accurate predicition but, since that has proven tricky to calculate, a cheap worse-case (but not too bad) estimate is all we really need since the only thing we must prevent is xenvif_gop_skb() consuming more slots than are available. Without this patch I can trivially stall netback permanently by just doing a large guest to guest file copy between two Windows Server 2008R2 VMs on a single host. Patch tested with frontends in: - Windows Server 2008R2 - CentOS 6.0 - Debian Squeeze - Debian Wheezy - SLES11 Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Annie Li <annie.li@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:33:12 -05:00
Erik Hugne	512137eeff	tipc: remove interface state mirroring in bearer struct 'tipc_bearer' is a generic representation of the underlying media type, and exists in a one-to-one relationship to each interface TIPC is using. The struct contains a 'blocked' flag that mirrors the operational and execution state of the represented interface, and is updated through notification calls from the latter. The users of tipc_bearer are checking this flag before each attempt to send a packet via the interface. This state mirroring serves no purpose in the current code base. TIPC links will not discover a media failure any faster through this mechanism, and in reality the flag only adds overhead at packet sending and reception. Furthermore, the fact that the flag needs to be protected by a spinlock aggregated into tipc_bearer has turned out to cause a serious and completely unnecessary deadlock problem. CPU0 CPU1 ---- ---- Time 0: bearer_disable() link_timeout() Time 1: spin_lock_bh(&b_ptr->lock) tipc_link_push_queue() Time 2: tipc_link_delete() tipc_bearer_blocked(b_ptr) Time 3: k_cancel_timer(&req->timer) spin_lock_bh(&b_ptr->lock) Time 4: del_timer_sync(&req->timer) I.e., del_timer_sync() on CPU0 never returns, because the timer handler on CPU1 is waiting for the bearer lock. We eliminate the 'blocked' flag from struct tipc_bearer, along with all tests on this flag. This not only resolves the deadlock, but also simplifies and speeds up the data path execution of TIPC. It also fits well into our ongoing effort to make the locking policy simpler and more manageable. An effect of this change is that we can get rid of functions such as tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer(). We replace the latter with a new function, tipc_reset_bearer(), which resets all links associated to the bearer immediately after an interface goes down. A user might notice one slight change in link behaviour after this change. When an interface goes down, (e.g. through a NETDEV_DOWN event) all attached links will be reset immediately, instead of leaving it to each link to detect the failure through a timer-driven mechanism. We consider this an improvement, and see no obvious risks with the new behavior. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <Paul.Gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:30:29 -05:00
wangweidong	b73e9e3cf0	x25: convert printks to pr_<level> use pr_<level> instead of printk(LEVEL) Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:24:18 -05:00
Daniel Borkmann	d346a3fae3	packet: introduce PACKET_QDISC_BYPASS socket option This patch introduces a PACKET_QDISC_BYPASS socket option, that allows for using a similar xmit() function as in pktgen instead of taking the dev_queue_xmit() path. This can be very useful when PF_PACKET applications are required to be used in a similar scenario as pktgen, but with full, flexible packet payload that needs to be provided, for example. On default, nothing changes in behaviour for normal PF_PACKET TX users, so everything stays as is for applications. New users, however, can now set PACKET_QDISC_BYPASS if needed to prevent own packets from i) reentering packet_rcv() and ii) to directly push the frame to the driver. In doing so we can increase pps (here 64 byte packets) for PF_PACKET a bit: # CPUs -- QDISC_BYPASS -- qdisc path -- qdisc path[] 1 CPU == 1,509,628 pps -- 1,208,708 -- 1,247,436 2 CPUs == 3,198,659 pps -- 2,536,012 -- 1,605,779 3 CPUs == 4,787,992 pps -- 3,788,740 -- 1,735,610 4 CPUs == 6,173,956 pps -- 4,907,799 -- 1,909,114 5 CPUs == 7,495,676 pps -- 5,956,499 -- 2,014,422 6 CPUs == 9,001,496 pps -- 7,145,064 -- 2,155,261 7 CPUs == 10,229,776 pps -- 8,190,596 -- 2,220,619 8 CPUs == 11,040,732 pps -- 9,188,544 -- 2,241,879 9 CPUs == 12,009,076 pps -- 10,275,936 -- 2,068,447 10 CPUs == 11,380,052 pps -- 11,265,337 -- 1,578,689 11 CPUs == 11,672,676 pps -- 11,845,344 -- 1,297,412 [...] 20 CPUs == 11,363,192 pps -- 11,014,933 -- 1,245,081 []: qdisc path with packet_rcv(), how probably most people seem to use it (hopefully not anymore if not needed) The test was done using a modified trafgen, sending a simple static 64 bytes packet, on all CPUs. The trick in the fast "qdisc path" case, is to avoid reentering packet_rcv() by setting the RAW socket protocol to zero, like: socket(PF_PACKET, SOCK_RAW, 0); Tradeoffs are documented as well in this patch, clearly, if queues are busy, we will drop more packets, tc disciplines are ignored, and these packets are not visible to taps anymore. For a pktgen like scenario, we argue that this is acceptable. The pointer to the xmit function has been placed in packet socket structure hole between cached_dev and prot_hook that is hot anyway as we're working on cached_dev in each send path. Done in joint work together with Jesper Dangaard Brouer. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:23:33 -05:00
Daniel Borkmann	4262e5ccbb	net: dev: move inline skb_needs_linearize helper to header As we need it elsewhere, move the inline helper function of skb_needs_linearize() over to skbuff.h include file. While at it, also convert the return to 'bool' instead of 'int' and add a proper kernel doc. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:23:33 -05:00
David S. Miller	34f9f43710	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge 'net' into 'net-next' to get the AF_PACKET bug fix that Daniel's direct transmit changes depend upon. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:20:14 -05:00

... 3 4 5 6 7 ...

413857 Commits