Commit Graph

375612 Commits

Author SHA1 Message Date
Peter Hüwe
31d60ebfbe net/ethernet/dec/tulip/xircom_cb: Use module_pci_driver to register driver
Removing some boilerplate by using module_pci_driver instead of calling
register and unregister in the otherwise empty init/exit functions.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22 14:35:03 -07:00
Peter Hüwe
52a58f9df1 net/ethernet/sis/sis190: Use module_pci_driver to register driver
Removing some boilerplate by using module_pci_driver instead of calling
register and unregister in the otherwise empty init/exit functions.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22 14:35:03 -07:00
Peter Hüwe
c996f4e8f0 net/ethernet/atheros/atlx/atl1: Use module_pci_driver to register driver
Removing some boilerplate by using module_pci_driver instead of calling
register and unregister in the otherwise empty init/exit functions.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22 14:35:03 -07:00
Peter Hüwe
687df5d489 net/ethernet/atheros/atl1e/atl1e_main: Use module_pci_driver to register driver
Removing some boilerplate by using module_pci_driver instead of calling
register and unregister in the otherwise empty init/exit functions.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22 14:35:03 -07:00
Peter Hüwe
2c21d6c988 net/ethernet/atheros/atl1c/atl1c_main: Use module_pci_driver to register driver
Removing some boilerplate by using module_pci_driver instead of calling
register and unregister in the otherwise empty init/exit functions.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22 14:35:03 -07:00
Peter Hüwe
61c951503b net/ethernet/silan/sc92031: Use module_pci_driver to register driver
Removing some boilerplate by using module_pci_driver instead of calling
register and unregister in the otherwise empty init/exit functions.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22 14:35:03 -07:00
David S. Miller
31400fe3ce Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to e1000e, igb and ixgbe.

Bruce Allan provide 2 minor cleanups for e1000e to resolve whitespace
issues and build warnings about unused parameters.

Carolyn provides a couple of fixes for igb, one being a fix for a
possible panic when the interface is down and receive traffic
arrives.  The second fix resolves an issue on newer parts which have
multiple checksum fields and set_ethtool was only checking to update
the first checksum of the NVM image.

Akeem provides majority of the changes in this patch set.  Akeem
provides a fix for e1000e on an issue reported from the community to
resolve the issue of unlocking swflag_mutex for 82574 and 82583
devices even if the hardware semaphore was successfully acquired.
The other patches from Akeem are against igb, where he adds support
SFP module discovery, LED blink mechanism for devices using cathodes,
LED support for i210/i211 parts and cleanup of a i2c function which
was not being used.

Matthew provides an update for igb to support a more accurate check
for a PTP RX hang.

Amir provides a patch for ixgbe to set the software prio_tc values at
initialization to the hardware setting to remove the need to reset the
device at the first time we call ixgbe_dcbnl_ieee_setets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-22 13:56:56 -07:00
Amir Hanania
e8915bebb4 IXGBE: Set the SW prio_tc values at initialization to the HW setting.
Set the SW prio_tc values at initialization to the HW setting.
Setting the SW prio_tc default values to be the HW setting by reading the
rtrup2tc register. For any TC change we need to reset the device.
This will remove the need to reset the device at the first
time we call ixgbe_dcbnl_ieee_setets.

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Tested-by: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 03:14:21 -07:00
Akeem G. Abodunrin
f7727c5309 igb: Removed unused i2c function
This patch removes unused i2c function definition.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 03:08:15 -07:00
Akeem G. Abodunrin
6f8b916065 igb: Implementation of i210/i211 LED support
This patch fixes LED issues with i210 and i211 devices, due to changes in the
device registers.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 03:01:58 -07:00
Carolyn Wyborny
41f149a285 igb: Fix possible panic caused by Rx traffic arrival while interface is down
This patch reorders disabling napi and irqs during igb_down.
This is done to avoid possible panic's found in other Intel drivers
when Rx traffic arrives while interface is going down.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:55:38 -07:00
Carolyn Wyborny
2a0a0f1ea2 igb: Fix set_ethtool function to call update nvm for entire image
This patch fixes a problem where we were only checking to update checksum
on first part of nvm image.  Newer parts have multiple checksum fields and
checksum function will accommodate that as long as we call it in the first
place for any changes made.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:49:34 -07:00
Akeem G. Abodunrin
373e6978f9 igb: SerDes flow control setting
This path allows users to get appropriate flow control setting on SerDes
devices, based on original implementation for Copper devices.
Also, since 100baseFX does not support setting flow control, so exclude
it from the setting mechanism.

Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:43:39 -07:00
Akeem G. Abodunrin
641ac5c0cd igb: Support for SFP modules discovery
This patch adds support for SFP modules media type discovery for
SGMII, which will enable driver to detect supported external PHYs,
including 100baseFXSFP module.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:37:36 -07:00
Matthew Vick
20a4841228 igb: Add update to last_rx_timestamp in Rx rings
In order to support a more accurate check for a PTP Rx hang where the
device can no longer timestamp received packets, we need to update, per
ring, when the last Rx timestamp was. Because of how the PTP Rx hang logic
works, the current logic is valid, but properly updating the ring variable
increases the accuracy of the check.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:31:43 -07:00
Akeem G. Abodunrin
cf7ed22171 igb: Changed LEDs blink mechanism to include designs using cathode
This patch addresses the changes needed to make LEDs work properly with
negative logic. This implementation uses LED Invert bit to reverse the
logic issue that occurred when LEDs are driven by cathode. Keep LEDs
blinking for SerDes devices. Also made changes to magic number and the
for loop to reduce number of shifts.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:25:25 -07:00
Akeem G. Abodunrin
6c1d8b96d0 e1000e: Release mutex lock only if it has been initially acquired
This patch fixes the issue of unlocking swflag_mutex for 82574 and 82583
devices regardless of if the hw semaphore has been successfully acquired via
e1000_get_hw_semaphore_82574(). With this patch, unlocking mutex now depends
on if the hw semaphore was successfully acquired before. And 82574/82583
devices are reset regardless of whether e1000_get_hw_semaphore_82574()
returns success or failure.

Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:19:03 -07:00
Bruce Allan
603cdca980 e1000e: prevent warning from -Wunused-parameter
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:13:07 -07:00
Bruce Allan
e80bd1d181 e1000e: cleanup whitespace
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-05-21 02:07:01 -07:00
Florian Fainelli
5ea94e7686 phy: add phy_mac_interrupt() to use with PHY_IGNORE_INTERRUPT
There is currently no way for an Ethernet MAC driver servicing PHY link
interrupts to notify this to the PHY state machine without defining its
own state machine. Since most drivers are not so special, introduce a
helper: phy_mac_interrupt() which can be called from a link up/down
interrupt routine to update the PHY state machine. To avoid code
duplication some refactoring has been done to expose the workqueue and
its corresponding callback internally.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:13:08 -07:00
Florian Fainelli
2c7b49212a phy: fix the use of PHY_IGNORE_INTERRUPT
When a PHY device is registered with the special IRQ value
PHY_IGNORE_INTERRUPT (-2) it will not properly be handled by the PHY
library:

- it continues to poll its register, while we do not want this
  because such PHY link events or register changes are serviced by an
  Ethernet MAC
- it will still try to configure PHY interrupts at the PHY level, such
  interrupts do not exist at the PHY but at the MAC level
- the state machine only handles PHY_POLL, but should also handle
  PHY_IGNORE_INTERRUPT similarly

This patch updates the PHY state machine and initialization paths to
account for the specific PHY_IGNORE_INTERRUPT. Based on an earlier patch
by Thomas Petazzoni, and reworked to add the missing bits. Add a helper
phy_interrupt_is_valid() which specifically tests for a PHY interrupt
not to be PHY_POLL or PHY_IGNORE_INTERRUPT and use it throughout the
code.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:13:08 -07:00
Rasesh Mody
45e9834143 bna: Driver and Firmware Updated
Driver and Firmware versions updated to 3.2.21.1.

Signed-off-by: Rasesh Mody <rmody@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:07:59 -07:00
Rasesh Mody
43c07adaeb bna: Enahncement to Identify Default IOC Function
User should not be allowed to delete base function of eth port. Add a new field
to the bfa ioc attributes structure to indicate if the given ioc is default
function on the port or not.

Signed-off-by: Rasesh Mody <rmody@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:07:59 -07:00
Rasesh Mody
f489a4ba76 bna: Fix Ucast Failure Handling
Failure of the UCAST set for base mac address fails when user configures a
duplicate mac address that matches that of another vNIC on the same port.
The bna does not handle the ucast failure and keeps this address in cache.
On disable of the vNIC, bna tries to delete the failed base mac address and the
fw asserts.

On failure of ucast address, mark ucast address set to false.

Signed-off-by: Rasesh Mody <rmody@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:07:59 -07:00
Rasesh Mody
4b2305826c bna: Clear Driver Config Flags When HW Resets
Driver configuration flags are retained across open/stop operations preventing
configurations to be set in next open/stop. Setting MTU on a 1020 causes
network to fail until a reboot is performed on the host.

Clear the flags when configuration resets in hardware.

Signed-off-by: Rasesh Mody <rmody@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:07:59 -07:00
Tomasz Figa
0b8bf1baab net: dm9000: Allow instantiation using device tree
This patch adds Device Tree support to dm9000 driver.

Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
Reviewed-by: Sylwester Nawrocki <sylvester.nawrocki@gmail.com>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:03:51 -07:00
Daniel Borkmann
aafc787e41 arm: bpf_jit: can call module_free() from any context
Follow-up on module_free()/vfree() that takes care of the rest, so no
longer this workaround with work_struct needed.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:03:50 -07:00
Daniel Borkmann
ed900ffb73 ppc: bpf_jit: can call module_free() from any context
Followup patch on module_free()/vfree() that takes care of the rest, so
no longer this workaround with work_struct is needed.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matt Evans <matt@ozlabs.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:03:50 -07:00
Eric Dumazet
71cea17ed3 tcp: md5: remove spinlock usage in fast path
TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.

[ tcp_md5sig_pool_lock is locked twice per incoming packet ]

Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.

Next step would be to allow crypto allocations being done in a NUMA
aware way.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 14:00:42 -07:00
Daniel Borkmann
168fc21a97 net: ipv6: remove 'next' member from inet6_dev
The next pointer within the inet6_dev structure seems not to be used
anywhere. So just remove it. Tested with allmodconfig on x86_64.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:49:48 -07:00
Willem de Bruijn
99bbc70741 rps: selective flow shedding during softnet overflow
A cpu executing the network receive path sheds packets when its input
queue grows to netdev_max_backlog. A single high rate flow (such as a
spoofed source DoS) can exceed a single cpu processing rate and will
degrade throughput of other flows hashed onto the same cpu.

This patch adds a more fine grained hashtable. If the netdev backlog
is above a threshold, IRQ cpus track the ratio of total traffic of
each flow (using 4096 buckets, configurable). The ratio is measured
by counting the number of packets per flow over the last 256 packets
from the source cpu. Any flow that occupies a large fraction of this
(set at 50%) will see packet drop while above the threshold.

Tested:
Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
kernel receive (RPS) on cpu0 and application threads on cpus 2--7
each handling 20k req/s. Throughput halves when hit with a 400 kpps
antagonist storm. With this patch applied, antagonist overload is
dropped and the server processes its complete load.

The patch is effective when kernel receive processing is the
bottleneck. The above RPS scenario is a extreme, but the same is
reached with RFS and sufficient kernel processing (iptables, packet
socket tap, ..).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:48:04 -07:00
Fabio Estevam
4a5bddf7ea fec: Let device core handle pinctrl
Since commit ab78029 (drivers/pinctrl: grab default handles from device core)
we can rely on device core for handling pinctrl, so remove
devm_pinctrl_get_select_default() from the driver.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:44:32 -07:00
Wei Liu
1ca2983aaa xen-netfront: avoid leaking resources when setup_netfront fails
We should correctly free related resources (grant ref, memory page, evtchn)
when setup_netfront fails.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:43:09 -07:00
Tony Prisk
6dffbe53fb net: velocity: Add platform device support to VIA velocity driver
Add support for the VIA Velocity network driver to be bound to a
OF created platform device.

Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:40:39 -07:00
Tony Prisk
e2c41f143f net: velocity: Convert to generic dma functions
Remove the pci_* dma functions and replace with the more generic
versions.

In preparation of adding platform support, a new struct device *dev
is added to struct velocity_info which can be used by both the pci
and platform code.

Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:40:39 -07:00
Tony Prisk
a9683c94fe net: velocity: Rename vptr->dev to vptr->netdev
Improve the clarity of the code in preparation for converting the
dma functions to generic versions, which require a struct device *.

This makes it possible to store a 'struct device *dev' in the
velocity_info structure.

Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:40:39 -07:00
Sergei Shtylyov
4fc1ad6f4b 3c59x: remove useless VORTEX_PCI() invocations
It's suboptimal to invoke quite complex VORTEX_PCI() macro every time we want
to get a 'struct pci_dev *' when we already have it in a variable...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 13:34:20 -07:00
Rolf Eike Beer
1e18583adc ThunderLAN: remove is_eisa flag
These 2 places are the only matches for is_eisa in the whole tree.

Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 00:20:14 -07:00
Eric Dumazet
8802f5790e net-bnx2x: dont reload on GRO change
bnx2x_set_features() forces a driver reload if GRO setting is changed.

A reload makes the ethernet port unresponsive for about 5 seconds.

This is not needed in the common case LRO is enabled, as LRO
(TPA_ENABLE_FLAG) has precedence over GRO (GRO_ENABLE_FLAG)

Tested:
 Verified that "ethtool -K eth0 gro {on|off}" doesn't blackout
 the NIC anymore

Google-Bug-Id: 8440442
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kravkov <dmitry@broadcom.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 00:16:21 -07:00
David S. Miller
f6abf2b158 Merge branch 'tg3_eee'
Nithin Nayak Sujir says:

====================
This series adds support for modifying EEE settings via ethtool. Since this can
impact Link Flap Avoidance, the driver pulls the current hardware settings if
LFA is enabled. This is similar to how we do the link settings to avoid a flap.

v2: Fixes pointed out by Ben Hutchings.
 - Use MDIO_AN_EEE_LPABLE to set the lp_advertised field.
 - Check that tx_lpi_timer is within valid range.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 00:13:54 -07:00
Nithin Sujir
1cbf9eb85a tg3: Implement set/get_eee handlers
Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 00:13:48 -07:00
Nithin Sujir
5b6c273ad6 tg3: Simplify tg3_phy_eee_config_ok() by reusing tg3_eee_pull_config()
eee_config_ok() was checking only for mismatch in advertised settings.
This patch expands the scope of eee_config_ok() to check for mismatch in
the other eee settings. On mismatch we will require a call to
tg3_setup_eee() to push the configured settings to the hardware.

Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 00:13:47 -07:00
Nithin Sujir
400dfbaa8d tg3: Add tg3_eee_pull_config() function
Add tg3_eee_pull_config() to pull the settings from the hardware and
populate the eee structure.

If Link Flap Avoidance is enabled, we pull the eee settings from the hw
so as not to cause a phy reset on eee config mismatch later. This
requires moving down tg3_setup_eee() below the tg3_pull_config() to not
trample existing settings.

Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 00:13:47 -07:00
Nithin Sujir
9e2ecbeb25 tg3: Add ethtool_eee struct and tg3_setup_eee()
Add an eee structure and update it with eee settings. This will be used
for set/get_eee operations. Add common function tg3_setup_eee() that
will be used in the subsequent patches.

Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-20 00:13:47 -07:00
Eric Dumazet
164954454a filter: do not output bpf image address for security reason
Do not leak starting address of BPF JIT code for non root users,
as it might help intruders to perform an attack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 23:56:41 -07:00
Eric Dumazet
314beb9bca x86: bpf_jit_comp: secure bpf jit against spraying attacks
hpa bringed into my attention some security related issues
with BPF JIT on x86.

This patch makes sure the bpf generated code is marked read only,
as other kernel text sections.

It also splits the unused space (we vmalloc() and only use a fraction of
the page) in two parts, so that the generated bpf code not starts at a
known offset in the page, but a pseudo random one.

Refs:
http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html

Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 23:55:41 -07:00
Yuchung Cheng
3e59cb0ddf tcp: remove bad timeout logic in fast recovery
tcp_timeout_skb() was intended to trigger fast recovery on timeout,
unfortunately in reality it often causes spurious retransmission
storms during fast recovery. The particular sign is a fast retransmit
over the highest sacked sequence (SND.FACK).

Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
to avoid spurious timeout: when SND.UNA advances the sender re-arms
RTO and extends the timeout by icsk_rto. The sender does not offset
the time elapsed since the packet at SND.UNA was sent.

But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
sent before the icsk_rto interval lost, including one that's above the
highest sacked sequence. Most likely a large part of scorebard will be
marked.

If most packets are not lost then the subsequent DUPACKs with new SACK
blocks will cause the sender to continue to retransmit packets beyond
SND.FACK spuriously. Even if only one packet is lost the sender may
falsely retransmit almost the entire window.

The situation becomes common in the world of bufferbloat: the RTT
continues to grow as the queue builds up but RTTVAR remains small and
close to the minimum 200ms. If a data packet is lost and the DUPACK
triggered by the next data packet is slightly delayed, then a spurious
retransmission storm forms.

As the original comment on tcp_timeout_skb() suggests: the usefulness
of this feature is questionable. It also wastes cycles walking the
sack scoreboard and is actually harmful because of false recovery.

It's time to remove this.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 23:51:17 -07:00
Rami Rosen
3cc7587b30 Documentation/sysctl/net.txt: fix (attribute removal).
This patch removes mentioning the sysfsf net_device weight attribute
(class/net/<device>/weight)
in Documentation/sysctl/net.txt, since the net sysfs weight attribute
was removed by the following patch:

[NET]: Make NAPI polling independent of struct net_device objects
 bea3348eef

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 15:13:10 -07:00
Nicolas Dichtel
caeaba7900 ipv6: add support of peer address
This patch adds the support of peer address for IPv6. For example, it is
possible to specify the remote end of a 6inY tunnel.
This was already possible in IPv4:
 ip addr add ip1 peer ip2 dev dev1

The peer address is specified with IFA_ADDRESS and the local address with
IFA_LOCAL (like explained in include/uapi/linux/if_addr.h).
Note that the API is not changed, because before this patch, it was not
possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE.
There is a small change for the dump: if the peer is different from ::,
IFA_ADDRESS will contain the peer address instead of the local address.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-19 15:09:26 -07:00
Eric Dumazet
5199dfe531 sparc: bpf_jit_comp: can call module_free() from any context
module_free()/vfree() takes care of details, we no longer need a wrapper
and a work_struct.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-17 18:27:52 -07:00