Commit Graph

263340 Commits

Author SHA1 Message Date
Tim Chen
0856a30409 Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path
Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
allow credentials to work across pid namespaces for packets sent via
UNIX sockets.  However, the atomic reference counts on pid and
credentials caused plenty of cache bouncing when there are numerous
threads of the same pid sharing a UNIX socket.  This patch mitigates the
problem by eliminating extraneous reference counts on pid and
credentials on both send and receive path of UNIX sockets. I found a 2x
improvement in hackbench's threaded case.

On the receive path in unix_dgram_recvmsg, currently there is an
increment of reference count on pid and credentials in scm_set_cred.
Then there are two decrement of the reference counts.  Once in scm_recv
and once when skb_free_datagram call skb->destructor function
unix_destruct_scm.  One pair of increment and decrement of ref count on
pid and credentials can be eliminated from the receive path.  Until we
destroy the skb, we already set a reference when we created the skb on
the send side.

On the send path, there are two increments of ref count on pid and
credentials, once in scm_send and once in unix_scm_to_skb.  Then there
is a decrement of the reference counts in scm_destroy's call to
scm_destroy_cred at the end of unix_dgram_sendmsg functions.   One pair
of increment and decrement of the reference counts can be removed so we
only need to increment the ref counts once.

By incorporating these changes, for hackbench running on a 4 socket
NHM-EX machine with 40 cores, the execution of hackbench on
50 groups of 20 threads sped up by factor of 2.

Hackbench command used for testing:
./hackbench 50 thread 2000

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 19:41:13 -07:00
Michio Honda
6af29ccc22 sctp: Bundle HEAERTBEAT into ASCONF_ACK
With this patch a HEARTBEAT chunk is bundled into the ASCONF-ACK
for ADD IP ADDRESS, confirming the new destination as quickly as
possible.

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 19:41:13 -07:00
Michio Honda
f207c050fb sctp: HEARTBEAT negotiation after ASCONF
This patch fixes BUG that the ASCONF receiver transmits DATA chunks
to the newly added UNCONFIRMED destination.

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 19:41:09 -07:00
Nandita Dukkipati
a262f0cdf1 Proportional Rate Reduction for TCP.
This patch implements Proportional Rate Reduction (PRR) for TCP.
PRR is an algorithm that determines TCP's sending rate in fast
recovery. PRR avoids excessive window reductions and aims for
the actual congestion window size at the end of recovery to be as
close as possible to the window determined by the congestion control
algorithm. PRR also improves accuracy of the amount of data sent
during loss recovery.

The patch implements the recommended flavor of PRR called PRR-SSRB
(Proportional rate reduction with slow start reduction bound) and
replaces the existing rate halving algorithm. PRR improves upon the
existing Linux fast recovery under a number of conditions including:
  1) burst losses where the losses implicitly reduce the amount of
outstanding data (pipe) below the ssthresh value selected by the
congestion control algorithm and,
  2) losses near the end of short flows where application runs out of
data to send.

As an example, with the existing rate halving implementation a single
loss event can cause a connection carrying short Web transactions to
go into the slow start mode after the recovery. This is because during
recovery Linux pulls the congestion window down to packets_in_flight+1
on every ACK. A short Web response often runs out of new data to send
and its pipe reduces to zero by the end of recovery when all its packets
are drained from the network. Subsequent HTTP responses using the same
connection will have to slow start to raise cwnd to ssthresh. PRR on
the other hand aims for the cwnd to be as close as possible to ssthresh
by the end of recovery.

A description of PRR and a discussion of its performance can be found at
the following links:
- IETF Draft:
    http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01
- IETF Slides:
    http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf
    http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf
- Paper to appear in Internet Measurements Conference (IMC) 2011:
    Improving TCP Loss Recovery
    Nandita Dukkipati, Matt Mathis, Yuchung Cheng

Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 19:40:40 -07:00
chetan loke
f6fb8f100b af-packet: TPACKET_V3 flexible buffer implementation.
1) Blocks can be configured with non-static frame-size.
2) Read/poll is at a block-level(as opposed to packet-level).
3) Added poll timeout to avoid indefinite user-space wait on idle links.
4) Added user-configurable knobs:
   4.1) block::timeout.
   4.2) tpkt_hdr::sk_rxhash.

Changes:
C1) tpacket_rcv()
    C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
          The bulk of the processing is then moved in the following chain:
          packet_current_rx_frame()
            __packet_lookup_frame_in_block
              fill_curr_block()
              or
                retire_current_block
                dispatch_next_block
              or
              return NULL(queue is plugged/paused)

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 19:40:40 -07:00
chetan loke
0d4691ce11 af-packet: Added TPACKET_V3 headers.
Added TPACKET_V3 definitions.

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 19:40:39 -07:00
Alexander Smirnov
44331fe2aa IEEE802.15.4: 6LoWPAN basic support
This patch provides base support for transmission of IPv6 packets as
well as the formation of IPv6 link-local addresses and statelessly
autoconfigured addresses on top of IEEE 802.15.4 networks.

For more information please look at the RFC4944 "Compression Format
for IPv6 Datagrams in Low Power and Losst Networks (6LoWPAN).

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 19:36:06 -07:00
Ian Campbell
804cf14ea5 net: xfrm: convert to SKB frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 17:52:12 -07:00
Ian Campbell
408dadf03f net: ipv6: convert to SKB frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 17:52:12 -07:00
Ian Campbell
aff65da0f1 net: ipv4: convert to SKB frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 17:52:11 -07:00
Ian Campbell
ea2ab69379 net: convert core to skb paged frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 17:52:11 -07:00
Sathya Perla
15133fbbb9 be2net: remove unused variable
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 16:19:29 -07:00
Sathya Perla
e2edb7d51f be2net: increase FW update completion timeout
Flashing some of the PHYs can take longer thus increasing the total flash
update time to a max of 40s.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 16:19:29 -07:00
Sathya Perla
09c1c68f22 be2net: fix erx->rx_drops_no_frags wrap around
The rx_drops_no_frags HW counter for RSS rings is 16bits in HW and can
wraparound often. Maintain a 32-bit accumulator in the driver to prevent
frequent wraparound.

Also, incorporated Eric's feedback to use ACCESS_ONCE() for the accumulator
write.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 16:19:29 -07:00
Sathya Perla
db3ea7819d be2net: get rid of memory mapped pci-cfg space address
Get rid of adapter->pcicfg and its use. Use pci_config_read/write_dword()
instead.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 16:19:29 -07:00
Sathya Perla
857c99059e be2net: Fix race in posting rx buffers.
There is a possibility of be_post_rx_frags() being called simultaneously from
both be_worker() (when rx_post_starved) and be_poll_rx() (when rxq->used is 0).
This can be avoided by posting rx buffers only when some completions have been
reaped.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 16:19:28 -07:00
Eric Dumazet
ec5efe7946 rps: support IPIP encapsulation
Skip IPIP header to get proper layer-4 information.

Like GRE tunnels, this only works if rxhash is not already provided by
the device itself (ethtool -K ethX rxhash off), to allow kernel compute
a software rxhash.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 16:13:55 -07:00
David S. Miller
0e43182c0c Merge branch 'batman-adv/next' of git://git.open-mesh.org/linux-merge 2011-08-24 10:33:00 -07:00
Ian Campbell
131ea6675c net: add APIs for manipulating skb page fragments.
The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
bare get/put_page on SKB pages fragments and to isolate users from subsequent
changes to the skb_frag_t data structure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-22 16:59:54 -07:00
David S. Miller
7ae9ed8d32 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2011-08-22 14:47:43 -07:00
Jiri Pirko
0dfe178239 net: vlan: goto another_round instead of calling __netif_receive_skb
Now, when vlan tag on untagged in non-accelerated path is stripped from
skb, headers are reset right away. Benefit from that and avoid calling
__netif_receive_skb recursivelly and just use another_round.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-22 12:43:22 -07:00
Jesper Juhl
c8d755b59a net/wan/hdlc_ppp: use break in switch
We'll either hit one of the case labels or the default in the switch
and in all cases do we then 'goto out' and we also have a 'goto out'
after the switch that is redundant. Change to just use break in the
case statements and leave the 'goto out' after the lop for everyone to
hit.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-22 11:30:38 -07:00
John W. Linville
b38d355eaa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/staging/ath6kl/miscdrv/ar3kps/ar3kpsparser.c
	drivers/staging/ath6kl/os/linux/ar6000_drv.c
2011-08-22 14:28:50 -04:00
Marek Lindner
a943cac144 batman-adv: merge update_transtable() into tt related code
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22 15:16:22 +02:00
Marek Lindner
267151cdfd batman-adv: reuse tt_len() to calculate tt buffer length
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
2011-08-22 15:16:22 +02:00
Antonio Quartulli
df6edb9e69 batman-adv: print client flags in the local/global transtables output
Since clients can have several flags on or off, this patches make them
appear in the local/global transtable output so that they can be checked
for debugging purposes.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22 15:16:22 +02:00
Antonio Quartulli
3d393e4732 batman-adv: implement AP-isolation on the sender side
If a node has to send a packet issued by a WIFI client to another WIFI client,
the packet is dropped.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22 15:16:21 +02:00
Antonio Quartulli
59b699cdee batman-adv: implement AP-isolation on the receiver side
When a node receives a unicast packet it checks if the source and the
destination client can communicate or not due to the AP isolation

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22 15:16:20 +02:00
Antonio Quartulli
bc2790808a batman-adv: detect clients connected through a 802.11 device
Clients connected through a 802.11 device are now marked with the
TT_CLIENT_WIFI flag. This flag is also advertised with the tt
announcement.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22 15:16:20 +02:00
Antonio Quartulli
015758d002 batman-adv: correct several typ0s in the comments
Several typos have been corrected and some sentences have been rephrased

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22 15:16:19 +02:00
Antonio Quartulli
1a1f37d925 batman-adv: hash_add() has to discriminate on the return value
hash_add() returns 0 on success while returns -1 either on error and on
entry already present. The caller could use such information to select
its behaviour. For this reason it is useful that hash_add() returns -1
in case on error and returns 1 in case of entry already present.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-08-22 15:16:19 +02:00
David S. Miller
ca1ba7caa6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next
Conflicts:
	drivers/net/ethernet/intel/e1000e/netdev.c
2011-08-20 17:25:36 -07:00
Changli Gao
6461be3a54 net: Preserve ooo_okay when copying skb header
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 14:20:49 -07:00
Vladimir Zapolskiy
2e025c71ce dm9000: define debug level as a module parameter
This change allows to get driver specific debug messages output
providing a module parameter. As far as the maximum level of verbosity
is too high, it is demoted by default.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 14:15:55 -07:00
David S. Miller
823dcd2506 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-08-20 10:39:12 -07:00
Matt Carlson
eaa36660de tg3: Update version to 3.120
This patch updates the tg3 version to 3.120.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 10:34:27 -07:00
Matt Carlson
941ec90f35 tg3: Add external loopback support to selftest
This patch adds external loopback support to tg3's ethtool selftest.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 10:34:27 -07:00
Matt Carlson
28a4595786 tg3: Restructure tg3_test_loopback
The tg3_test_loopback() function is starting to get more complicated as
more loopback tests are added.  This patch cleans up the code.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 10:34:27 -07:00
Matt Carlson
5e5a7f371f tg3: Pull phy int lpbk setup into separate func
This patch pulls out the internal phy loopback setup code into a
separate function.  This cleans up the loopback test code and makes it
available for NETIF_F_LOOPBACK support later.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 10:34:27 -07:00
Matt Carlson
6e01b20b21 tg3: Consilidate MAC loopback code
The driver puts the device into MAC loopback in two places in the
driver.  This patch consolidates the code into a single routine.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 10:34:27 -07:00
Matt Carlson
2215e24ceb tg3: Remove dead code
Now that CPMU devices don't do MAC loopback, all the CPMU power saving
mode adjustments are unneeded.  This patch removes the dead code.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 10:34:27 -07:00
Jitendra Kalsaria
b2d2ccffa4 qlge: Adding Maintainer.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20 10:33:34 -07:00
Changli Gao
4ca2462e94 net: add the comment for skb->l4_rxhash
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-19 07:26:44 -07:00
Alexander Duyck
66f32a8b97 ixgbe: Cleanup FCOE and VLAN handling in xmit_frame_ring
This change is meant to further cleanup the transmit path by streamlining
some of the VLAN and FCOE/DCB tasks in the transmit path.  In addition it
adds code for support software VLANs in the event that they are used in
conjunction with DCB and/or FCOE.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-19 06:02:40 -07:00
Alexander Duyck
971060b106 ixgbe: replace reference to CONFIG_FCOE with IXGBE_FCOE
CONFIG_FCOE is not the correct define to check since it is possible for it
to be CONFIG_FCOE_MODULE, as such the reference to it should be replaced
with IXGBE_FCOE.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-19 06:01:59 -07:00
Alexander Duyck
d3d0023979 ixgbe: Refactor transmit map and cleanup routines
This patch implements a partial refactor of the TX map/queue and cleanup
routines.  It merges the map and queue functionality and as a result
improves the transmit performance by avoiding unnecessary reads from memory.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-19 05:57:43 -07:00
Amir Hanania
0ebafd8665 ixgbe - DDP last user buffer - error to warn
Change the error message in the last DDP user buffer to warn_once

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-19 04:43:00 -07:00
Daniel Baluta
98e77438ae ipv6: Fix ipv6_getsockopt for IPV6_2292PKTOPTIONS
IPV6_2292PKTOPTIONS is broken for 32-bit applications running
in COMPAT mode on 64-bit kernels.

The same problem was fixed for IPv4 with the patch:
ipv4: Fix ip_getsockopt for IP_PKTOPTIONS,
commit dd23198e58

Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-19 03:19:07 -07:00
Bruce Allan
c5778b43df e1000e: bump driver version number
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-19 01:17:31 -07:00
Bruce Allan
5f450212f2 e1000e: convert driver to use extended descriptors
Some features currently not supported by the driver (e.g. RSS) require the
use of extended descriptors, but the driver is setup to only use legacy
descriptors in all modes except for when jumbo frames are enabled on some
parts.  Convert the driver to always use extended descriptors in order to
enable the forthcoming support of these other features.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-19 01:17:31 -07:00