Commit Graph

27656 Commits

Author SHA1 Message Date
Patrick McHardy
124dff01af netfilter: don't reset nf_trace in nf_reset()
Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 15:38:10 -04:00
Pablo Neira Ayuso
12202fa757 netfilter: remove unneeded variable proc_net_netfilter
Now that this supports net namespace for nflog and nfqueue,
we can remove the global proc_net_netfilter which has no
clients anymore.

Based on patch from Gao feng.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:11 +02:00
Gao feng
e817961048 netfilter: nfnetlink_queue: add net namespace support for nfnetlink_queue
This patch makes /proc/net/netfilter/nfnetlink_queue pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:11 +02:00
Gao feng
5b023fc8d8 netfilter: enable per netns support for nf_loggers
After this patch, all nf_loggers support net namespace. Still
xt_LOG and ebt_log require syslog netns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:10 +02:00
Gao feng
9368a53c47 netfilter: nfnetlink_log: add net namespace support for nfnetlink_log
This patch makes /proc/net/netfilter/nfnetlink_log pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:01 +02:00
Gao feng
355430671a netfilter: ipt_ULOG: add net namespace support for ipt_ULOG
Add pernet support to ipt_ULOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
nflognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:03:43 +02:00
Gao feng
5b175b7ebd netfilter: ebt_ulog: add net namespace support for ebt_ulog
Add pernet support to ebt_ulog by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
ebtulognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:59:50 +02:00
Gao feng
69b34fb996 netfilter: xt_LOG: add net namespace support for xt_LOG
Add pernet support to xt_LOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:58:45 +02:00
Gao feng
7d2789246c netfilter: ebt_log: add net namespace support for ebt_log
Add pernet support to ebt_log by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:57:27 +02:00
Gao feng
30e0c6a6be netfilter: nf_log: prepare net namespace support for loggers
This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:12:54 +02:00
Gao feng
f3c1a44a22 netfilter: make /proc/net/netfilter pernet
This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 19:35:02 +02:00
Jiri Pirko
34e2ed34a0 net: ipv4: notify when address lifetime changes
if userspace changes lifetime of address, send netlink notification and
call notifier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:51:12 -04:00
Eric W. Biederman
0e82e7f6df af_unix: If we don't care about credentials coallesce all messages
It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.

The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.

The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.

Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.

I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.

Reported-by: Karel Srot <ksrot@redhat.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:49:13 -04:00
Eric W. Biederman
25da0e3e9d Revert "af_unix: dont send SCM_CREDENTIAL when dest socket is NULL"
This reverts commit 14134f6584.

The problem that the above patch was meant to address is that af_unix
messages are not being coallesced because we are sending unnecesarry
credentials.  Not sending credentials in maybe_add_creds totally
breaks unconnected unix domain sockets that wish to send credentails
to other sockets.

In practice this break some versions of udev because they receive a
message and the sending uid is bogus so they drop the message.

Reported-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:49:03 -04:00
Vlad Yasevich
4543fbefe6 net: count hw_addr syncs so that unsync works properly.
A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:18:46 -04:00
David S. Miller
4f4ecd5f2a Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains netfilter updates for your net tree,
they are:

* Fix missing the skb->trace reset in nf_reset, noticed by Gao Feng
  while using the TRACE target with several net namespaces.

* Fix prefix translation in IPv6 NPT if non-multiple of 32 prefixes
  are used, from Matthias Schiffer.

* Fix invalid nfacct objects with empty name, they are now rejected
  with -EINVAL, spotted by Michael Zintakis, patch from myself.

* A couple of fixes for wrong return values in the error path of
  nfnetlink_queue and nf_conntrack, from Wei Yongjun.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:41:53 -04:00
David S. Miller
518314ffe4 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into wireless
John W. Linville says:

====================
Here are some more fixes intended for the 3.9 stream...

Regarding the mac80211 bits, Johannes says:

"I had changed the idle handling to simplify it, but broken the
sequencing of commands, at least for ath9k-htc, one patch restores the
sequence. The other patch fixes a crash Jouni found while stress-testing
the remain-on-channel code, when an item is deleted the work struct can
run twice and crash the second time."

As for the iwlwifi bits, Johannes says:

"The only fix here is to the passive-no-RX firmware regulatory
enforcement driver support code to not drop auth frames in quick
succession, leading to not being able to connect to APs on passive
channels in certain circumstances."

Don't forget the NFC bits, about which Samuel says:

"This time we have:

- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
  is physically removed.
- A potential double skb free when the LLCP socket receive queue is full.
- A fix for properly handling multiple and consecutive LLCP connections, and
  not trash the socket ack log.
- A build failure for the MEI microread physical layer, now that the MEI bus
  APIs have been merged into char-misc-next."

On top of that, Stone Piao provides an mwifiex fix to avoid accessing
beyond the end of a buffer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:39:06 -04:00
Jesper Dangaard Brouer
19952cc4f8 net: frag queue per hash bucket locking
This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
 http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
 (http://thread.gmane.org/gmane.linux.network/257155)
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>

----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:37:05 -04:00
John W. Linville
9306a398e7 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-03 14:19:48 -04:00
John W. Linville
407ad2b7ef Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-04-03 13:50:34 -04:00
Matthias Schiffer
906b1c394d netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths
The bitmask used for the prefix mangling was being calculated
incorrectly, leading to the wrong part of the address being replaced
when the prefix length wasn't a multiple of 32.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-03 12:24:56 +02:00
David S. Miller
d662483264 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull net into net-next to get the synchronize_net() bug fix in
bonding.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-03 01:31:54 -04:00
Jacob Keller
8facd5fb73 net: fix smatch warnings inside datagram_poll
Commit 7d4c04fc17 ("net: add option to enable
error queue packets waking select") has an issue due to operator precedence
causing the bit-wise OR to bind to the sock_flags call instead of the result of
the terniary conditional. This fixes the *_poll functions to work properly. The
old code results in "mask |= POLLPRI" instead of what was intended, which is to
only include POLLPRI when the socket option is enabled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 16:59:16 -04:00
Reilly Grant
990454b5a4 VSOCK: Handle changes to the VMCI context ID.
The VMCI context ID of a virtual machine may change at any time. There
is a VMCI event which signals this but datagrams may be processed before
this is handled. It is therefore necessary to be flexible about the
destination context ID of any datagrams received. (It can be assumed to
be correct because it is provided by the hypervisor.) The context ID on
existing sockets should be updated to reflect how the hypervisor is
currently referring to the system.

Signed-off-by: Reilly Grant <grantr@vmware.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:39:17 -04:00
Balakumaran Kannan
25fb6ca4ed net IPv6 : Fix broken IPv6 routing table after loopback down-up
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:37:19 -04:00
Paul Gortmaker
5e404cd658 ipconfig: add informative timeout messages while waiting for carrier
Commit 3fb72f1e6e ("ipconfig wait
for carrier") added a "wait for carrier on at least one interface"
policy, with a worst case maximum wait of two minutes.

However, if you encounter this, you won't get any feedback from
the console as to the nature of what is going on.  You just see
the booting process hang for two minutes and then continue.

Here we add a message so the user knows what is going on, and
hence can take action to rectify the situation (e.g. fix network
cable or whatever.)  After the 1st 10s pause, output now begins
that looks like this:

	Waiting up to 110 more seconds for network.
	Waiting up to 100 more seconds for network.
	Waiting up to 90 more seconds for network.
	Waiting up to 80 more seconds for network.
	...

Since most systems will have no problem getting link/carrier in the
1st 10s, the only people who will see these messages are people with
genuine issues that need to be resolved.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:35:33 -04:00
Vasily Averin
f0f6ee1f70 cbq: incorrect processing of high limits
currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link

 In shaper | Actual Result
-----------+---------------
  100M     | 108 Mbps
  200M     | 244 Mbps
  300M     | 412 Mbps
  500M     | 893 Mbps

This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.

To fix this problem we prevent change of q->now until its synchronization
with real time.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:29:20 -04:00
Eric Dumazet
4a0b5ec12f bridge: remove a redundant synchronize_net()
commit 00cfec3748 (net: add a synchronize_net() in
netdev_rx_handler_unregister())
allows us to remove the synchronized_net() call from del_nbp()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 12:10:58 -04:00
holger@eitzenberger.org
5c33448c40 netfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashing
Because rev1 and rev3 of the target share the same hashing
generalize it by introduing nfqueue_hash().

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:26:10 +02:00
holger@eitzenberger.org
8746ddcf12 netfilter: xt_NFQUEUE: introduce CPU fanout
Current NFQUEUE target uses a hash, computed over source and
destination address (and other parameters), for steering the packet
to the actual NFQUEUE. This, however forgets about the fact that the
packet eventually is handled by a particular CPU on user request.

If E. g.

  1) IRQ affinity is used to handle packets on a particular CPU already
     (both single-queue or multi-queue case)

and/or

  2) RPS is used to steer packets to a specific softirq

the target easily chooses an NFQUEUE which is not handled by a process
pinned to the same CPU.

The idea is therefore to use the CPU index for determining the
NFQUEUE handling the packet.

E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
looks like this:

 +-----+  +-----+  +-----+  +-----+
 |NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
 +-----+  +-----+  +-----+  +-----+
    ^        ^        ^        ^
    |        |NFQUEUE |        |
    +        +        +        +
 +-----+  +-----+  +-----+  +-----+
 |rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
 +-----+  +-----+  +-----+  +-----+

The NFQUEUEs not necessarily have to start with number 0, setups with
less NFQUEUEs than packet-handling CPUs are not a problem as well.

This patch extends the NFQUEUE target to accept a new
NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
CPU index for determining the NFQUEUE being used. I have to introduce
rev3 for this. The 'flags' are folded into _v2 'bypass'.

By changing the way which queue is assigned, I'm able to improve the
performance if the processes reading on the NFQUEUs are pinned
correctly.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:25:44 +02:00
Gao feng
f016588861 netfilter: use IS_ENABLE to replace if defined in TRACE target
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:03:45 +02:00
Julian Anastasov
ac69269a45 ipvs: do not disable bh for long time
We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.

Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:58 +02:00
Julian Anastasov
ceec4c3816 ipvs: convert services to rcu
This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
	RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
	the schedulers are prepared for such RCU readers
	even after done_service is called but they need
	to use synchronize_rcu because last ip_vs_scheduler_put
	can happen while RCU read-side critical sections
	use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
	If dests were present, the services are freed from
	the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:58 +02:00
Julian Anastasov
413c2d04e9 ipvs: convert dests to rcu
In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.

We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.

Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:57 +02:00
Julian Anastasov
ba3a3ce14e ipvs: convert sched_lock to spin lock
As all read_locks are gone spin lock is preferred.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov
ed3ffc4e48 ipvs: do not expect result from done_service
This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov
578bc3ef1e ipvs: reorganize dest trash
All dests will go to trash, no exceptions.
But we have to use new list node t_list for this, due
to RCU changes in following patches. Dests will wait there
initial grace period and later all conns and schedulers to
put their reference. The dests don't get reference for
staying in dest trash as before.

	As result, we do not load ip_vs_dest_put with
extra checks for last refcnt and the schedulers do not
need to play games with atomic_inc_not_zero while
selecting best destination.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:55 +02:00
Julian Anastasov
08cb2d032f ipvs: convert wrr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the weight for some
dest can be reduced during dest selection, change the
algorithm to check weights by using minimum weights in the
1 .. max_weight-(di-1) range, with the same step (di). By this
way we ensure that there will be always a weight >= 1 check
before claiming that all destinations are overloaded.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:54 +02:00
Julian Anastasov
b310faad3e ipvs: convert wlc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:54 +02:00
Julian Anastasov
1acb7f6761 ipvs: convert sh scheduler to rcu
Use the 3 new methods to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:53 +02:00
Julian Anastasov
9be52aba7a ipvs: convert sed scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:52 +02:00
Julian Anastasov
c0d0c0a1c0 ipvs: convert rr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the previous entry
could be unlinked, limit the list traversals to 2 when
lookup started from previous entry.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:52 +02:00
Julian Anastasov
f92ea8f096 ipvs: convert nq scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:51 +02:00
Julian Anastasov
4ebd288b69 ipvs: convert lc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:50 +02:00
Julian Anastasov
c5549571f9 ipvs: convert lblcr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. The set.lock is removed because now it is used in
rare cases, mostly under sched_lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:50 +02:00
Julian Anastasov
c2a4ffb70e ipvs: convert lblc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. Use a dead flag to prevent new entries to be created
while scheduler is reclaimed. Use hlist for the hash table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:49 +02:00
Julian Anastasov
8f3d0023b9 ipvs: convert dh scheduler to rcu
Use the new add_dest and del_dest methods
to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:49 +02:00
Julian Anastasov
fca9c20ae1 ipvs: add ip_vs_dest_hold and ip_vs_dest_put
ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:48 +02:00
Julian Anastasov
6b6df46663 ipvs: preparations for using rcu in schedulers
Allow schedulers to use rcu_dereference when
returning destination on lookup. The RCU read-side critical
section will allow ip_vs_bind_dest to get dest refcnt as
preparation for the step where destinations will be
deleted without an IP_VS_WAIT_WHILE guard that holds the
packet processing during update.

	Add new optional scheduler methods add_dest,
del_dest and upd_dest. For now the methods are called
together with update_service but update_service will be
removed in a following change.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:47 +02:00
Julian Anastasov
71dfa982f1 ipvs: change ip_vs_sched_lock to mutex
The global list with schedulers ip_vs_schedulers
is accessed only from user context - configuration and
scheduler module [un]registration. Use ip_vs_sched_mutex
instead.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:47 +02:00
Julian Anastasov
9a05475ceb ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new
We have many fields to set and few to reset,
use kmem_cache_alloc instead to save some cycles.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:46 +02:00
Julian Anastasov
1845ed0bb2 ipvs: reorder keys in connection structure
__ip_vs_conn_in_get and ip_vs_conn_out_get are
hot places. Optimize them, so that ports are matched first.
By moving net and fwmark below, on 32-bit arch we can fit
caddr in 32-byte cache line and all addresses in 64-byte
cache line.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00
Julian Anastasov
088339a57d ipvs: convert connection locking
Convert __ip_vs_conntbl_lock_array as follows:

- readers that do not modify conn lists will use RCU lock
- updaters that modify lists will use spinlock_t

Now for conn lookups we will use RCU read-side
critical section. Without using __ip_vs_conn_get such
places have access to connection fields and can
dereference some pointers like pe and pe_data plus
the ability to update timer expiration. If full access
is required we contend for reference.

We add barrier in __ip_vs_conn_put, so that
other CPUs see the refcnt operation after other writes.

With the introduction of ip_vs_conn_unlink()
we try to reorganize ip_vs_conn_expire(), so that
unhashing of connections that should stay more time is
avoided, even if it is for very short time.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00
Julian Anastasov
60b6aa3b31 ipvs: convert locks used in persistence engines
Allow the readers to use RCU lock and for
PE module registrations use global mutex instead of
spinlock. All PE modules need to use synchronize_rcu
in their module exit handler.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:44 +02:00
Julian Anastasov
276472eae0 ipvs: remove rs_lock by using RCU
rs_lock was used to protect rs_table (hash table)
from updaters (under global mutex) and readers (packet handlers).
We can remove rs_lock by using RCU lock for readers. Reclaiming
dest only with kfree_rcu is enough because the readers access
only fields from the ip_vs_dest structure.

Use hlist for rs_table.

As we are now using hlist_del_rcu, introduce in_rs_table
flag as replacement for the list_empty checks which do not
work with RCU. It is needed because only NAT dests are in
the rs_table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:43 +02:00
Julian Anastasov
363c97d743 ipvs: convert app locks
We use locks like tcp_app_lock, udp_app_lock,
sctp_app_lock to protect access to the protocol hash tables
from readers in packet context while the application
instances (inc) are [un]registered under global mutex.

As the hash tables are mostly read when conns are
created and bound to app, use RCU for readers and reclaim
app instance after grace period.

Simplify ip_vs_app_inc_get because we use usecnt
only for statistics and rely on module refcounting.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:43 +02:00
Julian Anastasov
026ace060d ipvs: optimize dst usage for real server
Currently when forwarding requests to real servers
we use dst_lock and atomic operations when cloning the
dst_cache value. As the dst_cache value does not change
most of the time it is better to use RCU and to lock
dst_lock only when we need to replace the obsoleted dst.
For this to work we keep dst_cache in new structure protected
by RCU. For packets to remote real servers we will use noref
version of dst_cache, it will be valid while we are in RCU
read-side critical section because now dst_release for replaced
dsts will be invoked after the grace period. Packets to
local real servers that are passed to local stack with
NF_ACCEPT need a dst clone.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:42 +02:00
Julian Anastasov
4115ded131 ipvs: consolidate all dst checks on transmit in one place
Consolidate the PMTU checks, ICMP sending and
skb_dst modification in __ip_vs_get_out_rt and
__ip_vs_get_out_rt_v6. Now skb_dst is changed early
to simplify the transmitters.

Make sure update_pmtu is called only for local clients.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:41 +02:00
Julian Anastasov
f11cb2c2aa ipvs: do not use skb_share_check
We run in contexts like ip_rcv, ipv6_rcv, br_handle_frame,
do not expect shared skbs.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:41 +02:00
Julian Anastasov
183dce554a ipvs: no need to reroute anymore on DNAT over loopback
After commit 70e7341673 (ipv4: Show that ip_send_reply()
is purely unicast routine.) we do not need to reroute DNAT-ed
traffic over loopback because reply uses iph daddr and not
rt_spec_dst.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:40 +02:00
Julian Anastasov
d1deae4d3a ipvs: rename functions related to dst_cache reset
Move and give better names to two functions:

- ip_vs_dst_reset to __ip_vs_dst_cache_reset
- __ip_vs_dev_reset to ip_vs_forget_dev

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:39 +02:00
Julian Anastasov
b8abdf0984 ipvs: convert the IP_VS_XMIT macros to functions
It was a bad idea to hide return statements in macros.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:39 +02:00
Julian Anastasov
313eae637f ipvs: prefer NETDEV_DOWN event to free cached dsts
The real server becomes unreachable on down event,
no need to wait device unregistration. Should help in
releasing dsts early before dst->dev is replaced with lo.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:38 +02:00
Julian Anastasov
c90558dae5 ipvs: avoid routing by TOS for real server
Avoid replacing the cached route for real server
on every packet with different TOS. I doubt that routing
by TOS for real server is used at all, so we should be
better with such optimization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:37 +02:00
Julian Anastasov
932bc4d7a5 net: add skb_dst_set_noref_force
Rename skb_dst_set_noref to __skb_dst_set_noref and
add force flag as suggested by David Miller. The new wrapper
skb_dst_set_noref_force will force dst entries that are not
cached to be attached as skb dst without taking reference
as long as provided dst is reclaimed after RCU grace period.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:22:53 +02:00
John W. Linville
b0bb9b392d This is the 2nd batch of NFC fixes for 3.9. This time we have:
- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
   is physically removed.
 - A potential double skb free when the LLCP socket receive queue is full.
 - A fix for properly handling multiple and consecutive LLCP connections, and
   not trash the socket ack log.
 - A build failure for the MEI microread physical layer, now that the MEI bus
   APIs have been merged into char-misc-next.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRWMDxAAoJEIqAPN1PVmxKnaIP/iWPgCdp7xZGtiT8R/bf0f2I
 hpWtCnYgwyvqEyUqq6+hJ+A4J6DJH/vXLqO0hJIZ6lPuATdxpba9u4mWaoIuU+V5
 4khspaWnTE7we72fVNhpT6Ms2ZZubtFynqbmPuxvu9vmk8JCMkJLfH8InQ4OGI0v
 mScDCByIq9xZiv+q/R2QO9vh6k3t5sJyAZONf7uuQhoJ+AdlrXNsvHPki+gfo+6T
 dLhgVGqytEDGKkh0AZLGc5Ss1MJFQsmTh76A4wstcJfLCisBtbhM0sZ8zWn6wmzQ
 KCq5YawbowuA12RYgb+cwxXV7H167lDYJxILZuK3WqL0nm0H3Z0jkakcV7F+IVNu
 iyh1hRqjdsgThl7aBF1VEER6Scum4WbEDXY5abCpX9l8Jbtk3+iQJWYpy8ldKEnL
 KqDiZnm6WC5CrcM1GQo4Uy46UPE0IlHPxupfQ3riuFjaFKyPd1x1ok1xUngdOXjn
 JYpxfMLVSsRBZ2y6tpDhm0r5oRHGmmJmKTzhMPKP+4+CVG8z0vwUQfydgaDlaFRa
 F3U7Av70vyCGt+Cr0tAXppvoXng1TSUKPHPTHjH+pEHjfVA9l1mAU5AYQ7JDB9aZ
 fccfkd5UG0f6ApFMhknI8kkISBN5KseNuuSAoQc4WV9jUVKk9mMifAz/Aj9MVYWx
 m6M8KzocEudWwM8y/BjE
 =ABey
 -----END PGP SIGNATURE-----

Merge tag 'nfc-fixes-3.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes

Samuel Ortiz <sameo@linux.intel.com> says:

"This is the 2nd batch of NFC fixes for 3.9. This time we have:

- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
  is physically removed.
- A potential double skb free when the LLCP socket receive queue is full.
- A fix for properly handling multiple and consecutive LLCP connections, and
  not trash the socket ack log.
- A build failure for the MEI microread physical layer, now that the MEI bus
  APIs have been merged into char-misc-next."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-04-01 15:14:22 -04:00
John W. Linville
95bc6b8461 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-01 15:09:28 -04:00
David S. Miller
a210576cf8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/mac80211/sta_info.c
	net/wireless/core.h

Two minor conflicts in wireless.  Overlapping additions of extern
declarations in net/wireless/core.h and a bug fix overlapping with
the addition of a boolean parameter to __ieee80211_key_free().

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-01 13:36:50 -04:00
Linus Torvalds
ff3421dee6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) sadb_msg prepared for IPSEC userspace forgets to initialize the
    satype field, fix from Nicolas Dichtel.

 2) Fix mac80211 synchronization during station removal, from Johannes
    Berg.

 3) Fix IPSEC sequence number notifications when they wrap, from Steffen
    Klassert.

 4) Fix cfg80211 wdev tracing crashes when add_virtual_intf() returns an
    error pointer, from Johannes Berg.

 5) In mac80211, don't call into the channel context code with the
    interface list mutex held.  From Johannes Berg.

 6) In mac80211, if we don't actually associate, do not restart the STA
    timer, otherwise we can crash.  From Ben Greear.

 7) Missing dma_mapping_error() check in e1000, ixgb, and e1000e.  From
    Christoph Paasch.

 8) Fix sja1000 driver defines to not conflict with SH port, from Marc
    Kleine-Budde.

 9) Don't call il4965_rs_use_green with a NULL station, from Colin Ian
    King.

10) Suspend/Resume in the FEC driver fail because the buffer descriptors
    are not initialized at all the moments in which they should.  Fix
    from Frank Li.

11) cpsw and davinci_emac drivers both use the wrong interface to
    restart a stopped TX queue.  Use netif_wake_queue not
    netif_start_queue, the latter is for initialization/bringup not
    active management of the queue.  From Mugunthan V N.

12) Fix regression in rate calculations done by
    psched_ratecfg_precompute(), missing u64 type promotion.  From
    Sergey Popovich.

13) Fix length overflow in tg3 VPD parsing, from Kees Cook.

14) AOE driver fails to allocate enough headroom, resulting in crashes.
    Fix from Eric Dumazet.

15) RX overflow happens too quickly in sky2 driver because pause packet
    thresholds are not programmed correctly.  From Mirko Lindner.

16) Bonding driver manages arp_interval and miimon settings incorrectly,
    disabling one unintentionally disables both.  Fix from Nikolay
    Aleksandrov.

17) smsc75xx drivers don't program the RX mac properly for jumbo frames.
    Fix from Steve Glendinning.

18) Fix off-by-one in Codel packet scheduler.  From Vijay Subramanian.

19) Fix packet corruption in atl1c by disabling MSI support, from Hannes
    Frederic Sowa.

20) netdev_rx_handler_unregister() needs a synchronize_net() to fix
    crashes in bonding driver unload stress tests.  From Eric Dumazet.

21) rxlen field of ks8851 RX packet descriptors not interpreted
    correctly (it is 12 bits not 16 bits, so needs to be masked after
    shifting the 32-bit value down 16 bits).  Fix from Max Nekludov.

22) Fix missed RX/TX enable in sh_eth driver due to mishandling of link
    change indications.  From Sergei Shtylyov.

23) Fix crashes during spurious ECI interrupts in sh_eth driver, also
    from Sergei Shtylyov.

24) dm9000 driver initialization is done wrong for revision B devices
    with DSP PHY, from Joseph CHANG.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  DM9000B: driver initialization upgrade
  sh_eth: make 'link' field of 'struct sh_eth_private' *int*
  sh_eth: workaround for spurious ECI interrupt
  sh_eth: fix handling of no LINK signal
  ks8851: Fix interpretation of rxlen field.
  net: add a synchronize_net() in netdev_rx_handler_unregister()
  MAINTAINERS: Update netxen_nic maintainers list
  atl1e: drop pci-msi support because of packet corruption
  net: fq_codel: Fix off-by-one error
  net: calxedaxgmac: Wake-on-LAN fixes
  net: calxedaxgmac: fix rx ring handling when OOM
  net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'
  smsc75xx: fix jumbo frame support
  net: fix the use of this_cpu_ptr
  bonding: fix disabling of arp_interval and miimon
  ipv6: don't accept node local multicast traffic from the wire
  sky2: Threshold for Pause Packet is set wrong
  sky2: Receive Overflows not counted
  aoe: reserve enough headroom on skbs
  line up comment for ndo_bridge_getlink
  ...
2013-04-01 08:06:30 -07:00
Keller, Jacob E
7d4c04fc17 net: add option to enable error queue packets waking select
Currently, when a socket receives something on the error queue it only wakes up
the socket on select if it is in the "read" list, that is the socket has
something to read. It is useful also to wake the socket if it is in the error
list, which would enable software to wait on error queue packets without waking
up for regular data on the socket. The main use case is for receiving
timestamped transmit packets which return the timestamp to the socket via the
error queue. This enables an application to select on the socket for the error
queue only instead of for the regular traffic.

-v2-
* Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
* Modified every socket poll function that checks error queue

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-31 19:44:20 -04:00
Artem Savkov
90e0970f87 cfg80211: sched_scan_mtx lock in cfg80211_conn_work()
Introduced in f9f475292d
("cfg80211: always check for scan end on P2P device")

cfg80211_conn_scan() which requires sched_scan_mtx to be held can be called
from cfg80211_conn_work(). Without this we are hitting multiple warnings like
the following:

WARNING: at net/wireless/sme.c:88 cfg80211_conn_scan+0x1dc/0x3a0 [cfg80211]()
Hardware name: 0578A21
Modules linked in: ...
Pid: 620, comm: kworker/3:1 Not tainted 3.9.0-rc4-next-20130328+ #326
Call Trace:
 [<c1036992>] warn_slowpath_common+0x72/0xa0
 [<c10369e2>] warn_slowpath_null+0x22/0x30
 [<faa4b0ec>] cfg80211_conn_scan+0x1dc/0x3a0 [cfg80211]
 [<faa4b344>] cfg80211_conn_do_work+0x94/0x380 [cfg80211]
 [<faa4c3b2>] cfg80211_conn_work+0xa2/0x130 [cfg80211]
 [<c1051858>] process_one_work+0x198/0x450

Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-30 22:33:19 +01:00
Antonio Quartulli
537aadc3a4 ip_gre: don't overwrite iflink during net_dev init
iflink is currently set to 0 in __gre_tunnel_init(). This
function is invoked in gre_tap_init() and
ipgre_tunnel_init() which are both used to initialise the
ndo_init field of the respective net_device_ops structs
(ipgre.. and gre_tap..) used by GRE interfaces.

However, in netdevice_register() iflink is first set to -1,
then ndo_init is invoked and then iflink is assigned to a
proper value if and only if it still was -1.

Assigning 0 to iflink in ndo_init is therefore first
preventing netdev_register() to correctly assign it a proper
value and then breaking iflink at all since 0 has not
correct meaning.

Fix this by removing the iflink assignment in
__gre_tunnel_init().

Introduced by c544193214
("GRE: Refactor GRE tunneling code.")

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-30 17:28:33 -04:00
John W. Linville
9a574cd67a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	net/mac80211/sta_info.c
	net/wireless/core.h
2013-03-29 16:41:36 -04:00
Eric Dumazet
00cfec3748 net: add a synchronize_net() in netdev_rx_handler_unregister()
commit 35d48903e9 (bonding: fix rx_handler locking) added a race
in bonding driver, reported by Steven Rostedt who did a very good
diagnosis :

<quoting Steven>

I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.

In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.

But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:

        slave = bond_slave_get_rcu(skb->dev);
        bond = slave->bond; <--- BUG

the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.

Looking at the code that unregisters the handler:

void netdev_rx_handler_unregister(struct net_device *dev)
{

        ASSERT_RTNL();
        RCU_INIT_POINTER(dev->rx_handler, NULL);
        RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}

Which is basically:
        dev->rx_handler = NULL;
        dev->rx_handler_data = NULL;

And looking at __netif_receive_skb() we have:

        rx_handler = rcu_dereference(skb->dev->rx_handler);
        if (rx_handler) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                switch (rx_handler(&skb)) {

My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading?  What happens if the interrupt
triggers and we have this:

        CPU0                    CPU1
        ----                    ----
  rx_handler = skb->dev->rx_handler

                        netdev_rx_handler_unregister() {
                           dev->rx_handler = NULL;
                           dev->rx_handler_data = NULL;

  rx_handler()
   bond_handle_frame() {
    slave = skb->dev->rx_handler;
    bond = slave->bond; <-- NULL pointer dereference!!!

What protection am I missing in the bond release handler that would
prevent the above from happening?

</quoting Steven>

We can fix bug this in two ways. First is adding a test in
bond_handle_frame() and others to check if rx_handler_data is NULL.

A second way is adding a synchronize_net() in
netdev_rx_handler_unregister() to make sure that a rcu protected reader
has the guarantee to see a non NULL rx_handler_data.

The second way is better as it avoids an extra test in fast path.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:38:15 -04:00
Vijay Subramanian
cd68ddd4c2 net: fq_codel: Fix off-by-one error
Currently, we hold a max of sch->limit -1 number of packets instead of
sch->limit packets. Fix this off-by-one error.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:32:23 -04:00
John Fastabend
91f3e7b174 net: rtnetlink: fdb dflt dump must set idx used for cb->arg[0]
In rtnl_fdb_dump() when the fdb_dump ndo op is not populated
we never set the idx value so that cb->arg[0] is always 0.
Resulting in a endless loop of messages.

Introduced with this commit,

commit 090096bf3d
Author: Vlad Yasevich <vyasevic@redhat.com>
Date:   Wed Mar 6 15:39:42 2013 +0000

    net: generic fdb support for drivers without ndo_fdb_<op>

CC: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:26:27 -04:00
Shmulik Ladkani
a561cf7edf net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'
'nf_reset' is called just prior calling 'netif_rx'.
No need to call it twice.

Reported-by: Igor Michailov <rgohita@gmail.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:25:28 -04:00
Pravin B Shelar
54a5d38289 ip_tunnel: Fix off-by-one error in forming dev name.
As Ben pointed out following patch fixes bug in checking device
name length limits while forming tunnel device name.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:24:28 -04:00
Li RongQing
278150321a net: simplify the getting percpu of flow_cache
replace per_cpu with per_cpu_ptr to save conversion between address and pointer

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:14:46 -04:00
Li RongQing
50eab0503a net: fix the use of this_cpu_ptr
flush_tasklet is not percpu var, and percpu is percpu var, and
	this_cpu_ptr(&info->cache->percpu->flush_tasklet)
is not equal to
	&this_cpu_ptr(info->cache->percpu)->flush_tasklet

1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:13:27 -04:00
Hannes Frederic Sowa
1c4a154e52 ipv6: don't accept node local multicast traffic from the wire
Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been
verified: http://www.rfc-editor.org/errata_search.php?eid=3480

We have to check for pkt_type and loopback flag because either the
packets are allowed to travel over the loopback interface (in which case
pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel
over a non-loopback interface back to us (in which case PACKET_TYPE is
PACKET_LOOPBACK and IFF_LOOPBACK flag is not set).

Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 14:57:33 -04:00
Hong zhi guo
10c9cbb10f netlink: fix the warning introduced by netlink API replacement
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 14:44:37 -04:00
Linus Torvalds
2c3de1c2d7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fixes from Eric W Biederman:
 "The bulk of the changes are fixing the worst consequences of the user
  namespace design oversight in not considering what happens when one
  namespace starts off as a clone of another namespace, as happens with
  the mount namespace.

  The rest of the changes are just plain bug fixes.

  Many thanks to Andy Lutomirski for pointing out many of these issues."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Restrict when proc and sysfs can be mounted
  ipc: Restrict mounting the mqueue filesystem
  vfs: Carefully propogate mounts across user namespaces
  vfs: Add a mount flag to lock read only bind mounts
  userns:  Don't allow creation if the user is chrooted
  yama:  Better permission check for ptraceme
  pid: Handle the exit of a multi-threaded init.
  scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
2013-03-28 13:43:46 -07:00
Hong zhi guo
c60ee67f45 bridge: remove unused variable ifm
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:41:19 -04:00
John W. Linville
630a216da6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-28 14:40:06 -04:00
David S. Miller
ea407f0b97 Included changes:
- A fix for the network coding component which has been added within the last
   pull request (so it is in linux-3.10). The problem has been spotted thanks to
   Fengguang Wu's automated daily checks on our tree.
 - Implementation of the RTNL API for virtual interface creation/deletion and slave
   manipulation
 - substitution of seq_printf with seq_puts when possible
 - minor cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJRUsb7AAoJEADl0hg6qKeOf/sQAJ4VRwpB5WLIjtLPQ1Dvd/qu
 thPAn4vpv02Is8H7NRULRN9aqUIzWrWtXkhBTAOh0EHq3mki6a+QT4TUQ4qswTPv
 BaVS70hoTtN1OdcntWTATCUuX2Y2HD/D3Cw2IZIG7X9oqTEpoTpnuQygMsG/c1+s
 MtNO57/WUKlwLv0pAu9IHh4qCj00eF5YHsPmsWzoWAvi/4aCgESQfXGy4Dm6HqLB
 E8XuVLOLtw7NSw0V7n//KtEE2llTWkfslm06dSsAmqEtzOfOCCuA4Z0asnMPJt7s
 4HknqNE0D0aR0sfS+W6G24gFQbIV8R4SYWnUfMFCvvT7Ds96RONdgkfWw39nM7XB
 y70kXUbfDBIeHiiV2uqbTZnbuAa61p8Y8jCJvWz0nujI1/QPnUA1bqEgC2AMTc1J
 I6rS8mhWC6Rg+ohkK7HSQfgiZ5xcWtjFuVuOhKr0cPwhaCI0yP5PCB/B3yxuJxwD
 egkL58fgPhgii3DOxInAqLS/aPT5vl49J91qC3+z2/xYS3zRxHsyIRkMwgfoI2Ab
 TofVmzKqDwENA+CnjCPQx2gn6w/QpK8g5xmW1GgbzuKddiCcQFKnezgpeSdnQrsM
 CP/8Tu6r3QSLlHebysMuhsCwyT1hv8Yt0PTDpBqtIXRpV5enTYa9xW3F+UTU65ZY
 p9fKQXJYDLLEhfcy/TJM
 =FoTZ
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- A fix for the network coding component which has been added within the last
  pull request (so it is in linux-3.10). The problem has been spotted thanks to
  Fengguang Wu's automated daily checks on our tree.
- Implementation of the RTNL API for virtual interface creation/deletion and slave
  manipulation
- substitution of seq_printf with seq_puts when possible
- minor cleanups

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:34:23 -04:00
Hong zhi guo
573ce260b3 net-next: replace obsolete NLMSG_* with type safe nlmsg_*
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:25:25 -04:00
Simon Horman
e5c5d22e8d net: add ETH_P_802_3_MIN
Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
an 802.3 frame. Frames with a lower value in the ethernet type field
are Ethernet II.

Also update all the users of this value that David Miller and
I could find to use the new constant.

Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
should be >= not >.

As suggested by Jesse Gross.

Compile tested only.

Cc: David Miller <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Bart De Schuymer <bart.de.schuymer@pandora.be>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-wireless@vger.kernel.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-media@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: dev@openvswitch.org
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 01:20:42 -04:00
David S. Miller
0fb031f036 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
1) Initialize the satype field in key_notify_policy_flush(),
   this was left uninitialized. From Nicolas Dichtel.

2) The sequence number difference for replay notifications
   was misscalculated on ESN sequence number wrap. We need
   a separate replay notify function for esn.

3) Fix an off by one in the esn replay notify function.
   From Mathias Krause.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 14:07:04 -04:00
Sergey Popovich
ea872d7712 sch: add missing u64 in psched_ratecfg_precompute()
It seems that commit

commit 292f1c7ff6
Author: Jiri Pirko <jiri@resnulli.us>
Date:   Tue Feb 12 00:12:03 2013 +0000

    sch: make htb_rate_cfg and functions around that generic

adds little regression.

Before:

# tc qdisc add dev eth0 root handle 1: htb default ffff
# tc class add dev eth0 classid 1:ffff htb rate 5Gbit
# tc -s class show dev eth0
class htb 1:ffff root prio 0 rate 5000Mbit ceil 5000Mbit burst 625b cburst
625b
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
 lended: 0 borrowed: 0 giants: 0
 tokens: 31 ctokens: 31

After:

# tc qdisc add dev eth0 root handle 1: htb default ffff
# tc class add dev eth0 classid 1:ffff htb rate 5Gbit
# tc -s class show dev eth0
class htb 1:ffff root prio 0 rate 1544Mbit ceil 1544Mbit burst 625b cburst
625b
 Sent 5073 bytes 41 pkt (dropped 0, overlimits 0 requeues 0)
 rate 1976bit 2pps backlog 0b 0p requeues 0
 lended: 41 borrowed: 0 giants: 0
 tokens: 1802 ctokens: 1802

This probably due to lost u64 cast of rate parameter in
psched_ratecfg_precompute() (net/sched/sch_generic.c).

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 14:06:41 -04:00
Wei Yongjun
fcca143d69 rtnetlink: fix error return code in rtnl_link_fill()
Fix to return a negative error code from the error handling case
instead of 0(possible overwrite to 0 by ops->fill_xstats call),
as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 14:06:40 -04:00
David S. Miller
e2a553dbf1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/ipip.h

The changes made to ipip.h in 'net' were already included
in 'net-next' before that header was moved to another location.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 13:52:49 -04:00
Wei Yongjun
5389090b59 netfilter: nf_conntrack: fix error return code
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in function
nf_conntrack_standalone_init().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-27 18:31:17 +01:00
Jesper Dangaard Brouer
1b5ab0def4 net: use the frag lru_lock to protect netns_frags.nqueues update
Move the protection of netns_frags.nqueues updates under the LRU_lock,
instead of the write lock.  As they are located on the same cacheline,
and this is also needed when transitioning to use per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:33 -04:00
Jesper Dangaard Brouer
68399ac37e net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop
The LRU list is protected by its own lock, since commit 3ef0eb0db4
(net: frag, move LRU list maintenance outside of rwlock), and
no-longer by a read_lock.

This makes it possible, to remove the inet_frag_queue, which is about
to be "evicted", from the LRU list head.  This avoids the problem, of
several CPUs grabbing the same frag queue.

Note, cannot remove the inet_frag_lru_del() call in fq_unlink()
called by inet_frag_kill(), because inet_frag_kill() is also used in
other situations.  Thus, we use list_del_init() to allow this
double list_del to work.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:33 -04:00
Andy Shevchenko
5a048e3b59 net: core: let's use native isxdigit instead of custom
In kernel we have fast and pretty implementation of the isxdigit() function.
Let's use it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:32 -04:00
Jason Wang
40893fd0fd net: switch to use skb_probe_transport_header()
Switch to use the new help skb_probe_transport_header() to do the l4 header
probing for untrusted sources. For packets with partial csum, the header should
already been set by skb_partial_csum_set().

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:31 -04:00
Jason Wang
e5d5decaed net: core: let skb_partial_csum_set() set transport header
For untrusted packets with partial checksum, we need to set the transport header
for precise packet length estimation. We can just let skb_pratial_csum_set() to
do this to avoid extra call to skb_flow_dissect() and simplify the caller.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:31 -04:00
Antonio Quartulli
0c81465357 batman-adv: use seq_puts instead of seq_printf when the format is constant
As reported by checkpatch, seq_puts has to be preferred with
respect to seq_printf when the format is a constant string
(no va_args)

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-03-27 10:29:55 +01:00
Antonio Quartulli
9cb812c54e batman-adv: update Makefile copyright years
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:29:54 +01:00