struct ethtool_rxnfc was originally defined in 2.6.27 for the
ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
fields. It was then extended in 2.6.30 to support various additional
commands. These commands should have been defined to use a new
structure, but it is too late to change that now.
Since user-space may still be using the old structure definition
for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
additional fields, only copy the originally defined fields to and
from user-space.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
On a 32-bit machine, info.rule_cnt >= 0x40000000 leads to integer
overflow and the buffer may be smaller than needed. Since
ETHTOOL_GRXCLSRLALL is unprivileged, this can presumably be used for at
least denial of service.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Use "depends on" instead of "if" in Kconfig files.
Fixed CAIF debug flag, and removed unnecessary clean-* options.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
don't clone skb when skb isn't shared
When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need
to clone a new skb. As the skb will be freed after this function returns, we
can use it freely once we get a reference to it.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/net/sch_generic.h | 11 +++++++++--
net/sched/act_mirred.c | 6 +++---
2 files changed, 12 insertions(+), 5 deletions(-)
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC
allocations sometimes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use u64_stats_sync infrastructure to implement 64bit rx stats.
(tx stats are addressed later)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id())
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of an ambiguity in the for_each_sta_info macro, it can
currently only be used if the third parameter is set to 'sta'.
Fix this by renaming the parameter to '_sta'.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The LOG targets print the entire MAC header as one long string, which is not
readable very well:
IN=eth0 OUT= MAC=00:15:f2:24:91:f8:00:1b:24:dc:61:e6:08:00 ...
Add an option to decode known header formats (currently just ARPHRD_ETHER devices)
in their individual fields:
IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=0800 ...
IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=86dd ...
The option needs to be explicitly enabled by userspace to avoid breaking
existing parsers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove the comparison within the loop to print the macheader by prepending
the colon to all but the first printk.
Based on suggestion by Jan Engelhardt <jengelh@medozas.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Allows use of ECN when syncookies are in effect by encoding ecn_ok
into the syn-ack tcp timestamp.
While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES.
With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc
removes the "if (0)".
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed out by Fernando Gont there is no need to encode rcv_wscale
into the cookie.
We did not use the restored rcv_wscale anyway; it is recomputed
via tcp_select_initial_window().
Thus we can save 4 bits in the ts option space by removing rcv_wscale.
In case window scaling was not supported, we set the (invalid) wscale
value 0xf.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 9261e53701 (ipv6: making ip and icmp statistics per/namespace)
forgot to remove ipv6_statistics variable.
commit bc417d99bf (ipv6: remove stale MIB definitions) took care of
icmpv6_statistics & icmpv6msg_statistics
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.
Callers can use __alignof__(type) to provide correct
alignment.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is thanks to Andre Noll who reported the issue and helped testing.
The Syn-RTT sampled during the initial handshake currently only works for
the client sending the DCCP-Request. TFRC penalizes the absence of an RTT
sample with a very slow initial speed (1 packet per second), which delays
slow-start significantly, resulting in sluggish performance.
This patch mirrors the "Syn RTT" principle by adding a timestamp also onto
the DCCP-Response, producing an RTT sample when the (Data)Ack completing
the handshake arrives.
Also changed the documentation to 'TFRC' since Syn RTTs are also used by CCID-4.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes an unused 'sk' argument from several option-inserting functions.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove "pktgen: " from formats
Convert printks to pr_<level>
Added func_enter() for debugging
Moved version to end of string at module_init
Coalesced long formats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gcc is currenlty not in the ability to optimize the switch statement in
sk_run_filter() because of dense case labels. This patch replace the
OR'd labels with ordered sequenced case labels. The sk_chk_filter()
function is modified to patch/replace the original OPCODES in a
ordered but equivalent form. gcc is now in the ability to transform the
switch statement in sk_run_filter into a jump table of complexity O(1).
Until this patch gcc generates a sequence of conditional branches (O(n) of 567
byte .text segment size (arch x86_64):
7ff: 8b 06 mov (%rsi),%eax
801: 66 83 f8 35 cmp $0x35,%ax
805: 0f 84 d0 02 00 00 je adb <sk_run_filter+0x31d>
80b: 0f 87 07 01 00 00 ja 918 <sk_run_filter+0x15a>
811: 66 83 f8 15 cmp $0x15,%ax
815: 0f 84 c5 02 00 00 je ae0 <sk_run_filter+0x322>
81b: 77 73 ja 890 <sk_run_filter+0xd2>
81d: 66 83 f8 04 cmp $0x4,%ax
821: 0f 84 17 02 00 00 je a3e <sk_run_filter+0x280>
827: 77 29 ja 852 <sk_run_filter+0x94>
829: 66 83 f8 01 cmp $0x1,%ax
[...]
With the modification the compiler translate the switch statement into
the following jump table fragment:
7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
803: 0f 87 1f 02 00 00 ja a28 <sk_run_filter+0x26a>
809: 0f b7 06 movzwl (%rsi),%eax
80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
813: 44 89 e3 mov %r12d,%ebx
816: e9 43 03 00 00 jmpq b5e <sk_run_filter+0x3a0>
81b: 41 89 dc mov %ebx,%r12d
81e: e9 3b 03 00 00 jmpq b5e <sk_run_filter+0x3a0>
Furthermore, I reordered the instructions to reduce cache line misses by
order the most common instruction to the start.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of TLLAO option created a kernel OOPS regression
for the case where neighbor advertisement is being sent via
proxy path. When using proxy, ipv6_get_ifaddr() returns NULL
causing the NULL dereference.
Change causing the bug was:
commit f7734fdf61
Author: Octavian Purdila <opurdila@ixiacom.com>
Date: Fri Oct 2 11:39:15 2009 +0000
make TLLAO option for NA packets configurable
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_NF_CT_ACCT has been deprecated for awhile and
was originally scheduled for removal by 2.6.29.
Removing support for this config option also stops
this deprecation warning message in the kernel log.
[ 61.669627] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 61.669850] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[ 61.669852] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[ 61.669853] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[Patrick: changed default value to 0]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Check at rule install time that CT accounting is enabled. Force it
to be enabled if not while also emitting a warning since this is not
the default state.
This is in preparation for deprecating CONFIG_NF_CT_ACCT upon which
CONFIG_NETFILTER_XT_MATCH_CONNBYTES depended being set.
Added 2 CT accounting support functions:
nf_ct_acct_enabled() - Get CT accounting state.
nf_ct_set_acct() - Enable/disable CT accountuing.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
commit ff6e2163f2 accidentally added a
regression on the bnep code. Fixing it.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
i've found that tcp_close() can be called for an already closed
socket, but still sends reset in this case (tcp_send_active_reset())
which seems to be incorrect. Moreover, a packet with reset is sent
with different source port as original port number has been already
cleared on socket. Besides that incrementing stat counter for
LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case.
Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same
seems to be true for the current mainstream kernel (checked on
2.6.35-rc3). Please, correct me if i missed something.
How that happens:
1) the server receives a packet for socket in TCP_CLOSE_WAIT state
that triggers a tcp_reset():
Call Trace:
<IRQ> [<ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8
[<ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08
[<ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a
[<ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43
[<ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259
[<ffffffff80037131>] ip_local_deliver+0x200/0x2f4
[<ffffffff8003843c>] ip_rcv+0x64c/0x69f
[<ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa
[<ffffffff80032eca>] process_backlog+0x90/0xec
[<ffffffff8000cc50>] net_rx_action+0xbb/0x1f1
[<ffffffff80012d3a>] __do_softirq+0xf5/0x1ce
[<ffffffff8001147a>] handle_IRQ_event+0x56/0xb0
[<ffffffff8006334c>] call_softirq+0x1c/0x28
[<ffffffff80070476>] do_softirq+0x2c/0x85
[<ffffffff80070441>] do_IRQ+0x149/0x152
[<ffffffff80062665>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303
[<ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303
[<ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e
[<ffffffff8006a263>] do_page_fault+0x49a/0x7dc
[<ffffffff80066487>] thread_return+0x89/0x174
[<ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c
[<ffffffff80062e39>] error_exit+0x0/0x84
tcp_rcv_state_process()
... // (sk_state == TCP_CLOSE_WAIT here)
...
/* step 2: check RST bit */
if(th->rst) {
tcp_reset(sk);
goto discard;
}
...
---------------------------------
tcp_rcv_state_process
tcp_reset
tcp_done
tcp_set_state(sk, TCP_CLOSE);
inet_put_port
__inet_put_port
inet_sk(sk)->num = 0;
sk->sk_shutdown = SHUTDOWN_MASK;
2) After that the process (socket owner) tries to write something to
that socket and "inet_autobind" sets a _new_ (which differs from
the original!) port number for the socket:
Call Trace:
[<ffffffff80255a12>] inet_bind_hash+0x33/0x5f
[<ffffffff80257180>] inet_csk_get_port+0x216/0x268
[<ffffffff8026bcc9>] inet_autobind+0x22/0x8f
[<ffffffff80049140>] inet_sendmsg+0x27/0x57
[<ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<ffffffff8001fb49>] __pollwait+0x0/0xdd
[<ffffffff8008d533>] default_wake_function+0x0/0xe
[<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<ffffffff80066538>] thread_return+0x13a/0x174
[<ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<ffffffff800622dd>] tracesys+0xd5/0xe0
3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace):
F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3
Call Trace:
[<ffffffff80027557>] tcp_sendmsg+0xcb/0xe87
[<ffffffff80033300>] release_sock+0x10/0xae
[<ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7
[<ffffffff8026bd32>] inet_autobind+0x8b/0x8f
[<ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<ffffffff8001fb49>] __pollwait+0x0/0xdd
[<ffffffff8008d533>] default_wake_function+0x0/0xe
[<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<ffffffff80066538>] thread_return+0x13a/0x174
[<ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<ffffffff800622dd>] tracesys+0xd5/0xe0
tcp_sendmsg()
...
/* Wait for a connection to finish. */
if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
int old_state = sk->sk_state;
if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) {
if (f_d && (err == -EPIPE)) {
printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, "
"sk_err=%d, sk_shutdown=%d\n",
sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state,
sk->sk_err, sk->sk_shutdown);
dump_stack();
}
goto out_err;
}
}
...
4) Then the process (socket owner) understands that it's time to close
that socket and does that (and thus triggers sending reset packet):
Call Trace:
...
[<ffffffff80032077>] dev_queue_xmit+0x343/0x3d6
[<ffffffff80034698>] ip_output+0x351/0x384
[<ffffffff80251ae9>] dst_output+0x0/0xe
[<ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2
[<ffffffff80095700>] vprintk+0x21/0x33
[<ffffffff800070f0>] check_poison_obj+0x2e/0x206
[<ffffffff80013587>] poison_obj+0x36/0x45
[<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<ffffffff80023481>] dbg_redzone1+0x1c/0x25
[<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8
[<ffffffff80023405>] tcp_transmit_skb+0x764/0x786
[<ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d
[<ffffffff80258ff1>] tcp_close+0x39a/0x960
[<ffffffff8026be12>] inet_release+0x69/0x80
[<ffffffff80059b31>] sock_release+0x4f/0xcf
[<ffffffff80059d4c>] sock_close+0x2c/0x30
[<ffffffff800133c9>] __fput+0xac/0x197
[<ffffffff800252bc>] filp_close+0x59/0x61
[<ffffffff8001eff6>] sys_close+0x85/0xc7
[<ffffffff800622dd>] tracesys+0xd5/0xe0
So, in brief:
* a received packet for socket in TCP_CLOSE_WAIT state triggers
tcp_reset() which clears inet_sk(sk)->num and put socket into
TCP_CLOSE state
* an attempt to write to that socket forces inet_autobind() to get a
new port (but the write itself fails with -EPIPE)
* tcp_close() called for socket in TCP_CLOSE state sends an active
reset via socket with newly allocated port
This adds an additional check in tcp_close() for already closed
sockets. We do not want to send anything to closed sockets.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove rtnl_unlock() which had no corresponding rtnl_lock().
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the bundle validation code to not assume having a valid policy.
When we have multiple transformations for a xfrm policy, the bundle
instance will be a chain of bundles with only the first one having
the policy reference. When policy_genid is bumped it will expire the
first bundle in the chain which is equivalent of expiring the whole
chain.
Reported-bisected-and-tested-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds transmit power setting type and transmit power level attributes
to NL80211_CMD_SET_WIPHY in order to facilitate adjusting of the transmit power
level of the device.
The added attributes allow selection of automatic, limited or fixed transmit
power level, with the level definable in signed mBm format.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In preparation for a TX power setting interface in the nl80211, change the
.set_tx_power function to use mBm units instead of dBm for greater accuracy and
smaller power levels.
Also, already in advance move the tx_power_setting enumeration to nl80211.
This change affects the .tx_set_power function prototype. As a result, the
corresponding changes are needed to modules using it. These are mac80211,
iwmc3200wifi and rndis_wlan.
Cc: Samuel Ortiz <samuel.ortiz@intel.com>
Cc: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Samuel Ortiz <samuel.ortiz@intel.com>
Acked-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
While mesh_rx_plink_frame holds sta->lock...
mesh_rx_plink_frame ->
mesh_plink_inc_estab_count ->
ieee80211_bss_info_change_notify
...but ieee80211_bss_info_change_notify is allowed to sleep. A driver
taking advantage of that allowance can cause a scheduling while
atomic bug. Similar paths exist for mesh_plink_dec_estab_count,
so work around those as well.
http://bugzilla.kernel.org/show_bug.cgi?id=16099
Also, correct a minor kerneldoc comment error (mismatched function names).
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: stable@kernel.org
net/mac80211/rc80211_minstrel_ht.c:440:46: warning: incorrect type in argument 2 (different signedness)
net/mac80211/rc80211_minstrel_ht.c:440:46: expected int *idx
net/mac80211/rc80211_minstrel_ht.c:440:46: got unsigned int *<noident>
net/mac80211/rc80211_minstrel_ht.c:446:46: warning: incorrect type in argument 2 (different signedness)
net/mac80211/rc80211_minstrel_ht.c:446:46: expected int *idx
net/mac80211/rc80211_minstrel_ht.c:446:46: got unsigned int *<noident>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
net/mac80211/rx.c:2059:39: warning: symbol 'mgmt' shadows an earlier one
net/mac80211/rx.c:1916:31: originally declared here
Signed-off-by: John W. Linville <linville@tuxdriver.com>
this patch is implementing IP_NODEFRAG option for IPv4 socket.
The reason is, there's no other way to send out the packet with user
customized header of the reassembly part.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use u64_stats_sync infrastructure to provide 64bit rx/tx
counters even on 32bit hosts.
It is safe to use a single u64_stats_sync for rx and tx,
because BH is disabled on both, and we use per_cpu data.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_needs_gso() is checked twice in the TX path once,
before submitting the skb to the qdisc and once after
it is dequeued from the qdisc just before calling
ndo_hard_start(). This opens a window for a user to
change the gso/tso or tx checksum settings that can
cause netif_needs_gso to be true in one check and false
in the other.
Specifically, changing TX checksum setting may cause
the warning in skb_gso_segment() to be triggered if
the checksum is calculated earlier.
This consolidates the netif_needs_gso() calls so that
the stack only checks if gso is needed in
dev_hard_start_xmit().
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Destination was spelled wrong in KConfig.
Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add header file to fix build error:
net/netfilter/xt_IDLETIMER.c:276: error: implicit declaration of function 'MKDEV'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Allow one-packet scheduling for UDP connections. When the fwmark-based or
normal virtual service is marked with '-o' or '--ops' options all
connections are created only to schedule one packet. Useful to schedule UDP
packets from same client port to different real servers. Recommended with
RR or WRR schedulers (the connections are not visible with ipvsadm -L).
Signed-off-by: Nick Chalk <nick@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
It has been reported that the new UFO software fallback path
fails under certain conditions with NFS. I tracked the problem
down to the generation of UFO packets that are smaller than the
MTU. The software fallback path simply discards these packets.
This patch fixes the problem by not generating such packets on
the UFO path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This mechanism introduced in this patch applies (at least) for hardware
designs using a single shared antenna for both WLAN and BT. In these designs,
the antenna must be toggled between WLAN and BT.
In those hardware, managing WLAN co-existence with Bluetooth requires WLAN
full power save whenever there is Bluetooth activity in order for WLAN to be
able to periodically relinquish the antenna to be used for BT. This is because
BT can only access the shared antenna when WLAN is idle or asleep.
Some hardware, for instance the wl1271, are able to indicate to the host
whenever there is BT traffic. In essence, the hardware will send an indication
to the host whenever there is, for example, SCO traffic or A2DP traffic, and
will send another indication when the traffic is over.
The hardware gets information of Bluetooth traffic via hardware co-existence
control lines - these lines are used to negotiate the shared antenna
ownership. The hardware will give the antenna to BT whenever WLAN is sleeping.
This patch adds the interface to mac80211 to facilitate temporarily disabling
of dynamic power save as per request of the WLAN driver. This interface will
immediately force WLAN to full powersave, hence allowing BT coexistence as
described above.
In these kind of shared antenna desings, when WLAN powersave is fully disabled,
Bluetooth will not work simultaneously with WLAN at all. This patch does not
address that problem. This interface will not change PSM state, so if PSM is
disabled it will remain so. Solving this problem requires knowledge about BT
state, and is best done in user-space.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix the following compile warning:
CC [M] net/mac80211/scan.o
net/mac80211/scan.c: In function 'ieee80211_request_internal_scan':
net/mac80211/scan.c:749:23: warning: comparison between 'enum nl80211_band' and 'enum ieee80211_band'
caused by the local variable band not being of the proper 'ieee80211_band' type.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Added new CAIF protocol type CAIFPROTO_DEBUG for accessing
CAIF debug on the ST Ericsson modems.
There are two debug servers on the modem, one for radio related
debug (CAIF_RADIO_DEBUG_SERVICE) and the other for
communication/application related debug (CAIF_COM_DEBUG_SERVICE).
The debug connection can contain trace debug printouts or
interactive debug used for debugging and test.
Debug connections can be of type STREAM or SEQPACKET.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously CAIF supported maximum transfer size of ~4050.
The transfer size is now calculated dynamically based on the
link layers mtu size.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
CAIF Remote File Manager may send or receive more than 4050 bytes.
Due to this The CAIF RFM service have to support segmentation.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Flow control is not used by all CAIF services.
The usage of flow control is now part of the gerneal
initialization function for CAIF Services.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, detection in hwsim and ath9k can
detect that two sw scans are in flight at the
same time, which isn't really true. It is
caused by a race condition, because the scan
complete callback is called too late, after
the lock has been dropped, so that a new scan
can be started before it is called.
It is also called too early semantically, as
it is currently called _after_ the return to
the operating channel -- it should be before
so that drivers know this is the operating
channel again.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
regulatory_init is only called by cfg80211_init which is in .init.text,
too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211_exit is only used as module_exit function, so it can go to
.exit.text saving a few bytes when CONFIG_CFG80211=y.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It is common in end-node, non STP bridges to set forwarding
delay to zero; which causes the forwarding database cleanup
to run every clock tick. Change to run only as soon as needed
or at next ageing timer interval which ever is sooner.
Use round_jiffies_up macro rather than attempting round up
by changing value.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2.6.34 introduced 'conntrack zones' to deal with cases where packets
from multiple identical networks are handled by conntrack/NAT. Packets
are looped through veth devices, during which they are NATed to private
addresses, after which they can continue normally through the stack
and possibly have NAT rules applied a second time.
This works well, but is needlessly complicated for cases where only
a single SNAT/DNAT mapping needs to be applied to these packets. In that
case, all that needs to be done is to assign each network to a seperate
zone and perform NAT as usual. However this doesn't work for packets
destined for the machine performing NAT itself since its corrently not
possible to configure SNAT mappings for the LOCAL_IN chain.
This patch adds a new INPUT chain to the NAT table and changes the
targets performing SNAT to be usable in that chain.
Example usage with two identical networks (192.168.0.0/24) on eth0/eth1:
iptables -t raw -A PREROUTING -i eth0 -j CT --zone 1
iptables -t raw -A PREROUTING -i eth0 -j MARK --set-mark 1
iptables -t raw -A PREROUTING -i eth1 -j CT --zone 2
iptabels -t raw -A PREROUTING -i eth1 -j MARK --set-mark 2
iptables -t nat -A INPUT -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A POSTROUTING -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A INPUT -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t nat -A POSTROUTING -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t raw -A PREROUTING -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A OUTPUT -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A PREROUTING -d 10.0.1.0/24 -j CT --zone 2
iptables -t raw -A OUTPUT -d 10.0.1.0/24 -j CT --zone 2
iptables -t nat -A PREROUTING -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A PREROUTING -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove the restriction that only allows connecting to a unix domain
socket identified by unix path that is in the same network namespace.
Crossing network namespaces is always tricky and we did not support
this at first, because of a strict policy of don't mix the namespaces.
Later after Pavel proposed this we did not support this because no one
had performed the audit to make certain using unix domain sockets
across namespaces is safe.
What fundamentally makes connecting to af_unix sockets in other
namespaces is safe is that you have to have the proper permissions on
the unix domain socket inode that lives in the filesystem. If you
want strict isolation you just don't create inodes where unfriendlys
can get at them, or with permissions that allow unfriendlys to open
them. All nicely handled for us by the mount namespace and other
standard file system facilities.
I looked through unix domain sockets and they are a very controlled
environment so none of the work that goes on in dev_forward_skb to
make crossing namespaces safe appears needed, we are not loosing
controll of the skb and so do not need to set up the skb to look like
it is comming in fresh from the outside world. Further the fields in
struct unix_skb_parms should not have any problems crossing network
namespaces.
Now that we handle SCM_CREDENTIALS in a way that gives useable values
across namespaces. There does not appear to be any operational
problems with encouraging the use of unix domain sockets across
containers either.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In unix_skb_parms store pointers to struct pid and struct cred instead
of raw uid, gid, and pid values, then translate the credentials on
reception into values that are meaningful in the receiving processes
namespaces.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start capturing not only the userspace pid, uid and gid values of the
sending process but also the struct pid and struct cred of the sending
process as well.
This is in preparation for properly supporting SCM_CREDENTIALS for
sockets that have different uid and/or pid namespaces at the different
ends.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
scm_send occasionally allocates state in the scm_cookie, so I have
modified netlink_sendmsg to guarantee that when scm_send succeeds
scm_destory will be called to free that state.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct pid and struct cred to store the peer credentials on struct
sock. This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.
This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To keep the coming code clear and to allow both the sock
code and the scm code to share the logic introduce a
fuction to translate from struct cred to struct ucred.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
https://bugzilla.kernel.org/show_bug.cgi?id=16183
The sch_teql module, which can be used to load balance over a set of
underlying interfaces, stopped working after 2.6.30 and has been
broken in all kernels since then for any underlying interface which
requires the addition of link level headers.
The problem is that the transmit routine relies on being able to
access the destination address in the skb in order to do address
resolution once it has decided which underlying interface it is going
to transmit through.
In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by
default for all interfaces, which causes the destination address to be
released before the transmit routine for the interface is called.
The solution is to clear that flag for teql interfaces.
Signed-off-by: Tom Hughes <tom@compton.nu>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discard the ACK if we find options that do not match current sysctl
settings.
Previously it was possible to create a connection with sack, wscale,
etc. enabled even if the feature was disabled via sysctl.
Also remove an unneeded call to tcp_sack_reset() in
cookie_check_timestamp: Both call sites (cookie_v4_check,
cookie_v6_check) zero "struct tcp_options_received", hand it to
tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
and then call cookie_check_timestamp().
Even if num_sacks/dsacks were changed, the structure is allocated on
the stack and after cookie_check_timestamp returns only a few selected
members are copied to the inet_request_sock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
regression introduced by b8d92c9c14
In function ‘ieee80211_work_rx_queued_mgmt’:
warning: ‘rma’ may be used uninitialized in this function
this re-adds default value WORK_ACT_NONE back to rma
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches.
Thats a bit unfortunate, since old size was exactly 64 bytes.
This can be solved, using an union between this rcu_head an four fields,
that are normally used only when a refcount is taken on inet_peer.
rcu_head is used only when refcnt=-1, right before structure freeing.
Add a inet_peer_refcheck() function to check this assertion for a while.
We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commit aa1039e73c (inetpeer: RCU conversion)
Unused inet_peer entries have a null refcnt.
Using atomic_inc_not_zero() in rcu lookups is not going to work for
them, and slow path is taken.
Fix this using -1 marker instead of 0 for deleted entries.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The version of br_netpoll_send_skb used when netpoll is off is
missing a const thus causing a warning.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge multicast patches introduced an OOM crash in the forward
path, when deliver_clone fails to clone the skb.
Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Third param (work) is unused, remove it.
Remove __inline__ and inline qualifiers.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of doing one atomic operation per frag, we can factorize them.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When syncookies are in effect, req->iif is left uninitialized.
In case of e.g. link-local addresses the route lookup then fails
and no syn-ack is sent.
Rearrange things so ->iif is also initialized in the syncookie case.
want_cookie can only be true when the isn was zero, thus move the want_cookie
check into the "!isn" branch.
Cc: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
inetpeer currently uses an AVL tree protected by an rwlock.
It's possible to make most lookups use RCU
1) Add a struct rcu_head to struct inet_peer
2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().
3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.
4) add an smp_wmb() in link_to_pool() right before node insert.
5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.
6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.
7) inet_getpeer() first attempts lockless lookup.
Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
If this attemps fails, lock is taken a regular lookup is performed
again.
8) convert peers.lock from rwlock to a spinlock
9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When management frame protection (IEEE 802.11w) is used, we must use a
separate counter for tracking received CCMP packet number for the
management frames. The previously used NUM_RX_DATA_QUEUESth queue was
shared with data frames when QoS was not used and that can cause
problems in detecting replays incorrectly for robust management frames.
Add a new counter just for robust management frames to avoid this issue.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When management frame protection (IEEE 802.11w) is used,
Deauthentication frame needs to be protected when the pairwise key is
configured. mac80211 was removing the station entry (and its keys)
before actually sending out the Deauthentication frame. Fix this by
reordering the code to send the frame before the station entry gets
removed. This matches an earlier change that handled the Disassociation
frame processing, but missed Deauthentication frames.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The ps-qos latency handling is broken. It uses predetermined latency values
to select specific dynamic PS timeouts. With common AP configurations, these
values overlap with beacon interval and are therefore essentially useless
(for network latencies less than the beacon interval, PSM is disabled.)
This patch remedies the problem by replacing the predetermined network latency
values with one high value (1900ms) which is used to go trigger full psm. For
backwards compatibility, the value 2000ms is still mapped to a dynamic ps
timeout of 100ms.
Currently also the mac80211 internal value for storing user space configured
dynamic PSM values is incorrectly in the driver visible ieee80211_conf struct.
Move it to the ieee80211_local struct.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add possibility to register rx_handler data pointer along with a rx_handler.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are multiple problems with the newly added netpoll support:
1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.
This patch fixes all of these problems. In particular, we now
allocate proper netpoll structures for each underlying device.
We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll. Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).
This allows us to do away with the npinfo juggling that caused
problem number 1.
Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.
Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.
They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ndo_netpoll_setup as the initialisation primitive
to complement ndo_netpoll_cleanup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it stands, netpoll_setup and netpoll_cleanup have no locking
protection whatsoever. So chaos ensures if two entities try to
perform them on the same device.
This patch adds RTNL to the equation. The code has been rearranged so
that bits that do not need RTNL protection are now moved to the top of
netpoll_setup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of RCU in netpoll is incorrect in a number of places:
1) The initial setting is lacking a write barrier.
2) The synchronize_rcu is in the wrong place.
3) Read barriers are missing.
4) Some places are even missing rcu_read_lock.
5) npinfo is zeroed after freeing.
This patch fixes those issues. As most users are in BH context,
this also converts the RCU usage to the BH variant.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that netpoll always zaps npinfo we no longer need to do it
in bridge.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have to NULL npinfo regardless of whether there is a
ndo_netpoll_cleanup, it makes sense to do this unconditionally
in netpoll_cleanup rather than having every driver do it by
themselves.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.
Timers are identified by labels and are created when a rule is set with a new
label. The rules also take a timeout value (in seconds) as an option. If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.
One entry for each timer is created in sysfs. This attribute contains the
timer remaining for the timer to expire. The attributes are located under
the xt_idletimer class:
/sys/class/xt_idletimer/timers/<label>
When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).
Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- clusterip_lock becomes a spinlock
- lockless lookups
- kfree() deferred after RCU grace period
- rcu_barrier_bh() inserted in clusterip_tg_exit()
v2)
- As Patrick pointed out, we use atomic_inc_not_zero() in
clusterip_config_find_get().
- list_add_rcu() and list_del_rcu() variants are used.
- atomic_dec_and_lock() used in clusterip_config_entry_put()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Try to reduce cache line contentions in peer management, to reduce IP
defragmentation overhead.
- peer_fake_node is marked 'const' to make sure its not modified.
(tested with CONFIG_DEBUG_RODATA=y)
- Group variables in two structures to reduce number of dirtied cache
lines. One named "peers" for avl tree root, its number of entries, and
associated lock. (candidate for RCU conversion)
- A second one named "unused_peers" for unused list and its lock
- Add a !list_empty() test in unlink_from_unused() to avoid taking lock
when entry is not unused.
- Use atomic_dec_and_lock() in inet_putpeer() to avoid taking lock in
some cases.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU to avoid atomic ops on idev refcnt in ipv6_get_mtu()
and ip6_dst_hoplimit()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use __in6_dev_get() instead of in6_dev_get()/in6_dev_put()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a circular locking dependency when configuring the
hardware ARP filters on association, occurring when flushing the mac80211
workqueue. This is what happens:
[ 92.026800] =======================================================
[ 92.030507] [ INFO: possible circular locking dependency detected ]
[ 92.030507] 2.6.34-04781-g2b2c009 #85
[ 92.030507] -------------------------------------------------------
[ 92.030507] modprobe/5225 is trying to acquire lock:
[ 92.030507] ((wiphy_name(local->hw.wiphy))){+.+.+.}, at: [<ffffffff8105b5c0>] flush_workq
ueue+0x0/0xb0
[ 92.030507]
[ 92.030507] but task is already holding lock:
[ 92.030507] (rtnl_mutex){+.+.+.}, at: [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[ 92.030507]
[ 92.030507] which lock already depends on the new lock.
[ 92.030507]
[ 92.030507]
[ 92.030507] the existing dependency chain (in reverse order) is:
[ 92.030507]
[ 92.030507] -> #2 (rtnl_mutex){+.+.+.}:
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff81341754>] mutex_lock_nested+0x44/0x300
[ 92.030507] [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[ 92.030507] [<ffffffffa022d47c>] ieee80211_assoc_done+0x6c/0xe0 [mac80211]
[ 92.030507] [<ffffffffa022f2ad>] ieee80211_work_work+0x31d/0x1280 [mac80211]
[ 92.030507] -> #1 ((&local->work_work)){+.+.+.}:
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff8105a51a>] worker_thread+0x22a/0x370
[ 92.030507] [<ffffffff8105ecc6>] kthread+0x96/0xb0
[ 92.030507] [<ffffffff81003a94>] kernel_thread_helper+0x4/0x10
[ 92.030507]
[ 92.030507] -> #0 ((wiphy_name(local->hw.wiphy))){+.+.+.}:
[ 92.030507] [<ffffffff81075fdc>] __lock_acquire+0x1c0c/0x1d50
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff8105b60e>] flush_workqueue+0x4e/0xb0
[ 92.030507] [<ffffffffa023ff7b>] ieee80211_stop_device+0x2b/0xb0 [mac80211]
[ 92.030507] [<ffffffffa0231635>] ieee80211_stop+0x3e5/0x680 [mac80211]
The locking in this case is quite complex. Fix the problem by rewriting the
way the hardware ARP filter list is handled - i.e. make a copy of the address
list to the bss_conf struct, and provide that list to the hardware driver
when needed.
The current patch will enable filtering also in promiscuous mode. This may need
to be changed in the future.
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove BSS from cfg80211 BSS list if we are only member in IBSS when
leaving it.
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add changed basic rates flag to bss_changed while joinig ibss network.
This patch is split from the patch containing support for setting basic
rates when creating ibss network. Original patch was posted by Johannes
Berg on the linux-wireless posting list.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>