Commit Graph

25154 Commits

Author SHA1 Message Date
Johannes Berg
00b14825ee Merge remote-tracking branch 'wireless-next/master' into mac80211-next 2012-09-06 17:05:28 +02:00
Johannes Berg
882a7c69d3 mac80211: disconnect if channel switch fails
Disconnect from the AP if channel switching in the
driver failed or if the new channel is unavailable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-06 17:05:24 +02:00
Johannes Berg
30dd3edf97 mac80211: don't hang on to sched_scan_ies
There's no need to keep a copy of the scheduled
scan IEs after the driver has been told, if it
requires a copy it must make one. Therefore, we
can move sched_scan_ies into the function.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-06 15:56:35 +02:00
Johannes Berg
944b9e375d Merge remote-tracking branch 'mac80211/master' into mac80211-next
Pull in mac80211.git to let the next patch apply
without conflicts, also resolving a hwsim conflict.

Conflicts:
	drivers/net/wireless/mac80211_hwsim.c

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-06 15:56:02 +02:00
Eric Dumazet
0626af3139 netfilter: take care of timewait sockets
Sami Farin reported crashes in xt_LOG because it assumes skb->sk is a
full blown socket.

Since (41063e9 ipv4: Early TCP socket demux), we can have skb->sk
pointing to a timewait socket.

Same fix is needed in nfnetlink_log.

Diagnosed-by: Florian Westphal <fw@strlen.de>
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-06 14:28:18 +02:00
Wei Yongjun
a4ed53466a mac80211: use list_move instead of list_del/list_add
Using list_move() instead of list_del() + list_add().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-06 11:55:47 +02:00
Mikulas Patocka
ed6fe9d614 Fix order of arguments to compat_put_time[spec|val]
Commit 644595f896 ("compat: Handle COMPAT_USE_64BIT_TIME in
net/socket.c") introduced a bug where the helper functions to take
either a 64-bit or compat time[spec|val] got the arguments in the wrong
order, passing the kernel stack pointer off as a user pointer (and vice
versa).

Because of the user address range check, that in turn then causes an
EFAULT due to the user pointer range checking failing for the kernel
address.  Incorrectly resuling in a failed system call for 32-bit
processes with a 64-bit kernel.

On odder architectures like HP-PA (with separate user/kernel address
spaces), it can be used read kernel memory.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-05 18:34:13 -07:00
Nicolas Dichtel
ef2c7d7b59 ipv6: fix handling of blackhole and prohibit routes
When adding a blackhole or a prohibit route, they were handling like classic
routes. Moreover, it was only possible to add this kind of routes by specifying
an interface.

Bug already reported here:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498

Before the patch:
  $ ip route add blackhole 2001::1/128
  RTNETLINK answers: No such device
  $ ip route add blackhole 2001::1/128 dev eth0
  $ ip -6 route | grep 2001
  2001::1 dev eth0  metric 1024

After:
  $ ip route add blackhole 2001::1/128
  $ ip -6 route | grep 2001
  blackhole 2001::1 dev lo  metric 1024  error -22

v2: wrong patch
v3: add a field fc_type in struct fib6_config to store RTN_* type

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 17:49:28 -04:00
Eric Dumazet
23d3b8bfb8 net: qdisc busylock needs lockdep annotations
It seems we need to provide ability for stacked devices
to use specific lock_class_key for sch->busylock

We could instead default l2tpeth tx_queue_len to 0 (no qdisc), but
a user might use a qdisc anyway.

(So same fixes are probably needed on non LLTX stacked drivers)

Noticed while stressing L2TPV3 setup :

======================================================
 [ INFO: possible circular locking dependency detected ]
 3.6.0-rc3+ #788 Not tainted
 -------------------------------------------------------
 netperf/4660 is trying to acquire lock:
  (l2tpsock){+.-...}, at: [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]

 but task is already holding lock:
  (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(&sch->busylock)->rlock){+.-...}:
        [<ffffffff810a5df0>] lock_acquire+0x90/0x200
        [<ffffffff817499fc>] _raw_spin_lock_irqsave+0x4c/0x60
        [<ffffffff81074872>] __wake_up+0x32/0x70
        [<ffffffff8136d39e>] tty_wakeup+0x3e/0x80
        [<ffffffff81378fb3>] pty_write+0x73/0x80
        [<ffffffff8136cb4c>] tty_put_char+0x3c/0x40
        [<ffffffff813722b2>] process_echoes+0x142/0x330
        [<ffffffff813742ab>] n_tty_receive_buf+0x8fb/0x1230
        [<ffffffff813777b2>] flush_to_ldisc+0x142/0x1c0
        [<ffffffff81062818>] process_one_work+0x198/0x760
        [<ffffffff81063236>] worker_thread+0x186/0x4b0
        [<ffffffff810694d3>] kthread+0x93/0xa0
        [<ffffffff81753e24>] kernel_thread_helper+0x4/0x10

 -> #0 (l2tpsock){+.-...}:
        [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10
        [<ffffffff810a5df0>] lock_acquire+0x90/0x200
        [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50
        [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
        [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
        [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70
        [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290
        [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00
        [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890
        [<ffffffff815db019>] ip_output+0x59/0xf0
        [<ffffffff815da36d>] ip_local_out+0x2d/0xa0
        [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680
        [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60
        [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30
        [<ffffffff815f5300>] tcp_push_one+0x30/0x40
        [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040
        [<ffffffff81614495>] inet_sendmsg+0x125/0x230
        [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0
        [<ffffffff81579ece>] sys_sendto+0xfe/0x130
        [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b
  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&(&sch->busylock)->rlock);
                                lock(l2tpsock);
                                lock(&(&sch->busylock)->rlock);
   lock(l2tpsock);

  *** DEADLOCK ***

 5 locks held by netperf/4660:
  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e581c>] tcp_sendmsg+0x2c/0x1040
  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff815da3e0>] ip_queue_xmit+0x0/0x680
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9ac5>] ip_finish_output+0x135/0x890
  #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81595820>] dev_queue_xmit+0x0/0xe00
  #4:  (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00

 stack backtrace:
 Pid: 4660, comm: netperf Not tainted 3.6.0-rc3+ #788
 Call Trace:
  [<ffffffff8173dbf8>] print_circular_bug+0x1fb/0x20c
  [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10
  [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0
  [<ffffffff810a3f44>] ? __lock_acquire+0x2e4/0x1b10
  [<ffffffff810a5df0>] lock_acquire+0x90/0x200
  [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
  [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50
  [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
  [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
  [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
  [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70
  [<ffffffff81594e0e>] ? dev_hard_start_xmit+0x5e/0xa70
  [<ffffffff81595961>] ? dev_queue_xmit+0x141/0xe00
  [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290
  [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00
  [<ffffffff81595820>] ? dev_hard_start_xmit+0xa70/0xa70
  [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890
  [<ffffffff815d9ac5>] ? ip_finish_output+0x135/0x890
  [<ffffffff815db019>] ip_output+0x59/0xf0
  [<ffffffff815da36d>] ip_local_out+0x2d/0xa0
  [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680
  [<ffffffff815da3e0>] ? ip_local_out+0xa0/0xa0
  [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60
  [<ffffffff815fa25e>] ? tcp_md5_do_lookup+0x18e/0x1a0
  [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30
  [<ffffffff815f5300>] tcp_push_one+0x30/0x40
  [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040
  [<ffffffff81614495>] inet_sendmsg+0x125/0x230
  [<ffffffff81614370>] ? inet_create+0x6b0/0x6b0
  [<ffffffff8157e6e2>] ? sock_update_classid+0xc2/0x3b0
  [<ffffffff8157e750>] ? sock_update_classid+0x130/0x3b0
  [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0
  [<ffffffff81162579>] ? fget_light+0x3f9/0x4f0
  [<ffffffff81579ece>] sys_sendto+0xfe/0x130
  [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff8174a0b0>] ? _raw_spin_unlock_irq+0x30/0x50
  [<ffffffff810757e3>] ? finish_task_switch+0x83/0xf0
  [<ffffffff810757a6>] ? finish_task_switch+0x46/0xf0
  [<ffffffff81752cb7>] ? sysret_check+0x1b/0x56
  [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 17:49:27 -04:00
Stephen Rothwell
2c96932229 netfilter: ipv6: using csum_ipv6_magic requires net/ip6_checksum.h
Fixes this build error:

net/ipv6/netfilter/nf_nat_l3proto_ipv6.c: In function 'nf_nat_ipv6_csum_recalc':
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:144:4: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 17:46:06 -04:00
Nikolay Aleksandrov
c6c13965f4 net: add unknown state to sysfs NIC duplex export
Currently when the NIC duplex state is DUPLEX_UNKNOWN it is exported as
full through sysfs, this patch adds support for DUPLEX_UNKNOWN. It is
handled the same way as in ethtool.

 Signed-off-by: Nikolay Aleksandrov <naleksan@redhat.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 17:40:07 -04:00
Julian Anastasov
d013ef2aba tcp: fix possible socket refcount problem for ipv6
commit 144d56e910
("tcp: fix possible socket refcount problem") is missing
the IPv6 part. As tcp_release_cb is shared by both protocols
we should hold sock reference for the TCP_MTU_REDUCED_DEFERRED
bit.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 17:16:25 -04:00
Huang Shijie
f21ec3d2d4 serial: add a new helper function
In most of the time, the driver needs to check if the cts flow control
is enabled. But now, the driver checks the ASYNC_CTS_FLOW flag manually,
which is not a grace way. So add a new wraper function to make the code
tidy and clean.

Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-05 13:28:35 -07:00
Julian Anastasov
d23ff70164 tcp: add generic netlink support for tcp_metrics
Add support for genl "tcp_metrics". No locking
is changed, only that now we can unlink and delete
entries after grace period. We implement get/del for
single entry and dump to support show/flush filtering
in user space. Del without address attribute causes
flush for all addresses, sadly under genl_mutex.

v2:
- remove rcu_assign_pointer as suggested by Eric Dumazet,
it is not needed because there are no other writes under lock
- move the flushing code in tcp_metrics_flush_all

v3:
- remove synchronize_rcu on flush as suggested by Eric Dumazet

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 15:15:02 -04:00
John W. Linville
785a7de9ee Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-09-05 14:48:15 -04:00
John W. Linville
518eefe1b6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-09-05 14:46:30 -04:00
Pablo Neira Ayuso
00545bec94 netfilter: fix crash during boot if NAT has been compiled built-in
(c7232c9 netfilter: add protocol independent NAT core) introduced a
problem that leads to crashing during boot due to NULL pointer
dereference. It seems that xt_nat calls xt_register_target() before
xt_init():

net/netfilter/x_tables.c:static struct xt_af *xt; is NULL and we crash on
xt_register_target(struct xt_target *target)
{
        u_int8_t af = target->family;
        int ret;

        ret = mutex_lock_interruptible(&xt[af].mutex);
...

Fix this by changing the linking order, to make sure that x_tables
comes before xt_nat.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-05 18:35:51 +02:00
Hila Gonen
768be59f30 cfg80211: fix indentation
checkpatch pointed out an issue, fix it.

Signed-off-by: Hila Gonen <hila.gonen@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 16:54:05 +02:00
Arend van Spriel
e5f5b2fb07 wext: include wireless event id when it has a size problem
The wext code checks is the event data is within size limits.
When this check fails a message is logged with violating size.
This patch adds the event id to put us on the right track for
resolving that violation.

Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 16:12:44 +02:00
LEO Airwarosu Yoichi Shinoda
7ce8c7a343 mac80211: Various small fixes for cfg.c: mpath_set_pinfo()
Various small fixes for net/mac80211/cfg.c:mpath_set_pinfo():
Initialize *pinfo before filling members in, handle MESH_PATH_RESOLVED
correctly, and remove bogus assignment; result in correct display
of FLAGS values and meaningful EXPTIME for expired paths in iw utility.

Signed-off-by: Yoichi Shinoda <shinoda@jaist.ac.jp>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 16:06:05 +02:00
Johannes Berg
00ea6deb0c mac80211: don't use kerneldoc for ieee80211_add_rx_radiotap_header
Doing so creates warnings, but the function is internal and
not part of the 802.11 docbooks, so it from kerneldoc.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 15:55:27 +02:00
Wei Yongjun
00a9ac4c01 cfg80211: use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 15:39:37 +02:00
Eric Dumazet
c0cc88a762 l2tp: fix a typo in l2tp_eth_dev_recv()
While investigating l2tp bug, I hit a bug in eth_type_trans(),
because not enough bytes were pulled in skb head.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 15:54:55 -04:00
Masatake YAMATO
600e177920 net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries
lsof reports some of socket descriptors as "can't identify protocol" like:

    [yamato@localhost]/tmp% sudo lsof | grep dbus | grep iden
    dbus-daem   652          dbus    6u     sock ... 17812 can't identify protocol
    dbus-daem   652          dbus   34u     sock ... 24689 can't identify protocol
    dbus-daem   652          dbus   42u     sock ... 24739 can't identify protocol
    dbus-daem   652          dbus   48u     sock ... 22329 can't identify protocol
    ...

lsof cannot resolve the protocol used in a socket because procfs
doesn't provide the map between inode number on sockfs and protocol
type of the socket.

For improving the situation this patch adds an extended attribute named
'system.sockprotoname' in which the protocol name for
/proc/PID/fd/SOCKET is stored. So lsof can know the protocol for a
given /proc/PID/fd/SOCKET with getxattr system call.

A few weeks ago I submitted a patch for the same purpose. The patch
was introduced /proc/net/sockfs which enumerates inodes and protocols
of all sockets alive on a system. However, it was rejected because (1)
a global lock was needed, and (2) the layout of struct socket was
changed with the patch.

This patch doesn't use any global lock; and doesn't change the layout
of any structs.

In this patch, a protocol name is stored to dentry->d_name of sockfs
when new socket is associated with a file descriptor. Before this
patch dentry->d_name was not used; it was just filled with empty
string. lsof may use an extended attribute named
'system.sockprotoname' to retrieve the value of dentry->d_name.

It is nice if we can see the protocol name with ls -l
/proc/PID/fd. However, "socket:[#INODE]", the name format returned
from sockfs_dname() was already defined. To keep the compatibility
between kernel and user land, the extended attribute is used to
prepare the value of dentry->d_name.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 15:52:13 -04:00
David S. Miller
cefd81cfec Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch 2012-09-04 15:22:28 -04:00
David S. Miller
9d148e39d1 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch 2012-09-04 15:17:52 -04:00
David S. Miller
798b2cbf92 net: Add INET dependency on aes crypto for the sake of TCP fastopen.
Stephen Rothwell says:

====================
After merging the final tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

net/built-in.o: In function `tcp_fastopen_ctx_free':
tcp_fastopen.c:(.text+0x5cc5c): undefined reference to `crypto_destroy_tfm'
net/built-in.o: In function `tcp_fastopen_reset_cipher':
(.text+0x5cccc): undefined reference to `crypto_alloc_base'
net/built-in.o: In function `tcp_fastopen_reset_cipher':
(.text+0x5cd6c): undefined reference to `crypto_destroy_tfm'

Presumably caused by commit 1046716368 ("tcp: TCP Fast Open Server -
header & support functions") from the net-next tree.  I assume that some
dependency on the CRYPTO infrastructure is missing.

I have reverted commit 1bed966cc3 ("Merge branch
'tcp_fastopen_server'") for today.
====================

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 14:20:14 -04:00
Wei Yongjun
54a2792423 sctp: use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 14:16:13 -04:00
Steffen Klassert
3b59df46a4 xfrm: Workaround incompatibility of ESN and async crypto
ESN for esp is defined in RFC 4303. This RFC assumes that the
sequence number counters are always up to date. However,
this is not true if an async crypto algorithm is employed.

If the sequence number counters are not up to date on sequence
number check, we may incorrectly update the upper 32 bit of
the sequence number. This leads to a DOS.

We workaround this by comparing the upper sequence number,
(used for authentication) with the upper sequence number
computed after the async processing. We drop the packet
if these numbers are different.

To do this, we introduce a recheck function that does this
check in the ESN case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 14:09:45 -04:00
Eric Dumazet
37159ef2c1 l2tp: fix a lockdep splat
Fixes following lockdep splat :

[ 1614.734896] =============================================
[ 1614.734898] [ INFO: possible recursive locking detected ]
[ 1614.734901] 3.6.0-rc3+ #782 Not tainted
[ 1614.734903] ---------------------------------------------
[ 1614.734905] swapper/11/0 is trying to acquire lock:
[ 1614.734907]  (slock-AF_INET){+.-...}, at: [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.734920]
[ 1614.734920] but task is already holding lock:
[ 1614.734922]  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734932]
[ 1614.734932] other info that might help us debug this:
[ 1614.734935]  Possible unsafe locking scenario:
[ 1614.734935]
[ 1614.734937]        CPU0
[ 1614.734938]        ----
[ 1614.734940]   lock(slock-AF_INET);
[ 1614.734943]   lock(slock-AF_INET);
[ 1614.734946]
[ 1614.734946]  *** DEADLOCK ***
[ 1614.734946]
[ 1614.734949]  May be due to missing lock nesting notation
[ 1614.734949]
[ 1614.734952] 7 locks held by swapper/11/0:
[ 1614.734954]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff81592801>] __netif_receive_skb+0x251/0xd00
[ 1614.734964]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff815d319c>] ip_local_deliver_finish+0x4c/0x4e0
[ 1614.734972]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff8160d116>] icmp_socket_deliver+0x46/0x230
[ 1614.734982]  #3:  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734989]  #4:  (rcu_read_lock){.+.+..}, at: [<ffffffff815da240>] ip_queue_xmit+0x0/0x680
[ 1614.734997]  #5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9925>] ip_finish_output+0x135/0x890
[ 1614.735004]  #6:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81595680>] dev_queue_xmit+0x0/0xe00
[ 1614.735012]
[ 1614.735012] stack backtrace:
[ 1614.735016] Pid: 0, comm: swapper/11 Not tainted 3.6.0-rc3+ #782
[ 1614.735018] Call Trace:
[ 1614.735020]  <IRQ>  [<ffffffff810a50ac>] __lock_acquire+0x144c/0x1b10
[ 1614.735033]  [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0
[ 1614.735037]  [<ffffffff810a6762>] ? mark_held_locks+0x82/0x130
[ 1614.735042]  [<ffffffff810a5df0>] lock_acquire+0x90/0x200
[ 1614.735047]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735051]  [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10
[ 1614.735060]  [<ffffffff81749b31>] _raw_spin_lock+0x41/0x50
[ 1614.735065]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735069]  [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735075]  [<ffffffffa014f7f2>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[ 1614.735079]  [<ffffffff81595112>] dev_hard_start_xmit+0x502/0xa70
[ 1614.735083]  [<ffffffff81594c6e>] ? dev_hard_start_xmit+0x5e/0xa70
[ 1614.735087]  [<ffffffff815957c1>] ? dev_queue_xmit+0x141/0xe00
[ 1614.735093]  [<ffffffff815b622e>] sch_direct_xmit+0xfe/0x290
[ 1614.735098]  [<ffffffff81595865>] dev_queue_xmit+0x1e5/0xe00
[ 1614.735102]  [<ffffffff81595680>] ? dev_hard_start_xmit+0xa70/0xa70
[ 1614.735106]  [<ffffffff815b4daa>] ? eth_header+0x3a/0xf0
[ 1614.735111]  [<ffffffff8161d33e>] ? fib_get_table+0x2e/0x280
[ 1614.735117]  [<ffffffff8160a7e2>] arp_xmit+0x22/0x60
[ 1614.735121]  [<ffffffff8160a863>] arp_send+0x43/0x50
[ 1614.735125]  [<ffffffff8160b82f>] arp_solicit+0x18f/0x450
[ 1614.735132]  [<ffffffff8159d9da>] neigh_probe+0x4a/0x70
[ 1614.735137]  [<ffffffff815a191a>] __neigh_event_send+0xea/0x300
[ 1614.735141]  [<ffffffff815a1c93>] neigh_resolve_output+0x163/0x260
[ 1614.735146]  [<ffffffff815d9cf5>] ip_finish_output+0x505/0x890
[ 1614.735150]  [<ffffffff815d9925>] ? ip_finish_output+0x135/0x890
[ 1614.735154]  [<ffffffff815dae79>] ip_output+0x59/0xf0
[ 1614.735158]  [<ffffffff815da1cd>] ip_local_out+0x2d/0xa0
[ 1614.735162]  [<ffffffff815da403>] ip_queue_xmit+0x1c3/0x680
[ 1614.735165]  [<ffffffff815da240>] ? ip_local_out+0xa0/0xa0
[ 1614.735172]  [<ffffffff815f4402>] tcp_transmit_skb+0x402/0xa60
[ 1614.735177]  [<ffffffff815f5a11>] tcp_retransmit_skb+0x1a1/0x620
[ 1614.735181]  [<ffffffff815f7e93>] tcp_retransmit_timer+0x393/0x960
[ 1614.735185]  [<ffffffff815fce23>] ? tcp_v4_err+0x163/0x6b0
[ 1614.735189]  [<ffffffff815fd317>] tcp_v4_err+0x657/0x6b0
[ 1614.735194]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735199]  [<ffffffff8160d19e>] icmp_socket_deliver+0xce/0x230
[ 1614.735203]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735208]  [<ffffffff8160d464>] icmp_unreach+0xe4/0x2c0
[ 1614.735213]  [<ffffffff8160e520>] icmp_rcv+0x350/0x4a0
[ 1614.735217]  [<ffffffff815d3285>] ip_local_deliver_finish+0x135/0x4e0
[ 1614.735221]  [<ffffffff815d319c>] ? ip_local_deliver_finish+0x4c/0x4e0
[ 1614.735225]  [<ffffffff815d3ffa>] ip_local_deliver+0x4a/0x90
[ 1614.735229]  [<ffffffff815d37b7>] ip_rcv_finish+0x187/0x730
[ 1614.735233]  [<ffffffff815d425d>] ip_rcv+0x21d/0x300
[ 1614.735237]  [<ffffffff81592a1b>] __netif_receive_skb+0x46b/0xd00
[ 1614.735241]  [<ffffffff81592801>] ? __netif_receive_skb+0x251/0xd00
[ 1614.735245]  [<ffffffff81593368>] process_backlog+0xb8/0x180
[ 1614.735249]  [<ffffffff81593cf9>] net_rx_action+0x159/0x330
[ 1614.735257]  [<ffffffff810491f0>] __do_softirq+0xd0/0x3e0
[ 1614.735264]  [<ffffffff8109ed24>] ? tick_program_event+0x24/0x30
[ 1614.735270]  [<ffffffff8175419c>] call_softirq+0x1c/0x30
[ 1614.735278]  [<ffffffff8100425d>] do_softirq+0x8d/0xc0
[ 1614.735282]  [<ffffffff8104983e>] irq_exit+0xae/0xe0
[ 1614.735287]  [<ffffffff8175494e>] smp_apic_timer_interrupt+0x6e/0x99
[ 1614.735291]  [<ffffffff81753a1c>] apic_timer_interrupt+0x6c/0x80
[ 1614.735293]  <EOI>  [<ffffffff810a14ad>] ? trace_hardirqs_off+0xd/0x10
[ 1614.735306]  [<ffffffff81336f85>] ? intel_idle+0xf5/0x150
[ 1614.735310]  [<ffffffff81336f7e>] ? intel_idle+0xee/0x150
[ 1614.735317]  [<ffffffff814e6ea9>] cpuidle_enter+0x19/0x20
[ 1614.735321]  [<ffffffff814e7538>] cpuidle_idle_call+0xa8/0x630
[ 1614.735327]  [<ffffffff8100c1ba>] cpu_idle+0x8a/0xe0
[ 1614.735333]  [<ffffffff8173762e>] start_secondary+0x220/0x222

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 14:07:50 -04:00
Alan Cox
6cf5c95117 netrom: copy_datagram_iovec can fail
Check for an error from this and if so bail properly.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 12:57:35 -04:00
Wei Yongjun
b4e4f47e94 nl80211: fix possible memory leak nl80211_connect()
connkeys is malloced in nl80211_parse_connkeys() and should
be freed in the error handling case, otherwise it will cause
memory leak.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-04 18:06:00 +02:00
Eliad Peller
3d2abdfdf1 mac80211: clear bssid on auth/assoc failure
ifmgd->bssid wasn't cleared properly in some
auth/assoc failure cases, causing mac80211 and
the low-level driver to go out of sync.

Clear ifmgd->bssid on failure, and notify the driver.

Cc: stable@kernel.org # 3.4+
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-04 17:14:19 +02:00
Ilan Peer
0ef24e528f mac80211: Do not check for valid hw_queues for P2P_DEVICE
A P2P Device interface does not have a netdev, and is not
expected to be used for transmitting data, so there is no
need to assign hw queues for it.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-04 14:16:41 +02:00
Pravin B Shelar
15eac2a742 openvswitch: Increase maximum number of datapath ports.
Use hash table to store ports of datapath. Allow 64K ports per switch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-09-03 19:20:49 -07:00
Jesse Gross
c303aa94cd openvswitch: Fix FLOW_BUFSIZE definition.
The vlan encapsulation fields in the maximum flow defintion were
never updated when the representation changed before upstreaming.
In theory this could cause a kernel panic when a maximum length
flow is used.  In practice this has never happened (to my knowledge)
because skb allocations are padded out to a cache line so you would
need the right combination of flow and packet being sent to userspace.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-09-03 19:06:27 -07:00
David S. Miller
1e9f0207d3 Merge branch 'master' of git://1984.lsi.us.es/nf-next 2012-09-03 20:26:45 -04:00
Eric Dumazet
b379135c40 fq_codel: dont reinit flow state
When fq_codel builds a new flow, it should not reset codel state.

Codel algo needs to get previous values (lastcount, drop_next) to get
proper behavior.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 14:36:50 -04:00
Yuchung Cheng
684bad1107 tcp: use PRR to reduce cwin in CWR state
Use proportional rate reduction (PRR) algorithm to reduce cwnd in CWR state,
in addition to Recovery state. Retire the current rate-halving in CWR.
When losses are detected via ACKs in CWR state, the sender enters Recovery
state but the cwnd reduction continues and does not restart.

Rename and refactor cwnd reduction functions since both CWR and Recovery
use the same algorithm:
tcp_init_cwnd_reduction() is new and initiates reduction state variables.
tcp_cwnd_reduction() is previously tcp_update_cwnd_in_recovery().
tcp_ends_cwnd_reduction() is previously  tcp_complete_cwr().

The rate halving functions and logic such as tcp_cwnd_down(), tcp_min_cwnd(),
and the cwnd moderation inside tcp_enter_cwr() are removed. The unused
parameter, flag, in tcp_cwnd_reduction() is also removed.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 14:34:02 -04:00
Yuchung Cheng
fb4d3d1df3 tcp: move tcp_update_cwnd_in_recovery
To prepare replacing rate halving with PRR algorithm in CWR state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 14:34:02 -04:00
Yuchung Cheng
09484d1f6e tcp: move tcp_enter_cwr()
To prepare replacing rate halving with PRR algorithm in CWR state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 14:34:02 -04:00
Thomas Graf
4c3a5bdae2 sctp: Don't charge for data in sndbuf again when transmitting packet
SCTP charges wmem_alloc via sctp_set_owner_w() in sctp_sendmsg() and via
skb_set_owner_w() in sctp_packet_transmit(). If a sender runs out of
sndbuf it will sleep in sctp_wait_for_sndbuf() and expects to be waken up
by __sctp_write_space().

Buffer space charged via sctp_set_owner_w() is released in sctp_wfree()
which calls __sctp_write_space() directly.

Buffer space charged via skb_set_owner_w() is released via sock_wfree()
which calls sk->sk_write_space() _if_ SOCK_USE_WRITE_QUEUE is not set.
sctp_endpoint_init() sets SOCK_USE_WRITE_QUEUE on all sockets.

Therefore if sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is
interrupted by a signal.

This could be fixed by clearing the SOCK_USE_WRITE_QUEUE flag but ...

Charging for the data twice does not make sense in the first place, it
leads to overcharging sndbuf by a factor 2. Therefore this patch only
charges a single byte in wmem_alloc when transmitting an SCTP packet to
ensure that the socket stays alive until the packet has been released.

This means that control chunks are no longer accounted for in wmem_alloc
which I believe is not a problem as skb->truesize will typically lead
to overcharging anyway and thus compensates for any control overhead.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 13:24:13 -04:00
Eric Dumazet
e812347ccf net: sock_edemux() should take care of timewait sockets
sock_edemux() can handle either a regular socket or a timewait socket

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 13:22:43 -04:00
Pablo Neira Ayuso
ace1fe1231 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling
of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
2012-09-03 15:34:51 +02:00
Jan Beulich
ce9f3f31ef netfilter: properly annotate ipv4_netfilter_{init,fini}()
Despite being just a few bytes of code, they should still have proper
annotations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:56:04 +02:00
Michael Wang
1c15b67709 netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_queue()
Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_queue() as a
parameter can not benefit us any more.

This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_queue() and __nf_queue() to save code.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:52:54 +02:00
Michael Wang
2a6decfd8a netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_iterate()
Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_iterate() as a
parameter can not benefit us any more.

This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_iterate() to save code.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:52:44 +02:00
Cong Wang
965505015b netfilter: remove xt_NOTRACK
It was scheduled to be removed for a long time.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:36:40 +02:00
Pablo Neira Ayuso
84b5ee939e netfilter: nf_conntrack: add nf_ct_timeout_lookup
This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:33:03 +02:00
Pablo Neira Ayuso
236df00561 netfilter: xt_CT: refactorize xt_ct_tg_check
This patch adds xt_ct_set_helper and xt_ct_set_timeout to reduce
the size of xt_ct_tg_check.

This aims to improve code mantainability by splitting xt_ct_tg_check
in smaller chunks.

Suggested by Eric Dumazet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:32:48 +02:00
Pablo Neira Ayuso
6703aa74ad netfilter: xt_socket: fix compilation warnings with gcc 4.7
This patch fixes compilation warnings in xt_socket with gcc-4.7.

In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:19: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_match.isra.4’:
include/net/netfilter/nf_tproxy_core.h:75:2: warning: ‘protocol’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:113:5: note: ‘protocol’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:45: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:106:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:9: note: ‘dport’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:16: note: ‘saddr’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:9: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:19: note: ‘daddr’ was declared here

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:31:39 +02:00
Joe Stringer
39855b5ba9 openvswitch: Fix typo
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-09-02 12:18:25 -07:00
Linus Torvalds
0b1a34c992 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) NLA_PUT* --> nla_put_* conversion got one case wrong in
    nfnetlink_log, fix from Patrick McHardy.

 2) Missed error return check in ipw2100 driver, from Julia Lawall.

 3) PMTU updates in ipv4 were setting the expiry time incorrectly, fix
    from Eric Dumazet.

 4) SFC driver erroneously reversed src and dst when reporting filters
    via ethtool.

 5) Memory leak in CAN protocol and wrong setting of IRQF_SHARED in
    sja1000 can platform driver, from Alexey Khoroshilov and Sven
    Schmitt.

 6) Fix multicast traffic scaling regression in ipv4_dst_destroy, only
    take the lock when we really need to.  From Eric Dumazet.

 7) Fix non-root process spoofing in netlink, from Pablo Neira Ayuso.

 8) CWND reduction in TCP is done incorrectly during non-SACK recovery,
    fix from Yuchung Cheng.

 9) Revert netpoll change, and fix what was actually a driver specific
    problem.  From Amerigo Wang.  This should cure bootup hangs with
    netconsole some people reported.

10) Fix xen-netfront invoking __skb_fill_page_desc() with a NULL page
    pointer.  From Ian Campbell.

11) SIP NAT fix for expectiontation creation, from Pablo Neira Ayuso.

12) __ip_rt_update_pmtu() needs RCU locking, from Eric Dumazet.

13) Fix usbnet deadlock on resume, can't use GFP_KERNEL in this
    situation.  From Oliver Neukum.

14) The davinci ethernet driver triggers an OOPS on removal because it
    frees an MDIO object before unregistering it.  Fix from Bin Liu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  net: qmi_wwan: add several new Gobi devices
  fddi: 64 bit bug in smt_add_para()
  net: ethernet: fix kernel OOPS when remove davinci_mdio module
  net/xfrm/xfrm_state.c: fix error return code
  net: ipv6: fix error return code
  net: qmi_wwan: new device: Foxconn/Novatel E396
  usbnet: fix deadlock in resume
  cs89x0 : packet reception not working
  netfilter: nf_conntrack: fix racy timer handling with reliable events
  bnx2x: Correct the ndo_poll_controller call
  bnx2x: Move netif_napi_add to the open call
  ipv4: must use rcu protection while calling fib_lookup
  bnx2x: fix 57840_MF pci id
  net: ipv4: ipmr_expire_timer causes crash when removing net namespace
  e1000e: DoS while TSO enabled caused by link partner with small MSS
  l2tp: avoid to use synchronize_rcu in tunnel free function
  gianfar: fix default tx vlan offload feature flag
  netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
  xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX
  netfilter: nfnetlink_log: fix error return code in init path
  ...
2012-09-02 11:28:00 -07:00
Alan Ott
a2dc375e12 6lowpan: handle NETDEV_UNREGISTER event
Before, it was impossible to remove a wpan device which had lowpan
attached to it.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@tempietto.lan>
2012-09-01 22:48:02 -04:00
Alan Ott
a437d2744b 6lowpan: Make a copy of skb's delivered to 6lowpan
Since lowpan_process_data() modifies the skb (by calling skb_pull()), we
need our own copy so that it doesn't affect the data received by other
protcols (in this case, af_ieee802154).

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@tempietto.lan>
2012-09-01 22:48:01 -04:00
Anatol Pomozov
4907cb7b19 treewide: fix comment/printk/variable typos
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-01 10:33:05 -07:00
Jerry Chu
168a8f5805 tcp: TCP Fast Open Server - main code path
This patch adds the main processing path to complete the TFO server
patches.

A TFO request (i.e., SYN+data packet with a TFO cookie option) first
gets processed in tcp_v4_conn_request(). If it passes the various TFO
checks by tcp_fastopen_check(), a child socket will be created right
away to be accepted by applications, rather than waiting for the 3WHS
to finish.

In additon to the use of TFO cookie, a simple max_qlen based scheme
is put in place to fend off spoofed TFO attack.

When a valid ACK comes back to tcp_rcv_state_process(), it will cause
the state of the child socket to switch from either TCP_SYN_RECV to
TCP_ESTABLISHED, or TCP_FIN_WAIT1 to TCP_FIN_WAIT2. At this time
retransmission will resume for any unack'ed (data, FIN,...) segments.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 20:02:19 -04:00
Jerry Chu
8336886f78 tcp: TCP Fast Open Server - support TFO listeners
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 20:02:19 -04:00
Jerry Chu
1046716368 tcp: TCP Fast Open Server - header & support functions
This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.

In addition, it includes the following:

1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.

2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.

"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.

"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.

3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 20:02:18 -04:00
Sorin Dumitru
eb7e057596 ipv6: remove some deadcode
__ipv6_regen_rndid no longer returns anything other than 0
so there's no point in verifying what it returns

Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:33:32 -04:00
Julia Lawall
599901c3e4 net/xfrm/xfrm_state.c: fix error return code
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:27:48 -04:00
Julia Lawall
48f125ce1c net: ipv6: fix error return code
Initialize return variable before exiting on an error path.

The initial initialization of the return variable is also dropped, because
that value is never used.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:27:48 -04:00
Rami Rosen
d1a53dfd11 net: fix documentation of skb_needs_linearize().
skb_needs_linearize() does not check highmem DMA as it does not call
illegal_highdma() anymore, so there is no need to mention highmem DMA here.

(Indeed, ~NETIF_F_SG flag, which is checked in skb_needs_linearize(), can
be set when illegal_highdma() returns true, and we are assured that
illegal_highdma() is invoked prior to skb_needs_linearize() as
skb_needs_linearize() is a static method called only once.
But ~NETIF_F_SG can be set not only there in this same invocation path.
It can also be set when can_checksum_protocol() returns false).

see commit 02932ce9e2,
Convert skb_need_linearize() to use precomputed features.
Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:24:02 -04:00
Alexander Duyck
98d75c3724 ipv4: Minor logic clean-up in ipv4_mtu
In ipv4_mtu there is some logic where we are testing for a non-zero value
and a timer expiration, then setting the value to zero, and then testing if
the value is zero we set it to a value based on the dst.  Instead of
bothering with the extra steps it is easier to just cleanup the logic so
that we set it to the dst based value if it is zero or if the timer has
expired.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
2012-08-31 16:22:50 -04:00
Wanlong Gao
4a2c240691 net🏧fix up ENOIOCTLCMD error handling
At commit 07d106d0, Linus pointed out that ENOIOCTLCMD should be
translated as ENOTTY to user mode.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:14:33 -04:00
Wei Yongjun
80f0fd8a7f openvswitch: using kfree_rcu() to simplify the code
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:55:38 -04:00
Xi Wang
fc61b928dc af_unix: fix shutdown parameter checking
Return -EINVAL rather than 0 given an invalid "mode" parameter.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:55:37 -04:00
Xi Wang
46b66d7077 decnet: fix shutdown parameter checking
The allowed value of "how" is SHUT_RD/SHUT_WR/SHUT_RDWR (0/1/2),
rather than SHUTDOWN_MASK (3).

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:55:37 -04:00
David S. Miller
c32f38619a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:14:18 -04:00
David S. Miller
0dcd5052c8 Merge branch 'master' of git://1984.lsi.us.es/nf 2012-08-31 13:06:37 -04:00
Pablo Neira Ayuso
5b423f6a40 netfilter: nf_conntrack: fix racy timer handling with reliable events
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.

This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.

Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-31 15:50:28 +02:00
Eric Dumazet
c5ae7d4192 ipv4: must use rcu protection while calling fib_lookup
Following lockdep splat was reported by Pavel Roskin :

[ 1570.586223] ===============================
[ 1570.586225] [ INFO: suspicious RCU usage. ]
[ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
[ 1570.586229] -------------------------------
[ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
[ 1570.586233]
[ 1570.586233] other info that might help us debug this:
[ 1570.586233]
[ 1570.586236]
[ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
[ 1570.586238] 2 locks held by Chrome_IOThread/4467:
[ 1570.586240]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff814f2c0c>] release_sock+0x2c/0xa0
[ 1570.586253]  #1:  (fnhe_lock){+.-...}, at: [<ffffffff815302fc>] update_or_create_fnhe+0x2c/0x270
[ 1570.586260]
[ 1570.586260] stack backtrace:
[ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
[ 1570.586265] Call Trace:
[ 1570.586271]  [<ffffffff810976ed>] lockdep_rcu_suspicious+0xfd/0x130
[ 1570.586275]  [<ffffffff8153042c>] update_or_create_fnhe+0x15c/0x270
[ 1570.586278]  [<ffffffff815305b3>] __ip_rt_update_pmtu+0x73/0xb0
[ 1570.586282]  [<ffffffff81530619>] ip_rt_update_pmtu+0x29/0x90
[ 1570.586285]  [<ffffffff815411dc>] inet_csk_update_pmtu+0x2c/0x80
[ 1570.586290]  [<ffffffff81558d1e>] tcp_v4_mtu_reduced+0x2e/0xc0
[ 1570.586293]  [<ffffffff81553bc4>] tcp_release_cb+0xa4/0xb0
[ 1570.586296]  [<ffffffff814f2c35>] release_sock+0x55/0xa0
[ 1570.586300]  [<ffffffff815442ef>] tcp_sendmsg+0x4af/0xf50
[ 1570.586305]  [<ffffffff8156fc60>] inet_sendmsg+0x120/0x230
[ 1570.586308]  [<ffffffff8156fb40>] ? inet_sk_rebuild_header+0x40/0x40
[ 1570.586312]  [<ffffffff814f4bdd>] ? sock_update_classid+0xbd/0x3b0
[ 1570.586315]  [<ffffffff814f4c50>] ? sock_update_classid+0x130/0x3b0
[ 1570.586320]  [<ffffffff814ec435>] do_sock_write+0xc5/0xe0
[ 1570.586323]  [<ffffffff814ec4a3>] sock_aio_write+0x53/0x80
[ 1570.586328]  [<ffffffff8114bc83>] do_sync_write+0xa3/0xe0
[ 1570.586332]  [<ffffffff8114c5a5>] vfs_write+0x165/0x180
[ 1570.586335]  [<ffffffff8114c805>] sys_write+0x45/0x90
[ 1570.586340]  [<ffffffff815d2722>] system_call_fastpath+0x16/0x1b

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:33:08 -04:00
Francesco Ruggeri
acbb219d5f net: ipv4: ipmr_expire_timer causes crash when removing net namespace
When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.

Tested in Linux 3.4.8.

Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 12:51:32 -04:00
Eric Dumazet
ee13040901 netpoll: provide an IP ident in UDP frames
Let's fill IP header ident field with a meaningful value,
it might help some setups.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 12:32:14 -04:00
xeb@mail.ru
99469c32f7 l2tp: avoid to use synchronize_rcu in tunnel free function
Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
atomic.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 12:31:03 -04:00
Pablo Neira Ayuso
3f509c689a netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
We're hitting bug while trying to reinsert an already existing
expectation:

kernel BUG at kernel/timer.c:895!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
 <IRQ>
 [<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
 [<ffffffff812d423a>] ? in4_pton+0x72/0x131
 [<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
 [<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
 [<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
 [<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
 [<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]

We have to remove the RTP expectation if the RTCP expectation hits EBUSY
since we keep trying with other ports until we succeed.

Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 18:27:14 +02:00
Gao feng
6549dd43c0 net: dev: fix the incorrect hold of net namespace's lo device
When moving a net device from one net namespace to another
net namespace,dev_change_net_namespace calls NETDEV_DOWN
event,so the original net namespace's dst entries which
beloned to this net device will be put into dst_garbage
list.

then dev_change_net_namespace will set this net device's
net to the new net namespace.

If we unregister this net device's driver, this will trigger
the NETDEV_UNREGISTER_FINAL event, dst_ifdown will be called,
and get this net device's dst entries from dst_garbage list,
put these entries' dev to the new net namespace's lo device.

It's not what we want,actually we need these dst entries hold
the original net namespace's lo device,this incorrect device
holding will trigger emg message like below.
unregister_netdevice: waiting for lo to become free. Usage count = 1

so we should call NETDEV_UNREGISTER_FINAL event in
dev_change_net_namespace too,in order to make sure dst entries
already in the dst_garbage list, we need rcu_barrier before we
call NETDEV_UNREGISTER_FINAL event.

With help form Eric Dumazet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 12:21:16 -04:00
Julia Lawall
6fc09f10f1 netfilter: nfnetlink_log: fix error return code in init path
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 03:29:58 +02:00
Julia Lawall
ef6acf68c2 netfilter: ctnetlink: fix error return code in init path
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 03:28:22 +02:00
Julia Lawall
0a54e939d8 ipvs: fix error return code
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 03:27:19 +02:00
Patrick McHardy
8a91bb0c30 netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:25 +02:00
Pablo Neira Ayuso
320ff567f2 netfilter: nf_nat: support IPv6 in TFTP NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:24 +02:00
Pablo Neira Ayuso
5901b6be88 netfilter: nf_nat: support IPv6 in IRC NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:23 +02:00
Patrick McHardy
9a66482106 netfilter: nf_nat: support IPv6 in SIP NAT helper
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:22 +02:00
Patrick McHardy
ee6eb96673 netfilter: nf_nat: support IPv6 in amanda NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:21 +02:00
Patrick McHardy
d33cbeeb1a netfilter: nf_nat: support IPv6 in FTP NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:20 +02:00
Patrick McHardy
ed72d9e294 netfilter: ip6tables: add NETMAP target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:19 +02:00
Patrick McHardy
115e23ac78 netfilter: ip6tables: add REDIRECT target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:19 +02:00
Patrick McHardy
b3f644fc82 netfilter: ip6tables: add MASQUERADE target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:18 +02:00
Patrick McHardy
58a317f106 netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:17 +02:00
Patrick McHardy
2cf545e835 net: core: add function for incremental IPv6 pseudo header checksum updates
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-30 03:00:16 +02:00
Patrick McHardy
0ad352cb43 netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change
Expand the skb headroom if the oif changed due to rerouting similar to
how IPv4 packets are handled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:15 +02:00
Patrick McHardy
c7232c9979 netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:14 +02:00
Patrick McHardy
051966c0c6 netfilter: nf_nat: add protoff argument to packet mangling functions
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:13 +02:00
Patrick McHardy
811927ccfe netfilter: nf_conntrack: restrict NAT helper invocation to IPv4
The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:12 +02:00
Patrick McHardy
2b60af0178 netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments
ICMPv6 error messages are tracked by extracting the conntrack tuple of
the inner packet and looking up the corresponding conntrack entry. Tuple
extraction uses the ->get_l4proto() callback, which in case of fragments
returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the
first fragment when the entire next header is present, resulting in a
failure to find the correct connection tracking entry.

This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead
of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the
fragment offset is zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:11 +02:00
Patrick McHardy
4cdd34084d netfilter: nf_conntrack_ipv6: improve fragmentation handling
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:10 +02:00
Jesper Dangaard Brouer
590e3f79a2 ipvs: IPv6 MTU checking cleanup and bugfix
Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len.  And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver.  This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 02:55:39 +02:00
Amerigo Wang
072a9c4860 netpoll: revert 6bdb7fe310 and fix be_poll() instead
Against -net.

In the patch "netpoll: re-enable irq in poll_napi()", I tried to
fix the following warning:

[100718.051041] ------------[ cut here ]------------
[100718.051048] WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7d/0xb0()
(Not tainted)
[100718.051049] Hardware name: ProLiant BL460c G7
...
[100718.051068] Call Trace:
[100718.051073]  [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
[100718.051075]  [<ffffffff8106b79a>] ? warn_slowpath_null+0x1a/0x20
[100718.051077]  [<ffffffff810747ed>] ? local_bh_enable_ip+0x7d/0xb0
[100718.051080]  [<ffffffff8150041b>] ? _spin_unlock_bh+0x1b/0x20
[100718.051085]  [<ffffffffa00ee974>] ? be_process_mcc+0x74/0x230 [be2net]
[100718.051088]  [<ffffffffa00ea68c>] ? be_poll_tx_mcc+0x16c/0x290 [be2net]
[100718.051090]  [<ffffffff8144fe76>] ? netpoll_poll_dev+0xd6/0x490
[100718.051095]  [<ffffffffa01d24a5>] ? bond_poll_controller+0x75/0x80 [bonding]
[100718.051097]  [<ffffffff8144fde5>] ? netpoll_poll_dev+0x45/0x490
[100718.051100]  [<ffffffff81161b19>] ? ksize+0x19/0x80
[100718.051102]  [<ffffffff81450437>] ? netpoll_send_skb_on_dev+0x157/0x240

by reenabling IRQ before calling ->poll, but it seems more
problems are introduced after that patch:

http://ozlabs.org/~akpm/stuff/IMG_20120824_122054.jpg
http://marc.info/?l=linux-netdev&m=134563282530588&w=2

So it is safe to fix be2net driver code directly.

This patch reverts the offending commit and fixes be_poll() by
avoid disabling BH there, this is okay because be_poll()
can be called either by poll_napi() which already disables
IRQ, or by net_rx_action() which already disables BH.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-29 15:03:23 -04:00
Vinicius Costa Gomes
d8343f1257 Bluetooth: Fix sending a HCI Authorization Request over LE links
In the case that the link is already in the connected state and a
Pairing request arrives from the mgmt interface, hci_conn_security()
would be called but it was not considering LE links.

Reported-by: João Paulo Rechi Vita <jprvita@openbossa.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:11:51 -07:00
Vinicius Costa Gomes
cc110922da Bluetooth: Change signature of smp_conn_security()
To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.

This also makes it clear that it is checking the security of the link.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:07:18 -07:00
Greg Kroah-Hartman
e372dc6c62 Merge 3.6-rc3 into tty-next
This picks up all of the different fixes in Linus's tree that we also need here.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-27 07:13:33 -07:00
Patrick McHardy
5f2d04f1f9 ipv4: fix path MTU discovery with connection tracking
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-26 19:13:55 +02:00
Linus Torvalds
8497ae61d0 Merge branch 'for-3.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J. Bruce Fields:
 "Particular thanks to Michael Tokarev, Malahal Naineni, and Jamie
  Heilman for their testing and debugging help."

* 'for-3.6' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
  svcrpc: sends on closed socket should stop immediately
  svcrpc: fix BUG() in svc_tcp_clear_pages
  nfsd4: fix security flavor of NFSv4.0 callback
2012-08-25 11:43:41 -07:00
David S. Miller
e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
David S. Miller
85c21049fc Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is a batch of updates intended for 3.7.  The bulk of it is
mac80211 changes, including some mesh work from Thomas Pederson and
some multi-channel work from Johannes.  A variety of driver updates
and other bits are scattered in there as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 15:18:07 -04:00
David S. Miller
d05cebb915 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
This batch of fixes is intended for 3.6...

Johannes Berg gives us a pair of iwlwifi fixes.  One corrects some
improperly defined ifdefs that lead to crashes and BUG_ONs.  The other
prevents attempts to read SRAM for devices that aren't actually started.

Julia Lawall provides an ipw2100 fix to properly set the return code
from a function call before testing it! :-)

Thomas Huehn corrects the improper use of a constant related to a power
setting in ath5k.

Thomas Pedersen offers a mac80211 fix to properly handle destination
addresses of unicast frames passing though a mesh gate.

Vladimir Zapolskiy provides a brcmsmac fix to properly mark the
interface state when the device goes down.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 15:15:10 -04:00
Yuchung Cheng
7c4a56fec3 tcp: fix cwnd reduction for non-sack recovery
The cwnd reduction in fast recovery is based on the number of packets
newly delivered per ACK. For non-sack connections every DUPACK
signifies a packet has been delivered, but the sender mistakenly
skips counting them for cwnd reduction.

The fix is to compute newly_acked_sacked after DUPACKs are accounted
in sacked_out for non-sack connections.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:48:58 -04:00
Jiri Pirko
9b361c13ce vlan: add helper which can be called to see if device is used by vlan
also, remove unused vlan_info definition from header

CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:46:39 -04:00
Pablo Neira Ayuso
20e1db19db netlink: fix possible spoofing from non-root processes
Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.

The userspace process usually verifies the legitimate origin in
two ways:

a) Socket credentials. If UID != 0, then the message comes from
   some ilegitimate process and the message needs to be dropped.

b) Netlink portID. In general, portID == 0 means that the origin
   of the messages comes from the kernel. Thus, discarding any
   message not coming from the kernel.

However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.

Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.

This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.

Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:36:09 -04:00
Ben Hutchings
8f4cccbbd9 net: Set device operstate at registration time
The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver.  The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.

For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened.  However, we must not activate linkwatch for a
unregistered device, and commit b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't.  But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.

The same issue exists with the dormant state.

The proper initialisation sequence, avoiding a race with opening of
the device, is:

        rtnl_lock();
        rc = register_netdevice(dev);
        if (rc)
                goto out_unlock;
        netif_carrier_off(dev); /* or netif_dormant_on(dev) */
        rtnl_unlock();

but it seems silly that this should have to be repeated in so many
drivers.  Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.

Commit 22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.

This initialises the operstate synchronously at registration time
only.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:46:13 -04:00
Neil Horman
3afa6d00fb cls_cgroup: Allow classifier cgroups to have their classid reset to 0
The network classifier cgroup initalizes each cgroups instance classid value to
0.  However, the sock_update_classid function only updates classid's in sockets
if the tasks cgroup classid is not zero, and if it differs from the current
classid.  The later check is to prevent cache line dirtying, but the former is
detrimental, as it prevents resetting a classid for a cgroup to 0.  While this
is not a common action, it has administrative usefulness (if the admin wants to
disable classification of a certain group temporarily for instance).

Easy fix, just remove the zero check.  Tested successfully by myself

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:41:17 -04:00
John W. Linville
f20b6213f1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-08-24 12:25:30 -04:00
Eric Dumazet
78df76a065 ipv4: take rt_uncached_lock only if needed
Multicast traffic allocates dst with DST_NOCACHE, but dst is
not inserted into rt_uncached_list.

This slowdown multicast workloads on SMP because rt_uncached_lock is
contended.

Change the test before taking the lock to actually check the dst
was inserted into rt_uncached_list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 11:47:48 -04:00
David S. Miller
e6e94e392f Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:

====================
Included changes:
- a set of codestyle rearrangements/fixes
- new feature to early detect new joining (mesh-unaware) clients
- a minor fix for the gw-feature
- substitution of shift operations with the BIT() macro
- reorganization of the main batman-adv structure (struct batadv_priv)
- some more (very) minor cleanups and fixes
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 11:30:50 -04:00
John W. Linville
e72615f6ab Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-08-24 11:16:58 -04:00
Fengguang Wu
a0dfb2634e af_packet: match_fanout_group() can be static
cc: Eric Leblond <eric@regit.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:27:12 -07:00
Eric Dumazet
748e2d9396 net: reinstate rtnl in call_netdevice_notifiers()
Eric Biederman pointed out that not holding RTNL while calling
call_netdevice_notifiers() was racy.

This patch is a direct transcription his feedback
against commit 0115e8e30d (net: remove delay at device dismantle)

Thanks Eric !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:24:42 -07:00
John W. Linville
c6b6eedc29 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-08-23 09:51:15 -04:00
John W. Linville
a4881cc45a Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-08-23 09:49:42 -04:00
Sven Eckelmann
fa4f0afcf4 batman-adv: Start new development cycle
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:23 +02:00
Antonio Quartulli
371351731e batman-adv: change interface_rx to get orig node
In order to understand  where a broadcast packet is coming from and use
this information to detect not yet announced clients, this patch modifies the
interface_rx() function by passing a new argument: the orig node
corresponding to the node that originated the received packet (if known).
This new argument if not NULL for broadcast packets only (other packets does not
have source field).

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:22 +02:00
Antonio Quartulli
30cfd02b60 batman-adv: detect not yet announced clients
With the current TT mechanism a new client joining the network is not
immediately able to communicate with other hosts because its MAC address has not
been announced yet. This situation holds until the first OGM containing its
joining event will be spread over the mesh network.

This behaviour can be acceptable in networks where the originator interval is a
small value (e.g. 1sec) but if that value is set to an higher time (e.g. 5secs)
the client could suffer from several malfunctions like DHCP client timeouts,
etc.

This patch adds an early detection mechanism that makes nodes in the network
able to recognise "not yet announced clients" by means of the broadcast packets
they emitted on connection (e.g. ARP or DHCP request). The added client will
then be confirmed upon receiving the OGM claiming it or purged if such OGM
is not received within a fixed amount of time.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:22 +02:00
Sven Eckelmann
c67893d17a batman-adv: Reduce accumulated length of simple statements
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:21 +02:00
Sven Eckelmann
bbb1f90efb batman-adv: Don't break statements after assignment operator
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:20 +02:00
Sven Eckelmann
8de47de575 batman-adv: Use BIT(x) macro to calculate bit positions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:19 +02:00
Martin Hundebøll
74ee3634dc batman-adv: Drop tt queries with foreign dest
When enabling promiscuous mode, tt queries for other hosts might be
received. Before this patch, "foreign" tt queries were processed like
any other query and thus forwarded to its destination again and thereby
causing a loop.

This patch adds a check to drop foreign tt queries.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:19 +02:00
Martin Hundebøll
ff51fd70ad batman-adv: Move batadv_check_unicast_packet()
batadv_check_unicast_packet() is needed in batadv_recv_tt_query(), so
move the former to before the latter.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:18 +02:00
Sven Eckelmann
807736f6e0 batman-adv: Split batadv_priv in sub-structures for features
The structure batadv_priv grows everytime a new feature is introduced. It gets
hard to find the parts of the struct that belongs to a specific feature. This
becomes even harder by the fact that not every feature uses a prefix in the
member name.

The variables for bridge loop avoidence, gateway handling, translation table
and visualization server are moved into separate structs that are included in
the bat_priv main struct.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:13 +02:00
Simon Wunderlich
624463079e batman-adv: check batadv_orig_hash_add_if() return code
If this call fails, some of the orig_nodes spaces may have been
resized for the increased number of interface, and some may not.
If we would just continue with the larger number of interfaces,
this would lead to access to not allocated memory later.

We better check the return code, and don't add the interface if
no memory is available. OTOH, keeping some of the orig_nodes
with too much memory allocated should hurt no one (except for
a few too many bytes allocated).

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:46 +02:00
Antonio Quartulli
a51fb9b2ac batman-adv: fix typos in comments
the word millisecond is misspelled in several comments. This patch fixes it.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:45 +02:00
Antonio Quartulli
d657e621a0 batman-adv: add reference counting for type batadv_tt_orig_list_entry
The batadv_tt_orig_list_entry structure didn't have any refcounting mechanism so
far. This patch introduces it and makes the structure being usable in much more
complex context.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:44 +02:00
Jonathan Corbet
3a7f291bf6 batman-adv: remove a misleading comment
As much as I'm happy to see LWN links sprinkled through the kernel by the
dozen, this one in particular reflects a very old state of reality; the
associated comment is now incorrect.  So just delete it.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:44 +02:00
Marek Lindner
1c9b0550f4 batman-adv: convert remaining packet counters to per_cpu_ptr() infrastructure
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:43 +02:00
Simon Wunderlich
3eb8773e3a batman-adv: rename bridge loop avoidance claim types
for consistency reasons within the code and with the documentation,
we should always call it "claim" and "unclaim".

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:42 +02:00
Simon Wunderlich
99e966fc96 batman-adv: correct comments in bridge loop avoidance
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:41 +02:00
Simon Wunderlich
536a23f119 batman-adv: Add the backbone gateway list to debugfs
This is especially useful if there are no claims yet, but we still want
to know which gateways are using bridge loop avoidance in the network.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:41 +02:00
Antonio Quartulli
c70437289c batman-adv: move function arguments on one line
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:40 +02:00
Pavel Emelyanov
0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
danborkmann@iogearbox.net
9e67030af3 af_packet: use define instead of constant
Instead of using a hard-coded value for the status variable, it would make
the code more readable to use its destined define from linux/if_packet.h.

Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
Ying Xue
bfdc587c5a rds: Don't disable BH on BH context
Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:52:04 -07:00
Eric Dumazet
b87fb39e39 ipv6: gre: fix ip6gre_err()
ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
not 40 as expected), and should take into account 'offset' parameter.

Also uses pskb_may_pull() to cope with some fragged skbs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:48:32 -07:00
Eric Dumazet
ef8531b64c xfrm: fix RCU bugs
This patch reverts commit 56892261ed (xfrm: Use rcu_dereference_bh to
deference pointer protected by rcu_read_lock_bh), and fixes bugs
introduced in commit 418a99ac6a ( Replace rwlock on xfrm_policy_afinfo
with rcu )

1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH

2) We must defer some writes after the synchronize_rcu() call or a reader
 can crash dereferencing NULL pointer.

3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
context, we no longer need to block BH in xfrm_policy_register_afinfo()
and xfrm_policy_unregister_afinfo()

4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
xfrm_policy_unregister_afinfo()

5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
  and also move xfrm_policy_get_afinfo() declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@windriver.com>
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:39:46 -07:00
Eric Dumazet
0115e8e30d net: remove delay at device dismantle
I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

With help from Gao feng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 21:50:36 -07:00
Eric Dumazet
9b04f35005 ipv4: properly update pmtu
Sylvain Munault reported following info :

 - TCP connection get "stuck" with data in send queue when doing
   "large" transfers ( like typing 'ps ax' on a ssh connection )
 - Only happens on path where the PMTU is lower than the MTU of
   the interface
 - Is not present right after boot, it only appears 10-20min after
   boot or so. (and that's inside the _same_ TCP connection, it works
   fine at first and then in the same ssh session, it'll get stuck)
 - Definitely seems related to fragments somehow since I see a router
   sending ICMP message saying fragmentation is needed.
 - Exact same setup works fine with kernel 3.5.1

Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
period is over.

ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
but dst_set_expires() does nothing because dst.expires is already set.

It seems we want to set the expires field to a new value, regardless
of prior one.

With help from Julian Anastasov.

Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Julian Anastasov <ja@ssi.bg>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 19:14:30 -07:00
David S. Miller
bf277b0cce Merge git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:

* Remove unnecessary RTNL locking now that we have support
  for namespace in nf_conntrack, from Patrick McHardy.

* Cleanup to eliminate unnecessary goto in the initialization
  path of several Netfilter tables, from Jean Sacren.

* Another cleanup from Wu Fengguang, this time to PTR_RET instead
  of if IS_ERR then return PTR_ERR.

* Use list_for_each_entry_continue_rcu in nf_iterate, from
  Michael Wang.

* Add pmtu_disc sysctl option to disable PMTU in their tunneling
  transmitter, from Julian Anastasov.

* Generalize application protocol registration in IPVS and modify
  IPVS FTP helper to use it, from Julian Anastasov.

* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
  helper for NAT support, from Julian Anastasov.

* Add logic to update PMTU for IPIP packets in IPVS, again
  from Julian Anastasov.

* A couple of sparse warning fixes for IPVS and Netfilter from
  Claudiu Ghioc and Patrick McHardy respectively.

Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 18:48:52 -07:00
Pravin B Shelar
46df7b8145 openvswitch: Add support for network namespaces.
Following patch adds support for network namespace to openvswitch.
Since it must release devices when namespaces are destroyed, a
side effect of this patch is that the module no longer keeps a
refcount but instead cleans up any state when it is unloaded.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-08-22 14:48:55 -07:00
David S. Miller
1304a7343b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
Jean Sacren
90efbed18a netfilter: remove unnecessary goto statement for error recovery
Usually it's a good practice to use goto statement for error recovery
when initializing the module. This approach could be an overkill if:

 1) there is only one fail case;
 2) success and failure use the same return statement.

For a cleaner approach, remove the unnecessary goto statement and
directly implement error recovery.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-22 19:17:38 +02:00
Michael Wang
6705e86724 netfilter: replace list_for_each_continue_rcu with new interface
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to allow removing
list_for_each_continue_rcu().

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-22 19:17:20 +02:00
Linus Torvalds
f753c4ec15 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "Jim's fix closes a narrow race introduced with the msgr changes.  One
  fix resolves problems with debugfs initialization that Yan found when
  multiple client instances are created (e.g., two clusters mounted, or
  rbd + cephfs), another one fixes problems with mounting a nonexistent
  server subdirectory, and the last one fixes a divide by zero error
  from unsanitized ioctl input that Dan Carpenter found."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: avoid divide by zero in __validate_layout()
  libceph: avoid truncation due to racing banners
  ceph: tolerate (and warn on) extraneous dentry from mds
  libceph: delay debugfs initialization until we learn global_id
2012-08-22 09:58:05 -07:00
Sujith Manoharan
aba4e6fff8 mac80211: Fix AP mode regression
Commit mac80211: avoid using synchronize_rcu in ieee80211_set_probe_resp
changed the return value when the probe response template is not present.
Revert to the earlier value of 1 - this fixes AP mode for drivers like
ath9k.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-22 10:55:04 +02:00
Thomas Pedersen
27f011243a mac80211: fix DS to MBSS address translation
The destination address of unicast frames forwarded through a mesh gate
was being replaced with the broadcast address. Instead leave the
original destination address as the mesh DA. If the nexthop address is
not in the mpath table it will be resolved. If that fails, the frame
will be forwarded to known mesh gates.

Reported-by: Cedric Voncken <cedric.voncken@acksys.fr>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-22 09:45:05 +02:00
Jim Schutt
6d4221b537 libceph: avoid truncation due to racing banners
Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.

When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work().  During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().

If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner.  In this case Ceph's protocol negotiation
can complete succesfully.

The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client.  As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read().  If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner.  Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.

The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".

Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.

[elder@inktak.com: added note about server-side behavior]

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21 15:55:27 -07:00
Eric Dumazet
e0e3cea46d af_netlink: force credentials passing [CVE-2012-3520]
Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e5726269
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:53:01 -07:00
Eric Dumazet
a9915a1b52 ipv4: fix ip header ident selection in __ip_make_skb()
Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
memory in __ip_select_ident().

It turns out that __ip_make_skb() called ip_select_ident() before
properly initializing iph->daddr.

This is a bug uncovered by commit 1d861aa4b3 (inet: Minimize use of
cached route inetpeer.)

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:51:06 -07:00
Christoph Paasch
1a7b27c97c ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
Since 0e73441992 ("ipv4: Use inet_csk_route_child_sock() in DCCP and
TCP."), inet_csk_route_child_sock() is called instead of
inet_csk_route_req().

However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
Thus, inside inet_csk_route_child_sock() opt is always NULL and the
SRR-options are not respected anymore.
Packets sent by the server won't have the correct destination-IP.

This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
inside inet_csk_route_child_sock().

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:49:11 -07:00
Eric Dumazet
144d56e910 tcp: fix possible socket refcount problem
Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  <IRQ>  [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [<ffffffff818a0839>] ? __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [<ffffffff818a0839>] __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff818a08c0>] sk_free+0x1c/0x1e
[ 2866.132281]  [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56
[ 2866.132281]  [<ffffffff81078082>] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [<ffffffff810f5831>] ? rb_commit+0x58/0x85
[ 2866.132281]  [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [<ffffffff81a1227c>] call_softirq+0x1c/0x30
[ 2866.132281]  [<ffffffff81039f38>] do_softirq+0x4a/0xa6
[ 2866.132281]  [<ffffffff81070f2b>] irq_exit+0x51/0xad
[ 2866.132281]  [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4
[ 2866.132281]  [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f
[ 2866.132281]  <EOI>  [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [<ffffffff81078692>] mod_timer+0x178/0x1a9
[ 2866.132281]  [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26
[ 2866.132281]  [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80
[ 2866.132281]  [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff81999d92>] call_transmit+0x1c5/0x20e
[ 2866.132281]  [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [<ffffffff810835d6>] process_one_work+0x24d/0x47f
[ 2866.132281]  [<ffffffff81083567>] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [<ffffffff81083a6d>] worker_thread+0x236/0x317
[ 2866.132281]  [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [<ffffffff8108b7b8>] kthread+0x9a/0xa2
[ 2866.132281]  [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [<ffffffff81a12180>] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:42:23 -07:00
Tejun Heo
e7c2f96744 workqueue: use mod_delayed_work() instead of __cancel + queue
Now that mod_delayed_work() is safe to call from IRQ handlers,
__cancel_delayed_work() followed by queue_delayed_work() can be
replaced with mod_delayed_work().

Most conversions are straight-forward except for the following.

* net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
  elaborate dancing around its delayed_work.  Collapse it such that
  linkwatch_work is queued for immediate execution if LW_URGENT and
  existing timer is kept otherwise.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
2012-08-21 13:18:24 -07:00
Tejun Heo
203b42f731 workqueue: make deferrable delayed_work initializer names consistent
Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-21 13:18:23 -07:00
John W. Linville
01e17dacd4 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/mac80211_hwsim.c
2012-08-21 16:00:21 -04:00
AceLan Kao
06d7de831d Revert "rfkill: remove dead code"
This reverts commit 2e48928d8a.

Those functions are needed and should not be removed, or
there is no way to set the rfkill led trigger name.

Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-21 20:50:25 +02:00
Vinicius Costa Gomes
f91c8468df Bluetooth: Fix establishing ESCO links
Commit 4cd2d98340 "Bluetooth: Simplify
the connection type handling" broke the creation of ESCO links.

This patch adds a type parameter to hci_connect_sco() so it creates
the connection of the right kind.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:42 -03:00
Syam Sidhardhan
144ad33020 Bluetooth: Use kref for l2cap channel reference counting
This patch changes the struct l2cap_chan and associated code to use
kref api for object refcounting and freeing.

Suggested-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:41 -03:00
Syam Sidhardhan
ab19516a50 Bluetooth: debug: Correct the PSM printing
Earlier we were printing chan->psm before assigning any value.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Acked-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:41 -03:00
Mikel Astiz
f0d6a0ea33 Bluetooth: mgmt: Add device disconnect reason
MGMT_EV_DEVICE_DISCONNECTED will now expose the disconnection reason to
userland, distinguishing four possible values:

	0x00	Reason not known or unspecified
	0x01	Connection timeout
	0x02	Connection terminated by local host
	0x03	Connection terminated by remote host

Note that the local/remote distinction just determines which side
terminated the low-level connection, regardless of the disconnection of
the higher-level profiles.

This can sometimes be misleading and thus must be used with care. For
example, some hardware combinations would report a locally initiated
disconnection even if the user turned Bluetooth off in the remote side.

Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:40 -03:00
Mikel Astiz
fa1bd91809 Bluetooth: Fix minor coding style in hci_event.c
Replace the status checks with the short form of the boolean expression.

Signed-off-by: Mikel Astiz <mikel.astiz@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:40 -03:00
Andrei Emeltchenko
31f470738b Bluetooth: trivial: Use preferred method for NULL check
Use standard bluetooth way to check NULL pointer !var instead of
var == NULL.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-21 14:54:39 -03:00
J. Bruce Fields
d10f27a750 svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Cc: stable@vger.kernel.org
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:19 -04:00
J. Bruce Fields
f06f00a24d svcrpc: sends on closed socket should stop immediately
svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
threads could send further replies before the socket is really shut
down.  This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Cc: stable@vger.kernel.org
Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:59 -04:00
J. Bruce Fields
be1e44441a svcrpc: fix BUG() in svc_tcp_clear_pages
Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).

svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY.  However,
that means the inconsistency must be repaired before dropping XPT_BUSY.

Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.

Symptoms were a BUG() in svc_tcp_clear_pages.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:44 -04:00
Tejun Heo
43829731dd workqueue: deprecate flush[_delayed]_work_sync()
flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
and convert all users to flush[_delayed]_work().

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Bryan Wu <bryan.wu@canonical.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
2012-08-20 14:51:24 -07:00
Sage Weil
d1c338a509 libceph: delay debugfs initialization until we learn global_id
The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).

Reported by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-20 10:03:15 -07:00
Johannes Berg
dcf33963c4 mac80211: clean up ieee80211_subif_start_xmit
There's no need to carry around a return value
that is always NETDEV_TX_OK anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:46 +02:00
Johannes Berg
fe94fe05e9 mac80211: pass channel to ieee80211_send_probe_req
In multi-channel scenarios, the channel that we will
transmit a probe request on isn't always the current
channel (which will be NULL anyway) but will instead
be the channel that the AP is on. Pass the channel
to the ieee80211_send_probe_req() function so it can
be used in the different scenarios. The scan code
continues to pass the current channel, of course.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:45 +02:00
Johannes Berg
c0af07340a mac80211: convert ops checks to WARN_ON
There's no need to BUG_ON when a driver registers
invalid operations, warn and return an error.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:44 +02:00
Johannes Berg
9b86487043 mac80211: check operating channel in scan
The optimisation of scanning only on the current
channel should check the operating channel. Also
modify it to compare channel pointer rather than
the frequency.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:43 +02:00
Johannes Berg
4797c7ba93 mac80211: use RX status band instead of current band
Even for single-channel devices it is possible that we
switch the channel temporarily (e.g. for scanning) but
while doing so process a received frame that was still
received on the old channel, so checking the current
band is racy. Use the band from status instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:43 +02:00
Johannes Berg
3d01be72e6 mac80211: don't assume channel is set in tracing
With the move to multi-channel and away from
drv_config(), hw.conf.channel will not always
be set, only for devices using the current API
instead of the new channel context APIs. Check
the channel is set before adding its frequency
to the trace data.

Also break some overly long lines in the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:42 +02:00
Johannes Berg
f9e6e95b63 mac80211: use oper_channel in rate init
Using hw.conf.channel is wrong as it could be the
temporary channel if the station is added from the
workqueue while the device is already on another
channel. Use oper_channel instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:41 +02:00
Johannes Berg
9e99a127b5 mac80211: remove freq/chantype from debugfs
You can now get these values through iw, and
they conflict with multi-channel work.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:40 +02:00
Johannes Berg
e31583cdf0 mac80211: remove almost unused local variable
In ieee80211_beacon_get_tim() we can use the
txrc.sband instead of a separate local variable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:39 +02:00
Johannes Berg
466f310d10 mac80211: mesh: don't use global channel type
Using local->_oper_channel_type in the mesh code is
completely wrong as this value is the combination
of the various interface channel types and can be
a different value from the mesh interface in case
there are multiple virtual interfaces.

Use sdata->vif.bss_conf.channel_type instead as it
tracks the per-vif channel type.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:38 +02:00
Johannes Berg
d348f69f59 mac80211: simplify buffers in aes_128_cmac_vector
There's no need to use a single scratch buffer and
calculate offsets into it, just use two separate
buffers for the separate variables.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:03:18 +02:00
Johannes Berg
6d71117a27 mac80211: add IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
Some devices like the current iwlwifi implementation
require that the P2P interface address match the P2P
Device address (only one P2P interface is supported.)
Add the HW flag IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
that allows drivers to request that P2P Interfaces
added while a P2P Device is active get the same MAC
address by default.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:23 +02:00
Johannes Berg
f142c6b906 mac80211: support P2P Device abstraction
After cfg80211 got a P2P Device abstraction, add
support to mac80211. Whether it really is supported
or not will depend on whether or not the driver has
support for it, but mac80211 needs to change to be
able to support drivers that need a P2P Device.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:22 +02:00
Johannes Berg
98104fdeda cfg80211: add P2P Device abstraction
In order to support using a different MAC address
for the P2P Device address we must first have a
P2P Device abstraction that can be assigned a MAC
address.

This abstraction will also be useful to support
offloading P2P operations to the device, e.g.
periodic listen for discoverability.

Currently, the driver is responsible for assigning
a MAC address to the P2P Device, but this could be
changed by allowing a MAC address to be given to
the NEW_INTERFACE command.

As it has no associated netdev, a P2P Device can
only be identified by its wdev identifier but the
previous patches allowed using the wdev identifier
in various APIs, e.g. remain-on-channel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:21 +02:00
Johannes Berg
cc74c0c7d6 mac80211: make ieee80211_beacon_connection_loss_work static
There's no need to declare the function in the
header file since it's only used in a single
place, so make it static.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:50 +02:00
Johannes Berg
5bc1420b11 mac80211: check size of channel switch IE when parsing
The channel switch IE has a fixed size, so we can
discard it in parsing if it's not the right size
and use the right struct pointer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:50 +02:00
Johannes Berg
3049000b97 mac80211: fix CSA handling timer
The time until the channel switch is in TU,
not in milliseconds, so use TU_TO_EXP_TIME()
to correctly program the timer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:49 +02:00
Johannes Berg
57eebdf3c2 mac80211: clean up CSA handling code
Clean up the CSA handling code by moving some
of it out of the if and using a C99 initializer
for the struct passed to the driver method.

While at it, also add a comment that we should
wait for a beacon after switching the channel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:48 +02:00
Johannes Berg
90bcf867ce mac80211: remove unneeded 'bssid' variable
There's no need to copy the BSSID just to print
it, remove the unnecessary variable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:47 +02:00
Johannes Berg
4c29867790 mac80211: support A-MPDU status reporting
Support getting A-MPDU status information from the
drivers and reporting it to userspace via radiotap
in the standard fields.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:22 +02:00
Johannes Berg
48613ece3d wireless: add radiotap A-MPDU status field
Define the A-MPDU status field in radiotap, also
update the radiotap parser for it and the MCS field
that was apparently missed last time.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:09 +02:00
Antonio Quartulli
e687f61eed mac80211: add supported rates change notification in IBSS
In IBSS it is possible that the supported rates set for a station changes over
time (e.g. it gets first initialised as an empty set because of no available
information about rates and updated later). In this case the driver has to be
notified about the change in order to update its internal table accordingly (if
needed).

This behaviour is needed by all those drivers that handle rc internally but
leave stations management to mac80211

Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
[Johannes - add docs, validate IBSS mode only, fix compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:31:43 +02:00
Thomas Pedersen
4bd4c2dd8e mac80211: clean up mpath_move_to_queue()
Use skb_queue_walk_safe instead, and fix a few issues:

	- didn't free old skbs on moving
	- didn't react to failed skb alloc
	- needlessly held a local pointer to the destination frame queue
	- didn't check destination queue length before adding skb

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:25:05 +02:00
Thomas Pedersen
b22bd5221c mac80211: use skb_queue_walk() in mesh_path_assign_nexthop
Since all we really want is just to iterate over all skbs, do just that
and avoid (de)queueing to a clusmy tmpq.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:25:05 +02:00
Eyal Shapira
aa7a00809c mac80211: avoid using synchronize_rcu in ieee80211_set_probe_resp
This could take a while (100ms+) and may delay sending assoc resp
in AP mode with WPS or P2P GO (as setting the probe resp takes place
there). We've encountered situations where the delay was big enough
to cause connection problems with devices like Galaxy Nexus.
Switch to using call_rcu with a free handler.

[Arik - rework to use plain buffer and instead of skb]

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:20:56 +02:00
Patrick McHardy
2834a6386b netfilter: nf_conntrack: remove unnecessary RTNL locking
Locking the rtnl was added to nf_conntrack_l{3,4}_proto_unregister()
for walking the network namespace list. This is not done anymore since
we have proper namespace support in the protocols now, so we don't
need to take the RTNL anymore.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:46:29 +02:00
Patrick McHardy
fe31d1a860 netfilter: sparse endian fixes
Fix a couple of endian annotation in net/netfilter:

net/netfilter/nfnetlink_acct.c:82:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_acct.c:86:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_cthelper.c:77:28: warning: cast to restricted __be16
net/netfilter/xt_NFQUEUE.c:46:16: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:60:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:68:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_osf.c:272:55: warning: cast to restricted __be16

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:45:57 +02:00
Patrick McHardy
2dba62c30e netfilter: nfnetlink_log: fix NLA_PUT macro removal bug
Commit 1db20a52 (nfnetlink_log: Stop using NLA_PUT*().) incorrectly
converted a NLA_PUT_BE16 macro to nla_put_be32() in nfnetlink_log:

-               NLA_PUT_BE16(inst->skb, NFULA_HWTYPE, htons(skb->dev->type));
+               if (nla_put_be32(inst->skb, NFULA_HWTYPE, htons(skb->dev->type))

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:40:23 +02:00
Neal Cardwell
fae6ef87fa net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
This commit removes the sk_rx_dst_set calls from
tcp_create_openreq_child(), because at that point the icsk_af_ops
field of ipv6_mapped TCP sockets has not been set to its proper final
value.

Instead, to make sure we get the right sk_rx_dst_set variant
appropriate for the address family of the new connection, we have
tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
shortly after the call to tcp_create_openreq_child() returns.

This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
with the new approach.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Artem Savkov <artem.savkov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 03:03:33 -07:00
Randy Dunlap
3de7a37b02 net/core/dev.c: fix kernel-doc warning
Fix kernel-doc warning:

Warning(net/core/dev.c:5745): No description found for parameter 'dev'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc:	"David S. Miller" <davem@davemloft.net>
Cc:	netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 03:00:55 -07:00
Patrick McHardy
9d7b0fc1ef net: ipv6: fix oops in inet_putpeer()
Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
initialized, causing a false positive result from inetpeer_ptr_is_peer(),
which in turn causes a NULL pointer dereference in inet_putpeer().

Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
EIP is at inet_putpeer+0xe/0x16
EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
 DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
 f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
 f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
 [<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
 [<c038d5f7>] dst_destroy+0x1d/0xa4
 [<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
 [<c0396870>] flow_cache_gc_task+0x85/0x9f
 [<c0142d2b>] process_one_work+0x122/0x441
 [<c043feb5>] ? apic_timer_interrupt+0x31/0x38
 [<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
 [<c0143e2d>] worker_thread+0x113/0x3cc

Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
properly initialize the dst's peer pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:56:56 -07:00
Jesper Juhl
d92c7f8aab caif: Do not dereference NULL in chnl_recv_cb()
In net/caif/chnl_net.c::chnl_recv_cb() we call skb_header_pointer()
which may return NULL, but we do not check for a NULL pointer before
dereferencing it.
This patch adds such a NULL check and properly free's allocated memory
and return an error (-EINVAL) on failure - much better than crashing..

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:47:49 -07:00
David S. Miller
6c71bec66a Merge git://1984.lsi.us.es/nf
Pable Neira Ayuso says:

====================
The following five patches contain fixes for 3.6-rc, they are:

* Two fixes for message parsing in the SIP conntrack helper, from
  Patrick McHardy.

* One fix for the SIP helper introduced in the user-space cthelper
  infrastructure, from Patrick McHardy.

* fix missing appropriate locking while modifying one conntrack entry
  from the nfqueue integration code, from myself.

* fix possible access to uninitiliazed timer in the nf_conntrack
  expectation infrastructure, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:44:29 -07:00
Eric Leblond
c0de08d042 af_packet: don't emit packet on orig fanout group
If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.

This patch fixes the issue by changing the transmission check to
take fanout group info account.

Reported-by: Aleksandr Kotov <a1k@mail.ru>
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:37:29 -07:00
Ying Xue
e6a04b1d3f tipc: eliminate configuration for maximum number of name publications
Gets rid of the need for users to specify the maximum number of
name publications supported by TIPC. TIPC now automatically provides
support for the maximum number of name publications to 65535.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
34f256cc79 tipc: eliminate configuration for maximum number of name subscriptions
Gets rid of the need for users to specify the maximum number of
name subscriptions supported by TIPC. TIPC now automatically provides
support for the maximum number of name subscriptions to 65535.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
61cdd4d80b tipc: add __read_mostly annotations to several global variables
Added to the following:

 - tipc_random
 - tipc_own_addr
 - tipc_max_ports
 - tipc_net_id
 - tipc_remote_management
 - handler_enabled

The above global variables are read often, but written rarely. Use
__read_mostly to prevent them being on the same cacheline as another
variable which is written to often, which would cause cacheline
bouncing.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
f046e7d9be tipc: convert tipc_nametbl_size type from variable to macro
There is nothing changing this variable dynamically, so change
it to a macro to make that more obvious when reading the code.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
379c0456af tipc: change tipc_net_start routine return value type
Since now tipc_net_start() always returns a success code - 0, its
return value type should be changed from integer to void, which can
avoid unnecessary check for its return value.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
381294331e tipc: manually inline single use media_name_valid routine
After eliminating the mechanism which checks whether all letters
in media name string are within a given character set, the
media_name_valid routine becomes trivial.  It is also only
used once, so it is unnecessary to keep it as a separate function.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
fc0739385b tipc: remove pointless name sanity check and tipc_alphabet array
There is no real reason to check whether all letters in the given
media name and network interface name are within the character set
defined in tipc_alphabet array. Even if we eliminate the checking,
the rest of checking conditions in tipc_enable_bearer() can ensure
we do not enable an invalid or illegal bearer.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
4225a398c1 tipc: fix lockdep warning during bearer initialization
When the lockdep validator is enabled, it will report the below
warning when we enable a TIPC bearer:

[ INFO: possible irq lock inversion dependency detected ]
---------------------------------------------------------
Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(ptype_lock);
                                local_irq_disable();
                                lock(tipc_net_lock);
                                lock(ptype_lock);
   <Interrupt>
   lock(tipc_net_lock);

  *** DEADLOCK ***

the shortest dependencies between 2nd lock and 1st lock:
  -> (ptype_lock){+.+...} ops: 10 {
[...]
SOFTIRQ-ON-W at:
                      [<c1089418>] __lock_acquire+0x528/0x13e0
                      [<c108a360>] lock_acquire+0x90/0x100
                      [<c1553c38>] _raw_spin_lock+0x38/0x50
                      [<c14651ca>] dev_add_pack+0x3a/0x60
                      [<c182da75>] arp_init+0x1a/0x48
                      [<c182dce5>] inet_init+0x181/0x27e
                      [<c1001114>] do_one_initcall+0x34/0x170
                      [<c17f7329>] kernel_init+0x110/0x1b2
                      [<c155b6a2>] kernel_thread_helper+0x6/0x10
[...]
   ... key      at: [<c17e4b10>] ptype_lock+0x10/0x20
   ... acquired at:
    [<c108a360>] lock_acquire+0x90/0x100
    [<c1553c38>] _raw_spin_lock+0x38/0x50
    [<c14651ca>] dev_add_pack+0x3a/0x60
    [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
    [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
    [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
    [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
    [<c148e802>] genl_rcv_msg+0x232/0x280
    [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
    [<c148e5bc>] genl_rcv+0x1c/0x30
    [<c148d144>] netlink_unicast+0x174/0x1f0
    [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
    [<c1456bc1>] sock_aio_write+0x161/0x170
    [<c1135a7c>] do_sync_write+0xac/0xf0
    [<c11360f6>] vfs_write+0x156/0x170
    [<c11361e2>] sys_write+0x42/0x70
    [<c155b0df>] sysenter_do_call+0x12/0x38
[...]
}
  -> (tipc_net_lock){+..-..} ops: 4 {
[...]
    IN-SOFTIRQ-R at:
                     [<c108953a>] __lock_acquire+0x64a/0x13e0
                     [<c108a360>] lock_acquire+0x90/0x100
                     [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
                     [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
                     [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
                     [<c146a5fa>] __netif_receive_skb+0x22a/0x590
                     [<c146ab0b>] netif_receive_skb+0x2b/0xf0
                     [<c13c43d2>] pcnet32_poll+0x292/0x780
                     [<c146b00a>] net_rx_action+0xfa/0x1e0
                     [<c103a4be>] __do_softirq+0xae/0x1e0
[...]
}

>From the log, we can see three different call chains between
CPU0 and CPU1:

Time 0 on CPU0:

  kernel_init()->inet_init()->dev_add_pack()

At time 0, the ptype_lock is held by CPU0 in dev_add_pack();

Time 1 on CPU1:

  tipc_enable_bearer()->enable_bearer()->dev_add_pack()

At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
wants to take ptype_lock to register TIPC protocol handler into the
networking stack.  But the ptype_lock has been taken by dev_add_pack()
on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
busy looping.

Time 2 on CPU0:

  netif_receive_skb()->recv_msg()->tipc_recv_msg()

At time 2, an incoming TIPC packet arrives at CPU0, hence
tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
to hold tipc_net_lock.  At the moment, below scenario happens:

On CPU0, below is our sequence of taking locks:

  lock(ptype_lock)->lock(tipc_net_lock)

On CPU1, our sequence of taking locks looks like:

  lock(tipc_net_lock)->lock(ptype_lock)

Obviously deadlock may happen in this case.

But please note the deadlock possibly doesn't occur at all when the
first TIPC bearer is enabled.  Before enable_bearer() -- running on
CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
recv_msg()) is not registered successfully via dev_add_pack(), so
the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
message comes to CPU0. But when the second TIPC bearer is
registered, the deadlock can perhaps really happen.

To fix it, we will push the work of registering TIPC protocol
handler into workqueue context. After the change, both paths taking
ptype_lock are always in process contexts, thus, the deadlock should
never occur.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
fa7f86f1bb tipc: optimize the initialization of network device notifier
Ethernet media initialization is only done when TIPC is started or
switched to network mode. So the initialization of the network device
notifier structure can be moved out of this function and done
statically instead.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Pavel Emelyanov
fff3321d75 packet: Report fanout status via diag engine
Reported value is the same reported by the FANOUT getsockoption, but
unlike it, the absent fanout setup results in absent nlattr, rather
than in nlattr with zero value. This is done so, since zero fanout
report may mean both -- no fanout, and fanout with both id and type zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:23:14 -07:00
Pavel Emelyanov
16f01365fa packet: Report rings cfg via diag engine
One extension bit may result in two nlattrs -- one per ring type.
If some ring type is not configured, then the respective nlatts
will be empty.

The structure reported contains the data, that is given to the
corresponding ring setup socket option.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:23:14 -07:00
Dan Carpenter
5ef5d6c569 gre: information leak in ip6_tnl_ioctl()
There is a one byte hole between p->hop_limit and p->flowinfo where
stack memory is leaked to the user.  This was introduced in c12b395a46
"gre: Support GRE over IPv6".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-20 02:21:30 -07:00
Fan Du
56892261ed xfrm: Use rcu_dereference_bh to deference pointer protected by rcu_read_lock_bh
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 19:39:22 -07:00
Dan Carpenter
898132ae76 ipv6: move dereference after check in fl_free()
There is a dereference before checking for NULL bug here.  Generally
free() functions should accept NULL pointers.  For example, fl_create()
can pass a NULL pointer to fl_free() on the error path.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-16 16:04:42 -07:00
John Fastabend
476ad154f3 net: netprio: fix cgrp create and write priomap race
A race exists where creating cgroups and also updating the priomap
may result in losing a priomap update. This is because priomap
writers are not protected by rtnl_lock.

Move priority writer into rtnl_lock()/rtnl_unlock().

CC: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:56:11 -07:00
John Fastabend
48a87cc26c net: netprio: fd passed in SCM_RIGHTS datagram not set correctly
A socket fd passed in a SCM_RIGHTS datagram was not getting
updated with the new tasks cgrp prioidx. This leaves IO on
the socket tagged with the old tasks priority.

To fix this add a check in the scm recvmsg path to update the
sock cgrp prioidx with the new tasks value.

Thanks to Al Viro for catching this.

CC: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:56:11 -07:00
John Fastabend
f796c20cf6 net: netprio: fix files lock and remove useless d_path bits
Add lock to prevent a race with a file closing and also remove
useless and ugly sscanf code. The extra code was never needed
and the case it supposedly protected against is in fact handled
correctly by sock_from_file as pointed out by Al Viro.

CC: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:56:11 -07:00
Jason Wang
16c0b164bd act_mirred: do not drop packets when fails to mirror it
We drop packet unconditionally when we fail to mirror it. This is not intended
in some cases. Consdier for kvm guest, we may mirror the traffic of the bridge
to a tap device used by a VM. When kernel fails to mirror the packet in
conditions such as when qemu crashes or stop polling the tap, it's hard for the
management software to detect such condition and clean the the mirroring
before. This would lead all packets to the bridge to be dropped and break the
netowrk of other virtual machines.

To solve the issue, the patch does not drop packets when kernel fails to mirror
it, and only drop the redirected packets.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:54:44 -07:00
Dan Carpenter
02644a1745 sctp: fix bogus if statement in sctp_auth_recv_cid()
There is an extra semi-colon here, so we always return 0 instead of
calling __sctp_auth_cid().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 13:36:29 -07:00
Ulrich Weber
6932f119bd sctp: fix compile issue with disabled CONFIG_NET_NS
struct seq_net_private has no struct net
if CONFIG_NET_NS is not enabled

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 13:36:29 -07:00
Pablo Neira Ayuso
2614f86490 netfilter: nf_ct_expect: fix possible access to uninitialized timer
In __nf_ct_expect_check, the function refresh_timer returns 1
if a matching expectation is found and its timer is successfully
refreshed. This results in nf_ct_expect_related returning 0.
Note that at this point:

- the passed expectation is not inserted in the expectation table
  and its timer was not initialized, since we have refreshed one
  matching/existing expectation.

- nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
  timer is in some undefined state just after the allocation,
  until it is appropriately initialized.

This can be a problem for the SIP helper during the expectation
addition:

 ...
 if (nf_ct_expect_related(rtp_exp) == 0) {
         if (nf_ct_expect_related(rtcp_exp) != 0)
                 nf_ct_unexpect_related(rtp_exp);
 ...

Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:

 spin_lock_bh(&nf_conntrack_lock);
 if (del_timer(&exp->timeout)) {
         nf_ct_unlink_expect(exp);
         nf_ct_expect_put(exp);
 }
 spin_unlock_bh(&nf_conntrack_lock);

Note that del_timer always returns false if the timer has been
initialized.  However, the timer was not initialized since setup_timer
was not called, therefore, the expectation timer remains in some
undefined state. If I'm not missing anything, this may lead to the
removal an unexistent expectation.

To fix this, the optimization that allows refreshing an expectation
is removed. Now nf_conntrack_expect_related looks more consistent
to me since it always add the expectation in case that it returns
success.

Thanks to Patrick McHardy for participating in the discussion of
this patch.

I think this may be the source of the problem described by:
http://marc.info/?l=netfilter-devel&m=134073514719421&w=2

Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-16 11:49:53 +02:00
Mathias Krause
43da5f2e0d net: fix info leak in compat dev_ifconf()
The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
2d8a041b7b ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
7b07f8eb75 dccp: fix info leak via getsockopt(DCCP_SOCKOPT_CCID_TX_INFO)
The CCID3 code fails to initialize the trailing padding bytes of struct
tfrc_tx_info added for alignment on 64 bit architectures. It that for
potentially leaks four bytes kernel stack via the getsockopt() syscall.
Add an explicit memset(0) before filling the structure to avoid the
info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
276bdb82de dccp: check ccid before dereferencing
ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
a NULL ccid pointer leading to a NULL pointer dereference. This could
lead to a privilege escalation if the attacker is able to map page 0 and
prepare it with a fake ccid_ops pointer.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
3592aaeb80 llc: fix info leak via getsockname()
The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.

Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
04d4fbca10 l2tp: fix info leak via getsockname()
The L2TP code for IPv6 fails to initialize the l2tp_unused member of
struct sockaddr_l2tpip6 and that for leaks two bytes kernel stack via
the getsockname() syscall. Initialize l2tp_unused with 0 to avoid the
info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
792039c73c Bluetooth: L2CAP - Fix info leak via getsockname()
The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
9344a97296 Bluetooth: RFCOMM - Fix info leak via getsockname()
The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
f9432c5ec8 Bluetooth: RFCOMM - Fix info leak in ioctl(RFCOMMGETDEVLIST)
The RFCOMM code fails to initialize the two padding bytes of struct
rfcomm_dev_list_req inserted for alignment before copying it to
userland. Additionally there are two padding bytes in each instance of
struct rfcomm_dev_info. The ioctl() that for disclosures two bytes plus
dev_num times two bytes uninitialized kernel heap memory.

Allocate the memory using kzalloc() to fix this issue.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
9ad2de43f1 Bluetooth: RFCOMM - Fix info leak in getsockopt(BT_SECURITY)
The RFCOMM code fails to initialize the key_size member of struct
bt_security before copying it to userland -- that for leaking one
byte kernel stack. Initialize key_size with 0 to avoid the info
leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
3f68ba07b1 Bluetooth: HCI - Fix info leak via getsockname()
The HCI code fails to initialize the hci_channel member of struct
sockaddr_hci and that for leaks two bytes kernel stack via the
getsockname() syscall. Initialize hci_channel with 0 to avoid the
info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
e15ca9a0ef Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER)
The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
3c0c5cfdcd atm: fix info leak via getsockname()
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
e862f1a9b7 atm: fix info leak in getsockopt(SO_ATMPVC)
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
David S. Miller
2ea214929d Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is a batch of updates intended for 3.7.  The ath9k, mwifiex,
and b43 drivers get the bulk of the commits this time, with a handful
of other driver bits thrown-in.  It is mostly just minor fixes and
cleanups, etc.

Also included is a Bluetooth pull, with a lot of refactoring.
Gustavo says:

	"These are the changes I queued for 3.7. There are a many
	small fixes/improvements by Andre Guedes. A l2cap channel
	refcounting refactor by Jaganath. Bluetooth sockets now
	appears in /proc/net, by Masatake Yamato and Sachin Kamat
	changes ours drivers to use devm_kzalloc()."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:26:05 -07:00
Razvan Ghitulete
1b3a6926fc net: remove wrong initialization for snd_wl1
The field tp->snd_wl1 is twice initialized, the second time
seems to be wrong as it may overwrite any update in tcp_ack.

Signed-off-by: Razvan Ghitulete <rghitulete@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:23:46 -07:00
Fan Du
65e0736bc2 xfrm: remove redundant parameter "int dir" in struct xfrm_mgr.acquire
Sematically speaking, xfrm_mgr.acquire is called when kernel intends to ask
user space IKE daemon to negotiate SAs with peers. IOW the direction will
*always* be XFRM_POLICY_OUT, so remove int dir for clarity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:13:30 -07:00
Stephen Hemminger
c03307eab6 bridge: fix rcu dereference outside of rcu_read_lock
Alternative solution for problem found by Linux Driver Verification
project (linuxtesting.org).

As it noted in the comment before the br_handle_frame_finish
function, this function should be called under rcu_read_lock.

The problem callgraph:
br_dev_xmit -> br_nf_pre_routing_finish_bridge_slow ->
 -> br_handle_frame_finish -> br_port_get_rcu -> rcu_dereference

And in this case there is no read-lock section.

Reported-by: Denis Efremov <yefremov.denis@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:09:41 -07:00
John W. Linville
16698918cd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-08-15 14:29:37 -04:00
Eric W. Biederman
e1fc3b14f9 sctp: Make sysctl tunables per net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:32:16 -07:00
Eric W. Biederman
f53b5b097e sctp: Push struct net down into sctp_verify_ext_param
Add struct net as a parameter to sctp_verify_param so it can be passed
to sctp_verify_ext_param where struct net will be needed when the sctp
tunables become per net tunables.

Add struct net as a parameter to sctp_verify_init so struct net can be
passed to sctp_verify_param.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
24cb81a6a9 sctp: Push struct net down into all of the state machine functions
There are a handle of state machine functions primarily those dealing
with processing INIT packets where there is neither a valid endpoint nor
a valid assoication from which to derive a struct net.  Therefore add
struct net * to the parameter list of sctp_state_fn_t and update all of
the state machine functions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
e7ff4a7037 sctp: Push struct net down into sctp_in_scope
struct net will be needed shortly when the tunables are made per network
namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
89bf3450cb sctp: Push struct net down into sctp_transport_init
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
55e26eb95a sctp: Push struct net down to sctp_chunk_event_lookup
This trickles up through sctp_sm_lookup_event up to sctp_do_sm
and up further into sctp_primitiv_NAME before the code reaches
places where struct net can be reliably found.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
ebb7e95d93 sctp: Add infrastructure for per net sysctls
Start with an empty sctp_net_table that will be populated as the various
tunable sysctls are made per net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
b01a24078f sctp: Make the mib per network namespace
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:36 -07:00
Eric W. Biederman
bb2db45b54 sctp: Enable sctp in all network namespaces
- Fix the sctp_af operations to work in all namespaces
- Enable sctp socket creation in all network namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:29:59 -07:00
Eric W. Biederman
13d782f6b4 sctp: Make the proc files per network namespace.
- Convert all of the files under /proc/net/sctp to be per
  network namespace.

- Don't print anything for /proc/net/sctp/snmp except in
  the initial network namespaces as the snmp counters still
  have to be converted to be per network namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:29:53 -07:00
Eric W. Biederman
632c928a6a sctp: Move the percpu sockets counter out of sctp_proc_init
The percpu sctp socket counter has nothing at all to do with the sctp
proc files, and having it in the wrong initialization is confusing,
and makes network namespace support a pain.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:17:26 -07:00
Eric W. Biederman
2ce9550350 sctp: Make the ctl_sock per network namespace
- Kill sctp_get_ctl_sock, it is useless now.
- Pass struct net where needed so net->sctp.ctl_sock is accessible.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:17:26 -07:00
Eric W. Biederman
4db67e8086 sctp: Make the address lists per network namespace
- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
  to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:12:17 -07:00
Eric W. Biederman
4110cc255d sctp: Make the association hashtable handle multiple network namespaces
- Use struct net in the hash calculation
- Use sock_net(association.base.sk) in the association lookups.
- On receive calculate the network namespace from skb->dev.
- Pass struct net from receive down to the functions that actually
  do the association lookup.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
4cdadcbcb6 sctp: Make the endpoint hashtable handle multiple network namespaces
- Use struct net in the hash calculation
- Use sock_net(endpoint.base.sk) in the endpoint lookups.
- On receive calculate the network namespace from skb->dev.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
f1f4376307 sctp: Make the port hash table use struct net in it's key.
- Add struct net into the port hash table hash calculation
- Add struct net inot the struct sctp_bind_bucket so there
  is a memory of which network namespace a port is allocated in.
  No need for a ref count because sctp_bind_bucket only exists
  when there are sockets in the hash table and sockets can not
  change their network namspace, and sockets already ref count
  their network namespace.
- Add struct net into the key comparison when we are testing
  to see if we have found the port hash table entry we are
  looking for.

With these changes lookups in the port hash table becomes
safe to use in multiple network namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
26711a791e userns: xt_owner: Add basic user namespace support.
- Only allow adding matches from the initial user namespace
- Add the appropriate conversion functions to handle matches
  against sockets in other user namespaces.

Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:30 -07:00
Eric W. Biederman
da7428080a userns xt_recent: Specify the owner/group of ip_list_perms in the initial user namespace
xt_recent creates a bunch of proc files and initializes their uid
and gids to the values of ip_list_uid and ip_list_gid.  When
initialize those proc files convert those values to kuids so they
can continue to reside on the /proc inode.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:29 -07:00
Eric W. Biederman
8c6e2a941a userns: Convert xt_LOG to print socket kuids and kgids as uids and gids
xt_LOG always writes messages via sb_add via printk.  Therefore when
xt_LOG logs the uid and gid of a socket a packet came from the
values should be converted to be in the initial user namespace.

Thus making xt_LOG as user namespace safe as possible.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:29 -07:00
Eric W. Biederman
a6c6796c71 userns: Convert cls_flow to work with user namespaces enabled
The flow classifier can use uids and gids of the sockets that
are transmitting packets and do insert those uids and gids
into the packet classification calcuation.  I don't fully
understand the details but it appears that we can depend
on specific uids and gids when making traffic classification
decisions.

To work with user namespaces enabled map from kuids and kgids
into uids and gids in the initial user namespace giving raw
integer values the code can play with and depend on.

To avoid issues of userspace depending on uids and gids in
packet classifiers installed from other user namespaces
and getting confused deny all packet classifiers that
use uids or gids that are not comming from a netlink socket
in the initial user namespace.

Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Changli Gao <xiaosuo@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:28 -07:00
Eric W. Biederman
af4c6641f5 net sched: Pass the skb into change so it can access NETLINK_CB
cls_flow.c plays with uids and gids.  Unless I misread that
code it is possible for classifiers to depend on the specific uid and
gid values.  Therefore I need to know the user namespace of the
netlink socket that is installing the packet classifiers.  Pass
in the rtnetlink skb so I can access the NETLINK_CB of the passed
packet.  In particular I want access to sk_user_ns(NETLINK_CB(in_skb).ssk).

Pass in not the user namespace but the incomming rtnetlink skb into
the the classifier change routines as that is generally the more useful
parameter.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:28 -07:00
Eric W. Biederman
9eea9515cb userns: nfnetlink_log: Report socket uids in the log sockets user namespace
At logging instance creation capture the peer netlink socket's user
namespace. Use the captured peer user namespace when reporting socket
uids to the peer.

The peer socket's user namespace is guaranateed to be valid until the user
closes the netlink socket.  nfnetlink_log removes instances during the final
close of a socket.  __build_packet_message does not get called after an
instance is destroyed.   Therefore it is safe to let the peer netlink socket
take care of the user namespace reference counting for us.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:27 -07:00
Eric W. Biederman
d06ca95643 userns: Teach inet_diag to work with user namespaces
Compute the user namespace of the socket that we are replying to
and translate the kuids of reported sockets into that user namespace.

Cc: Andrew Vagin <avagin@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:20 -07:00
Eric W. Biederman
3fbc290540 netlink: Make the sending netlink socket availabe in NETLINK_CB
The sending socket of an skb is already available by it's port id
in the NETLINK_CB.  If you want to know more like to examine the
credentials on the sending socket you have to look up the sending
socket by it's port id and all of the needed functions and data
structures are static inside of af_netlink.c.  So do the simple
thing and pass the sending socket to the receivers in the NETLINK_CB.

I intend to use this to get the user namespace of the sending socket
in inet_diag so that I can report uids in the context of the process
who opened the socket, the same way I report uids in the contect
of the process who opens files.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:49 -07:00
Eric W. Biederman
d13fda8564 userns: Convert net/ax25 to use kuid_t where appropriate
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:42 -07:00
Eric W. Biederman
4f82f45730 net ip6 flowlabel: Make owner a union of struct pid * and kuid_t
Correct a long standing omission and use struct pid in the owner
field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS.
This guarantees we don't have issues when pid wraparound occurs.

Use a kuid_t in the owner field of struct ip6_flowlabel when the
share type is IPV6_FL_S_USER to add user namespace support.

In /proc/net/ip6_flowlabel capture the current pid namespace when
opening the file and release the pid namespace when the file is
closed ensuring we print the pid owner value that is meaning to
the reader of the file.  Similarly use from_kuid_munged to print
uid values that are meaningful to the reader of the file.

This requires exporting pid_nr_ns so that ipv6 can continue to built
as a module.  Yoiks what silliness

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:25 -07:00
Eric W. Biederman
7064d16e16 userns: Use kgids for sysctl_ping_group_range
- Store sysctl_ping_group_range as a paire of kgid_t values
  instead of a pair of gid_t values.
- Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range
- For invalid cases reset to the default disabled state.

With the kgid_t conversion made part of the original value sanitation
from userspace understand how the code will react becomes clearer
and it becomes possible to set the sysctl ping group range from
something other than the initial user namespace.

Cc: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:10 -07:00
Eric W. Biederman
a7cb5a49bf userns: Print out socket uids in a user namespace aware fashion.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:48:06 -07:00
Eric W. Biederman
976d020150 userns: Convert sock_i_uid to return a kuid_t
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:47:34 -07:00
Eric W. Biederman
d04a48b06d userns: Convert __dev_set_promiscuity to use kuids in audit logs
Cc: Klaus Heinrich Kiwi <klausk@br.ibm.com>
Cc: Eric Paris <eparis@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-08-14 21:42:40 -07:00
Eric W. Biederman
b2e4f544fd userns: Convert net/core/scm.c to use kuids and kgids
With the existence of kuid_t and kgid_t we can take this further
and remove the usage of struct cred altogether, ensuring we
don't get cache line misses from reference counts.   For now
however start simply and do a straight forward conversion
I can be certain is correct.

In cred_to_ucred use from_kuid_munged and from_kgid_munged
as these values are going directly to userspace and we want to use
the userspace safe values not -1 when reporting a value that does not
map.  The earlier conversion that used from_kuid was buggy in that
respect.  Oops.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:41:58 -07:00
Andrei Emeltchenko
09d5d4aa64 Bluetooth: trivial: Shorten variable scope
Make code more clear by moving sk and bt vars to the place where
they are actually used.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 01:14:53 -03:00
Andre Guedes
61a0cfb008 Bluetooth: Fix use-after-free bug in SMP
If SMP fails, we should always cancel security_timer delayed work.
Otherwise, security_timer function may run after l2cap_conn object
has been freed.

This patch fixes the following warning reported by ODEBUG:

WARNING: at lib/debugobjects.c:261 debug_print_object+0x7c/0x8d()
Hardware name: Bochs
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x27
Modules linked in: btusb bluetooth
Pid: 440, comm: kworker/u:2 Not tainted 3.5.0-rc1+ #4
Call Trace:
 [<ffffffff81174600>] ? free_obj_work+0x4a/0x7f
 [<ffffffff81023eb8>] warn_slowpath_common+0x7e/0x97
 [<ffffffff81023f65>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff811746b1>] debug_print_object+0x7c/0x8d
 [<ffffffff810394f0>] ? __queue_work+0x241/0x241
 [<ffffffff81174fdd>] debug_check_no_obj_freed+0x92/0x159
 [<ffffffff810ac08e>] slab_free_hook+0x6f/0x77
 [<ffffffffa0019145>] ? l2cap_conn_del+0x148/0x157 [bluetooth]
 [<ffffffff810ae408>] kfree+0x59/0xac
 [<ffffffffa0019145>] l2cap_conn_del+0x148/0x157 [bluetooth]
 [<ffffffffa001b9a2>] l2cap_recv_frame+0xa77/0xfa4 [bluetooth]
 [<ffffffff810592f9>] ? trace_hardirqs_on_caller+0x112/0x1ad
 [<ffffffffa001c86c>] l2cap_recv_acldata+0xe2/0x264 [bluetooth]
 [<ffffffffa0002b2f>] hci_rx_work+0x235/0x33c [bluetooth]
 [<ffffffff81038dc3>] ? process_one_work+0x126/0x2fe
 [<ffffffff81038e22>] process_one_work+0x185/0x2fe
 [<ffffffff81038dc3>] ? process_one_work+0x126/0x2fe
 [<ffffffff81059f2e>] ? lock_acquired+0x1b5/0x1cf
 [<ffffffffa00028fa>] ? le_scan_work+0x11d/0x11d [bluetooth]
 [<ffffffff81036fb6>] ? spin_lock_irq+0x9/0xb
 [<ffffffff81039209>] worker_thread+0xcf/0x175
 [<ffffffff8103913a>] ? rescuer_thread+0x175/0x175
 [<ffffffff8103cfe0>] kthread+0x95/0x9d
 [<ffffffff812c5054>] kernel_threadi_helper+0x4/0x10
 [<ffffffff812c36b0>] ? retint_restore_args+0x13/0x13
 [<ffffffff8103cf4b>] ? flush_kthread_worker+0xdb/0xdb
 [<ffffffff812c5050>] ? gs_change+0x13/0x13

This bug can be reproduced using hctool lecc or l2test tools and
bluetoothd not running.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 01:06:23 -03:00
Vinicius Costa Gomes
e6dd548b9a Bluetooth: Add type information to the hci_connect() debug statement
Now that we have a "connect" function for each link type, we should be
able to indentify which function is going to be called.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:21 -03:00
Vinicius Costa Gomes
4cd2d98340 Bluetooth: Simplify a the connection type handling
Now that we have separate ways of doing connections for each link type,
we can do better than an "if" statement to handle each link type.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:19 -03:00
Vinicius Costa Gomes
b7d839bfff Bluetooth: Refactor SCO connection into its own function
We can do the same that we did for the other link types, for SCO
connections. The only thing that's worth noting is that as SCO
links need an ACL link, this functions uses the function that adds
an ACL link.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:17 -03:00
Vinicius Costa Gomes
db4742756a Bluetooth: Refactor ACL connection into its own function
The hci_connect() function was starting to get too complicated to be
quickly understood. We can separate the creation of a new ACL
connection into its own function.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:16 -03:00
Vinicius Costa Gomes
d04aef4ccc Bluetooth: Refactor LE connection into its own function
The code that handles LE connection is already quite separated from
the rest of the connection procedure, so we can easily put it into
its own.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:14 -03:00
Vinicius Costa Gomes
1aef866968 Bluetooth: Rename LE and ACL connection functions
These names were causing much confusion, so we rename these functions
that send HCI commands to be more similar in naming to the actual HCI
commands that will be sent.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:12 -03:00
Vinicius Costa Gomes
57f5d0d1d9 Bluetooth: Remove some functions from being exported
Some connection related functions are only used inside hci_conn.c
so no need to have them exported.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 00:53:11 -03:00
David S. Miller
7bab3ae760 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Alexey Khoroshilov provides a potential memory leak in rndis_wlan.

Bob Copeland gives us an ath5k fix for a lockdep problem.

Dan Carpenter fixes a signedness mismatch in at76c50x.

Felix Fietkau corrects a regression caused by an earlier commit that can
lead to an IRQ storm.

Lorenzo Bianconi offers a fix for a bad variable initialization in ath9k
that can cause it to improperly mark decrypted frames.

Rajkumar Manoharan fixes ath9k to prevent the btcoex time from running
when the hardware is asleep.

The remainder are Bluetooth fixes, about which Gustavo says:

	"Here goes some fixes for 3.6-rc1, there are a few fix to
	thte inquiry code by Ram Malovany, support for 2 new devices,
	and few others fixes for NULL dereference, possible deadlock
	and a memory leak."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 17:03:22 -07:00
Ben Hutchings
4acd4945cd ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock
Cong Wang reports that lockdep detected suspicious RCU usage while
enabling IPV6 forwarding:

 [ 1123.310275] ===============================
 [ 1123.442202] [ INFO: suspicious RCU usage. ]
 [ 1123.558207] 3.6.0-rc1+ #109 Not tainted
 [ 1123.665204] -------------------------------
 [ 1123.768254] include/linux/rcupdate.h:430 Illegal context switch in RCU read-side critical section!
 [ 1123.992320]
 [ 1123.992320] other info that might help us debug this:
 [ 1123.992320]
 [ 1124.307382]
 [ 1124.307382] rcu_scheduler_active = 1, debug_locks = 0
 [ 1124.522220] 2 locks held by sysctl/5710:
 [ 1124.648364]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81768498>] rtnl_trylock+0x15/0x17
 [ 1124.882211]  #1:  (rcu_read_lock){.+.+.+}, at: [<ffffffff81871df8>] rcu_lock_acquire+0x0/0x29
 [ 1125.085209]
 [ 1125.085209] stack backtrace:
 [ 1125.332213] Pid: 5710, comm: sysctl Not tainted 3.6.0-rc1+ #109
 [ 1125.441291] Call Trace:
 [ 1125.545281]  [<ffffffff8109d915>] lockdep_rcu_suspicious+0x109/0x112
 [ 1125.667212]  [<ffffffff8107c240>] rcu_preempt_sleep_check+0x45/0x47
 [ 1125.781838]  [<ffffffff8107c260>] __might_sleep+0x1e/0x19b
[...]
 [ 1127.445223]  [<ffffffff81757ac5>] call_netdevice_notifiers+0x4a/0x4f
[...]
 [ 1127.772188]  [<ffffffff8175e125>] dev_disable_lro+0x32/0x6b
 [ 1127.885174]  [<ffffffff81872d26>] dev_forward_change+0x30/0xcb
 [ 1128.013214]  [<ffffffff818738c4>] addrconf_forward_change+0x85/0xc5
[...]

addrconf_forward_change() uses RCU iteration over the netdev list,
which is unnecessary since it already holds the RTNL lock.  We also
cannot reasonably require netdevice notifier functions not to sleep.

Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 17:02:12 -07:00
Pavel Emelyanov
eea68e2f1a packet: Report socket mclist info via diag module
The info is reported as an array of packet_diag_mclist structures. Each
includes not only the directly configured values (index, type, etc), but
also the "count".

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
8a360be0c5 packet: Report more packet sk info via diag module
This reports in one rtattr message all the other scalar values, that can be
set on a packet socket with setsockopt.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
96ec632714 packet: Diag core and basic socket info dumping
The diag module can be built independently from the af_packet.ko one,
just like it's done in unix sockets.

The core dumping message carries the info available at socket creation
time, i.e. family, type and protocol (in the same byte order as shown in
the proc file).

The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
2787b04b6c packet: Introduce net/packet/internal.h header
The diag module will need to access some private packet_sock data, so
move it to a header in advance. This file will be shared between the
af_packet.c and the diag.c

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Ben Hutchings
aadf31de16 llc: Fix races between llc2 handler use and (un)registration
When registering the handlers, any state they rely on must be
completely initialised first.  When unregistering, we must wait until
they are definitely no longer running.  llc_rcv() must also avoid
reading the handler pointers again after checking for NULL.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:52:02 -07:00
Ben Hutchings
f4f8720feb llc2: Call llc_station_exit() on llc2_init() failure path
Otherwise the station packet handler will remain registered even though
the module is unloaded.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:51:40 -07:00
Ben Hutchings
6024935f5f llc2: Fix silent failure of llc_station_init()
llc_station_init() creates and processes an event skb with no effect
other than to change the state from DOWN to UP.  Allocation failure is
reported, but then ignored by its caller, llc2_init().  Remove this
possibility by simply initialising the state as UP.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:51:18 -07:00
Igor Maravic
ad5b310228 net: ipv4: fib_trie: Don't unnecessarily search for already found fib leaf
We've already found leaf, don't search for it again. Same is for fib leaf info.

Signed-off-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 15:02:20 -07:00
Priyanka Jain
418a99ac6a Replace rwlock on xfrm_policy_afinfo with rcu
xfrm_policy_afinfo is read mosly data structure.
Write on xfrm_policy_afinfo is done only at the
time of configuration.
So rwlocks can be safely replaced with RCU.

RCUs usage optimizes the performance.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:57:37 -07:00
Igor Maravic
4855d6f311 net: ipv6: proc: Fix error handling
Fix error handling in case making of dir dev_snmp6 failes

Signed-off-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:45:07 -07:00
Yan, Zheng
7bd86cc282 ipv4: Cache local output routes
Commit caacf05e5a causes big drop of UDP loop back performance.
The cause of the regression is that we do not cache the local output
routes. Each time we send a datagram from unconnected UDP socket,
the kernel allocates a dst_entry and adds it to the rt_uncached_list.
It creates lock contention on the rt_uncached_lock.

Reported-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:45:07 -07:00