This patch fixes UDP socket refcnt bugs in the pppol2tp driver.
A bug can cause a kernel stack trace when a tunnel socket is closed.
A way to reproduce the issue is to prepare the UDP socket for L2TP (by
opening a tunnel pppol2tp socket) and then close it before any L2TP
sessions are added to it. The sequence is
Create UDP socket
Create tunnel pppol2tp socket to prepare UDP socket for L2TP
pppol2tp_connect: session_id=0, peer_session_id=0
L2TP SCCRP control frame received (tunnel_id==0)
pppol2tp_recv_core: sock_hold()
pppol2tp_recv_core: sock_put
L2TP ZLB control frame received (tunnel_id=nnn)
pppol2tp_recv_core: sock_hold()
pppol2tp_recv_core: sock_put
Close tunnel management socket
pppol2tp_release: session_id=0, peer_session_id=0
Close UDP socket
udp_lib_close: BUG
The addition of sock_hold() in pppol2tp_connect() solves the problem.
For data frames, two sock_put() calls were added to plug a refcnt leak
per received data frame. The ref that is grabbed at the top of
pppol2tp_recv_core() must always be released, but this wasn't done for
accepted data frames or data frames discarded because of bad UDP
checksums. This leak meant that any UDP socket that had passed L2TP
data traffic (i.e. L2TP data frames, not just L2TP control frames)
using pppol2tp would not be released by the kernel.
WARNING: at include/net/sock.h:435 udp_lib_unhash+0x117/0x120()
Pid: 1086, comm: openl2tpd Not tainted 2.6.33-rc1 #8
Call Trace:
[<c119e9b7>] ? udp_lib_unhash+0x117/0x120
[<c101b871>] ? warn_slowpath_common+0x71/0xd0
[<c119e9b7>] ? udp_lib_unhash+0x117/0x120
[<c101b8e3>] ? warn_slowpath_null+0x13/0x20
[<c119e9b7>] ? udp_lib_unhash+0x117/0x120
[<c11598a7>] ? sk_common_release+0x17/0x90
[<c11a5e33>] ? inet_release+0x33/0x60
[<c11577b0>] ? sock_release+0x10/0x60
[<c115780f>] ? sock_close+0xf/0x30
[<c106e542>] ? __fput+0x52/0x150
[<c106b68e>] ? filp_close+0x3e/0x70
[<c101d2e2>] ? put_files_struct+0x62/0xb0
[<c101eaf7>] ? do_exit+0x5e7/0x650
[<c1081623>] ? mntput_no_expire+0x13/0x70
[<c106b68e>] ? filp_close+0x3e/0x70
[<c101eb8a>] ? do_group_exit+0x2a/0x70
[<c101ebe1>] ? sys_exit_group+0x11/0x20
[<c10029b0>] ? sysenter_do_call+0x12/0x26
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take advantage of the new pernet automatic storage management,
and stop using compatibility network namespace functions.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this file, function names are otherwise used as pointers without &.
A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
identifier f;
@@
f(...) { ... }
@@
identifier r.f;
@@
- &f
+ f
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.
Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)
This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.
Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
Failure to call unregister_pernet_gen_device() can exhaust memory
if module is loaded/unloaded many times.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this sparse warnings:
drivers/net/ppp_generic.c:919:43: warning: incorrect type in argument 1 (different signedness)
drivers/net/pppoe.c:1195:43: warning: incorrect type in argument 1 (different signedness)
drivers/net/pppol2tp.c:2666:43: warning: incorrect type in argument 1 (different signedness)
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPP is modular code so no initdata on netns hooks.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PDE_NET usage in driver code is a sign and, indeed, switching
to seq_open_net/seq_release_net saves code and fixes bogus things, like
user triggerabble BUG_ON(!net) after maybe_get_net, and NULLifying ->private.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In PPPo[E|L2TP] we could explicitly point which net namespace
we're going to use for channels - make it so.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Each tunnel and appropriate lock are inside own namespace now.
- pppox code allows to create per-namespace sockets for
both PX_PROTO_OE and PX_PROTO_OL2TP protocols. Actually since
now pppox_create support net-namespaces new PPPo... protocols
(if they ever will be) should support net-namespace too otherwise
explicit check for &init_net would be needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pppol2tp driver has had broken UDP checksum code for a long
time. This patch fixes it. If UDP checksums are enabled in the
tunnel's UDP socket, the L2TP driver now properly validates the
checksum on receive and fills in the checksum on transmit. If the
network device has hardware checksum support and is enabled, it is
used instead of generating/checking the checksum in software.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pppol2tp_sock_to_session() do sock_hold() if the session to release is
not NULL.
Signed-off-by: Frédéric Moulins <frederic.moulins@alsatis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This minor cleanup simplifies later changes which will convert
struct sk_buff and friends over to using struct list_head.
Signed-off-by: David S. Miller <davem@davemloft.net>
The drivers below do not use LINUX_VERSION_CODE nor KERNEL_VERSION.
drivers/net/acenic.c
drivers/net/bnx2x_link.c
drivers/net/bnx2x_main.c
drivers/net/cpmac.c
drivers/net/gianfar_sysfs.c
drivers/net/ipg.h
drivers/net/ppp_mppe.c
drivers/net/pppol2tp.c
drivers/net/r6040.c
drivers/net/sh_eth.c
drivers/net/sky2.c
drivers/net/tehuti.h
drivers/net/typhoon.c
This patch removes the said #include <linux/version.h>.
Signed-off-by: Huang Weiyi <hwy@cn.fujitsu.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch fixes a potential memory corruption in
pppol2tp_recvmsg(). If skb->len is bigger than the caller's buffer
length, memcpy_toiovec() will go into unintialized data on the kernel
heap, interpret it as an iovec and start modifying memory.
The fix is to change the memcpy_toiovec() call to
skb_copy_datagram_iovec() so that paged packets (rare for PPPOL2TP)
are handled properly. Also check that the caller's buffer is big
enough for the data and set the MSG_TRUNC flag if it is not so.
Reported-by: Ilja <ilja@netric.org>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some problems have been experienced in the field which cause an oops
in the pppol2tp driver if L2TP tunnels fail while passing data.
The pppol2tp driver uses private data that is referenced via the
sk->sk_user_data of its UDP and PPPoL2TP sockets. This patch makes
sure that the driver uses sock_hold() when it holds a reference to the
sk pointer. This affects its sendmsg(), recvmsg(), getname(),
[gs]etsockopt() and ioctl() handlers.
Tested by ISP where problem was seen. System has been up 10 days with
no oops since running this patch. Without the patch, an oops would
occur every 1-2 days.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an L2TP daemon closes a tunnel socket while packets are queued in
the tunnel's reorder queue, a kernel warning is logged because the
socket is closed while skbs are still referencing it. The fix is to
purge the queue in the socket's release handler.
WARNING: at include/net/sock.h:351 udp_lib_unhash+0x41/0x68()
Pid: 12998, comm: openl2tpd Not tainted 2.6.25 #8
[<c0423c58>] warn_on_slowpath+0x41/0x51
[<c05d33a7>] udp_lib_unhash+0x41/0x68
[<c059424d>] sk_common_release+0x23/0x90
[<c05d16be>] udp_lib_close+0x8/0xa
[<c05d8684>] inet_release+0x42/0x48
[<c0592599>] sock_release+0x14/0x60
[<c059299f>] sock_close+0x29/0x30
[<c046ef52>] __fput+0xad/0x15b
[<c046f1d9>] fput+0x17/0x19
[<c046c8c4>] filp_close+0x50/0x5a
[<c046da06>] sys_close+0x69/0x9f
[<c04048ce>] syscall_call+0x7/0xb
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A user reported seeing occasional bugs such as the following when
using the L2TP driver.
SKB BUG: Invalid truesize (272) len=72, sizeof(sk_buff)=208
When L2TP adds its header in the transmit path, it might need to
increase the headroom of the skb. In some cases, the increased
headroom trips a kernel bug when the skb is freed because the skb has
grown beyond its truesize value. The fix is to increase the truesize
by the amount of headroom added, after orphaning the skb.
While here, fix a misleading comment.
Thanks to Iouri Kharon <bc-info@styx.cabel.net> for the initial
report and testing the fix.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If session is NULL, it is not possible to access its name field. So I
have split apart the printing of the error message to drop the
printing of the name field in this case.
The macro PRINTK actually only evaluates its arguments starting with
the third one if the bitwise conjunction of the first two is non-zero.
Normally, this conjunction would only be non-zero if debugging mode
were turned on, but when session is NULL, the first argument in both
the old and new code is -1, and thus the bitwise conjunction is true.
Perhaps a different strategy is desired, such as using tunnel->debug,
which session->debug is initialized to, but tunnel can also be NULL,
so this does not completely solve the problem.
This problem was found using the following semantic match
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E, E1;
identifier f;
statement S1,S2,S3;
@@
* if (E == NULL)
{
... when != if (E == NULL) S1 else S2
when != E = E1
* E->f
... when any
return ...;
}
else S3
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When walking a session's packet reorder queue, use
skb_queue_walk_safe() since the list could be modified inside the
loop.
Rearrange the unlinking skbs from the reorder queue such that it is
done while the queue lock is held in pppol2tp_recv_dequeue() when
walking the skb list.
A version of this patch was suggested by Jarek Poplawski.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix locking issues in the pppol2tp driver which can cause a kernel
crash on SMP boxes. There were two problems:-
1. The driver was violating read_lock() and write_lock() scheduling
rules because it wasn't using softirq-safe locks in softirq
contexts. So we now consistently use the _bh variants of the lock
functions.
2. The driver was calling sk_dst_get() in pppol2tp_xmit() which was
taking sk_dst_lock in softirq context. We now call __sk_dst_get().
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every skb removed from session->reorder_q needs sock_put().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every skb removed from session->reorder_q needs sock_put().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/pppol2tp.c: In function `pppol2tp_seq_tunnel_show':
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 4)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 5)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 6)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 7)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 8)
drivers/net/pppol2tp.c:2295: warning: long long unsigned int format, __u64 arg (arg 9)
drivers/net/pppol2tp.c: In function `pppol2tp_seq_session_show':
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 5)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 6)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 7)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 8)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 9)
drivers/net/pppol2tp.c:2328: warning: long long unsigned int format, __u64 arg (arg 10)
Not all platforms implement u64 with unsigned long long. eg: powerpc.
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_PROC_FS is not set and CONFIG_PPPOL2TP is set,
we have the following warning in build:
drivers/net/pppol2tp.c: In function 'pppol2tp_init':
drivers/net/pppol2tp.c:2472: warning: label
'out_unregister_pppox_proto' defined but not used
This patches fixes this warning by adding appropriate #ifdef.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes made on 18-sep to fix skb handling in the pppol2tp driver
broke the transmit and receive paths. Users are only running into this
now because distros are now using 2.6.23 and I must have messed up
when I tested the change.
For receive, we now do our own calculation of how much to pull from
the skb (variable length L2TP header) rather than using
skb_transport_offset(). Also, if the skb isn't a data packet, it must
be passed back to UDP with skb->data pointing to the UDP header.
For transmit, make sure skb->sk is set up because ip_queue_xmit()
needs it.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.
Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.
This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope
this particular split helped.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting. By
virtue of this all socket create methods are touched. In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.
Failing if we attempt to create a socket outside of the default
network namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.
Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.
[ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.
Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes pppol2tp_xmit call skb_cow_head so that we don't modify
cloned skb data. It also gets rid of skb2 we only need to preserve the
original skb for congestion notification, which is only applicable for
ppp_async and ppp_sync.
The other semantic change made here is the removal of socket accounting
for data tranmitted out of pppol2tp_xmit. The original code leaked any
existing socket skb accounting. We could fix this by dropping the
original skb owner. However, this is undesirable as the packet has not
physically left the host yet.
In fact, all other tunnels in the kernel do not account skb's passing
through to their own socket. In partciular, ESP over UDP does not do
so and it is the closest tunnel type to PPPoL2TP. So this patch simply
removes the socket accounting in pppol2tp_xmit. The accounting still
applies to control packets of course.
I've also added a reminder that the outgoing checksum here doesn't work.
I suppose existing deployments don't actually enable checksums.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function pppol2tp_recv_core doesn't handle non-linear packets properly.
It also fails to check the remote offset field.
This patch fixes these problems. It also removes an unnecessary check on
the UDP header which has already been performed by the UDP layer.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the addition of UDP-Lite we need to refine the socket check so
that only genuine UDP sockets are allowed through.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Address of auto variable is not a userland pointer. A good thing, too,
since if pppol2tp_tunnel_getsockopt() would _really_ get a userland pointer
as argument, it would be an instant roothole...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reset netfilter data and IP CB, fix dst_entry leak.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't use skb->len after passing it to ip_queue_xmit.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver handles only L2TP data frames; control frames are handled
by a userspace application. It implements L2TP using the PPPoX socket
family. There is a PPPoX socket for each L2TP session in an L2TP
tunnel. PPP data within each session is passed through the kernel's
PPP subsystem via this driver. Kernel parameters of each socket can be
read or modified using ioctl() or [gs]etsockopt() calls.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>