When unloading a protocol module nf_ct_iterate_cleanup() is used to
remove all conntracks using the protocol from the bysource hash and
clean their NAT sections. Since the conntrack isn't actually killed,
the NAT callback is invoked twice, once for each direction, which
causes an oops when trying to delete it from the bysource hash for
the second time.
The same oops can also happen when removing both an L3 and L4 protocol
since the cleanup function doesn't check whether the conntrack has
already been cleaned up.
Pid: 4052, comm: modprobe Not tainted 3.6.0-rc3-test-nat-unload-fix+ #32 Red Hat KVM
RIP: 0010:[<ffffffffa002c303>] [<ffffffffa002c303>] nf_nat_proto_clean+0x73/0xd0 [nf_nat]
RSP: 0018:ffff88007808fe18 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800728550c0 RCX: ffff8800756288b0
RDX: dead000000200200 RSI: ffff88007808fe88 RDI: ffffffffa002f208
RBP: ffff88007808fe28 R08: ffff88007808e000 R09: 0000000000000000
R10: dead000000200200 R11: dead000000100100 R12: ffffffff81c6dc00
R13: ffff8800787582b8 R14: ffff880078758278 R15: ffff88007808fe88
FS: 00007f515985d700(0000) GS:ffff88007cd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f515986a000 CR3: 000000007867a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 4052, threadinfo ffff88007808e000, task ffff8800756288b0)
Stack:
ffff88007808fe68 ffffffffa002c290 ffff88007808fe78 ffffffff815614e3
ffffffff00000000 00000aeb00000246 ffff88007808fe68 ffffffff81c6dc00
ffff88007808fe88 ffffffffa00358a0 0000000000000000 000000000040f5b0
Call Trace:
[<ffffffffa002c290>] ? nf_nat_net_exit+0x50/0x50 [nf_nat]
[<ffffffff815614e3>] nf_ct_iterate_cleanup+0xc3/0x170
[<ffffffffa002c55a>] nf_nat_l3proto_unregister+0x8a/0x100 [nf_nat]
[<ffffffff812a0303>] ? compat_prepare_timeout+0x13/0xb0
[<ffffffffa0035848>] nf_nat_l3proto_ipv4_exit+0x10/0x23 [nf_nat_ipv4]
...
To fix this,
- check whether the conntrack has already been cleaned up in
nf_nat_proto_clean
- change nf_ct_iterate_cleanup() to only invoke the callback function
once for each conntrack (IP_CT_DIR_ORIGINAL).
The second change doesn't affect other callers since when conntracks are
actually killed, both directions are removed from the hash immediately
and the callback is already only invoked once. If it is not killed, the
second callback invocation will always return the same decision not to
kill it.
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
net/netfilter/nfnetlink_log.c
net/netfilter/xt_LOG.c
Rather easy conflict resolution, the 'net' tree had bug fixes to make
sure we checked if a socket is a time-wait one or not and elide the
logging code if so.
Whereas on the 'net-next' side we are calculating the UID and GID from
the creds using different interfaces due to the user namespace changes
from Eric Biederman.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains four Netfilter updates, mostly targeting
to fix issues added with IPv6 NAT, and one little IPVS update for net-next:
* Remove unneeded conditional free of skb in nfnetlink_queue, from
Wei Yongjun.
* One semantic path from coccinelle detected the use of list_del +
INIT_LIST_HEAD, instead of list_del_init, again from Wei Yongjun.
* Fix out-of-bound memory access in the NAT address selection, from
Florian Westphal. This was introduced with the IPv6 NAT patches.
* Two fixes for crashes that were introduced in the recently merged
IPv6 NAT support, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
(c7232c9 netfilter: add protocol independent NAT core) added
incorrect locking for the module auto-load case in ctnetlink_parse_nat.
That function is always called from ctnetlink_create_conntrack which
requires no locking.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
auto75914331@hushmail.com reports that iptables does not correctly
output the KERN_<level>.
$IPTABLES -A RULE_0_in -j LOG --log-level notice --log-prefix "DENY in: "
result with linux 3.6-rc5
Sep 12 06:37:29 xxxxx kernel: <5>DENY in: IN=eth0 OUT= MAC=.......
result with linux 3.5.3 and older:
Sep 9 10:43:01 xxxxx kernel: DENY in: IN=eth0 OUT= MAC......
commit 04d2c8c83d
("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern")
updated the syslog header style but did not update netfilter uses.
Do so.
Use KERN_SOH and string concatenation instead of "%c" KERN_SOH_ASCII
as suggested by Eric Dumazet.
Signed-off-by: Joe Perches <joe@perches.com>
cc: auto75914331@hushmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using list_del_init() instead of list_del() + INIT_LIST_HEAD().
spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We spare nothing by not validating the sequence number of dataless
ACK packets and enabling it makes harder off-path attacks.
See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Clients should not send such packets. By accepting them, we open
up a hole by wich ephemeral ports can be discovered in an off-path
attack.
See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
include/linux/jhash.h:138:16: warning: array subscript is above array bounds
[jhash2() expects the number of u32 in the key]
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).
Suggested by David S. Miller.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sami Farin reported crashes in xt_LOG because it assumes skb->sk is a
full blown socket.
Since (41063e9 ipv4: Early TCP socket demux), we can have skb->sk
pointing to a timewait socket.
Same fix is needed in nfnetlink_log.
Diagnosed-by: Florian Westphal <fw@strlen.de>
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(c7232c9 netfilter: add protocol independent NAT core) introduced a
problem that leads to crashing during boot due to NULL pointer
dereference. It seems that xt_nat calls xt_register_target() before
xt_init():
net/netfilter/x_tables.c:static struct xt_af *xt; is NULL and we crash on
xt_register_target(struct xt_target *target)
{
u_int8_t af = target->family;
int ret;
ret = mutex_lock_interruptible(&xt[af].mutex);
...
Fix this by changing the linking order, to make sure that x_tables
comes before xt_nat.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_queue() as a
parameter can not benefit us any more.
This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_queue() and __nf_queue() to save code.
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_iterate() as a
parameter can not benefit us any more.
This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_iterate() to save code.
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It was scheduled to be removed for a long time.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds xt_ct_set_helper and xt_ct_set_timeout to reduce
the size of xt_ct_tg_check.
This aims to improve code mantainability by splitting xt_ct_tg_check
in smaller chunks.
Suggested by Eric Dumazet.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch fixes compilation warnings in xt_socket with gcc-4.7.
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:19: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_match.isra.4’:
include/net/netfilter/nf_tproxy_core.h:75:2: warning: ‘protocol’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:113:5: note: ‘protocol’ was declared here
In file included from include/net/tcp.h:37:0,
from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:45: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:106:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:9: note: ‘dport’ was declared here
In file included from include/net/tcp.h:37:0,
from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:16: note: ‘saddr’ was declared here
In file included from include/net/tcp.h:37:0,
from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:9: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:19: note: ‘daddr’ was declared here
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.
This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.
Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Initialize return variable before exiting on an error path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Initialize return variable before exiting on an error path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Initialize return variable before exiting on an error path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.
Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.
This patch improves the situation in the following way:
- If a reassembled packet belongs to a connection that has a helper
assigned, the reassembled packet is passed through the stack instead
of the original fragments.
- During defragmentation, the largest received fragment size is stored.
On output, the packet is refragmented if required. If the largest
received fragment size exceeds the outgoing MTU, a "packet too big"
message is generated, thus behaving as if the original fragments
were passed through the stack from an outside point of view.
- The ipv6_helper() hook function can't receive fragments anymore for
connections using a helper, so it is switched to use ipv6_skip_exthdr()
instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
reassembled packets are passed to connection tracking helpers.
The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.
This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.
IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().
The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len. And the 'mtu' variable have been adjusted before
calling helper.
Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().
This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver. This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:
* Remove unnecessary RTNL locking now that we have support
for namespace in nf_conntrack, from Patrick McHardy.
* Cleanup to eliminate unnecessary goto in the initialization
path of several Netfilter tables, from Jean Sacren.
* Another cleanup from Wu Fengguang, this time to PTR_RET instead
of if IS_ERR then return PTR_ERR.
* Use list_for_each_entry_continue_rcu in nf_iterate, from
Michael Wang.
* Add pmtu_disc sysctl option to disable PMTU in their tunneling
transmitter, from Julian Anastasov.
* Generalize application protocol registration in IPVS and modify
IPVS FTP helper to use it, from Julian Anastasov.
* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
helper for NAT support, from Julian Anastasov.
* Add logic to update PMTU for IPIP packets in IPVS, again
from Julian Anastasov.
* A couple of sparse warning fixes for IPVS and Netfilter from
Claudiu Ghioc and Patrick McHardy respectively.
Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to allow removing
list_for_each_continue_rcu().
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Locking the rtnl was added to nf_conntrack_l{3,4}_proto_unregister()
for walking the network namespace list. This is not done anymore since
we have proper namespace support in the protocols now, so we don't
need to take the RTNL anymore.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 1db20a52 (nfnetlink_log: Stop using NLA_PUT*().) incorrectly
converted a NLA_PUT_BE16 macro to nla_put_be32() in nfnetlink_log:
- NLA_PUT_BE16(inst->skb, NFULA_HWTYPE, htons(skb->dev->type));
+ if (nla_put_be32(inst->skb, NFULA_HWTYPE, htons(skb->dev->type))
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pable Neira Ayuso says:
====================
The following five patches contain fixes for 3.6-rc, they are:
* Two fixes for message parsing in the SIP conntrack helper, from
Patrick McHardy.
* One fix for the SIP helper introduced in the user-space cthelper
infrastructure, from Patrick McHardy.
* fix missing appropriate locking while modifying one conntrack entry
from the nfqueue integration code, from myself.
* fix possible access to uninitiliazed timer in the nf_conntrack
expectation infrastructure, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In __nf_ct_expect_check, the function refresh_timer returns 1
if a matching expectation is found and its timer is successfully
refreshed. This results in nf_ct_expect_related returning 0.
Note that at this point:
- the passed expectation is not inserted in the expectation table
and its timer was not initialized, since we have refreshed one
matching/existing expectation.
- nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
timer is in some undefined state just after the allocation,
until it is appropriately initialized.
This can be a problem for the SIP helper during the expectation
addition:
...
if (nf_ct_expect_related(rtp_exp) == 0) {
if (nf_ct_expect_related(rtcp_exp) != 0)
nf_ct_unexpect_related(rtp_exp);
...
Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:
spin_lock_bh(&nf_conntrack_lock);
if (del_timer(&exp->timeout)) {
nf_ct_unlink_expect(exp);
nf_ct_expect_put(exp);
}
spin_unlock_bh(&nf_conntrack_lock);
Note that del_timer always returns false if the timer has been
initialized. However, the timer was not initialized since setup_timer
was not called, therefore, the expectation timer remains in some
undefined state. If I'm not missing anything, this may lead to the
removal an unexistent expectation.
To fix this, the optimization that allows refreshing an expectation
is removed. Now nf_conntrack_expect_related looks more consistent
to me since it always add the expectation in case that it returns
success.
Thanks to Patrick McHardy for participating in the discussion of
this patch.
I think this may be the source of the problem described by:
http://marc.info/?l=netfilter-devel&m=134073514719421&w=2
Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Only allow adding matches from the initial user namespace
- Add the appropriate conversion functions to handle matches
against sockets in other user namespaces.
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
xt_recent creates a bunch of proc files and initializes their uid
and gids to the values of ip_list_uid and ip_list_gid. When
initialize those proc files convert those values to kuids so they
can continue to reside on the /proc inode.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
xt_LOG always writes messages via sb_add via printk. Therefore when
xt_LOG logs the uid and gid of a socket a packet came from the
values should be converted to be in the initial user namespace.
Thus making xt_LOG as user namespace safe as possible.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
At logging instance creation capture the peer netlink socket's user
namespace. Use the captured peer user namespace when reporting socket
uids to the peer.
The peer socket's user namespace is guaranateed to be valid until the user
closes the netlink socket. nfnetlink_log removes instances during the final
close of a socket. __build_packet_message does not get called after an
instance is destroyed. Therefore it is safe to let the peer netlink socket
take care of the user namespace reference counting for us.
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>