Commit Graph

2436 Commits

Author SHA1 Message Date
Patrick McHardy
1d49144c0a netfilter: nf_tables: add "inet" table for IPv4/IPv6
This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:25 +01:00
Patrick McHardy
115a60b173 netfilter: nf_tables: add support for multi family tables
Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2014-01-07 23:55:46 +01:00
Patrick McHardy
c9484874e7 netfilter: nf_tables: add hook ops to struct nft_pktinfo
Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Patrick McHardy
3b088c4bc0 netfilter: nf_tables: make chain types override the default AF functions
Currently the AF-specific hook functions override the chain-type specific
hook functions. That doesn't make too much sense since the chain types
are a special case of the AF-specific hooks.

Make the AF-specific hook functions the default and make the optional
chain type hooks override them.

As a side effect, the necessary code restructuring reduces the code size,
f.i. in case of nf_tables_ipv4.o:

  nf_tables_ipv4_init_net   |  -24
  nft_do_chain_ipv4         | -113
 2 functions changed, 137 bytes removed, diff: -137

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Pablo Neira Ayuso
688d18636f netfilter: nft_reject: fix compilation warning if NF_TABLES_IPV6 is disabled
net/netfilter/nft_reject.c: In function 'nft_reject_eval':
net/netfilter/nft_reject.c:37:14: warning: unused variable 'net' [-Wunused-variable]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
David S. Miller
39b6b2992f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
[GIT net-next] Open vSwitch

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 19:48:38 -05:00
Thomas Graf
af2806f8f9 net: Export skb_zerocopy() to zerocopy from one skb to another
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:42 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
David S. Miller
83111e7fe8 netfilter: Fix build failure in nfnetlink_queue_core.c.
net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1

Just a missing include of net/tcp_states.h

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:36:06 -05:00
David S. Miller
9aa28f2b71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>

====================
nftables updates for net-next

The following patchset contains nftables updates for your net-next tree,
they are:

* Add set operation to the meta expression by means of the select_ops()
  infrastructure, this allows us to set the packet mark among other things.
  From Arturo Borrero Gonzalez.

* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
  Borkmann.

* Add new queue expression to nf_tables. These comes with two previous patches
  to prepare this new feature, one to add mask in nf_tables_core to
  evaluate the queue verdict appropriately and another to refactor common
  code with xt_NFQUEUE, from Eric Leblond.

* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
  Eric Leblond.

* Add the reject expression to nf_tables, this adds the missing TCP RST
  support. It comes with an initial patch to refactor common code with
  xt_NFQUEUE, again from Eric Leblond.

* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
  Nazarewicz.

* Remove the nft_meta_target code, now that Arturo added the set operation
  to the meta expression, from me.

* Add help information for nf_tables to Kconfig, also from me.

* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
  available to other nf_tables objects, requested by Arturo, from me.

* Expose the table usage counter, so we can know how many chains are using
  this table without dumping the list of chains, from Tomasz Bursztyka.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:29:30 -05:00
David S. Miller
855404efae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:18:50 -05:00
Pablo Neira Ayuso
c9c8e48597 netfilter: nf_tables: dump sets in all existing families
This patch allows you to dump all sets available in all of
the registered families. This allows you to use NFPROTO_UNSPEC
to dump all existing sets, similarly to other existing table,
chain and rule operations.

This patch is based on original patch from Arturo Borrero
González.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-04 00:23:11 +01:00
Daniel Borkmann
82a37132f3 netfilter: x_tables: lightweight process control group matching
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.

Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.

As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.

In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.

One possible, minimal usage example (many other iptables options
can be applied obviously):

 1) Configuring cgroups if not already done, e.g.:

  mkdir /sys/fs/cgroup/net_cls
  mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
  mkdir /sys/fs/cgroup/net_cls/0
  echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
  (resp. a real flow handle id for tc)

 2) Configuring netfilter (iptables-nftables), e.g.:

  iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

 3) Running applications, e.g.:

  ping 208.67.222.222  <pid:1799>
  echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
  [...]
  ping 208.67.220.220  <pid:1804>
  ping: sendmsg: Operation not permitted
  [...]
  echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
  [...]

Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.

  [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:44 +01:00
Eric Leblond
14abfa161d netfilter: xt_CT: fix error value in xt_ct_tg_check()
If setting event mask fails then we were returning 0 for success.
This patch updates return code to -EINVAL in case of problem.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:39 +01:00
stephen hemminger
dcd93ed4cd netfilter: nf_conntrack: remove dead code
The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:37 +01:00
stephen hemminger
02eca9d2cc netfilter: ipset: remove unused code
Function never used in current upstream code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:35 +01:00
Daniel Borkmann
34ce324019 netfilter: nf_nat: add full port randomization support
We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.

SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.

So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.

More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:

  The best defense against the poisoning attacks is to properly
  deploy and validate DNSSEC; DNSSEC provides security not only
  against off-path attacker but even against MitM attacker. We hope
  that our results will help motivate administrators to adopt DNSSEC.
  However, full DNSSEC deployment make take significant time, and
  until that happens, we recommend short-term, non-cryptographic
  defenses. We recommend to support full port randomisation,
  according to practices recommended in [2], and to avoid
  per-destination sequential port allocation, which we show may be
  vulnerable to derandomisation attacks.

Joint work between Hannes Frederic Sowa and Daniel Borkmann.

 [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
 [2] http://arxiv.org/pdf/1205.5190v1.pdf

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:26 +01:00
Michal Nazarewicz
720e0dfa3a netfilter: nf_tables: remove unused variable in nf_tables_dump_set()
The nfmsg variable is not used (except in sizeof operator which does
not care about its value) between the first and second time it is
assigned the value.  Furthermore, nlmsg_data has no side effects, so
the assignment can be safely removed.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 22:51:14 +01:00
Daniel Borkmann
1466291790 netfilter: nf_tables: fix type in parsing in nf_tables_set_alloc_name()
In nf_tables_set_alloc_name(), we are trying to find a new, unused
name for our new set and interate through the list of present sets.
As far as I can see, we're using format string %d to parse already
present names in order to mark their presence in a bitmap, so that
we can later on find the first 0 in that map to assign the new set
name to. We should rather use a temporary variable of type int to
store the result of sscanf() to, and for making sanity checks on.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 22:50:59 +01:00
Pablo Neira Ayuso
d497c63527 netfilter: add help information to new nf_tables Kconfig options
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-01 18:37:10 +01:00
Eric Leblond
bee11dc78f netfilter: nft_reject: support for IPv6 and TCP reset
This patch moves nft_reject_ipv4 to nft_reject and adds support
for IPv6 protocol. This patch uses functions included in nf_reject.h
to implement reject by TCP reset.

The code has to be build as a module if NF_TABLES_IPV6 is also a
module to avoid compilation error due to usage of IPv6 functions.
This has been done in Kconfig by using the construct:

 depends on NF_TABLES_IPV6 || !NF_TABLES_IPV6

This seems a bit weird in terms of syntax but works perfectly.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-30 18:15:38 +01:00
Eric Leblond
5f291c2869 netfilter: select NFNETLINK when enabling NF_TABLES
In Kconfig, nf_tables depends on NFNETLINK so building nf_tables as
a module or inside kernel depends on the state of NFNETLINK inside
the kernel config. If someone wants to build nf_tables inside the
kernel, it is necessary to also build NFNETLINK inside the kernel.
But NFNETLINK can not be set in the menu so it is necessary to
toggle other nfnetlink subsystems such as logging and nfacct to see
the nf_tables switch.

This patch changes the dependency from 'depend' to 'select' inside
Kconfig to allow to set the build of nftables as modules or inside
kernel independently.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-29 12:08:29 +01:00
Pablo Neira Ayuso
2ee0d3c80f netfilter: nf_tables: fix wrong datatype in nft_validate_data_load()
This patch fixes dictionary mappings, eg.

 add rule ip filter input meta dnat set tcp dport map { 22 => 1.1.1.1, 23 => 2.2.2.2 }

The kernel was returning -EINVAL in nft_validate_data_load() since
the type of the set element data that is passed was the real userspace
datatype instead of NFT_DATA_VALUE.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28 22:32:28 +01:00
Pablo Neira Ayuso
994737513e netfilter: nf_tables: remove nft_meta_target
In e035b77 ("netfilter: nf_tables: nft_meta module get/set ops"),
we got the meta target merged into the existing meta expression.
So let's get rid of this dead code now that we fully support that
feature.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28 14:06:25 +01:00
Arturo Borrero Gonzalez
e035b77ac7 netfilter: nf_tables: nft_meta module get/set ops
This patch adds kernel support for the meta expression in get/set
flavour. The set operation indicates that a given packet has to be
set with a property, currently one of mark, priority, nftrace.
The get op is what was currently working: evaluate the given
packet property.

In the nftrace case, the value is always 1. Such behaviour is copied
from net/netfilter/xt_TRACE.c

The NFTA_META_DREG and NFTA_META_SREG attributes are mutually
exclusives.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28 14:02:12 +01:00
Pablo Neira Ayuso
d201297561 netfilter: nf_tables: fix oops when updating table with user chains
This patch fixes a crash while trying to deactivate a table that
contains user chains. You can reproduce it via:

% nft add table table1
% nft add chain table1 chain1
% nft-table-upd ip table1 dormant

[  253.021026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[  253.021114] IP: [<ffffffff8134cebd>] nf_register_hook+0x35/0x6f
[  253.021167] PGD 30fa5067 PUD 30fa2067 PMD 0
[  253.021208] Oops: 0000 [#1] SMP
[...]
[  253.023305] Call Trace:
[  253.023331]  [<ffffffffa0885020>] nf_tables_newtable+0x11c/0x258 [nf_tables]
[  253.023385]  [<ffffffffa0878592>] nfnetlink_rcv_msg+0x1f4/0x226 [nfnetlink]
[  253.023438]  [<ffffffffa0878418>] ? nfnetlink_rcv_msg+0x7a/0x226 [nfnetlink]
[  253.023491]  [<ffffffffa087839e>] ? nfnetlink_bind+0x45/0x45 [nfnetlink]
[  253.023542]  [<ffffffff8134b47e>] netlink_rcv_skb+0x3c/0x88
[  253.023586]  [<ffffffffa0878973>] nfnetlink_rcv+0x3af/0x3e4 [nfnetlink]
[  253.023638]  [<ffffffff813fb0d4>] ? _raw_read_unlock+0x22/0x34
[  253.023683]  [<ffffffff8134af17>] netlink_unicast+0xe2/0x161
[  253.023727]  [<ffffffff8134b29a>] netlink_sendmsg+0x304/0x332
[  253.023773]  [<ffffffff8130d250>] __sock_sendmsg_nosec+0x25/0x27
[  253.023820]  [<ffffffff8130fb93>] sock_sendmsg+0x5a/0x7b
[  253.023861]  [<ffffffff8130d5d5>] ? copy_from_user+0x2a/0x2c
[  253.023905]  [<ffffffff8131066f>] ? move_addr_to_kernel+0x35/0x60
[  253.023952]  [<ffffffff813107b3>] SYSC_sendto+0x119/0x15c
[  253.023995]  [<ffffffff81401107>] ? sysret_check+0x1b/0x56
[  253.024039]  [<ffffffff8108dc30>] ? trace_hardirqs_on_caller+0x140/0x1db
[  253.024090]  [<ffffffff8120164e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  253.024141]  [<ffffffff81310caf>] SyS_sendto+0x9/0xb
[  253.026219]  [<ffffffff814010e2>] system_call_fastpath+0x16/0x1b

Reported-by: Alex Wei <alex.kern.mentor@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28 12:18:16 +01:00
Pablo Neira Ayuso
e38195bf32 netfilter: nf_tables: fix dumping with large number of sets
If not table name is specified, the dumping of the existing sets
may be incomplete with a sufficiently large number of sets and
tables. This patch fixes missing reset of the cursors after
finding the location of the last object that has been included
in the previous multi-part message.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28 12:14:42 +01:00
Jesper Dangaard Brouer
b25adce160 ipvs: correct usage/allocation of seqadj ext in ipvs
The IPVS FTP helper ip_vs_ftp could trigger an OOPS in nf_ct_seqadj_set,
after commit 41d73ec053 (netfilter: nf_conntrack: make sequence number
adjustments usuable without NAT).

This is because, the seqadj ext is now allocated dynamically, and the
IPVS code didn't handle this situation.  Fix this in the IPVS nfct
code by invoking the alloc function nfct_seqadj_ext_add().

Fixes: 41d73ec053 (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT)
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-12-27 12:30:02 +09:00
Jesper Dangaard Brouer
db12cf2743 netfilter: WARN about wrong usage of sequence number adjustments
Since commit 41d73ec053 (netfilter: nf_conntrack: make sequence
number adjustments usuable without NAT), the sequence number extension
is dynamically allocated.

Instead of dying, give a WARN splash, in case of wrong usage of the
seqadj code, e.g. when forgetting to allocate via nfct_seqadj_ext_add().

Wrong usage have been seen in the IPVS code path.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-12-27 12:29:54 +09:00
Geert Uytterhoeven
9dcbe1b87c ipvs: Remove unused variable ret from sync_thread_master()
net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]

Commit 35a2af94c7 ("sched/wait: Make the
__wait_event*() interface more friendly") changed how the interruption
state is returned. However, sync_thread_master() ignores this state,
now causing a compile warning.

According to Julian Anastasov <ja@ssi.bg>, this behavior is OK:

    "Yes, your patch looks ok to me. In the past we used ssleep() but IPVS
     users were confused why IPVS threads increase the load average. So, we
     switched to _interruptible calls and later the socket polling was
     added."

Document this, as requested by Peter Zijlstra, to avoid precious developers
disappearing in this pitfall in the future.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-12-27 12:19:32 +09:00
fan.du
6a649f3398 netfilter: add IPv4/6 IPComp extension match support
With this plugin, user could specify IPComp tagged with certain
CPI that host not interested will be DROPped or any other action.

For example:
iptables  -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP
ip6tables -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP

Then input IPComp packet with CPI equates 0x87 will not reach
upper layer anymore.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-24 12:37:58 +01:00
Valentina Giusti
08c0cad69f netfilter: nfnetlink_queue: enable UID/GID socket info retrieval
Thanks to commits 41063e9 (ipv4: Early TCP socket demux) and 421b388
(udp: ipv4: Add udp early demux) it is now possible to parse UID and
GID socket info also for incoming TCP and UDP connections. Having
this info available, it is convenient to let NFQUEUE parse it in
order to improve and refine the traffic analysis in userspace.

Signed-off-by: Valentina Giusti <valentina.giusti@bmw-carit.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-21 11:57:54 +01:00
Helmut Schaa
443d20fd18 netfilter: nf_ct_timestamp: Fix BUG_ON after netns deletion
When having nf_conntrack_timestamp enabled deleting a netns
can lead to the following BUG being triggered:

[63836.660000] Kernel bug detected[#1]:
[63836.660000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.18 #14
[63836.660000] task: 802d9420 ti: 802d2000 task.ti: 802d2000
[63836.660000] $ 0   : 00000000 00000000 00000000 00000000
[63836.660000] $ 4   : 00000001 00000004 00000020 00000020
[63836.660000] $ 8   : 00000000 80064910 00000000 00000000
[63836.660000] $12   : 0bff0002 00000001 00000000 0a0a0abe
[63836.660000] $16   : 802e70a0 85f29d80 00000000 00000004
[63836.660000] $20   : 85fb62a0 00000002 802d3bc0 85fb62a0
[63836.660000] $24   : 00000000 87138110
[63836.660000] $28   : 802d2000 802d3b40 00000014 871327cc
[63836.660000] Hi    : 000005ff
[63836.660000] Lo    : f2edd000
[63836.660000] epc   : 87138794 __nf_ct_ext_add_length+0xe8/0x1ec [nf_conntrack]
[63836.660000]     Not tainted
[63836.660000] ra    : 871327cc nf_conntrack_in+0x31c/0x7b8 [nf_conntrack]
[63836.660000] Status: 1100d403 KERNEL EXL IE
[63836.660000] Cause : 00800034
[63836.660000] PrId  : 0001974c (MIPS 74Kc)
[63836.660000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_quota xt_policy xt_pkttype xt_owner xt_nat xt_multiport xt_mark xh
[63836.660000] Process swapper (pid: 0, threadinfo=802d2000, task=802d9420, tls=00000000)
[63836.660000] Stack : 802e70a0 871323d4 00000005 87080234 802e70a0 86d2a840 00000000 00000000
[63836.660000] Call Trace:
[63836.660000] [<87138794>] __nf_ct_ext_add_length+0xe8/0x1ec [nf_conntrack]
[63836.660000] [<871327cc>] nf_conntrack_in+0x31c/0x7b8 [nf_conntrack]
[63836.660000] [<801ff63c>] nf_iterate+0x90/0xec
[63836.660000] [<801ff730>] nf_hook_slow+0x98/0x164
[63836.660000] [<80205968>] ip_rcv+0x3e8/0x40c
[63836.660000] [<801d9754>] __netif_receive_skb_core+0x624/0x6a4
[63836.660000] [<801da124>] process_backlog+0xa4/0x16c
[63836.660000] [<801d9bb4>] net_rx_action+0x10c/0x1e0
[63836.660000] [<8007c5a4>] __do_softirq+0xd0/0x1bc
[63836.660000] [<8007c730>] do_softirq+0x48/0x68
[63836.660000] [<8007c964>] irq_exit+0x54/0x70
[63836.660000] [<80060830>] ret_from_irq+0x0/0x4
[63836.660000] [<8006a9f8>] r4k_wait_irqoff+0x18/0x1c
[63836.660000] [<8009cfb8>] cpu_startup_entry+0xa4/0x104
[63836.660000] [<802eb918>] start_kernel+0x394/0x3ac
[63836.660000]
[63836.660000]
Code: 00821021  8c420000  2c440001 <00040336> 90440011  92350010  90560010  2485ffff  02a5a821
[63837.040000] ---[ end trace ebf660c3ce3b55e7 ]---
[63837.050000] Kernel panic - not syncing: Fatal exception in interrupt
[63837.050000] Rebooting in 3 seconds..

Fix this by not unregistering the conntrack extension in the per-netns
cleanup code.

This bug was introduced in (73f4001 netfilter: nf_ct_tstamp: move
initialization out of pernet_operations).

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-20 14:58:29 +01:00
Daniel Borkmann
540436c80e netfilter: nft_exthdr: call ipv6_find_hdr() with explicitly initialized offset
In nft's nft_exthdr_eval() routine we process IPv6 extension header
through invoking ipv6_find_hdr(), but we call it with an uninitialized
offset variable that contains some stack value. In ipv6_find_hdr()
we then test if the value of offset != 0 and call skb_header_pointer()
on that offset in order to map struct ipv6hdr into it. Fix it up by
initializing offset to 0 as it was probably intended to be.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-20 11:25:10 +01:00
Florian Westphal
534473c608 netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark
Useful to only set a particular range of the conntrack mark while
leaving exisiting parts of the value alone, e.g. when setting
conntrack marks via NFQUEUE.

Follows same scheme as MARK/CONNMARK targets, i.e. the mask defines
those bits that should be altered.  No mask is equal to '~0', ie.
the old value is replaced by new one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-20 10:21:42 +01:00
Florian Westphal
a42b99a6e3 netfilter: avoid get_random_bytes calls
All these users need an initial seed value for jhash, prandom is
perfectly fine.  This avoids draining the entropy pool where
its not strictly required.

nfnetlink_log did not use the random value at all.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-20 10:21:40 +01:00
David S. Miller
143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
Gao feng
45c2aff645 netfilter: nfnetlink_log: unset nf_loggers for netns when unloading module
Steven Rostedt and Arnaldo Carvalho de Melo reported a panic
when access the files /proc/sys/net/netfilter/nf_log/*.

This problem will occur when we do:

 echo nfnetlink_log > /proc/sys/net/netfilter/nf_log/any_file
 rmmod nfnetlink_log

and then access the files.

Since the nf_loggers of netns hasn't been unset, it will point
to the memory that has been freed.

This bug is introduced by commit 9368a53c ("netfilter: nfnetlink_log:
add net namespace support for nfnetlink_log").

[17261.822047] BUG: unable to handle kernel paging request at ffffffffa0d49090
[17261.822056] IP: [<ffffffff8157aba0>] nf_log_proc_dostring+0xf0/0x1d0
[...]
[17261.822226] Call Trace:
[17261.822235]  [<ffffffff81297b98>] ? security_capable+0x18/0x20
[17261.822240]  [<ffffffff8106fa09>] ? ns_capable+0x29/0x50
[17261.822247]  [<ffffffff8163d25f>] ? net_ctl_permissions+0x1f/0x90
[17261.822254]  [<ffffffff81216613>] proc_sys_call_handler+0xb3/0xc0
[17261.822258]  [<ffffffff81216651>] proc_sys_read+0x11/0x20
[17261.822265]  [<ffffffff811a80de>] vfs_read+0x9e/0x170
[17261.822270]  [<ffffffff811a8c09>] SyS_read+0x49/0xa0
[17261.822276]  [<ffffffff810e6496>] ? __audit_syscall_exit+0x1f6/0x2a0
[17261.822283]  [<ffffffff81656e99>] system_call_fastpath+0x16/0x1b
[17261.822285] Code: cc 81 4d 63 e4 4c 89 45 88 48 89 4d 90 e8 19 03 0d 00 4b 8b 84 e5 28 08 00 00 48 8b 4d 90 4c 8b 45 88 48 85 c0 0f 84 a8 00 00 00 <48> 8b 40 10 48 89 43 08 48 89 df 4c 89 f2 31 f6 e8 4b 35 af ff
[17261.822329] RIP  [<ffffffff8157aba0>] nf_log_proc_dostring+0xf0/0x1d0
[17261.822334]  RSP <ffff880274d3fe28>
[17261.822336] CR2: ffffffffa0d49090
[17261.822340] ---[ end trace a14ce54c0897a90d ]---

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-17 23:19:03 +01:00
Tomasz Bursztyka
d8bcc768c8 netfilter: nf_tables: Expose the table usage counter via netlink
Userspace can therefore know whether a table is in use or not, and
by how many chains. Suggested by Pablo Neira Ayuso.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-17 14:28:05 +01:00
Eric Leblond
0aff078d58 netfilter: nft: add queue module
This patch adds a new nft module named "nft_queue" which provides
a new nftables expression that allows you to enqueue packets to
userspace via the nfnetlink_queue subsystem. It provides the same
level of functionality as NFQUEUE and it shares some code with it.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-07 23:20:46 +01:00
Eric Leblond
97a2d41c47 netfilter: xt_NFQUEUE: separate reusable code
This patch prepares the addition of nft_queue module by moving
reusable code into a header file.

This patch also converts NFQUEUE to use prandom_u32 to initialize
the random jhash seed as suggested by Florian Westphal.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-07 23:20:45 +01:00
Eric Leblond
e569bdab35 netfilter: nf_tables: fix issue with verdict support
The test on verdict was simply done on the value of the verdict
which is not correct as far as queue is concern. In fact, the test
of verdict test must be done with respect to the verdict mask for
verdicts which are not internal to nftables.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-07 23:20:44 +01:00
Pablo Neira Ayuso
cf9dc09d09 netfilter: nf_tables: fix missing rules flushing per table
This patch allows you to atomically remove all rules stored in
a table via the NFT_MSG_DELRULE command. You only need to indicate
the specific table and no chain to flush all rules stored in that
table.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-07 22:55:48 +01:00
Sergey Popovich
b4ef4ce093 netfilter: xt_hashlimit: fix proc entry leak in netns destroy path
In (32263dd1b netfilter: xt_hashlimit: fix namespace destroy path)
the hashlimit_net_exit() function is always called right before
hashlimit_mt_destroy() to release netns data. If you use xt_hashlimit
with IPv4 and IPv6 together, this produces the following splat via
netconsole in the netns destroy path:

 Pid: 9499, comm: kworker/u:0 Tainted: G        WC O 3.2.0-5-netctl-amd64-core2
 Call Trace:
  [<ffffffff8104708d>] ? warn_slowpath_common+0x78/0x8c
  [<ffffffff81047139>] ? warn_slowpath_fmt+0x45/0x4a
  [<ffffffff81144a99>] ? remove_proc_entry+0xd8/0x22e
  [<ffffffff810ebbaa>] ? kfree+0x5b/0x6c
  [<ffffffffa043c501>] ? hashlimit_net_exit+0x45/0x8d [xt_hashlimit]
  [<ffffffff8128ab30>] ? ops_exit_list+0x1c/0x44
  [<ffffffff8128b28e>] ? cleanup_net+0xf1/0x180
  [<ffffffff810369fc>] ? should_resched+0x5/0x23
  [<ffffffff8105b8f9>] ? process_one_work+0x161/0x269
  [<ffffffff8105aea5>] ? cwq_activate_delayed_work+0x3c/0x48
  [<ffffffff8105c8c2>] ? worker_thread+0xc2/0x145
  [<ffffffff8105c800>] ? manage_workers.isra.25+0x15b/0x15b
  [<ffffffff8105fa01>] ? kthread+0x76/0x7e
  [<ffffffff813581f4>] ? kernel_thread_helper+0x4/0x10
  [<ffffffff8105f98b>] ? kthread_worker_fn+0x139/0x139
  [<ffffffff813581f0>] ? gs_change+0x13/0x13
 ---[ end trace d8c3cc0ad163ef79 ]---
 ------------[ cut here ]------------
 WARNING: at /usr/src/linux-3.2.52/debian/build/source_netctl/fs/proc/generic.c:849
 remove_proc_entry+0x217/0x22e()
 Hardware name:
 remove_proc_entry: removing non-empty directory 'net/ip6t_hashlimit', leaking at least 'IN-REJECT'

This is due to lack of removal net/ip6t_hashlimit/* entries in
hashlimit_proc_net_exit(), since only IPv4 entries are deleted. Fix
it by always removing the IPv4 and IPv6 entries and their parent
directories in the netns destroy path.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-07 22:46:51 +01:00
Jeff Kirsher
e664eabd18 netfilter: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

CC: netfilter@vger.kernel.org
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:57 -05:00
Dave Jones
b49faea765 netfilter: ipset: fix incorret comparison in hash_netnet4_data_equal()
Both sides of the comparison are the same, looks like a cut-and-paste error.

Spotted by Coverity.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-25 22:42:18 +01:00
David S. Miller
cd2cc01b67 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains fixes for your net tree, they are:

* Remove extra quote from connlimit configuration in Kconfig, from
  Randy Dunlap.

* Fix missing mss option in syn packets sent to the backend in our
  new synproxy target, from Martin Topholm.

* Use window scale announced by client when sending the forged
  syn to the backend, from Martin Topholm.

* Fix IPv6 address comparison in ebtables, from Luís Fernando
  Cornachioni Estrozi.

* Fix wrong endianess in sequence adjustment which breaks helpers
  in NAT configurations, from Phil Oester.

* Fix the error path handling of nft_compat, from me.

* Make sure the global conntrack counter is decremented after the
  object has been released, also from me.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 12:44:15 -05:00
Linus Torvalds
1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Johannes Berg
c53ed74236 genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Pablo Neira Ayuso
0c3c6c00c6 netfilter: nf_conntrack: decrement global counter after object release
nf_conntrack_free() decrements our counter (net->ct.count)
before releasing the conntrack object. That counter is used in the
nf_conntrack_cleanup_net_list path to check if it's time to
kmem_cache_destroy our cache of conntrack objects. I think we have
a race there that should be easier to trigger (although still hard)
with CONFIG_DEBUG_OBJECTS_FREE as object releases become slowier
according to the following splat:

[ 1136.321305] WARNING: CPU: 2 PID: 2483 at lib/debugobjects.c:260
debug_print_object+0x83/0xa0()
[ 1136.321311] ODEBUG: free active (active state 0) object type:
timer_list hint: delayed_work_timer_fn+0x0/0x20
...
[ 1136.321390] Call Trace:
[ 1136.321398]  [<ffffffff8160d4a2>] dump_stack+0x45/0x56
[ 1136.321405]  [<ffffffff810514e8>] warn_slowpath_common+0x78/0xa0
[ 1136.321410]  [<ffffffff81051557>] warn_slowpath_fmt+0x47/0x50
[ 1136.321414]  [<ffffffff812f8883>] debug_print_object+0x83/0xa0
[ 1136.321420]  [<ffffffff8106aa90>] ? execute_in_process_context+0x90/0x90
[ 1136.321424]  [<ffffffff812f99fb>] debug_check_no_obj_freed+0x20b/0x250
[ 1136.321429]  [<ffffffff8112e7f2>] ? kmem_cache_destroy+0x92/0x100
[ 1136.321433]  [<ffffffff8115d945>] kmem_cache_free+0x125/0x210
[ 1136.321436]  [<ffffffff8112e7f2>] kmem_cache_destroy+0x92/0x100
[ 1136.321443]  [<ffffffffa046b806>] nf_conntrack_cleanup_net_list+0x126/0x160 [nf_conntrack]
[ 1136.321449]  [<ffffffffa046c43d>] nf_conntrack_pernet_exit+0x6d/0x80 [nf_conntrack]
[ 1136.321453]  [<ffffffff81511cc3>] ops_exit_list.isra.3+0x53/0x60
[ 1136.321457]  [<ffffffff815124f0>] cleanup_net+0x100/0x1b0
[ 1136.321460]  [<ffffffff8106b31e>] process_one_work+0x18e/0x430
[ 1136.321463]  [<ffffffff8106bf49>] worker_thread+0x119/0x390
[ 1136.321467]  [<ffffffff8106be30>] ? manage_workers.isra.23+0x2a0/0x2a0
[ 1136.321470]  [<ffffffff8107210b>] kthread+0xbb/0xc0
[ 1136.321472]  [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110
[ 1136.321477]  [<ffffffff8161b8fc>] ret_from_fork+0x7c/0xb0
[ 1136.321479]  [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110
[ 1136.321481] ---[ end trace 25f53c192da70825 ]---

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18 14:07:19 +01:00