Adding support for SNMP broadcast connection tracking. The SNMP
broadcast requests are now paired with the SNMP responses.
Thus allowing using SNMP broadcasts with firewall enabled.
Please refer to the following conversation:
http://marc.info/?l=netfilter-devel&m=125992205006600&w=2
Patrick McHardy wrote:
> > The best solution would be to add generic broadcast tracking, the
> > use of expectations for this is a bit of abuse.
> > The second best choice I guess would be to move the help() function
> > to a shared module and generalize it so it can be used for both.
This patch implements the "second best choice".
Since the netbios-ns conntrack module uses the same helper
functionality as the snmp, only one helper function is added
for both snmp and netbios-ns modules into the new object -
nf_conntrack_broadcast.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When a packet is meant to be handled by another node of the cluster,
silently drop it instead of flooding kernel log.
Note : INVALID packets are also dropped without notice.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
If an skb is to be NF_QUEUE'd, but no program has opened the queue, the
packet is dropped.
This adds a v2 target revision of xt_NFQUEUE that allows packets to
continue through the ruleset instead.
Because the actual queueing happens outside of the target context, the
'bypass' flag has to be communicated back to the netfilter core.
Unfortunately the only choice to do this without adding a new function
argument is to use the target function return value (i.e. the verdict).
In the NF_QUEUE case, the upper 16bit already contain the queue number
to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e.
we now have extra room for a new flag.
If a hook issued a NF_QUEUE verdict, then the netfilter core will
continue packet processing if the queueing hook
returns -ESRCH (== "this queue does not exist") and the new
NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value.
Note: If the queue exists, but userspace does not consume packets fast
enough, the skb will still be dropped.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
NF_VERDICT_MASK is currently 0xffff. This is because the upper
16 bits are used to store errno (for NF_DROP) or the queue number
(NF_QUEUE verdict).
As there are up to 0xffff different queues available, there is no more
room to store additional flags.
At the moment there are only 6 different verdicts, i.e. we can reduce
NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space.
NF_VERDICT_BITS would then be reduced to 8, but because the value is
exported to userspace, this might cause breakage; e.g.:
e.g. 'queuenr = (1 << NF_VERDICT_BITS) | NF_QUEUE' would now break.
Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value
to the 'userspace compat' section.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move free responsibility from nf_queue to caller.
This enables more flexible error handling; we can now accept the skb
instead of freeing it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
instead of returning -1 on error, return an error number to allow the
caller to handle some errors differently.
ECANCELED is used to indicate that the hook is going away and should be
ignored.
A followup patch will introduce more 'ignore this hook' conditions,
(depending on queue settings) and will move kfree_skb responsibility
to the caller.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
NFLOG already does the same thing for NETFILTER_NETLINK_LOG.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
As this ct won't be seen by the others, we don't need to set the
IPS_CONFIRMED_BIT in atomic way.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
My previous patch (netfilter: nf_nat: don't use atomic bit operation)
made a mistake when converting atomic_set to a normal bit 'or'.
IPS_*_BIT should be replaced with IPS_*.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When xt_AUDIT is built as a module, modpost reports a problem.
MODPOST 322 modules
ERROR: "audit_enabled" [net/netfilter/x_tables.ko] undefined!
WARNING: modpost: Found 1 section mismatch(es).
Cc: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
The setsockopt() syscall to replace tables is already recorded
in the audit logs. This patch stores additional information
such as table name and netfilter protocol.
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds a new netfilter target which creates audit records
for packets traversing a certain chain.
It can be used to record packets which are rejected administraively
as follows:
-N AUDIT_DROP
-A AUDIT_DROP -j AUDIT --type DROP
-A AUDIT_DROP -j DROP
a rule which would typically drop or reject a packet would then
invoke the new chain to record packets before dropping them.
-j AUDIT_DROP
The module is protocol independant and works for iptables, ip6tables
and ebtables.
The following information is logged:
- netfilter hook
- packet length
- incomming/outgoing interface
- MAC src/dst/proto for ethernet packets
- src/dst/protocol address for IPv4/IPv6
- src/dst port for TCP/UDP/UDPLITE
- icmp type/code
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of
the vmalloc flags to indicate that a hash table has been allocated
using vmalloc().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
which itself depends on NET_SCHED; this dependency is missing from netfilter.
Since matching on realms is also useful without having NET_SCHED enabled and
the option really only controls whether the tclassid member is included in
route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.
Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
To avoid adding a new match revision icmp type/code are stored
in the sport/dport area.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Holger Eitzenberger <holger@eitzenberger.org>
Reviewed-by: Bart De Schuymer<bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
One iptables invocation with 135000 rules takes 35 seconds of cpu time
on a recent server, using a 32bit distro and a 64bit kernel.
We eventually trigger NMI/RCU watchdog.
INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)
COMPAT mode has quadratic behavior and consume 16 bytes of memory per
rule.
Switch the xt_compat algos to use an array instead of list, and use a
binary search to locate an offset in the sorted array.
This halves memory need (8 bytes per rule), and removes quadratic
behavior [ O(N*N) -> O(N*log2(N)) ]
Time of iptables goes from 35 s to 150 ms.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new revision 3 that contains port ranges for all of origsrc,
origdst, replsrc and repldst. The high ports are appended to the
original v2 data structure to allow sharing most of the code with
v1 and v2. Use of the revision specific port matching function is
made dependant on par->match->revision.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since a string is stored, and not something like a MAC address that
would rely on (un)signedness, drop the qualifier.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
all init_net removed, (except for some alloc related
that needs to be there)
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
init_net removed in __ip_vs_addr_is_local_v6, and got net as param.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Last two global vars to be moved,
ip_vs_ftpsvc_counter and ip_vs_nullsvc_counter.
[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
trash list per namspace,
and reordering of some params in dst struct.
[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
cancel_rearming_delayed_work(). Found during
merge conflict resoliution ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch makes defense work timer per name-space,
A net ptr had to be added to the ipvs struct,
since it's needed by defense_work_handler.
[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
cancel_rearming_delayed_work(). Found during
merge conflict resoliution ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Moving global vars to ipvs struct, except for svc table lock.
Next patch for ctl will be drop-rate handling.
*v3
__ip_vs_mutex remains global
ip_vs_conntrack_enabled(struct netns_ipvs *ipvs)
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Connection hash table is now name space aware.
i.e. net ptr >> 8 is xor:ed to the hash,
and this is the first param to be compared.
The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s)
and cache-line aligned, so a ptr >> 5 might be a more clever solution ?
All lookups where net is compared uses net_eq() which returns 1 when netns
is disabled, and the compiler seems to do something clever in that case.
ip_vs_conn_fill_param() have *net as first param now.
Three new inlines added to keep conn struct smaller
when names space is disabled.
- ip_vs_conn_net()
- ip_vs_conn_net_set()
- ip_vs_conn_net_eq()
*v3
moved net compare to the end in "fast path"
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The statistic counter locks for every packet are now removed,
and that statistic is now per CPU, i.e. no locks needed.
However summing is made in ip_vs_est into ip_vs_stats struct
which is moved to ipvs struc.
procfs, ip_vs_stats now have a "per cpu" count and a grand total.
A new function seq_file_single_net() in ip_vs.h created for handling of
single_open_net() since it does not place net ptr in a struct, like others.
/var/lib/lxc # cat /proc/net/ip_vs_stats_percpu
Total Incoming Outgoing Incoming Outgoing
CPU Conns Packets Packets Bytes Bytes
0 0 3 1 9D 34
1 0 1 2 49 70
2 0 1 2 34 76
3 1 2 2 70 74
~ 1 7 7 18A 18E
Conns/s Pkts/s Pkts/s Bytes/s Bytes/s
0 0 0 0 0
*v3
ip_vs_stats reamains as before, instead ip_vs_stats_percpu is added.
u64 seq lock added
*v4
Bug correction inbytes and outbytes as own vars..
per_cpu counter for all stats now as suggested by Julian.
[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
All global variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
in sync_buf create + 4 replaced by sizeof(struct..)
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
*v3
timer per ns instead of a common timer in estimator.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
in ip_vs_protocol param struct net *net added to:
- register_app()
- unregister_app()
This affected almost all proto_xxx.c files
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
appcnt and timeout_table moved from struct ip_vs_protocol to
ip_vs proto_data.
struct net *net added as first param to
- register_app()
- unregister_app()
- app_conn_bind()
- ip_vs_conn_new()
[horms@verge.net.au: removed cosmetic-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
ip_vs_protocol *pp is replaced by ip_vs_proto_data *pd in
function call in ip_vs_protocol struct i.e. :,
- timeout_change()
- state_transition()
ip_vs_protocol_timeout_change() got ipvs as param, due to above
and a upcoming patch - defence work
Most of this changes are triggered by Julians comment:
"tcp_timeout_change should work with the new struct ip_vs_proto_data
so that tcp_state_table will go to pd->state_table
and set_tcp_state will get pd instead of pp"
*v3
Mostly comments from Julian
The pp -> pd conversion should start from functions like
ip_vs_out() that use pp = ip_vs_proto_get(iph.protocol),
now they should use ip_vs_proto_data_get(net, iph.protocol).
conn_in_get() and conn_out_get() unused param *pp, removed.
*v4
ip_vs_protocol_timeout_change() walk the proto_data path.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that common for all protos.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data
*v3
Removed unuset function set_state_timeout()
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data
*v3
Removed unused function set_state_timeout()
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use all
ip_vs_proto_data
*v3
Removed unused function as sugested by Simon
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add support for protocol data per name-space.
in struct ip_vs_protocol, appcnt will be removed when all protos
are modified for network name-space.
This patch causes warnings of unused functions, they will be used
when next patch will be applied.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
var sysctl_ip_vs_lblc_expiration moved to ipvs struct as
sysctl_lblc_expiration
procfs updated to handle this.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
var sysctl_ip_vs_lblcr_expiration moved to ipvs struct as
sysctl_lblcr_expiration
procfs updated to handle this.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Services hash tables got netns ptr a hash arg,
While Real Servers (rs) has been moved to ipvs struct.
Two new inline functions added to get net ptr from skb.
Since ip_vs is called from different contexts there is two
places to dig for the net ptr skb->dev or skb->sk
this is handled in skb_net() and skb_sknet()
Global functions, ip_vs_service_get() ip_vs_lookup_real_service()
etc have got struct net *net as first param.
If possible get net ptr skb etc,
- if not &init_net is used at this early stage of patching.
ip_vs_ctl.c procfs not ready for netns yet.
*v3
Comments by Julian
- __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path,
net_eq(svc->net, net) so the check is at the end now.
- net = skb_net(skb) in ip_vs_out moved after check for skb_dst.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Preparation for network name-space init, in this stage
some empty functions exists.
In most files there is a check if it is root ns i.e. init_net
if (!net_eq(net, &init_net))
return ...
this will be removed by the last patch, when enabling name-space.
*v3
ip_vs_conn.c merge error corrected.
net_ipvs #ifdef removed as sugested by Jan Engelhardt
[ horms@verge.net.au: Removed whitespace-change-only hunks ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
hfsplus: %L-to-%ll, macro correction, and remove unneeded braces
hfsplus: spaces/indentation clean-up
hfsplus: C99 comments clean-up
hfsplus: over 80 character lines clean-up
hfsplus: fix an artifact in ioctl flag checking
hfsplus: flush disk caches in sync and fsync
hfsplus: optimize fsync
hfsplus: split up inode flags
hfsplus: write up fsync for directories
hfsplus: simplify fsync
hfsplus: avoid useless work in hfsplus_sync_fs
hfsplus: make sure sync writes out all metadata
hfsplus: use raw bio access for partition tables
hfsplus: use raw bio access for the volume headers
hfsplus: always use hfsplus_sync_fs to write the volume header
hfsplus: silence a few debug printks
hfsplus: fix option parsing during remount
Fix up conflicts due to VFS changes in fs/hfsplus/{hfsplus_fs.h,unicode.c}
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6: (77 commits)
spi/omap: Fix DMA API usage in OMAP MCSPI driver
spi/imx: correct the test on platform_get_irq() return value
spi/topcliff: Typo fix threhold to threshold
spi/dw_spi Typo change diable to disable.
spi/fsl_espi: change the read behaviour of the SPIRF
spi/mpc52xx-psc-spi: move probe/remove to proper sections
spi/dw_spi: add DMA support
spi/dw_spi: change to EXPORT_SYMBOL_GPL for exported APIs
spi/dw_spi: Fix too short timeout in spi polling loop
spi/pl022: convert running variable
spi/pl022: convert busy flag to a bool
spi/pl022: pass the returned sglen to the DMA engine
spi/pl022: map the buffers on the DMA engine
spi/topcliff_pch: Fix data transfer issue
spi/imx: remove autodetection
spi/pxa2xx: pass of_node to spi device and set a parent device
spi/pxa2xx: Modify RX-Tresh instead of busy-loop for the remaining RX bytes.
spi/pxa2xx: Add chipselect support for Sodaville
spi/pxa2xx: Consider CE4100's FIFO depth
spi/pxa2xx: Add CE4100 support
...
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
gameport: use this_cpu_read instead of lookup
x86: udelay: Use this_cpu_read to avoid address calculation
x86: Use this_cpu_inc_return for nmi counter
x86: Replace uses of current_cpu_data with this_cpu ops
x86: Use this_cpu_ops to optimize code
vmstat: User per cpu atomics to avoid interrupt disable / enable
irq_work: Use per cpu atomics instead of regular atomics
cpuops: Use cmpxchg for xchg to avoid lock semantics
x86: this_cpu_cmpxchg and this_cpu_xchg operations
percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
percpu,x86: relocate this_cpu_add_return() and friends
connector: Use this_cpu operations
xen: Use this_cpu_inc_return
taskstats: Use this_cpu_ops
random: Use this_cpu_inc_return
fs: Use this_cpu_inc_return in buffer.c
highmem: Use this_cpu_xx_return() operations
vmstat: Use this_cpu_inc_return for vm statistics
x86: Support for this_cpu_add, sub, dec, inc_return
percpu: Generic support for this_cpu_add, sub, dec, inc_return
...
Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.