Commit Graph

814 Commits

Author SHA1 Message Date
stephen hemminger
77f9859837 bridge: fix ordering of NEWLINK and NEWNEIGH events
When port is added to a bridge, the old code would send the new neighbor
netlink message before the subsequent new link message. This bug makes
it difficult to use the monitoring API in an application.

This code changes the ordering to add the forwarding entry
after the port is setup. One of the error checks (for invalid address)
is moved earlier in the process to avoid having to do unwind.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03 12:17:33 -04:00
David S. Miller
8decf86879 Merge branch 'master' of github.com:davem330/net
Conflicts:
	MAINTAINERS
	drivers/net/Kconfig
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
	drivers/net/ethernet/broadcom/tg3.c
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
	drivers/net/wireless/rt2x00/rt2800usb.c
	drivers/net/wireless/wl12xx/main.c
2011-09-22 03:23:13 -04:00
Jiri Pirko
4bc71cb983 net: consolidate and fix ethtool_ops->get_settings calling
This patch does several things:
- introduces __ethtool_get_settings which is called from ethtool code and
  from drivers as well. Put ASSERT_RTNL there.
- dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
- changes calling in drivers so rtnl locking is respected. In
  iboe_get_rate was previously ->get_settings() called unlocked. This
  fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
  problem. Also fixed by calling __dev_get_by_index() instead of
  dev_get_by_index() and holding rtnl_lock for both calls.
- introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
  so bnx2fc_if_create() and fcoe_if_create() are called locked as they
  are from other places.
- use __ethtool_get_settings() in bonding code

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

v2->v3:
	-removed dev_ethtool_get_settings()
	-added ASSERT_RTNL into __ethtool_get_settings()
	-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
	 around it and __ethtool_get_settings() call
v1->v2:
        add missing export_symbol
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 17:32:26 -04:00
Jiri Pirko
fa3df928e0 br: remove redundant check and init
Since these checks and initialization are done in
dev_ethtool_get_settings called later on, remove this redundancy.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 15:36:34 -04:00
David S. Miller
7858241655 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-08-30 17:43:56 -04:00
Eric Dumazet
22df13319d bridge: fix a possible use after free
br_multicast_ipv6_rcv() can call pskb_trim_rcsum() and therefore skb
head can be reallocated.

Cache icmp6_type field instead of dereferencing twice the struct
icmp6hdr pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 17:49:24 -07:00
Yan, Zheng
4b275d7efa bridge: Pseudo-header required for the checksum of ICMPv6
Checksum of ICMPv6 is not properly computed because the pseudo header is not used.
Thus, the MLD packet gets dropped by the bridge.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reported-by: Ang Way Chuang <wcang@sfc.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24 17:49:00 -07:00
Eric Dumazet
11f3a6bdc2 bridge: fix a possible net_device leak
Jan Beulich reported a possible net_device leak in bridge code after
commit bb900b27a2 (bridge: allow creating bridge devices with netlink)

Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-22 16:49:56 -07:00
David S. Miller
823dcd2506 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-08-20 10:39:12 -07:00
Jiri Pirko
afc4b13df1 net: remove use of ndo_set_multicast_list in drivers
replace it by ndo_set_rx_mode

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:22:03 -07:00
Julia Lawall
5189054dd7 net/bridge/netfilter/ebtables.c: use available error handling code
Free the locally allocated table and newinfo as done in adjacent error
handling code.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-11 05:52:57 -07:00
Andrei Warkentin
9be6dd6510 Bridge: Always send NETDEV_CHANGEADDR up on br MAC change.
This ensures the neighbor entries associated with the bridge
dev are flushed, also invalidating the associated cached L2 headers.

This means we br_add_if/br_del_if ports to implement hand-over and
not wind up with bridge packets going out with stale MAC.

This means we can also change MAC of port device and also not wind
up with bridge packets going out with stale MAC.

This builds on Stephen Hemminger's patch, also handling the br_del_if
case and the port MAC change case.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-09 21:44:44 -07:00
Stephen Hemminger
a9b3cd7f32 rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.

Convert all rcu_assign_pointer of NULL value.

//smpl
@@ expression P; @@

- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)

// </smpl>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02 04:29:23 -07:00
Bart De Schuymer
9823d9ff48 netfilter: ebtables: fix ebtables build dependency
The configuration of ebtables shouldn't depend on
CONFIG_BRIDGE_NETFILTER, only on CONFIG_NETFILTER.

Reported-by: Sbastien Laveze <slaveze@gmail.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-07-29 16:40:30 +02:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Linus Torvalds
d3ec4844d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  fs: Merge split strings
  treewide: fix potentially dangerous trailing ';' in #defined values/expressions
  uwb: Fix misspelling of neighbourhood in comment
  net, netfilter: Remove redundant goto in ebt_ulog_packet
  trivial: don't touch files that are removed in the staging tree
  lib/vsprintf: replace link to Draft by final RFC number
  doc: Kconfig: `to be' -> `be'
  doc: Kconfig: Typo: square -> squared
  doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
  drivers/net: static should be at beginning of declaration
  drivers/media: static should be at beginning of declaration
  drivers/i2c: static should be at beginning of declaration
  XTENSA: static should be at beginning of declaration
  SH: static should be at beginning of declaration
  MIPS: static should be at beginning of declaration
  ARM: static should be at beginning of declaration
  rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
  Update my e-mail address
  PCIe ASPM: forcedly -> forcibly
  gma500: push through device driver tree
  ...

Fix up trivial conflicts:
 - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
 - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
 - drivers/net/r8169.c (just context changes)
2011-07-25 13:56:39 -07:00
stephen hemminger
160d73b845 bridge: minor cleanups
Some minor cleanups that won't impact code:
  1. Remove inline from non-critical functions; compiler will most
     likely inline them anyway.
  2. Make function args const where possible.
  3. Whitespace cleanup

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-22 17:01:13 -07:00
stephen hemminger
4ecb961c8b bridge: add notification over netlink when STP changes state
When STP changes state of interface need to send a new link
message to reflect that change.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-22 17:01:12 -07:00
stephen hemminger
56139fc5bd bridge: notifier called with the wrong device
If a new device is added to a bridge, the ethernet address of the
bridge network device may change. When the address changes, the
appropriate callback is called, but with the wrong device argument.
The address of the bridge device (ie br0) changes not the address
of the device being passed to add_if (ie eth0).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-22 17:01:12 -07:00
stephen hemminger
0652cac22c bridge: ignore bogus STP config packets
If the message_age is already greater than the max_age, then the
BPDU is bogus. Linux won't generate BPDU, but conformance tester
or buggy implementation might.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-22 17:01:12 -07:00
stephen hemminger
0c03150e7e bridge: send proper message_age in config BPDU
A bridge topology with three systems:

      +------+  +------+
      | A(2) |--| B(1) |
      +------+  +------+
           \    /
          +------+
          | C(3) |
          +------+

What is supposed to happen:
 * bridge with the lowest ID is elected root (for example: B)
 * C detects that A->C is higher cost path and puts in blocking state

What happens. Bridge with lowest id (B) is elected correctly as
root and things start out fine initially. But then config BPDU
doesn't get transmitted from A -> C. Because of that
the link from A-C is transistioned to the forwarding state.

The root cause of this is that the configuration messages
is generated with bogus message age, and dropped before
sending.

In the standardmessage_age is supposed to be:
  the time since the generation of the Configuration BPDU by
  the Root that instigated the generation of this Configuration BPDU.

Reimplement this by recording the timestamp (age + jiffies) when
recording config information. The old code incorrectly used the time
elapsed on the ageing timer which was incorrect.

See also:
  https://bugzilla.vyatta.com/show_bug.cgi?id=7164

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-22 17:01:12 -07:00
Jesper Juhl
f0da7ee410 net, netfilter: Remove redundant goto in ebt_ulog_packet
In net/bridge/netfilter/ebt_ulog.c:ebt_ulog_packet() the 'goto unlock'
before the 'alloc_failure' label is completely redundant. This patch
removes it.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-07-21 14:02:17 +02:00
David S. Miller
d3aaeb38c4 net: Add ->neigh_lookup() operation to dst_ops
In the future dst entries will be neigh-less.  In that environment we
need to have an easy transition point for current users of
dst->neighbour outside of the packet output fast path.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-18 00:40:17 -07:00
David S. Miller
69cce1d140 net: Abstract dst->neighbour accesses behind helpers.
dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
David S. Miller
8f40b161de neigh: Pass neighbour entry to output ops.
This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.

We will also be able to make dst entries neigh-less.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:17 -07:00
David S. Miller
f6b72b6217 net: Embed hh_cache inside of struct neighbour.
Now that there is a one-to-one correspondance between neighbour
and hh_cache entries, we no longer need:

1) dynamic allocation
2) attachment to dst->hh
3) refcounting

Initialization of the hh_cache entry is indicated by hh_len
being non-zero, and such initialization is always done with
the neighbour's lock held as a writer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14 07:53:20 -07:00
David S. Miller
e12fe68ce3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-07-05 23:23:37 -07:00
Herbert Xu
44661462ee bridge: Always flood broadcast packets
As is_multicast_ether_addr returns true on broadcast packets as
well, we need to explicitly exclude broadcast packets so that
they're always flooded.  This wasn't an issue before as broadcast
packets were considered to be an unregistered multicast group,
which were always flooded.  However, as we now only flood such
packets to router ports, this is no longer acceptable.

Reported-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-05 18:39:39 -07:00
Herbert Xu
bd4265fe36 bridge: Only flood unregistered groups to routers
The bridge currently floods packets to groups that we have never
seen before to all ports.  This is not required by RFC4541 and
in fact it is not desirable in environment where traffic to
unregistered group is always present.

This patch changes the behaviour so that we only send traffic
to unregistered groups to ports marked as routers.

The user can always force flooding behaviour to any given port
by marking it as a router.

Note that this change does not apply to traffic to 224.0.0.X
as traffic to those groups must always be flooded to all ports.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-24 17:52:51 -07:00
David S. Miller
9f6ec8d697 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
	drivers/net/wireless/rtlwifi/pci.c
	net/netfilter/ipvs/ip_vs_core.c
2011-06-20 22:29:08 -07:00
WANG Cong
cefa9993f1 netpoll: copy dev name of slaves to struct netpoll
Otherwise we will not see the name of the slave dev in error
message:

[  388.469446] (null):  doesn't support polling, aborting.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-19 16:13:01 -07:00
Fernando Luis Vázquez Cao
fc2af6c73f IGMP snooping: set mrouters_only flag for IPv6 traffic properly
Upon reception of a MGM report packet the kernel sets the mrouters_only flag
in a skb that is a clone of the original skb, which means that the bridge
loses track of MGM packets (cb buffers are tied to a specific skb and not
shared) and it ends up forwading join requests to the bridge interface.

This can cause unexpected membership timeouts and intermitent/permanent loss
of connectivity as described in RFC 4541 [2.1.1. IGMP Forwarding Rules]:

    A snooping switch should forward IGMP Membership Reports only to
    those ports where multicast routers are attached.
    [...]
    Sending membership reports to other hosts can result, for IGMPv1
    and IGMPv2, in unintentionally preventing a host from joining a
    specific multicast group.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
2011-06-16 23:14:13 -04:00
Fernando Luis Vázquez Cao
62b2bcb49c IGMP snooping: set mrouters_only flag for IPv4 traffic properly
Upon reception of a IGMP/IGMPv2 membership report the kernel sets the
mrouters_only flag in a skb that may be a clone of the original skb, which
means that sometimes the bridge loses track of membership report packets (cb
buffers are tied to a specific skb and not shared) and it ends up forwading
join requests to the bridge interface.

This can cause unexpected membership timeouts and intermitent/permanent loss
of connectivity as described in RFC 4541 [2.1.1. IGMP Forwarding Rules]:

    A snooping switch should forward IGMP Membership Reports only to
    those ports where multicast routers are attached.
    [...]
    Sending membership reports to other hosts can result, for IGMPv1
    and IGMPv2, in unintentionally preventing a host from joining a
    specific multicast group.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Tested-by: Hayato Kakuta <kakuta.hayato@oss.ntt.co.jp>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
2011-06-16 23:14:12 -04:00
Greg Rose
c7ac8679be rtnetlink: Compute and store minimum ifinfo dump size
The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-06-09 20:38:07 -07:00
Alexander Holler
6407d74c51 bridge: provide a cow_metrics method for fake_ops
Like in commit 0972ddb237 (provide cow_metrics() methods to blackhole
dst_ops), we must provide a cow_metrics for bridges fake_dst_ops as
well.

This fixes a regression coming from commits 62fa8a846d (net: Implement
read-only protection and COW'ing of metrics.) and 33eb9873a2 (bridge:
initialize fake_rtable metrics)

ip link set mybridge mtu 1234
-->
[  136.546243] Pid: 8415, comm: ip Tainted: P 
2.6.39.1-00006-g40545b7 #103 ASUSTeK Computer Inc.         V1Sn 
        /V1Sn
[  136.546256] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
[  136.546268] EIP is at 0x0
[  136.546273] EAX: f14a389c EBX: 000005d4 ECX: f80d32c0 EDX: f80d1da1
[  136.546279] ESI: f14a3000 EDI: f255bf10 EBP: f15c3b54 ESP: f15c3b48
[  136.546285]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  136.546293] Process ip (pid: 8415, ti=f15c2000 task=f4741f80 
task.ti=f15c2000)
[  136.546297] Stack:
[  136.546301]  f80c658f f14a3000 ffffffed f15c3b64 c12cb9c8 f80d1b80 
ffffffa1 f15c3bbc
[  136.546315]  c12da347 c12d9c7d 00000000 f7670b00 00000000 f80d1b80 
ffffffa6 f15c3be4
[  136.546329]  00000004 f14a3000 f255bf20 00000008 f15c3bbc c11d6cae 
00000000 00000000
[  136.546343] Call Trace:
[  136.546359]  [<f80c658f>] ? br_change_mtu+0x5f/0x80 [bridge]
[  136.546372]  [<c12cb9c8>] dev_set_mtu+0x38/0x80
[  136.546381]  [<c12da347>] do_setlink+0x1a7/0x860
[  136.546390]  [<c12d9c7d>] ? rtnl_fill_ifinfo+0x9bd/0xc70
[  136.546400]  [<c11d6cae>] ? nla_parse+0x6e/0xb0
[  136.546409]  [<c12db931>] rtnl_newlink+0x361/0x510
[  136.546420]  [<c1023240>] ? vmalloc_sync_all+0x100/0x100
[  136.546429]  [<c1362762>] ? error_code+0x5a/0x60
[  136.546438]  [<c12db5d0>] ? rtnl_configure_link+0x80/0x80
[  136.546446]  [<c12db27a>] rtnetlink_rcv_msg+0xfa/0x210
[  136.546454]  [<c12db180>] ? __rtnl_unlock+0x20/0x20
[  136.546463]  [<c12ee0fe>] netlink_rcv_skb+0x8e/0xb0
[  136.546471]  [<c12daf1c>] rtnetlink_rcv+0x1c/0x30
[  136.546479]  [<c12edafa>] netlink_unicast+0x23a/0x280
[  136.546487]  [<c12ede6b>] netlink_sendmsg+0x26b/0x2f0
[  136.546497]  [<c12bb828>] sock_sendmsg+0xc8/0x100
[  136.546508]  [<c10adf61>] ? __alloc_pages_nodemask+0xe1/0x750
[  136.546517]  [<c11d0602>] ? _copy_from_user+0x42/0x60
[  136.546525]  [<c12c5e4c>] ? verify_iovec+0x4c/0xc0
[  136.546534]  [<c12bd805>] sys_sendmsg+0x1c5/0x200
[  136.546542]  [<c10c2150>] ? __do_fault+0x310/0x410
[  136.546549]  [<c10c2c46>] ? do_wp_page+0x1d6/0x6b0
[  136.546557]  [<c10c47d1>] ? handle_pte_fault+0xe1/0x720
[  136.546565]  [<c12bd1af>] ? sys_getsockname+0x7f/0x90
[  136.546574]  [<c10c4ec1>] ? handle_mm_fault+0xb1/0x180
[  136.546582]  [<c1023240>] ? vmalloc_sync_all+0x100/0x100
[  136.546589]  [<c10233b3>] ? do_page_fault+0x173/0x3d0
[  136.546596]  [<c12bd87b>] ? sys_recvmsg+0x3b/0x60
[  136.546605]  [<c12bdd83>] sys_socketcall+0x293/0x2d0
[  136.546614]  [<c13629d0>] sysenter_do_call+0x12/0x26
[  136.546619] Code:  Bad EIP value.
[  136.546627] EIP: [<00000000>] 0x0 SS:ESP 0068:f15c3b48
[  136.546645] CR2: 0000000000000000
[  136.546652] ---[ end trace 6909b560e78934fa ]---

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-07 00:51:35 -07:00
David S. Miller
58bf2dbccc Merge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6 2011-05-27 13:04:40 -04:00
David Miller
97242c85a2 netfilter: Fix several warnings in compat_mtw_from_user().
Kill set but not used 'entry_offset'.

Add a default case to the switch statement so the compiler
can see that we always initialize off and size_kern before
using them.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-26 19:09:07 +02:00
Eric Dumazet
33eb9873a2 bridge: initialize fake_rtable metrics
bridge netfilter code uses a fake_rtable, and we must init its _metric
field or risk NULL dereference later.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=35672

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-24 13:32:18 -04:00
Eric Dumazet
6df427fe8c net: remove synchronize_net() from netdev_set_master()
In the old days, we used to access dev->master in __netif_receive_skb()
in a rcu_read_lock section.

So one synchronize_net() call was needed in netdev_set_master() to make
sure another cpu could not use old master while/after we release it.

We now use netdev_rx_handler infrastructure and added one
synchronize_net() call in bond_release()/bond_release_all()

Remove the obsolete synchronize_net() from netdev_set_master() and add
one in bridge del_nbp() after its netdev_rx_handler_unregister() call.

This makes enslave -d a bit faster.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jiri Pirko <jpirko@redhat.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-22 21:01:20 -04:00
Amerigo Wang
bb8ed6302b bridge: call NETDEV_JOIN notifiers when add a slave
In the previous patch I added NETDEV_JOIN, now
we can notify netconsole when adding a device to a bridge too.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-22 21:01:19 -04:00
David S. Miller
9cbc94eabb Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/vmxnet3/vmxnet3_ethtool.c
	net/core/dev.c
2011-05-17 17:33:11 -04:00
Stephen Hemminger
cb68552858 bridge: fix forwarding of IPv6
The commit 6b1e960fdb
    bridge: Reset IPCB when entering IP stack on NF_FORWARD
broke forwarding of IPV6 packets in bridge because it would
call bp_parse_ip_options with an IPV6 packet.

Reported-by: Noah Meyerhans <noahm@debian.org>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 16:03:24 -04:00
David S. Miller
3c709f8fb4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6
Conflicts:
	drivers/net/benet/be_main.c
2011-05-11 14:26:58 -04:00
Florian Westphal
103a9778e0 netfilter: ebtables: only call xt_compat_add_offset once per rule
The optimizations in commit 255d0dc340
(netfilter: x_table: speedup compat operations) assume that
xt_compat_add_offset is called once per rule.

ebtables however called it for each match/target found in a rule.

The match/watcher/target parser already returns the needed delta, so it
is sufficient to move the xt_compat_add_offset call to a more reasonable
location.

While at it, also get rid of the unused COMPAT iterator macros.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-05-10 09:52:17 +02:00
Eric Dumazet
5a6351eecf netfilter: fix ebtables compat support
commit 255d0dc340 (netfilter: x_table: speedup compat operations)
made ebtables not working anymore.

1) xt_compat_calc_jump() is not an exact match lookup
2) compat_table_info() has a typo in xt_compat_init_offsets() call
3) compat_do_replace() misses a xt_compat_init_offsets() call

Reported-by: dann frazier <dannf@dannf.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-05-10 09:48:59 +02:00
Eric Dumazet
e67f88dd12 net: dont hold rtnl mutex during netlink dump callbacks
Four years ago, Patrick made a change to hold rtnl mutex during netlink
dump callbacks.

I believe it was a wrong move. This slows down concurrent dumps, making
good old /proc/net/ files faster than rtnetlink in some situations.

This occurred to me because one "ip link show dev ..." was _very_ slow
on a workload adding/removing network devices in background.

All dump callbacks are able to use RCU locking now, so this patch does
roughly a revert of commits :

1c2d670f36 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks
6313c1e099 : [RTNETLINK]: Remove unnecessary locking in dump callbacks

This let writers fight for rtnl mutex and readers going full speed.

It also takes care of phonet : phonet_route_get() is now called from rcu
read section. I renamed it to phonet_route_get_rcu()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-02 15:26:28 -07:00
David Decotigny
25db033881 ethtool: Use full 32 bit speed range in ethtool's set_settings
This makes sure the ethtool's set_settings() callback of network
drivers don't ignore the 16 most significant bits when ethtool calls
their set_settings().

All drivers compiled with make allyesconfig on x86_64 have been
updated.

Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-29 14:03:00 -07:00
Michał Mirosław
c4d27ef957 bridge: convert br_features_recompute() to ndo_fix_features
Note: netdev_update_features() needs only rtnl_lock as br->port_list
is only changed while holding it.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28 13:33:08 -07:00
David S. Miller
2bd93d7af1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Resolved logic conflicts causing a build failure due to
drivers/net/r8169.c changes using a patch from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-26 12:16:46 -07:00
Eric Dumazet
b71d1d426d inet: constify ip headers and in6_addr
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22 11:04:14 -07:00
David S. Miller
f01cb5fbea Revert "bridge: Forward reserved group addresses if !STP"
This reverts commit 1e253c3b8a.

It breaks 802.3ad bonding inside of a bridge.

The commit was meant to support transport bridging, and specifically
virtual machines bridged to an ethernet interface connected to a
switch port wiht 802.1x enabled.

But this isn't the way to do it, it breaks too many other things.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-21 21:17:25 -07:00
David S. Miller
e1943424e4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_ethtool.c
2011-04-19 00:21:33 -07:00
Stephen Hemminger
28674b97cf bridge: fix accidental creation of sysfs directory
Commit bb900b27a2 ("bridge: allow
creating bridge devices with netlink") introduced a bug in net-next
because of a typo in notifier. Every device would have the sysfs
bridge directory (and files).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-17 17:52:51 -07:00
Eric Dumazet
f8e9881c2a bridge: reset IPCB in br_parse_ip_options
Commit 462fb2af97 (bridge : Sanitize skb before it enters the IP
stack), missed one IPCB init before calling ip_options_compile()

Thanks to Scot Doyle for his tests and bug reports.

Reported-by: Scot Doyle <lkml@scotdoyle.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Bandan Das <bandan.das@stratus.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Jan Lübbe <jluebbe@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 13:39:14 -07:00
David S. Miller
1c01a80cfe Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/smsc911x.c
2011-04-11 13:44:25 -07:00
Linus Torvalds
42933bac11 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Fix common misspellings
2011-04-07 11:14:49 -07:00
stephen hemminger
14f98f258f bridge: range check STP parameters
Apply restrictions on STP parameters based 802.1D 1998 standard.
   * Fixes missing locking in set path cost ioctl
   * Uses common code for both ioctl and sysfs

This is based on an earlier patch Sasikanth V but with overhaul.

Note:
1. It does NOT enforce the restriction on the relationship max_age and
   forward delay or hello time because in existing implementation these are
   set as independant operations.

2. If STP is disabled, there is no restriction on forward delay

3. No restriction on holding time because users use Linux code to act
   as hub or be sticky.

4. Although standard allow 0-255, Linux only allows 0-63 for port priority
   because more bits are reserved for port number.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 17:22:29 -07:00
stephen hemminger
bb900b27a2 bridge: allow creating bridge devices with netlink
Add netlink device ops to allow creating bridge device via netlink.
This works in a manner similar to vlan, macvlan and bonding.

Example:
  # ip link add link dev br0 type bridge
  # ip link del dev br0

The change required rearranging initializtion code to deal with
being called by create link. Most of the initialization happens
in br_dev_setup, but allocation of stats is done in ndo_init callback
to deal with allocation failure. Sysfs setup has to wait until
after the network device kobject is registered.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 17:22:28 -07:00
stephen hemminger
36fd2b63e3 bridge: allow creating/deleting fdb entries via netlink
Use RTM_NEWNEIGH and RTM_DELNEIGH to allow updating of entries
in bridge forwarding table. This allows manipulating static entries
which is not possible with existing tools.

Example (using bridge extensions to iproute2)
   # br fdb add 00:02:03:04:05:06 dev eth0

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 17:22:28 -07:00
stephen hemminger
b078f0df67 bridge: add netlink notification on forward entry changes
This allows applications to query and monitor bridge forwarding
table in the same method used for neighbor table. The forward table
entries are returned in same structure format as used by the ioctl.
If more information is desired in future, the netlink method is
extensible.

Example (using bridge extensions to iproute2)
  # br monitor

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 17:22:27 -07:00
stephen hemminger
664de48bb6 bridge: split rcu and no-rcu cases of fdb lookup
In some cases, look up of forward database entry is done with RCU;
and for others no RCU is needed because of locking. Split the two
cases into two differnt loops (and take off inline).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 17:22:26 -07:00
stephen hemminger
7cd8861ab0 bridge: track last used time in forwarding table
Adds tracking the last used time in forwarding table.
Rename ageing_timer to updated to better describe it.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 17:22:26 -07:00
stephen hemminger
03e9b64b89 bridge: change arguments to fdb_create
Later patch provides ability to create non-local static entry.
To make this easier move the updating of the flag values to
after the code that creates entry.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 17:22:25 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Lüssing
ff9a57a62a bridge: mcast snooping, fix length check of snooped MLDv1/2
"len = ntohs(ip6h->payload_len)" does not include the length of the ipv6
header itself, which the rest of this function assumes, though.

This leads to a length check less restrictive as it should be in the
following line for one thing. For another, it very likely leads to an
integer underrun when substracting the offset and therefore to a very
high new value of 'len' due to its unsignedness. This will ultimately
lead to the pskb_trim_rcsum() practically never being called, even in
the cases where it should.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30 02:28:20 -07:00
Balaji G
1459a3cc51 bridge: Fix compilation warning in function br_stp_recalculate_bridge_id()
net/bridge/br_stp_if.c: In function ‘br_stp_recalculate_bridge_id’:
net/bridge/br_stp_if.c:216:3: warning: ‘return’ with no value, in function returning non-void

Signed-off-by: G.Balaji <balajig81@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-29 23:37:23 -07:00
stephen hemminger
edf947f100 bridge: notify applications if address of bridge device changes
The mac address of the bridge device may be changed when a new interface
is added to the bridge. If this happens, then the bridge needs to call
the network notifiers to tickle any other systems that care. Since bridge
can be a module, this also means exporting the notifier function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 23:35:02 -07:00
Linus Lüssing
a7bff75b08 bridge: Fix possibly wrong MLD queries' ethernet source address
The ipv6_dev_get_saddr() is currently called with an uninitialized
destination address. Although in tests it usually seemed to nevertheless
always fetch the right source address, there seems to be a possible race
condition.

Therefore this commit changes this, first setting the destination
address and only after that fetching the source address.

Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:26:42 -07:00
Herbert Xu
6b1e960fdb bridge: Reset IPCB when entering IP stack on NF_FORWARD
Whenever we enter the IP stack proper from bridge netfilter we
need to ensure that the skb is in a form the IP stack expects
it to be in.

The entry point on NF_FORWARD did not meet the requirements of
the IP stack, therefore leading to potential crashes/panics.

This patch fixes the problem.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-18 15:13:12 -07:00
Jiri Pirko
8a4eb5734e net: introduce rx_handler results and logic around that
This patch allows rx_handlers to better signalize what to do next to
it's caller. That makes skb->deliver_no_wcard no longer needed.

kernel-doc for rx_handler_result is taken from Nicolas' patch.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16 12:53:54 -07:00
David S. Miller
c337ffb68e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-03-15 15:15:17 -07:00
stephen hemminger
a461c0297f bridge: skip forwarding delay if not using STP
If Spanning Tree Protocol is not enabled, there is no good reason for
the bridge code to wait for the forwarding delay period before enabling
the link. The purpose of the forwarding delay is to allow STP to
learn about other bridges before nominating itself.

The only possible impact is that when starting up a new port
the bridge may flood a packet now, where previously it might have
seen traffic from the other host and preseeded the forwarding table.

Includes change for local variable br already available in that func.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:06:49 -07:00
stephen hemminger
1faa4356a3 bridge: control carrier based on ports online
This makes the bridge device behave like a physical device.
In earlier releases the bridge always asserted carrier. This
changes the behavior so that bridge device carrier is on only
if one or more ports are in the forwarding state. This
should help IPv6 autoconfiguration, DHCP, and routing daemons.

I did brief testing with Network and Virt manager and they
seem fine, but since this changes behavior of bridge, it should
wait until net-next (2.6.39).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Tested-By: Adam Majer <adamm@zombino.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 14:29:02 -07:00
David S. Miller
78fbfd8a65 ipv4: Create and use route lookup helpers.
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:42 -08:00
David S. Miller
33175d84ee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_cmn.c
2011-03-10 14:26:00 -08:00
Randy Dunlap
dcbcdf22f5 net: bridge builtin vs. ipv6 modular
When configs BRIDGE=y and IPV6=m, this build error occurs:

br_multicast.c:(.text+0xa3341): undefined reference to `ipv6_dev_get_saddr'

BRIDGE_IGMP_SNOOPING is boolean; if it were tristate, then adding
	depends on IPV6 || IPV6=n
to BRIDGE_IGMP_SNOOPING would be a good fix.  As it is currently,
making BRIDGE depend on the IPV6 config works.

Reported-by: Patrick Schaaf <netdev@bof.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-10 13:45:57 -08:00
David S. Miller
0a0e9ae1bd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x.h
2011-03-03 21:27:42 -08:00
David S. Miller
b23dd4fe42 ipv4: Make output route lookup return rtable directly.
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 14:31:35 -08:00
David S. Miller
3872b28408 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-03-02 11:30:24 -08:00
Linus Lüssing
fe29ec41aa bridge: Use IPv6 link-local address for multicast listener queries
Currently the bridge multicast snooping feature periodically issues
IPv6 general multicast listener queries to sense the absence of a
listener.

For this, it uses :: as its source address - however RFC 2710 requires:
"To be valid, the Query message MUST come from a link-local IPv6 Source
Address". Current Linux kernel versions seem to follow this requirement
and ignore our bogus MLD queries.

With this commit a link local address from the bridge interface is being
used to issue the MLD query, resulting in other Linux devices which are
multicast listeners in the network to respond with a MLD response (which
was not the case before).

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:07:29 -08:00
Linus Lüssing
36cff5a10c bridge: Fix MLD queries' ethernet source address
Map the IPv6 header's destination multicast address to an ethernet
source address instead of the MLD queries multicast address.

For instance for a general MLD query (multicast address in the MLD query
set to ::), this would wrongly be mapped to 33:33:00:00:00:00, although
an MLD queries destination MAC should always be 33:33:00:00:00:01 which
matches the IPv6 header's multicast destination ff02::1.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:07:28 -08:00
Linus Lüssing
e4de9f9e83 bridge: Allow mcast snooping for transient link local addresses too
Currently the multicast bridge snooping support is not active for
link local multicast. I assume this has been done to leave
important multicast data untouched, like IPv6 Neighborhood Discovery.

In larger, bridged, local networks it could however be desirable to
optimize for instance local multicast audio/video streaming too.

With the transient flag in IPv6 multicast addresses we have an easy
way to optimize such multimedia traffic without tempering with the
high priority multicast data from well-known addresses.

This patch alters the multicast bridge snooping for IPv6, to take
effect for transient multicast addresses instead of non-link-local
addresses.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:07:28 -08:00
Linus Lüssing
d41db9f3f7 bridge: Add missing ntohs()s for MLDv2 report parsing
The nsrcs number is 2 Byte wide, therefore we need to call ntohs()
before using it.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:07:27 -08:00
Linus Lüssing
649e984d00 bridge: Fix IPv6 multicast snooping by correcting offset in MLDv2 report
We actually want a pointer to the grec_nsrcr and not the following
field. Otherwise we can get very high values for *nsrcs as the first two
bytes of the IPv6 multicast address are being used instead, leading to
a failing pskb_may_pull() which results in MLDv2 reports not being
parsed.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:07:26 -08:00
Linus Lüssing
9cc6e0c4c4 bridge: Fix IPv6 multicast snooping by storing correct protocol type
The protocol type for IPv6 entries in the hash table for multicast
bridge snooping is falsely set to ETH_P_IP, marking it as an IPv4
address, instead of setting it to ETH_P_IPV6, which results in negative
look-ups in the hash table later.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:07:26 -08:00
David S. Miller
da935c66ba Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/e1000e/netdev.c
	net/xfrm/xfrm_policy.c
2011-02-19 19:17:35 -08:00
Vasiliy Kulikov
d846f71195 bridge: netfilter: fix information leak
Struct tmp is copied from userspace.  It is not checked whether the "name"
field is NULL terminated.  This may lead to buffer overflow and passing
contents of kernel stack as a module name to try_then_request_module() and,
consequently, to modprobe commandline.  It would be seen by all userspace
processes.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-14 16:49:23 +01:00
Jiri Pirko
afc6151a78 bridge: implement [add/del]_slave ops
add possibility to addif/delif via rtnetlink

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-13 16:58:40 -08:00
Herbert Xu
8a870178c0 bridge: Replace mp->mglist hlist with a bool
As it turns out we never need to walk through the list of multicast
groups subscribed by the bridge interface itself (the only time we'd
want to do that is when we shut down the bridge, in which case we
simply walk through all multicast groups), we don't really need to
keep an hlist for mp->mglist.

This means that we can replace it with just a single bit to indicate
whether the bridge interface is subscribed to a group.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-12 01:05:42 -08:00
Herbert Xu
24f9cdcbd7 bridge: Fix timer typo that may render snooping less effective
In a couple of spots where we are supposed to modify the port
group timer (p->timer) we instead modify the bridge interface
group timer (mp->timer).

The effect of this is mostly harmless.  However, it can cause
port subscriptions to be longer than they should be, thus making
snooping less effective.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-11 21:59:37 -08:00
Herbert Xu
6b0d6a9b42 bridge: Fix mglist corruption that leads to memory corruption
The list mp->mglist is used to indicate whether a multicast group
is active on the bridge interface itself as opposed to one of the
constituent interfaces in the bridge.

Unfortunately the operation that adds the mp->mglist node to the
list neglected to check whether it has already been added.  This
leads to list corruption in the form of nodes pointing to itself.

Normally this would be quite obvious as it would cause an infinite
loop when walking the list.  However, as this list is never actually
walked (which means that we don't really need it, I'll get rid of
it in a subsequent patch), this instead is hidden until we perform
a delete operation on the affected nodes.

As the same node may now be pointed to by more than one node, the
delete operations can then cause modification of freed memory.

This was observed in practice to cause corruption in 512-byte slabs,
most commonly leading to crashes in jbd2.

Thanks to Josef Bacik for pointing me in the right direction.

Reported-by: Ian Page Hands <ihands@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-11 21:59:37 -08:00
David S. Miller
bd4a6974cc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-02-04 14:28:58 -08:00
Pavel Emelyanov
1158f762e5 bridge: Don't put partly initialized fdb into hash
The fdb_create() puts a new fdb into hash with only addr set. This is
not good, since there are callers, that search the hash w/o the lock
and access all the other its fields.

Applies to current netdev tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04 13:02:36 -08:00
Michał Mirosław
acd1130e87 net: reduce and unify printk level in netdev_fix_features()
Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:45:15 -08:00
Michał Mirosław
04ed3e741d net: change netdev->features to u32
Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:32:47 -08:00
Florian Westphal
6faee60a4e netfilter: ebt_ip6: allow matching on ipv6-icmp types/codes
To avoid adding a new match revision icmp type/code are stored
in the sport/dport area.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Holger Eitzenberger <holger@eitzenberger.org>
Reviewed-by: Bart De Schuymer<bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
Eric Dumazet
255d0dc340 netfilter: x_table: speedup compat operations
One iptables invocation with 135000 rules takes 35 seconds of cpu time
on a recent server, using a 32bit distro and a 64bit kernel.

We eventually trigger NMI/RCU watchdog.

INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)

COMPAT mode has quadratic behavior and consume 16 bytes of memory per
rule.

Switch the xt_compat algos to use an array instead of list, and use a
binary search to locate an offset in the sorted array.

This halves memory need (8 bytes per rule), and removes quadratic
behavior [ O(N*N) -> O(N*log2(N)) ]

Time of iptables goes from 35 s to 150 ms.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
Changli Gao
f88de8de5a net: bridge: check the length of skb after nf_bridge_maybe_copy_header()
Since nf_bridge_maybe_copy_header() may change the length of skb,
we should check the length of skb after it to handle the ppoe skbs.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:33:05 -08:00
David S. Miller
dbbe68bb12 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-04 11:57:25 -08:00
Tomas Winkler
1a9180a20f net/bridge: fix trivial sparse errors
net/bridge//br_stp_if.c:148:66: warning: conversion of
net/bridge//br_stp_if.c:148:66:     int to
net/bridge//br_stp_if.c:148:66:     int enum umh_wait

net/bridge//netfilter/ebtables.c:1150:30: warning: Using plain integer as NULL pointer

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 13:29:18 -08:00
Florian Westphal
e6f26129eb bridge: stp: ensure mac header is set
commit bf9ae5386b
(llc: use dev_hard_header) removed the
skb_reset_mac_header call from llc_mac_hdr_init.

This seems fine itself, but br_send_bpdu() invokes ebtables LOCAL_OUT.

We oops in ebt_basic_match() because it assumes eth_hdr(skb) returns
a meaningful result.

Cc: acme@ghostprotocols.net
References: https://bugzilla.kernel.org/show_bug.cgi?id=24532
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 12:09:33 -08:00
Tomas Winkler
9d89081d69 bridge: fix br_multicast_ipv6_rcv for paged skbs
use pskb_may_pull to access ipv6 header correctly for paged skbs
It was omitted in the bridge code leading to crash in blind
__skb_pull

since the skb is cloned undonditionally we also simplify the
the exit path

this fixes bug https://bugzilla.kernel.org/show_bug.cgi?id=25202

Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: authenticated
Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: associated (aid 2)
Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 RADIUS: starting accounting session 4D0608A3-00000005
Dec 15 14:36:41 User-PC kernel: [175576.120287] ------------[ cut here ]------------
Dec 15 14:36:41 User-PC kernel: [175576.120452] kernel BUG at include/linux/skbuff.h:1178!
Dec 15 14:36:41 User-PC kernel: [175576.120609] invalid opcode: 0000 [#1] SMP
Dec 15 14:36:41 User-PC kernel: [175576.120749] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
Dec 15 14:36:41 User-PC kernel: [175576.121035] Modules linked in: approvals binfmt_misc bridge stp llc parport_pc ppdev arc4 iwlagn snd_hda_codec_realtek iwlcore i915 snd_hda_intel mac80211 joydev snd_hda_codec snd_hwdep snd_pcm snd_seq_midi drm_kms_helper snd_rawmidi drm snd_seq_midi_event snd_seq snd_timer snd_seq_device cfg80211 eeepc_wmi usbhid psmouse intel_agp i2c_algo_bit intel_gtt uvcvideo agpgart videodev sparse_keymap snd shpchp v4l1_compat lp hid video serio_raw soundcore output snd_page_alloc ahci libahci atl1c
Dec 15 14:36:41 User-PC kernel: [175576.122712]
Dec 15 14:36:41 User-PC kernel: [175576.122769] Pid: 0, comm: kworker/0:0 Tainted: G        W   2.6.37-rc5-wl+ #3 1015PE/1016P
Dec 15 14:36:41 User-PC kernel: [175576.123012] EIP: 0060:[<f83edd65>] EFLAGS: 00010283 CPU: 1
Dec 15 14:36:41 User-PC kernel: [175576.123193] EIP is at br_multicast_rcv+0xc95/0xe1c [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.123362] EAX: 0000001c EBX: f5626318 ECX: 00000000 EDX: 00000000
Dec 15 14:36:41 User-PC kernel: [175576.123550] ESI: ec512262 EDI: f5626180 EBP: f60b5ca0 ESP: f60b5bd8
Dec 15 14:36:41 User-PC kernel: [175576.123737]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec 15 14:36:41 User-PC kernel: [175576.123902] Process kworker/0:0 (pid: 0, ti=f60b4000 task=f60a8000 task.ti=f60b0000)
Dec 15 14:36:41 User-PC kernel: [175576.124137] Stack:
Dec 15 14:36:41 User-PC kernel: [175576.124181]  ec556500 f6d06800 f60b5be8 c01087d8 ec512262 00000030 00000024 f5626180
Dec 15 14:36:41 User-PC kernel: [175576.124181]  f572c200 ef463440 f5626300 3affffff f6d06dd0 e60766a4 000000c4 f6d06860
Dec 15 14:36:41 User-PC kernel: [175576.124181]  ffffffff ec55652c 00000001 f6d06844 f60b5c64 c0138264 c016e451 c013e47d
Dec 15 14:36:41 User-PC kernel: [175576.124181] Call Trace:
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01087d8>] ? sched_clock+0x8/0x10
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0138264>] ? enqueue_entity+0x174/0x440
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e451>] ? sched_clock_cpu+0x131/0x190
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c013e47d>] ? select_task_rq_fair+0x2ad/0x730
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0524fc1>] ? nf_iterate+0x71/0x90
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4914>] ? br_handle_frame_finish+0x184/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e46e9>] ? br_handle_frame+0x189/0x230 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4560>] ? br_handle_frame+0x0/0x230 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff026>] ? __netif_receive_skb+0x1b6/0x5b0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04f7a30>] ? skb_copy_bits+0x110/0x210
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0503a7f>] ? netif_receive_skb+0x6f/0x80
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cb74c>] ? ieee80211_deliver_skb+0x8c/0x1a0 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cc836>] ? ieee80211_rx_handlers+0xeb6/0x1aa0 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff1f0>] ? __netif_receive_skb+0x380/0x5b0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e242>] ? sched_clock_local+0xb2/0x190
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c012b688>] ? default_spin_lock_flags+0x8/0x10
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cd621>] ? ieee80211_prepare_and_rx_handle+0x201/0xa90 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82ce154>] ? ieee80211_rx+0x2a4/0x830 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f815a8d6>] ? iwl_update_stats+0xa6/0x2a0 [iwlcore]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8499212>] ? iwlagn_rx_reply_rx+0x292/0x3b0 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8483697>] ? iwl_rx_handle+0xe7/0x350 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8486ab7>] ? iwl_irq_tasklet+0xf7/0x5c0 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01aece1>] ? __rcu_process_callbacks+0x201/0x2d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150d05>] ? tasklet_action+0xc5/0x100
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150a07>] ? __do_softirq+0x97/0x1d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d910c>] ? nmi_stack_correct+0x2f/0x34
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150970>] ? __do_softirq+0x0/0x1d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  <IRQ>
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01508f5>] ? irq_exit+0x65/0x70
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05df062>] ? do_IRQ+0x52/0xc0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01036b0>] ? common_interrupt+0x30/0x38
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c03a1fc2>] ? intel_idle+0xc2/0x160
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04daebb>] ? cpuidle_idle_call+0x6b/0x100
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0101dea>] ? cpu_idle+0x8a/0xf0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d2702>] ? start_secondary+0x1e8/0x1ee

Cc: David Miller <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 11:26:34 -08:00
David S. Miller
b4aa9e05a6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x.h
	drivers/net/wireless/iwlwifi/iwl-1000.c
	drivers/net/wireless/iwlwifi/iwl-6000.c
	drivers/net/wireless/iwlwifi/iwl-core.h
	drivers/vhost/vhost.c
2010-12-17 12:27:22 -08:00
David Stevens
76d661586c bridge: fix IPv6 queries for bridge multicast snooping
This patch fixes a missing ntohs() for bridge IPv6 multicast snooping.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:41:23 -08:00
Herbert Xu
deef4b522b bridge: Use consistent NF_DROP returns in nf_pre_routing
The nf_pre_routing functions in bridging have collected two
distinct ways of returning NF_DROP over the years, inline and
via goto.  There is no reason for preferring either one.

So this patch arbitrarily picks the inline variant and converts
the all the gotos.

Also removes a redundant comment.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 16:04:53 -08:00
Tobias Klauser
4c0833bcd4 bridge: Fix return values of br_multicast_add_group/br_multicast_new_group
If br_multicast_new_group returns NULL, we would return 0 (no error) to
the caller of br_multicast_add_group, which is not what we want. Instead
br_multicast_new_group should return ERR_PTR(-ENOMEM) in this case.
Also propagate the error number returned by br_mdb_rehash properly.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 13:00:39 -08:00
David S. Miller
defb3519a6 net: Abstract away all dst_entry metrics accesses.
Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.

This will allow us to:

1) More easily change how the metrics are stored.

2) Implement COW for metrics.

In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing.  We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-12-09 10:46:36 -08:00
Changli Gao
5811662b15 net: use the macros defined for the members of flowi
Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:45 -08:00
Eric Dumazet
ec1e5610c0 bridge: add RCU annotations to bridge port lookup
br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:18 -08:00
stephen hemminger
b5ed54e94d bridge: fix RCU races with bridge port
The macro br_port_exists() is not enough protection when only
RCU is being used. There is a tiny race where other CPU has cleared port
handler hook, but is bridge port flag might still be set.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:17 -08:00
Eric Dumazet
a386f99025 bridge: add proper RCU annotation to should_route_hook
Add br_should_route_hook_t typedef, this is the only way we can
get a clean RCU implementation for function pointer.

Move route_hook to location where it is used.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:16 -08:00
Eric Dumazet
e805168800 bridge: add RCU annotation to bridge multicast table
Add modern __rcu annotatations to bridge multicast table.
Use newer hlist macros to avoid direct access to hlist internals.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:16 -08:00
Benjamin Poirier
1e253c3b8a bridge: Forward reserved group addresses if !STP
Make all frames sent to reserved group MAC addresses (01:80:c2:00:00:00 to
01:80:c2:00:00:0f) be forwarded if STP is disabled. This enables
forwarding EAPOL frames, among other things.

Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 04:25:48 -07:00
stephen hemminger
d028023281 bridge: make br_parse_ip_options static
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 03:09:42 -07:00
Jesse Gross
361ff8a6cf bridge: Add support for TX vlan offload.
If some of the underlying devices support it, enable vlan offload on
transmit for bridge devices.  This allows senders to take advantage of the
hardware support, similar to other forms of acceleration.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 01:26:54 -07:00
Jesse Gross
b738127dfb vlan: Rename VLAN_GROUP_ARRAY_LEN to VLAN_N_VID.
VLAN_GROUP_ARRAY_LEN is simply the number of possible vlan VIDs.
Since vlan groups will soon be more of an implementation detail
for vlan devices, rename the constant to be descriptive of its
actual purpose.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 01:26:50 -07:00
Jesse Gross
13937911f9 ebtables: Allow filtering of hardware accelerated vlan frames.
An upcoming commit will allow packets with hardware vlan acceleration
information to be passed though more parts of the network stack, including
packets trunked through the bridge.  This adds support for matching and
filtering those packets through ebtables.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 01:26:49 -07:00
Eric Dumazet
fc66f95c68 net dst: use a percpu_counter to track entries
struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

After:

real	0m45.570s
user	0m15.525s
sys	9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real	0m41.841s
user	0m15.261s
sys	8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 13:06:53 -07:00
Bandan Das
462fb2af97 bridge : Sanitize skb before it enters the IP stack
Related dicussion here : http://lkml.org/lkml/2010/9/3/16

Introduce a function br_parse_ip_options that will audit the
skb and possibly refill IP options before a packet enters the
IP stack. If no options are present, the function will zero out
the skb cb area so that it is not misinterpreted as options by some
unsuspecting IP layer routine. If packet consistency fails, drop it.

Signed-off-by: Bandan Das <bandan.das@stratus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-19 12:42:34 -07:00
David S. Miller
e548833df8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/mac80211/main.c
2010-09-09 22:27:33 -07:00
David S. Miller
87f94b4e91 bridge: Clear INET control block of SKBs passed into ip_fragment().
In a similar vain to commit 17762060c2
("bridge: Clear IPCB before possible entry into IP stack")

Any time we call into the IP stack we have to make sure the state
there is as expected by the ipv4 code.

With help from Eric Dumazet and Herbert Xu.

Reported-by: Bandan Das <bandan.das@stratus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 19:17:34 -07:00
stephen hemminger
aa7c6e5fa0 bridge: avoid ethtool on non running interface
If bridge port is offline, don't call ethtool to query speed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-25 16:36:51 -07:00
Stephen Hemminger
944c794d64 bridge: fix locking comment
The carrier check is not called from work queue in current code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-25 16:36:50 -07:00
Changli Gao
4c3a76abd3 bridge: netfilter: fix a memory leak
nf_bridge_alloc() always reset the skb->nf_bridge, so we should always
put the old one.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-23 20:14:36 -07:00
Simon Horman
c2368e795c bridge: is PACKET_LOOPBACK unlikely()?
While looking at using netdev_rx_handler_register for openvswitch Jesse
Gross suggested that an unlikely() might be worthwhile in that code.
I'm interested to see if its appropriate for the bridge code.

Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-22 21:09:04 -07:00
David S. Miller
00dad5e479 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/e1000e/hw.h
	net/bridge/br_device.c
	net/bridge/br_input.c
2010-08-02 22:22:46 -07:00
Herbert Xu
3a7fda06ba bridge: Allow multicast snooping to be disabled before ifup
Currently you cannot disable multicast snooping while a device is
down.  There is no good reason for this restriction and this patch
removes it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-30 23:38:58 -07:00
Herbert Xu
6d1d1d398c bridge: Fix skb leak when multicast parsing fails on TX
On the bridge TX path we're leaking an skb when br_multicast_rcv
returns an error.

Reported-by: David Lamparter <equinox@diac24.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-30 23:36:57 -07:00
stephen hemminger
eeaf61d889 bridge: add rcu_read_lock on transmit
Long ago, when bridge was converted to RCU, rcu lock was equivalent
to having preempt disabled. RCU has changed a lot since then and
bridge code was still assuming the since transmit was called with
bottom half disabled, it was RCU safe.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-28 10:50:55 -07:00
Herbert Xu
573201f36f bridge: Partially disable netpoll support
The new netpoll code in bridging contains use-after-free bugs
that are non-trivial to fix.

This patch fixes this by removing the code that uses skbs after
they're freed.

As a consequence, this means that we can no longer call bridge
from the netpoll path, so this patch also removes the controller
function in order to disable netpoll.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-19 23:28:25 -07:00
Kulikov Vasiliy
bb7a0bd600 net: bridge: fix sign bug
ipv6_skip_exthdr() can return error code that is below zero.
'offset' is unsigned, so it makes no sense.
ipv6_skip_exthdr() returns 'int' so we can painlessly change type of
offset to int.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-15 20:27:58 -07:00
David S. Miller
597e608a84 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-07-07 15:59:38 -07:00
Eric Dumazet
28172739f0 net: fix 64 bit counters on 32 bit arches
There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.

One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().

Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)

Old drivers (only providing get_stats() method) need no changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-07 14:58:56 -07:00
Herbert Xu
17762060c2 bridge: Clear IPCB before possible entry into IP stack
The bridge protocol lives dangerously by having incestuous relations
with the IP stack.  In this instance an abomination has been created
where a bogus IPCB area from a bridged packet leads to a crash in
the IP stack because it's interpreted as IP options.

This patch papers over the problem by clearing the IPCB area in that
particular spot.  To fix this properly we'd also need to parse any
IP options if present but I'm way too lazy for that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-07 14:43:33 -07:00
Herbert Xu
7f285fa78d bridge br_multicast: BUG: unable to handle kernel NULL pointer dereference
On Tue, Jul 06, 2010 at 08:48:35AM +0800, Herbert Xu wrote:
>
> bridge: Restore NULL check in br_mdb_ip_get

Resend with proper attribution.

bridge: Restore NULL check in br_mdb_ip_get

Somewhere along the line the NULL check in br_mdb_ip_get went
AWOL, causing crashes when we receive an IGMP packet with no
multicast table allocated.

This patch restores it and ensures all br_mdb_*_get functions
use it.

Reported-by: Frank Arnold <frank.arnold@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-05 20:08:06 -07:00
David S. Miller
e490c1defe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-07-02 22:42:06 -07:00
Patrick McHardy
4df53d8bab bridge: add per bridge device controls for invoking iptables
Support more fine grained control of bridge netfilter iptables invocation
by adding seperate brnf_call_*tables parameters for each device using the
sysfs interface. Packets are passed to layer 3 netfilter when either the
global parameter or the per bridge parameter is enabled.

Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-02 09:32:57 +02:00
David S. Miller
8244132ea8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/ip_output.c
2010-06-23 18:26:27 -07:00
Eric Dumazet
406818ff34 bridge: 64bit rx/tx counters
Use u64_stats_sync infrastructure to provide 64bit rx/tx
counters even on 32bit hosts.

It is safe to use a single u64_stats_sync for rx and tx,
because BH is disabled on both, and we use per_cpu data.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-23 13:00:48 -07:00
stephen hemminger
25442e06d2 bridge: fdb cleanup runs too often
It is common in end-node, non STP bridges to set forwarding
delay to zero; which causes the forwarding database cleanup
to run every clock tick. Change to run only as soon as needed
or at next ageing timer interval which ever is sooner.

Use round_jiffies_up macro rather than attempting round up
by changing value.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-17 13:49:14 -07:00
Herbert Xu
9f70b0fcec bridge: Add const to dummy br_netpoll_send_skb
The version of br_netpoll_send_skb used when netpoll is off is
missing a const thus causing a warning.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:43:48 -07:00
Herbert Xu
fed396a585 bridge: Fix OOM crash in deliver_clone
The bridge multicast patches introduced an OOM crash in the forward
path, when deliver_clone fails to clone the skb.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:43:07 -07:00
David S. Miller
16fb62b6b4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-06-15 13:49:24 -07:00
Jiri Pirko
f350a0a873 bridge: use rx_handler_data pointer to store net_bridge_port pointer
Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:48:58 -07:00
Jiri Pirko
93e2c32b5c net: add rx_handler data pointer
Add possibility to register rx_handler data pointer along with a rx_handler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:47:11 -07:00
Herbert Xu
91d2c34a4e bridge: Fix netpoll support
There are multiple problems with the newly added netpoll support:

1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.

This patch fixes all of these problems.  In particular, we now
allocate proper netpoll structures for each underlying device.

We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll.  Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).

This allows us to do away with the npinfo juggling that caused
problem number 1.

Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.

Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:00:40 -07:00
Herbert Xu
36655042f9 bridge: Remove redundant npinfo NULL setting
Now that netpoll always zaps npinfo we no longer need to do it
in bridge.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:38 -07:00
Patrick McHardy
f9181f4ffc Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	include/net/netfilter/xt_rateest.h
	net/bridge/br_netfilter.c
	net/netfilter/nf_conntrack_core.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-15 17:31:06 +02:00
Changli Gao
d8d1f30b95 net-next: remove useless union keyword
remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 23:31:35 -07:00
Jiri Pirko
ab95bfe01f net: replace hooks in __netif_receive_skb V5
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 07:11:15 -07:00
Eric Dumazet
a8b563894d netfilter: br_netfilter: use skb_set_noref()
Avoid dirtying bridge_parent_rtable refcount, using new dst noref
infrastructure.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-01 11:48:31 +02:00
Chris Wright
2c3c8bea60 sysfs: add struct file* to bin_attr callbacks
This allows bin_attr->read,write,mmap callbacks to check file specific data
(such as inode owner) as part of any privilege validation.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:31 -07:00
Randy Dunlap
b3bcb72edb bridge: fix build for CONFIG_SYSFS disabled
Fix build when CONFIG_SYSFS is not enabled:
net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name'

Note: dev->name == sysfs_name except when change name is in
progress, and we are protected from that by RTNL mutex.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 12:26:27 -07:00
Simon Arlott
e0f43752a9 bridge: update sysfs link names if port device names have changed
Links for each port are created in sysfs using the device
name, but this could be changed after being added to the
bridge.

As well as being unable to remove interfaces after this
occurs (because userspace tools don't recognise the new
name, and the kernel won't recognise the old name), adding
another interface with the old name to the bridge will
cause an error trying to create the sysfs link.

This fixes the problem by listening for NETDEV_CHANGENAME
notifications and renaming the link.

https://bugzilla.kernel.org/show_bug.cgi?id=12743

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:15 -07:00
stephen hemminger
28a16c9796 bridge: change console message interface
Use one set of macro's for all bridge messages.

Note: can't use netdev_XXX macro's because bridge is purely
virtual and has no device parent.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:02 -07:00
stephen hemminger
cfb478da70 bridge: netpoll cleanup
Move code around so that the ifdef for NETPOLL_CONTROLLER don't have to
show up in main code path. The control functions should be in helpers
that are only compiled if needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:01 -07:00
Bart De Schuymer
e94c67436e netfilter: bridge-netfilter: fix crash in br_nf_forward_finish()
[ 4593.956206] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 4593.956219] IP: [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
[ 4593.956232] PGD 195ece067 PUD 1ba005067 PMD 0
[ 4593.956241] Oops: 0000 [#1] SMP
[ 4593.956248] last sysfs file:
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08/ATK0110:00/hwmon/hwmon0/temp2_label
[ 4593.956253] CPU 3
...
[ 4593.956380] Pid: 29512, comm: kvm Not tainted 2.6.34-rc7-net #195 P6T DELUXE/System Product Name
[ 4593.956384] RIP: 0010:[<ffffffffa03357a4>]  [<ffffffffa03357a4>] br_nf_forward_finish+0x154/0x170 [bridge]
[ 4593.956395] RSP: 0018:ffff880001e63b78  EFLAGS: 00010246
[ 4593.956399] RAX: 0000000000000608 RBX: ffff880057181700 RCX: ffff8801b813d000
[ 4593.956402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880057181700
[ 4593.956406] RBP: ffff880001e63ba8 R08: ffff8801b9d97000 R09: ffffffffa0335650
[ 4593.956410] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b813d000
[ 4593.956413] R13: ffffffff81ab3940 R14: ffff880057181700 R15: 0000000000000002
[ 4593.956418] FS:  00007fc40d380710(0000) GS:ffff880001e60000(0000) knlGS:0000000000000000
[ 4593.956422] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 4593.956426] CR2: 0000000000000018 CR3: 00000001ba1d7000 CR4: 00000000000026e0
[ 4593.956429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4593.956433] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4593.956437] Process kvm (pid: 29512, threadinfo ffff8801ba566000, task ffff8801b8003870)
[ 4593.956441] Stack:
[ 4593.956443]  0000000100000020 ffff880001e63ba0 ffff880001e63ba0 ffff880057181700
[ 4593.956451] <0> ffffffffa0335650 ffffffff81ab3940 ffff880001e63bd8 ffffffffa03350e6
[ 4593.956462] <0> ffff880001e63c40 000000000000024d ffff880057181700 0000000080000000
[ 4593.956474] Call Trace:
[ 4593.956478]  <IRQ>
[ 4593.956488]  [<ffffffffa0335650>] ? br_nf_forward_finish+0x0/0x170 [bridge]
[ 4593.956496]  [<ffffffffa03350e6>] NF_HOOK_THRESH+0x56/0x60 [bridge]
[ 4593.956504]  [<ffffffffa0335282>] br_nf_forward_arp+0x112/0x120 [bridge]
[ 4593.956511]  [<ffffffff813f7184>] nf_iterate+0x64/0xa0
[ 4593.956519]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
[ 4593.956524]  [<ffffffff813f722c>] nf_hook_slow+0x6c/0x100
[ 4593.956531]  [<ffffffffa032f920>] ? br_forward_finish+0x0/0x60 [bridge]
[ 4593.956538]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]
[ 4593.956545]  [<ffffffffa032f86d>] __br_forward+0x6d/0xc0 [bridge]
[ 4593.956550]  [<ffffffff813c5d8e>] ? skb_clone+0x3e/0x70
[ 4593.956557]  [<ffffffffa032f462>] deliver_clone+0x32/0x60 [bridge]
[ 4593.956564]  [<ffffffffa032f6b6>] br_flood+0xa6/0xe0 [bridge]
[ 4593.956571]  [<ffffffffa032f800>] ? __br_forward+0x0/0xc0 [bridge]

Don't call nf_bridge_update_protocol() for ARP traffic as skb->nf_bridge isn't
used in the ARP case.

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-13 14:55:34 +02:00
Patrick McHardy
cba7a98a47 Merge branch 'master' of git://dev.medozas.de/linux 2010-05-11 18:59:21 +02:00
Jan Engelhardt
b4ba26119b netfilter: xtables: change hotdrop pointer to direct modification
Since xt_action_param is writable, let's use it. The pointer to
'bool hotdrop' always worried (8 bytes (64-bit) to write 1 byte!).
Surprisingly results in a reduction in size:

   text    data     bss filename
5457066  692730  357892 vmlinux.o-prev
5456554  692730  357892 vmlinux.o

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:35:27 +02:00
Jan Engelhardt
62fc805108 netfilter: xtables: deconstify struct xt_action_param for matches
In future, layer-3 matches will be an xt module of their own, and
need to set the fragoff and thoff fields. Adding more pointers would
needlessy increase memory requirements (esp. so for 64-bit, where
pointers are wider).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:33:37 +02:00
Jan Engelhardt
4b560b447d netfilter: xtables: substitute temporary defines by final name
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:31:17 +02:00
Jan Engelhardt
de74c16996 netfilter: xtables: combine struct xt_match_param and xt_target_param
The structures carried - besides match/target - almost the same data.
It is possible to combine them, as extensions are evaluated serially,
and so, the callers end up a little smaller.

  text  data  bss  filename
-15318   740  104  net/ipv4/netfilter/ip_tables.o
+15286   740  104  net/ipv4/netfilter/ip_tables.o
-15333   540  152  net/ipv6/netfilter/ip6_tables.o
+15269   540  152  net/ipv6/netfilter/ip6_tables.o

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-05-11 18:23:43 +02:00
Patrick McHardy
1e4b105712 Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	net/bridge/br_device.c
	net/bridge/br_forward.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-10 18:39:28 +02:00
WANG Cong
c06ee961d3 bridge: make bridge support netpoll
Based on the previous patch, make bridge support netpoll by:

1) implement the 2 methods to support netpoll for bridge;

2) modify netpoll during forwarding packets via bridge;

3) disable netpoll support of bridge when a netpoll-unabled device
   is added to bridge;

4) enable netpoll support when all underlying devices support netpoll.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-06 00:48:24 -07:00
stephen hemminger
afe0159d93 bridge: multicast_flood cleanup
Move some declarations around to make it clearer which variables
are being used inside loop.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:55 -07:00
stephen hemminger
83f6a740b4 bridge: multicast port group RCU fix
The recently introduced bridge mulitcast port group list was only
partially using RCU correctly. It was missing rcu_dereference()
and missing the necessary barrier on deletion.

The code should have used one of the standard list methods (list or hlist)
instead of open coding a RCU based link list.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:54 -07:00
stephen hemminger
168d40ee3d bridge: multicast flood
Fix unsafe usage of RCU. Would never work on Alpha SMP because
of lack of rcu_dereference()

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:54 -07:00
stephen hemminger
7e80c12448 bridge: simplify multicast_add_router
By coding slightly differently, there are only two cases
to deal with: add at head and add after previous entry.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:53 -07:00
David S. Miller
709b9326ef Revert "bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()"
This reverts commit ff65e8275f.

As explained by Stephen Hemminger, the traversal doesn't require
RCU handling as we hold a lock.

The list addition et al. calls, on the other hand, do.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 16:49:58 -07:00
David S. Miller
ff65e8275f bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()
Noticed by Michał Mirosław.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 16:26:49 -07:00
stephen hemminger
dcdca2c49b bridge: multicast router list manipulation
I prefer that the hlist be only accessed through the hlist macro
objects. Explicit twiddling of links (especially with RCU) exposes
the code to future bugs.

Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 12:53:24 -07:00
stephen hemminger
7180f7751d bridge: use is_multicast_ether_addr
Use existing inline function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 12:53:24 -07:00
David S. Miller
d4c4f07df1 bridge: Fix build of ipv6 multicast code.
Based upon a report from Stephen Rothwell:

--------------------
net/bridge/br_multicast.c: In function 'br_ip6_multicast_alloc_query':
net/bridge/br_multicast.c:469: error: implicit declaration of function 'csum_ipv6_magic'

Introduced by commit 08b202b672 ("bridge
br_multicast: IPv6 MLD support") from the net tree.

csum_ipv6_magic is declared in net/ip6_checksum.h ...
--------------------

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 10:16:54 -07:00
YOSHIFUJI Hideaki / 吉藤英明
1fafc7a935 bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.
Even with commit 32dec5dd02 ("bridge
br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only
without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is
not appropriately initialized if IGMP/MLD snooping support is
compiled and disabled, so we can see garbage.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26 11:26:58 -07:00
YOSHIFUJI Hideaki
08b202b672 bridge br_multicast: IPv6 MLD support.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23 13:35:56 +09:00
YOSHIFUJI Hideaki
8ef2a9a598 bridge br_multicast: Make functions less ipv4 dependent.
Introduce struct br_ip{} to store ip address and protocol
and make functions more generic so that we can support
both IPv4 and IPv6 with less pain.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23 13:35:55 +09:00
David S. Miller
87eb367003 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-6000.c
	net/core/dev.c
2010-04-21 01:14:25 -07:00
Eric Dumazet
0eae88f31c net: Fix various endianness glitches
Sparse can help us find endianness bugs, but we need to make some
cleanups to be able to more easily spot real bugs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 19:06:52 -07:00
Eric Dumazet
8eabf95cb1 bridge: add a missing ntohs()
grec_nsrcs is in network order, we should convert to host horder in
br_multicast_igmp3_report()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 18:51:57 -07:00
Bart De Schuymer
6c79bf0f24 netfilter: bridge-netfilter: fix refragmenting IP traffic encapsulated in PPPoE traffic
The MTU for IP traffic encapsulated inside PPPoE traffic is smaller
than the MTU of the Ethernet device (1500). Connection tracking
gathers all IP packets and sometimes will refragment them in
ip_fragment(). We then need to subtract the length of the
encapsulating header from the mtu used in ip_fragment(). The check in
br_nf_dev_queue_xmit() which determines if ip_fragment() has to be
called is also updated for the PPPoE-encapsulated packets.
nf_bridge_copy_header() is also updated to make sure the PPPoE data
length field has the correct value.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-20 16:22:01 +02:00
Patrick McHardy
6291055465 Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	net/ipv6/netfilter/ip6t_REJECT.c
	net/netfilter/xt_limit.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-20 16:02:01 +02:00
Bart De Schuymer
e179e6322a netfilter: bridge-netfilter: Fix MAC header handling with IP DNAT
- fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
neigh_hh_output() or dst->neighbour->output() overwrite the complete
Ethernet header, although we only need the destination MAC address.
For encapsulated packets, they ended up overwriting the encapsulating
header. The new code copies the Ethernet source MAC address and
protocol number before calling dst->neighbour->output(). The Ethernet
source MAC and protocol number are copied back in place in
br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
more transparent because in the old scheme the source MAC of the
bridge was copied into the source address in the Ethernet header. We
also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
execution of the PF_INET resp. PF_INET6 hooks.

- Speed up IP DNAT by calling neigh_hh_bridge() instead of
neigh_hh_output(): if dst->hh is available, we already know the MAC
address so we can just copy it.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15 12:26:39 +02:00
Bart De Schuymer
ea2d9b41bd netfilter: bridge-netfilter: simplify IP DNAT
Remove br_netfilter.c::br_nf_local_out(). The function
br_nf_local_out() was needed because the PF_BRIDGE::LOCAL_OUT hook
could be called when IP DNAT happens on to-be-bridged traffic. The
new scheme eliminates this mess.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15 12:14:51 +02:00
Bart De Schuymer
e26c28e8bf netfilter: bridge-netfilter: update a comment in br_forward.c about ip_fragment()
ip_refrag isn't used anymore in the bridge-netfilter code

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-13 11:41:39 +02:00
Bart De Schuymer
8237908e14 netfilter: bridge-netfilter: cleanup br_netfilter.c
bridge-netfilter: cleanup br_netfilter.c

- remove some of the graffiti at the head of br_netfilter.c
- remove __br_dnat_complain()
- remove KERN_INFO messages when CONFIG_NETFILTER_DEBUG is defined

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-13 11:40:41 +02:00
David S. Miller
871039f02f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/stmmac/stmmac_main.c
	drivers/net/wireless/wl12xx/wl1271_cmd.c
	drivers/net/wireless/wl12xx/wl1271_main.c
	drivers/net/wireless/wl12xx/wl1271_spi.c
	net/core/ethtool.c
	net/mac80211/scan.c
2010-04-11 14:53:53 -07:00
David S. Miller
4a1032faac Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-04-11 02:44:30 -07:00
Herbert Xu
fd218cf955 bridge: Fix IGMP3 report parsing
The IGMP3 report parsing is looking at the wrong address for
group records.  This patch fixes it.

Reported-by: Banyeer <banyeer@yahoo.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:20:47 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jan Engelhardt
d6b00a5345 netfilter: xtables: change targets to return error code
Part of the transition of done by this semantic patch:
// <smpl>
@ rule1 @
struct xt_target ops;
identifier check;
@@
 ops.checkentry = check;

@@
identifier rule1.check;
@@
 check(...) { <...
-return true;
+return 0;
 ...> }

@@
identifier rule1.check;
@@
 check(...) { <...
-return false;
+return -EINVAL;
 ...> }
// </smpl>

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 16:55:49 +01:00
Jan Engelhardt
bd414ee605 netfilter: xtables: change matches to return error code
The following semantic patch does part of the transformation:
// <smpl>
@ rule1 @
struct xt_match ops;
identifier check;
@@
 ops.checkentry = check;

@@
identifier rule1.check;
@@
 check(...) { <...
-return true;
+return 0;
 ...> }

@@
identifier rule1.check;
@@
 check(...) { <...
-return false;
+return -EINVAL;
 ...> }
// </smpl>

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 16:55:24 +01:00
Jan Engelhardt
135367b8f6 netfilter: xtables: change xt_target.checkentry return type
Restore function signatures from bool to int so that we can report
memory allocation failures or similar using -ENOMEM rather than
always having to pass -EINVAL back.

// <smpl>
@@
type bool;
identifier check, par;
@@
-bool check
+int check
 (struct xt_tgchk_param *par) { ... }
// </smpl>

Minus the change it does to xt_ct_find_proto.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 16:04:33 +01:00
Jan Engelhardt
b0f38452ff netfilter: xtables: change xt_match.checkentry return type
Restore function signatures from bool to int so that we can report
memory allocation failures or similar using -ENOMEM rather than
always having to pass -EINVAL back.

This semantic patch may not be too precise (checking for functions
that use xt_mtchk_param rather than functions referenced by
xt_match.checkentry), but reviewed, it produced the intended result.

// <smpl>
@@
type bool;
identifier check, par;
@@
-bool check
+int check
 (struct xt_mtchk_param *par) { ... }
// </smpl>

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 16:03:13 +01:00
Jan Engelhardt
713aefa3fb netfilter: bridge: use NFPROTO values for NF_HOOK invocation
The first argument to NF_HOOK* is an nfproto since quite some time.
Commit v2.6.27-2457-gfdc9314 was the first to practically start using
the new names. Do that now for the remaining NF_HOOK calls.

The semantic patch used was:
// <smpl>
@@
@@
(NF_HOOK
|NF_HOOK_THRESH
)(
-PF_BRIDGE,
+NFPROTO_BRIDGE,
 ...)

@@
@@
 NF_HOOK(
-PF_INET6,
+NFPROTO_IPV6,
 ...)

@@
@@
 NF_HOOK(
-PF_INET,
+NFPROTO_IPV4,
 ...)
// </smpl>

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 16:00:29 +01:00
Jan Engelhardt
fd0ec0e621 netfilter: xtables: consolidate code into xt_request_find_match
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 15:02:19 +01:00
Jan Engelhardt
d2a7b6bad2 netfilter: xtables: make use of xt_request_find_target
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 15:02:19 +01:00
Jan Engelhardt
ff67e4e42b netfilter: xt extensions: use pr_<level> (2)
Supplement to 1159683ef4.

Downgrade the log level to INFO for most checkentry messages as they
are, IMO, just an extra information to the -EINVAL code that is
returned as part of a parameter "constraint violation". Leave errors
to real errors, such as being unable to create a LED trigger.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-25 15:00:04 +01:00
Dan Carpenter
7668448ea9 bridge: cleanup: remove unused assignment
We never actually use iph again so this assignment can be removed.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 21:21:58 -07:00
Jiri Pirko
1c01fe14a8 net: forbid underlaying devices to change its type
It's not desired for underlaying devices to change type. At the time,
there is for example possible to have bond with changed type from
Ethernet to Infiniband as a port of a bridge. This patch fixes this.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:00:02 -07:00
Jan Engelhardt
85bc3f3814 netfilter: xtables: do not print any messages on ENOMEM
ENOMEM is a very obvious error code (cf. EINVAL), so I think we do not
really need a warning message. Not to mention that if the allocation
fails, the user is most likely going to get a stack trace from slab
already.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-03-18 14:20:07 +01:00