Commit Graph

138 Commits

Author SHA1 Message Date
David S. Miller
b9e4085768 netdev: Do not use TX lock to protect address lists.
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:15:08 -07:00
David S. Miller
e308a5d806 netdev: Add netdev->addr_list_lock protection.
Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:13:44 -07:00
Wang Chen
7e1a1ac1fb bonding: Check return of dev_set_promiscuity/allmulti
dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

In bond_alb and bond_main, we check all positive increment for promiscuity
and allmulti to get error return.
But there are still two problems left.
1. Some code path has no mechanism to signal errors upstream.
2. If there are multi slaves, it's hard to tell which slaves increment
   promisc/allmulti successfully and which failed.
So I left these problems to be FIXME.
Fortunately, the overflow is very rare case.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 20:51:36 -07:00
David S. Miller
c773e847ea netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue.
Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:13:53 -07:00
Jay Vosburgh
b8a9787edd bonding: Allow setting max_bonds to zero
Permit bonding to function rationally if max_bonds is set to
zero.  This will load the module, but create no master devices (which can
be created via sysfs).

	Requires some change to bond_create_sysfs; currently, the
netdev sysfs directory is determined from the first bonding device created,
but this is no longer possible.  Instead, an interface from net/core is
created to create and destroy files in net_class.

	Based on a patch submitted by Phil Oester <kernel@linuxaces.com>.
Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to
update the documentation.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-18 00:00:04 -04:00
Jay Vosburgh
b59f9f74c4 bonding: Rework / fix multiple gratuitous ARP support
Support for sending multiple gratuitous ARPs during failovers
was added by commit:

commit 7893b2491a
Author: Moni Shoua <monis@voltaire.com>
Date:   Sat May 17 21:10:12 2008 -0700

    bonding: Send more than one gratuitous ARP when slave takes over

	This change modifies that support to remove duplicated code,
add support for ARP monitor (the original only supported miimon), clear
the grat ARP counter in bond_close (lest a later "ifconfig up" immediately
start spewing ARPs), and add documentation for the module parameter.

	Also updated driver version to 3.3.0.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-18 00:00:04 -04:00
Or Gerlitz
01f3109de4 bonding: deliver netdev event for fail-over under the active-backup mode
under active-backup mode and when there's actual new_active slave,
have bond_change_active_slave() call the networking core to deliver
NETDEV_BONDING_FAILOVER event such that the fail-over can be notable
by code outside of the bonding driver such as the RDMA stack and
monitoring tools.

As the correct context of locking appropriate for notifier calls is RTNL
and nothing else, bond->curr_slave_lock and bond->lock are unlocked and
later locked again. This is ensured by the rest of the code to be safe
under backup-mode AND when new_active is not NULL.

Jay Vosburgh modified the original patch for formatting and fixed a
compiler error.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-17 23:59:42 -04:00
Or Gerlitz
709f8a45e8 bonding: bond_change_active_slave() cleanup under active-backup
simplified the code of bond_change_active_slave() such that under
active-backup mode there's one "if (new_active)" test and the rest
of the code only does extra checks on top of it. This removed an
unneeded "if (bond->send_grat_arp > 0)" check and avoid calling
bond_send_gratuitous_arp when there's no active slave.

Jay Vosburgh made minor coding style changes to the orignal patch.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-17 23:59:41 -04:00
Jay Vosburgh
3915c1e863 bonding: Add "follow" option to fail_over_mac
Add a "follow" selection for fail_over_mac.  This option
causes the MAC address to move from slave to slave as the active
slave changes.  This is in addition to the existing fail_over_mac option
that causes the bond's MAC address to change during failover.

	This new option is useful for devices that cannot tolerate
multiple ports using the same MAC address simultaneously, either
because it confuses them or incurs a performance penalty (as is the
case with some LPAR-aware multiport devices).  Because the MAC of the
bond itself does not change, the "follow" option is slightly more
reliable during failover and doesn't change the MAC of the bond during
operation.

	This patch requires a previous ARP monitor change to properly
handle RTNL during failovers.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:29 -04:00
Jay Vosburgh
b2220cad58 bonding: refactor ARP active-backup monitor
Refactor ARP monitor for active-backup mode.  The motivation for
this is to take care of locking issues in a clear manner (particularly to
correctly handle RTNL vs. the bonding locks).  Currently, the a-b ARP
monitor does not hold RTNL at all, but future changes will require RTNL
during ARP monitor failovers.

	Rather than using conditional locking, this patch instead breaks
up the ARP monitor into three discrete steps: inspection, commit changes,
and probe.  The inspection phase marks slaves that require link state
changes.  The commit phase is only called if inspection detects that
changes are needed, and is called with RTNL.  Lastly, the probe phase
issues the ARP probes that the inspection phase uses to determine link
state.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:28 -04:00
Moni Shoua
7893b2491a bonding: Send more than one gratuitous ARP when slave takes over
With IPoIB, reception of gratuitous ARP by neighboring hosts
is essential for a successful change of slaves in case of failure.
Otherwise, they won't learn about the HW address change and need
to wait a long time until the neighboring system gives up and sends
an ARP request to learn the new HW address.  This patch decreases
the chance for a lost of a gratuitous ARP packet by sending it more
than once. The number retries is configurable and can be set with a
module param.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:26 -04:00
Pavel Emelyanov
8047637c70 bonding: Remove unneeded list_empty checks.
Some places iterate over the checked list right after the check
itself, so even if the list is empty, the list_for_each_xxx
iterator will make everything right by himself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:25 -04:00
Pavel Emelyanov
0883beca7f bonding: Relax unneeded _safe lists iterations.
Many places either do not modify the list under the list_for_each_xxx,
or break out of the loop as soon as the first element is removed.

Thus, this _safe iteration just occupies some unneeded .text space
and requires an additional variable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:22 -04:00
Pavel Emelyanov
0dd646fe05 bonding: Remove redundant argument from bond_create.
While we're fixing the bond_create, I hope it's OK to polish it
a bit after the fixes.

The third argument is NULL at the first caller and is ignored by
the second one, so remove it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:21 -04:00
Jay Vosburgh
4b8a9239ee bonding: remove test for IP in ARP monitor
Remove bond_has_ip and all references to it.  With this change,
the ARP monitor will always send ARP probes if the master is up and has
at least one slave.  If the bond has an IP address, it is used in the
ARP probe; if not, the probes are sent with all zeros in the sender's
IP address (which is consistent with an RFC 2131 4.4.1 duplicate address
probe).

	This is useful for cases when bonding itself is hidden underneath
a layer of virtual devices, e.g., with Xen.

	Change suggested by Tsutomu Fujii <t-fujii@nb.jp.nec.com>, who
included a one-line patch that only affected active-backup mode.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:18 -04:00
Jay Vosburgh
5ce0da8f03 bonding: Use msecs_to_jiffies, eliminate panic
Convert bonding to use msecs_to_jiffies instead of doing the
math.  For the ARP monitor, there was an underflow problem that could
result in an infinite loop.  The miimon already had that worked around,
but this is cleaner.

	Originally by Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Jay Vosburgh corrected a math error in the original; Nicolas' original
commit message is:

When setting arp_interval parameter to a very low value, delta_in_ticks
for next arp might become 0, causing an infinite loop.

See http://bugzilla.kernel.org/show_bug.cgi?id=10680

Same problem for miimon parameter already fixed, but fix might be
enhanced, by using msecs_to_jiffies() function.

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:17 -04:00
Jay Vosburgh
569f0c4d90 bonding: fix enslavement error unwinds
As part of:

commit c2edacf80e
Author: Jay Vosburgh <fubar@us.ibm.com>
Date:   Mon Jul 9 10:42:47 2007 -0700

    bonding / ipv6: no addrconf for slaves separately from master

two steps were rearranged in the enslavement process: netdev_set_master
is now before the call to dev_open to open the slave.

	This patch updates the error cases and unwind process at the
end of bond_enslave to match the new order.  Without this patch, it is
possible for the enslavement to fail, but leave the slave with IFF_SLAVE
set in its flags.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-06 12:01:32 -04:00
Pavel Emelyanov
ae68c39819 bonding: Deadlock between bonding_store_bonds and bond_destroy_sysfs.
The sysfs layer has an internal protection, that ensures, that
all the process sitting inside ->sore/->show callback exits
before the appropriate entry is unregistered (the calltraces
are rather big, but I can provide them if required).

On the other hand, bonding takes rtnl_lock in
a) the bonding_store_bonds, i.e. in ->store callback,
b) module exit before calling the sysfs unregister routines.

Thus, the classical AB-BA deadlock may occur. To reproduce run
# while :; do modprobe bonding; rmmod bonding; done
and
# while :; do echo '+bond%d' > /sys/class/net/bonding_masters ; done
in parallel.

The fix is to move the bond_destroy_sysfs out of the rtnl_lock,
but _before_ bond_free_all to make sure no bonding devices exist
after module unload.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-06 12:01:30 -04:00
Pavel Emelyanov
822973ba79 bonding: Do not call free_netdev for already registered device.
If the call to bond_create_sysfs_entry in bond_create fails, the
proper rollback is to call unregister_netdevice, not free_netdev.
Otherwise - kernel BUG at net/core/dev.c:4057!

Checked with artificial failures injected into bond_create_sysfs_entry.

Pavel's original patch modified by Jay Vosburgh to move code around
for clarity (remove goto-hopping within the unwind block).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-06 12:01:27 -04:00
Denis V. Lunev
a95609cb02 netdev: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:22 -07:00
Joe Perches
dc13b38599 drivers/net/bonding/bond_main.c - remove unnecessary #define
bond_main.c already #includes <linux/seq_file.h>

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-04-16 20:09:35 -04:00
David S. Miller
8e8e43843b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/usb/rndis_host.c
	drivers/net/wireless/b43/dma.c
	net/ipv6/ndisc.c
2008-03-27 18:48:56 -07:00
Libor Pechacek
92b41daa45 bonding: Fix sysfs attribute handling
For bonding interfaces any attempt to read the sysfs directory contents after
module removal results in an oops.  The fix is to release sysfs attributes
for the interfaces upon module unload.

Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-03-25 23:15:48 -04:00
Jay Vosburgh
966bc6f434 bonding: fix two compiler warnings
Fix two compiler warnings that are new with recent versions of gcc
(apparently 4.2 and up).  One is fixed by refactoring; this change was
supplied by Stephen Hemminger.  The other was fixed by labelling the
variable as uninitialized_var() after confirming via inspection that it
cannot actually be used uninitialized.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-03-25 23:15:40 -04:00
YOSHIFUJI Hideaki
c346dca108 [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:53 +09:00
Pavel Emelyanov
988b705077 [ARP]: Introduce the arp_hdr_len helper.
There are some place, that calculate the ARP header length. These
calculations are correct, but 
 a) some operate with "magic" constants,
 b) enlarge the code length (sometimes at the cost of coding style),
 c) are not informative from the first glance.

The proposal is to introduce a helper, that includes all the good
sides of these calculations.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 12:20:57 -08:00
Denis V. Lunev
6133fb1aa1 [NETNS]: Disable inetaddr notifiers in namespaces other than initial.
ip_fib_init is kept enabled. It is already namespace-aware.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:46:17 -08:00
Jay Vosburgh
21c9d8d73d bonding: do not acquire rtnl in ARP monitor
The ARP monitor functions currently acquire RTNL when performing
failover operations, but do so incorrectly (out of order).  This causes
various warnings from might_sleep.

	The ARP monitor isn't supported for any of the bonding modes
that actually require RTNL, so it is safe to not hold RTNL when
failing over in the ARP monitor.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:33 -08:00
Andy Gospodarek
2439f9ebd4 bonding: fix race that causes invalid statistics
I've seen reports of invalid stats in /proc/net/dev for bonding
interfaces, and found it's a pretty easy problem to reproduce.  Since
the current code zeros the bonding stats when a read is requested and a
pointer to that data is returned to the caller we cannot guarantee that
the caller has completely accessed the data before a successive call to
request the stats zeroes the stats again.

This patch creates a new stack variable to keep track of the updated
stats and copies the data from that variable into the bonding stats
structure.  This ensures that the value for any of the bonding stats
should not incorrectly return zero for any of the bonding statistics.
This does use more stack space and require an extra memcpy, but it seems
like a fair trade-off for consistently correct bonding statistics.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Chris Snook <csnook@redhat.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:13 -08:00
Jay Vosburgh
4fe4763cd8 bonding: fix NULL pointer deref in startup processing
Fix the "are we creating a duplicate" check to not compare
the name if the name is NULL (meaning that the system should select
a name).  Bug reported by Benny Amorsen <benny+usenet@amorsen.dk>.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:12 -08:00
Jay Vosburgh
80ee5ad231 bonding: fix set_multicast_list locking
This patch eliminates a problem (reported by lockdep) in the
bond_set_multicast_list function.  It first reduces the locking on
bond->lock to a simple read_lock, and second, adds netif_tx locking
around the bonding mc_list manipulations that occur outside of the
set_multicast_list function.

	The original problem was related to IPv6 addrconf activity.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:12 -08:00
Jay Vosburgh
a42e534f1b bonding: fix parameter parsing
My last fix (commit ece95f7fef)
didn't handle one case correctly.  This resolves that, and it will now
correctly parse parameters with arbitrary white space, and either text
names or mode values.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:11 -08:00
Denis V. Lunev
f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Jay Vosburgh
5655662dab bonding: Don't hold lock when calling rtnl_unlock
Change bond_mii_monitor to not hold any locks when calling rtnl_unlock,
as rtnl_unlock can sleep (when acquring another mutex in netdev_run_todo).

Bug reported by Makito SHIOKAWA <mshiokawa@miraclelinux.com>, who
included a different patch.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-18 14:38:39 -05:00
Jay Vosburgh
027ea0416c bonding: fix lock ordering for rtnl and bonding_rwsem
Fix the handling of rtnl and the bonding_rwsem to always be acquired
in a consistent order (rtnl, then bonding_rwsem).

The existing code sometimes acquired them in this order, and sometimes
in the opposite order, which opens a window for deadlock between ifenslave
and sysfs.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-18 14:38:39 -05:00
Jay Vosburgh
ece95f7fef bonding: Fix up parameter parsing
A recent change to add an additional hash policy modified
bond_parse_parm, but it now does not correctly match parameters passed in
via sysfs.

	Rewrote bond_parse_parm to handle (a) parameter matches that
are substrings of one another and (b) user input with whitespace (e.g.,
sysfs input often has a trailing newline).

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-18 14:38:38 -05:00
Jay Vosburgh
3b96c858fc bonding: release slaves when master removed via sysfs
Add a call to bond_release_all in the bonding netdev event
handler for the master.  This releases the slaves for the case of, e.g.,
"echo -bond0 > /sys/class/net/bonding_masters", which otherwise will spin
forever waiting for references to be released.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-18 14:38:38 -05:00
Jay Vosburgh
2543331d36 bonding: fix locking during alb failover and slave removal
alb_fasten_mac_swap (actually rlb_teach_disabled_mac_on_primary)
requries RTNL and no other locks.  This could cause dev_set_promiscuity
and/or dev_set_mac_address to be called with improper locking.

	Changed callers to hold only RTNL during calls to alb_fasten_mac_swap
or functions calling it.  Updated header comments in affected functions to
reflect proper reality of locking requirements.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-18 14:38:38 -05:00
Jay Vosburgh
fdaea7a93d bonding: Fix race at module unload
Fixes a race condition in module unload.  Without this change,
workqueue events may fire while bonding data structures are partially
freed but before bond_close() is invoked by unregister_netdevice().

	Update version to 3.2.3.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-07 15:00:34 -05:00
Jay Vosburgh
6f6652be18 bonding: Add new layer2+3 hash for xor/802.3ad modes
Add new hash for balance-xor and 802.3ad modes.  Originally
 submitted by "Glenn Griffin" <ggriffin.kernel@gmail.com>; modified by
 Jay Vosburgh to move setting of hash policy out of line, tweak the
 documentation update and add version update to 3.2.2.

	Glenn's original comment follows:

Included is a patch for a new xmit_hash_policy for the bonding driver
that selects slaves based on MAC and IP information.  This is a middle
ground between what currently exists in the layer2 only policy and the
layer3+4 policy.  This policy strives to be fully 802.3ad compliant by
transmitting every packet of any particular flow over the same link.
As documented the layer3+4 policy is not fully compliant for extreme
cases such as ip fragmentation, so this policy is a nice compromise
for environments that require full compliance but desire more than the
layer2 only policy.

Signed-off-by: "Glenn Griffin" <ggriffin.kernel@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-07 15:00:32 -05:00
David Sterba
b63bb739a1 bonding: Fix time comparison
From: David Sterba <dsterba@suse.cz>

Use macros for comparing jiffies. Jiffies' wrap caused missed events and hangs.
Module reinsert was needed to make bonding work again.

Signed-off-by: David Sterba <dsterba@suse.cz>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-07 15:00:30 -05:00
Jay Vosburgh
8cbdeec637 [BONDING]: Fix resource use after free
Fix bond_destroy and bond_free_all to not reference the struct
net_device after calling unregister_netdevice.

Bug and offending change reported by Moni Shoua <monis@voltaire.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-13 21:16:29 -08:00
Jay Vosburgh
3a1521b7e5 bonding: don't validate address at device open
The standard validate_addr handler refuses to accept the all zeroes address
as valid.  However, it's common historical practice for the bonding
master to be configured up prior to having any slaves, at which time the
master will have a MAC address of all zeroes.

Resolved by setting the dev->validate_addr to NULL.  The master still can't
end up with an invalid address, as the set_mac_address function tests
for validity.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-10 04:25:14 -05:00
Adrian Bunk
a40745f5ef bonding/bond_main.c: fix cut'n'paste error
This patch fixes a cut'n'paste error in
commit 1b76b31693.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-25 03:31:14 -04:00
Adrian Bunk
c50b85d0fb make bonding/bond_main.c:bond_deinit() static
bond_deinit() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-25 03:31:14 -04:00
Jay Vosburgh
6603a6f25e bonding: Convert more locks to _bh, acquire rtnl, for new locking
Convert more lock acquisitions to _bh flavor to avoid deadlock
with workqueue activity and add acquisition of RTNL in appropriate places.
Affects ALB mode, as well as core bonding functions and sysfs.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-23 20:32:00 -04:00
Jay Vosburgh
059fe7a578 bonding: Convert locks to _bh, rework alb locking for new locking
Convert locking-related activity to new & improved system.
Convert some lock acquisitions to _bh and rework parts of ALB mode, both
to avoid deadlocks with workqueue activity.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-23 20:32:00 -04:00
Jay Vosburgh
0b0eef6641 bonding: Convert miimon to new locking
Convert mii (link state) monitor to acquire correct locks for
failover events.  In particular, failovers generally require RTNL at a low
level (when manipulating device MAC addresses, for example) and no other
locks.  The high level monitor is responsible for acquiring a known set
of locks, RTNL, the bond->lock for read and the slave_lock for write, and
the low level failover processing can then release appropriate locks as
needed.  This patch provides the high level portion.

	As it is undesirable to acquire RTNL for every monitor pass (which
may occur as often as every 10 ms), the miimon has been converted to
do conditional locking.  A first pass inspects all slaves to determine
if any action is required, and if so, a second pass (after acquring RTNL)
is done to perform any actions (doing a complete rescan, as the situation
may have changed when all locks were released).

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-23 20:32:00 -04:00
Jay Vosburgh
cf5f904493 bonding: Convert balance-rr transmit to new locking
Change locking in balance-rr transmit processing to use a free
running counter to determine which slave to transmit on.  Instead, a
free-running counter is maintained, and modulo arithmetic used to select
a slave for transmit.

	This removes lock operations from the TX path, and eliminates
a deadlock introduced by the conversion to work queues.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-23 20:32:00 -04:00
Jay Vosburgh
1b76b31693 Convert bonding timers to workqueues
Convert bonding timers to workqueues.  This converts the various
monitor functions to run in periodic work queues instead of timers.  This
patch introduces the framework and convers the calls, but does not resolve
various locking issues, and does not stand alone.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-23 20:32:00 -04:00