When the register_netdevice() call fails, the newly allocated device is
not freed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to maintain stats in the bonding structure.
Use the instance of net_device_stats in netdevice.
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The convention for API functions in kernel is to return errno value;
bond_open would return -1 if alb setup failed. The only reason that
could happen is if kmalloc() failed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__net_init/__net_exit are apparently not going away, so use them
to full extent.
In some cases __net_init was removed, because it was called from
__net_exit code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows a bond device to specify an arp_ip_target as a host that is
not on the same vlan as the base bond device and still use arp
validation. A configuration like this, now works:
BONDING_OPTS="mode=active-backup arp_interval=1000 arp_ip_target=10.0.100.1 arp_validate=3"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
inet6 fe80::213:21ff:febe:33e9/64 scope link
valid_lft forever preferred_lft forever
9: bond0.100@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff
inet 10.0.100.2/24 brd 10.0.100.255 scope global bond0.100
inet6 fe80::213:21ff:febe:33e9/64 scope link
valid_lft forever preferred_lft forever
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.0.100.1
Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:40:05:30:ff:30
Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:13:21:be:33:e9
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove DRV_NAME from pr_<level>s
Consolidate long format strings
Remove some extra tab indents
Remove some unnecessary ()s from pr_<level>s arguments
Align pr_<level> arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only files where David Miller is the primary git-signer.
wireless, wimax, ixgbe, etc are not modified.
Compile tested x86 allyesconfig only
Not all files compiled (not x86 compatible)
Added a few > 80 column lines, which I ignored.
Existing checkpatch complaints ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take advantage of the new pernet automatic storage management,
and stop using compatibility network namespace functions.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Don't call rtnl_link_unregister if rtnl_link_register fails
- Set .priv_size so we aren't stomping on uninitialized memory
when we use netdev_priv, on bond devices created with
ip link add type bond.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a basic set of rtnl link ops and takes advantage of
the fact that rtnl_link_unregister kills all of the surviving
devices to all us to kill bond_free_all. A module alias
is added so ip link add can pull in the bonding module.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manually inline the code from bond_deinit to bond_uninit. bond_uninit
is the only caller and it is short.
Move the call of bond_release_all from the netdev notifier into
bond_uninit. The call site is effectively the same and performing
the call explicitly allows all the paths for destroying a
bonding device to behave the same way.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop calling dev_get_by_name to see if the bond device already
exists. register_netdevice already does that.
Stop calling bond_deinit if register_netdevice fails as bond_uninit
is guaranteed to be called if bond_init succeeds.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch delegates the work of creating the sysfs groups
to the netdev layer and ultimately to the device layer. This
closes races between uevents.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mii monitor mode, bond_check_dev_link() calls the the ioctl
handler of slave devices. It stores the ndo_do_ioctl function
pointer to a static (!) ioctl variable and later uses it to call the
handler with the IOCTL macro.
If another thread executes bond_check_dev_link() at the same time
(even with a different bond, which none of the locks prevent), a
race condition occurs. If the two racing slaves have different
drivers, this may result in one driver's ioctl handler being
called with a pointer to a net_device controlled with a different
driver, resulting in unpredictable breakage.
Unless I am overlooking something, the "static" must be a
copy'n'paste error (?).
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the bonding device is no longer used in determining the device to
which to send packets, it can be dropped from the argument list of the various
xmit_hash_policy calls.
Signed-off-by: Jasper Spaans <spaans@fox-it.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify bonding hash transmit policies to use the psource MAC address of
the packet instead of the MAC address configured for the bonding device.
The old sitation conflicts with the documentation.
Signed-off-by: Jasper Spaans <spaans@fox-it.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function bond_create_proc_entry is currently of type int.
Two versions of this function exist:
The one in the ifdef CONFIG_PROC_FS branch always return 0.
The one in the else branch (which is empty) return nothing.
When CONFIG_PROC_FS is undef, this cause the following warning:
drivers/net/bonding/bond_main.c: In function `bond_create_proc_entry':
drivers/net/bonding/bond_main.c:3393: warning: control reaches end of
non-void function
No caller of this function use the returned value.
So change the returned type from int to void and remove the
useless return 0; .
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable old_active is first set to bond->curr_active_slave.
Then, it is unconditionally set to new_active, without being used in between.
The first assignment, having no side effect, is useless.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When parsing module parameters, bond_check_params() erroneously use
'xor_mode' as the name of a module parameter in an error message.
The right name for this parameter is 'xmit_hash_policy'.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases there is not desirable to switch back to primary interface when
it's link recovers and rather stay with currently active one. We need to avoid
packetloss as much as we can in some cases. This is solved by introducing
primary_reselect option. Note that enslaved primary slave is set as current
active no matter what.
Patch modified by Jay Vosburgh as follows: fixed bug in action
after change of option setting via sysfs, revised the documentation
update, and bumped the bonding version number.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I was implementing primary_passive option (formely named primary_lazy) I've
run into troubles with ab_arp. This is the only mode which is not using
bond_select_active_slave() function to select active slave and instead it
selects it itself. This seems to be not the right behaviour and it would be
better to do it in bond_select_active_slave() for all cases. This patch makes
this happen. Please review.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes commit e36b9d16c6. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:
*. It assumes tha the device is UP when calling dev_close(), or otherwise
dev_close() has no affect. It is worth to mention that initscripts (Redhat)
and sysconfig (Suse) doesn't act the same in this matter.
*. dev_close() has other side affects, like deleting entries from the routing
table, which might be unnecessary.
The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are all drivers that don't touch real hardware.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: Have bond_check_dev_link examine netif_running
Some network devices do not call netif_carrier_off when they
are set administratively down. Have the bonding link check function
also inspect the netif_running state. Ignore netif_running if the
bond_check_dev_link function is called with "reporting" set, as in that
case it's inspecting the capabilities of the non-netif_carrier device
driver.
Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
max_bonds is of type int and cannot be greater than INT_MAX.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding can use compare_ether_addr() in bond_release.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propogate the vlan_features of the slave devices to the bonding
master device, using the same logic as for regular features.
Tested by Or Gerlitz <ogerlitz@voltaire.com>, who also removed
the debug logic from the original test patch.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I did not introduce new lines over 80 chars. I even eliminated some of
them.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding device forbids slave device of different types under the same
master.
However, it is possible for a bonding master to change type during its
lifetime. This can be either from ARPHRD_ETHER to ARPHRD_INFINIBAND
or the other way arround. The change of type requires device level
multicast address cleanup because device level multicast addresses
depend on the device type.
The patch adds a call to dev_close() before the bonding master changes
type and dev_open() just after that.
In the example below I enslaved an IPoIB device (ib0) under
bond0. Since each bonding master starts as device of type ARPHRD_ETHER
by default, a change of type occurs when ib0 is enslaved.
This is how /proc/net/dev_mcast looks like without the patch
5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 01005e000116
5 bond0 1 0 01005e7ffffd
5 bond0 1 0 01005e000001
5 bond0 1 0 333300000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 333300000001
6 ib0 1 0 01005e000001
6 ib0 1 0 01005e7ffffd
6 ib0 1 0 01005e000116
6 ib0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff00000000000000000001
and this is how it looks like after the patch.
5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 00ffffffff12601bffff00000000000000000001
5 bond0 1 0 00ffffffff12401bffff0000000000000ffffffd
5 bond0 1 0 00ffffffff12401bffff00000000000000000116
5 bond0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 00ffffffff12401bffff00000000000000000116
6 ib0 1 0 00ffffffff12401bffff0000000000000ffffffd
6 ib0 2 0 00ffffffff12401bffff00000000000000000001
6 ib0 2 0 00ffffffff12601bffff00000000000000000001
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the remaining occurences of raw return values to their
symbolic counterparts in ndo_start_xmit() functions that were missed by the
previous automatic conversion.
Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero
is changed to explicitly use NETDEV_TX_OK.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need to rework how bonding devices are initialized to make it more
amenable to creating bonding devices via netlink.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding device acts unlike all other Linux network device functions
in that it ignores case of device names. The developer must have come
from windows!
Cleanup the management of names and use standard routines where possible.
Flag places where bonding device still doesn't work right with network
namespaces.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resolve some of the complaints from checkpatch, and remove "magic emacs format"
comments, and useless MODULE_SUPPORTED_DEVICE(). But should not
change actual code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use a network device destructor that is a function in
the module, since it can be called after module is unloaded if sysfs
handle is open.
When eventually using netlink, the device cleanup code needs to be done
via uninit function.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole read/write semaphore locking can be removed. It doesn't add any
protection that isn't already done by using the RTNL mutex properly.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid a unnecessary carrier state transistion that happens when device
is registered.
Lockdep works better if initialization is done before registration as well.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_create() is always called with same parameters so move the argument
down.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).
CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.
It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.
David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().
List of devices that must clear this flag is :
- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function
- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.
But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.
We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.
Further patches could also put tx_bytes/tx_packets counters in
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If module initialisation failed (e.g. because the bonding sysfs entry
cannot be created), kernel panics:
IP: [<ffffffff8024910a>] destroy_workqueue+0x2d/0x146
Call Trace:
[<ffffffff808268c4>] bond_destructor+0x28/0x78
[<ffffffff80b64471>] netdev_run_todo+0x231/0x25a
[<ffffffff80b6dbcd>] rtnl_unlock+0x9/0xb
[<ffffffff81567907>] bonding_init+0x83e/0x84a
Remove the calls to bond_work_cancel_all() and destroy_workqueue();
both are also called/scheduled via bond_free_all().
bond_destroy_sysfs is unecessary because the sysfs entry has
not been created in the error case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove CONFIG_PROC_FS ifdefs from the code by adding void functions.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bonding/bond_main.c | 30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>