The iseries_veth driver uses the generic TX timeout watchdog, however a better
solution is in the works, so remove this code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver can attach to multiple vlans, which correspond to
multiple net devices. However there is only 1 connection between each LPAR,
so the connection structure may be shared by multiple net devices.
This makes module removal messy, because we can't deallocate the connections
until we know there are no net devices still using them. The solution is to
use ref counts on the connections, so we can delete them (actually stop) as
soon as the ref count hits zero.
This patch fixes (part of) a bug we were seeing with IPv6 sending probes to
a dead LPAR, which would then hang us forever due to leftover skbs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch makes veth_init_connection() and veth_destroy_connection()
symmetrical in that they allocate/deallocate the same data.
Currently if there's an error while initialising connections (ie. ENOMEM)
we call veth_module_cleanup(), however this will oops because we call
driver_unregister() before we've called driver_register(). I've never seen
this actually happen though.
So instead we explicitly call veth_destroy_connection() for each connection,
any that have been set up will be deallocated.
We also fix a potential leak if vio_register_driver() fails.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver unconditionally calls dma_unmap_single() even
when the corresponding dma_map_single() may have failed.
Rework the code a bit to keep the return value from dma_unmap_single()
around, and then check if it's a dma_mapping_error() before we do
the dma_unmap_single().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver uses atomic ops to manipulate the in_use field of
one of its per-connection structures. However all references to the
flag occur while the connection's lock is held, so the atomic ops aren't
necessary.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver keeps a stack of messages for each connection
and a lock to protect the stack. However there is also a per-connection lock
which makes the message stack lock redundant.
Remove the message stack lock and document the fact that callers of the
stack-manipulation functions must hold the connection's lock.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Due to a logic bug, once promiscuous mode is enabled in the iseries_veth
driver it is never disabled.
The driver keeps two flags, promiscuous and all_mcast which have exactly the
same effect. This is because we only ever receive packets destined for us,
or multicast packets. So consolidate them into one promiscuous flag for
simplicity.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver contains a state machine which is used to manage
how connections are setup and neogotiated between LPARs.
If one side of a connection resets for some reason, the two LPARs can get
stuck in a race to re-setup the connection. This can lead to the connection
being declared dead by one or both ends. In practice the connection is
declared dead by one or both ends approximately 8/10 times a connection is
reset, although it is rare for connections to be reset.
(an example here: http://michael.ellerman.id.au/files/misc/veth-trace.html)
The core of the problem is that the end that resets the connection doesn't
wait for the other end to become aware of the reset. So the resetting end
starts setting the connection back up, and then receives a reset from the
other end (which is the response to the initial reset). And so on.
We're severely limited in what we can do to fix this. The protocol between
LPARs is essentially fixed, as we have to interoperate with both OS/400
and old Linux drivers. Which also means we need a fix that only changes the
code on one end.
The only fix I've found given that, is to just blindly sleep for a bit when
resetting the connection, in the hope that the other end will get itself
sorted. Needless to say I'd love it if someone has a better idea.
This does work, I've so far been unable to get it to break, whereas without
the fix a reset of one end will lead to a dead connection ~8/10 times.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The iseries_veth driver has a timer which we use to send acks. When the
connection is reset or stopped we need to delete the timer.
Currently we only call del_timer() when resetting a connection, which means
the timer might run again while the connection is being re-setup. As it turns
out that's ok, because the flags the timer consults have been reset.
It's cleaner though to call del_timer_sync() once we've dropped the lock,
although the timer may still run between us dropping the lock and calling
del_timer_sync(), but as above that's ok.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Currently the iseries_veth driver prints the file name and line number in its
error messages. This isn't very useful for most users, so just print
"iseries_veth: message" instead.
- convert uses of veth_printk() to veth_debug()/veth_error()/veth_info()
- make terminology consistent, ie. always refer to LPAR not lpar
- be consistent about printing return codes as %d not %x
- make format strings fit in 80 columns
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Patch from Sascha Hauer
This patch adds support for setting and getting RTS / CTS via
set_mtctrl / get_mctrl functions.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The start_tx and stop_tx methods were passed a flag to indicate
whether the start/stop was from the tty start/stop callbacks, and
some drivers used this flag to decide whether to ask the UART to
immediately stop transmission (where the UART supports such a
feature.)
There are other cases when we wish this to occur - when CTS is
lowered, or if we change from soft to hard flow control and CTS
is inactive. In these cases, this flag was false, and we would
allow the transmitter to drain before stopping.
There is really only one case where we want to let the transmitter
drain before disabling, and that's when we run out of characters
to send.
Hence, re-jig the start_tx and stop_tx methods to eliminate this
flag, and introduce new functions for the special "disable and
allow transmitter to drain" case.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The idea behind a RAID class is to provide a uniform interface to all
RAID subsystems (both hardware and software) in the kernel.
To do that, I've made this class a transport class that's entirely
subsystem independent (although the matching routines have to match per
subsystem, as you'll see looking at the code). I put it in the scsi
subdirectory purely because I needed somewhere to play with it, but it's
not a scsi specific module.
I used a fusion raid card as the test bed for this; with that kind of
card, this is the type of class output you get:
jejb@titanic> ls -l /sys/class/raid_devices/20\:0\:0\:0/
total 0
lrwxrwxrwx 1 root root 0 Aug 16 17:21 component-0 -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:1:0/20:1:0:0/
lrwxrwxrwx 1 root root 0 Aug 16 17:21 component-1 -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:1:1/20:1:1:0/
lrwxrwxrwx 1 root root 0 Aug 16 17:21 device -> ../../../devices/pci0000:80/0000:80:04.0/host20/target20:0:0/20:0:0:0/
-r--r--r-- 1 root root 16384 Aug 16 17:21 level
-r--r--r-- 1 root root 16384 Aug 16 17:21 resync
-r--r--r-- 1 root root 16384 Aug 16 17:21 state
So it's really simple: for a SCSI device representing a hardware raid,
it shows the raid level, the array state, the resync % complete (if the
state is resyncing) and the underlying components of the RAID (these are
exposed in fusion on the virtual channel 1).
As you can see, this type of information can be exported by almost
anything, including software raid.
The more difficult trick, of course, is going to be getting it to
perform configuration type actions with writable attributes.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Since the attribute container deletes from a klist while it's walking
it, it is vulnerable to the problem (and fix) here:
http://marc.theaimsgroup.com/?l=linux-scsi&m=112485448830217
The attached fixes this (but won't compile without the above).
It also fixes the logical reversal in the traversal loop which meant
that we were never actually traversing the loop to hit this bug in the
first place.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
One of the changes in the attribute_container code in the scsi-misc tree
was to add a lock to protect the list of devices per container. This,
unfortunately, leads to potential scheduling while atomic problems if
there's a sleep in the function called by a trigger.
The correct solution is to use the kernel klist infrastructure instead
which allows lockless traversal of a list.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I had some time to think about PCI assign issues in 2.6.13-rc series.
The major problem here is that we call pci_assign_unassigned_resources()
way too early - at subsys_initcall level. Therefore we give no chances
to ACPI and PnP routines (called at fs_initcall level) to reserve their
respective resources properly, as the comments in drivers/pnp/system.c
and drivers/acpi/motherboard.c suggest:
/**
* Reserve motherboard resources after PCI claim BARs,
* but before PCI assign resources for uninitialized PCI devices
*/
So I moved the pci_assign_unassigned_resources() call to
pcibios_assign_resources() (fs_initcall), which should hopefully fix a
lot of problems and make PCIBIOS_MIN_IO tweaks unnecessary.
Other changes:
- remove resource assignment code from pcibios_assign_resources(), since
it duplicates pci_assign_unassigned_resources() functionality and
actually does nothing in 2.6.13;
- modify ROM assignment code as per Ben's suggestion: try to use firmware
settings by default (if PCI_ASSIGN_ROMS is not set);
- set CARDBUS_IO_SIZE back to 4K as it's a wonderful stress test for
various setups.
Confirmed by Tero Roponen <teanropo@cc.jyu.fi> (who had problems with
the 4kB CardBus IO size previously).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't bother calling a hook, to call our own module, to call a helper
than simply calls ionumap().
If you unroll all that convolution, you get a simple kfree()+iounmap()
pair of calls.
ATAPI is getting close to being ready. To increase exposure, we enable
the code in the upstream kernel, but default it to off (present
behavior). Users must pass atapi_enabled=1 as a module option (if
module) or on the kernel command line (if built in) to turn on
discovery of their ATAPI devices.
Damir Perisa <damir.perisa@solnet.ch> reports:
drivers/net/s2io.h:765: error: invalid lvalue in assignment
drivers/net/s2io.h:766: error: invalid lvalue in assignment
That's a gcc4 error. I don't see why the casts are there anyway..
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Implemented a full bytewise compare to determine if a table load
request is attempting to load a duplicate table. The compare is
performed if the table signatures and table lengths match. This
will allow different tables with the same OEM Table ID and
revision to be loaded.
Although the BIOS is technically violating the ACPI spec when
this happens -- it does happen -- so Linux must handle it.
Signed-off-by: Robert Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Create vio_bus_ops so that we just pass a structure to vio_bus_init
instead of three separate function pointers.
Rearrange vio.h to avoid forward references. vio.h only needs
struct device_node from prom.h so remove the include and just
declare it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Update version and add 4 minor fixes, the last 2 were suggested by
Jeff Garzik:
1. check for a valid ethernet address before setting it
2. zero out bp->regview if init_one encounters an error and unmaps
the IO address. This prevents remove_one from unmapping again.
3. use netif_rx_schedule() instead of hand coding the same.
4. use IRQ_HANDLED and IRQ_NONE.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all locks from spin_lock_irqsave() to spin_lock_bh(). All
places that require spinlocks are in BH context.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove atomic operations in the fast tx path. Expensive atomic
operations were used to keep track of the number of available tx
descriptors. The new code uses the difference between the consumer
and producer index to determine the number of free tx descriptors.
As suggested by Jeff Garzik, the name of the inline function is
changed to all lower case.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This speeds up link-up time on 5706 SerDes if the link partner does
not autoneg, a rather common scenario in blade servers. Some blade
servers use IPMI for keyboard input and it's important to minimize
link disruptions.
The speedup is achieved by shortening the timer to (HZ / 3) during
the transient period right after initiating a SerDes autoneg. If
autoneg does not complete, parallel detect can be done sooner. After
the transient period is over, the timer goes back to its normal HZ
interval.
As suggested by Jeff Garzik, the timer initialization is moved to
bnx2_init_board() from bnx2_open().
An eeprom bit is also added to allow default forced SerDes speed for
even faster link-up time.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an rtnl deadlock problem when flush_scheduled_work() is
called from bnx2_close(). In rare cases, linkwatch_event() may be on
the workqueue from a previous close of a different device and it will
try to get the rtnl lock which is already held by dev_close().
The fix is to set a flag if we are in the reset task which is run
from the workqueue. bnx2_close() will loop until the flag is cleared.
As suggested by Jeff Garzik, the loop is changed to call msleep(1)
instead of yield() in the original patch.
flush_scheduled_work() is also moved to bnx2_remove_one() before the
netdev is freed.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>