Fix resource leak in
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c::ahc_linux_pci_dev_probe()
Found by the coverity checker (#668)
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Adds AHCI support for the VIA VT8251.
Includes a workaround for a hardware bug which requires a Command List
Override before softreset.
Signed-off-by: Bastiaan Jacques <b.jacques@planet.nl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix ahc_pci_write_config's (wrong order of arguments).
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
drivers/scsi/megaraid.c: In function `mega_internal_command':
drivers/scsi/megaraid.c:4474: warning: unused variable `flags'
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If qla2x00_probe_one() fails before calling request_irq() but gets to
qla2x00_free_device() then it will mistakenly try to free an irq it didn't
request. It's chosing to free based on ha->pdev->irq which is always set.
host->irq is set after request_irq() succeeds so let's use that to decide
to free or not.
This was observed and tested when a silly set of circumstances lead to
firmware loading failing on a 2100.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This fixes coverity bug id #480. Since id_array is declared as
id_array[MAX_SLOTS], the check for i>MAX_SLOTS is obviously false.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Contains the following misc fixes:
- Fix build warnings
- Race condition in lpfc_workq_post_event() could corrupt phba->work_list.
- nlp_sid was not being initialized properly
- Fix some RSCN handling during the re-discovery after Link Up event.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Additional fixes to LOGO, PLOGI, and RSCN processing
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix cleanup code in the lpfc_pci_probe_one() error code path.
This changes the original patch by:
- hardsetting the return value from lpfc_pci_probe_one() to
-ENODEV (negative value) if we fail attach
- removes the checks from lpfc_pci_remove_one() validating the
host and phba pointers as it's no longer needed.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fixed FC protocol violation in handling of PRLO.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use asynchronous ABTS completion to speed up abort completions
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix Discovery processing for NPorts that hit nodev_tmo during discovery
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch fixes a NULL pointer dereference spotted by the Coverity
checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Any end device that can't support any of the scanning protocols
shouldn't be scanned, so set its id to -1 to prevent
scsi_scan_target() being called for it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Equivalent of the same patch for the 3w-xxxx driver.
Signed-off-by: Adam Radford <linuxraid@amcc.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_kill_request() completes requests via normal SCSI completion path
which decrements busy counts; however, requests which get passed to
scsi_kill_request() aren't holding busy counts and scsi_kill_request()
don't increment them before invoking completion path resulting in
incorrect busy counts. Bump up busy counts before invoking completion
path.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
As previously reported via Michael Reed, the FC transport took a hit
in 2.6.15 (perhaps a little earlier) when we solved a recursion error.
There are 2 deadlocks occurring:
- With scan and the delete items sharing the same workq, flushing the
workq for the delete code was getting it stalled behind a very long
running scan code path.
- There's a deadlock where scsi_remove_target() has to sit behind
scsi_scan_target() due to contention over the scan_lock().
This patch resolves the 1st deadlock and significantly reduces the
odds of the second. So far, we have only replicated the 2nd deadlock
on a highly-parallel SMP system. More on the 2nd deadlock in a following
email.
This patch reworks the transport to:
- Only use the scsi host workq for scanning
- Use 2 other workq's internally. One for deletions, the other for
scheduled deletions. Originally, we tried this with a single workq,
but the occassional flushes of the scheduled queues was hitting the
second deadlock with a slightly higher frequency. In the future, we'll
look at the LLDD's and the transport to see if we can get rid of this
extra overhead.
- When moving to the other workq's we tightened up some object states
and some lock handling.
- Properly syncs adds/deletes
- minor code cleanups
- directly reference fc_host_attrs, rather than through attribute
macros
- flush the right workq on delayed work cancel failures.
Large kudos to Michael Reed who has been working this issue for the last
month.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When a target is added aic79xx tries to be overly clever: it changes
the command on the fly to TEST UNIT READY and tries to requeue the
original command. Sadly this breaks SCSI compability and of course
the midlayer is getting a bit confused by it.
So we're just removing that bit of code and let the midlayer deal with
it. It's clever enough by now. And the driver code is getting simpler.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
It's no longer needed after the convrsion to use the linux srp.h file.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
As James B. correctly noted, ahd_reset_channel() in
ahd_linux_bus_reset() should be protected by ahd_lock(). However, the
main reason for not doing so was a deadlock with the interesting
polling mechanism to detect the end a bus reset.
This patch replaces the polling mechanism with a saner signalling via
flags; it also gives us the benefit of detecting any multiple calls to
ahd_reset_channel().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Original From: Ingo Flaschberger <if@xip.at>
To support the RA4100 array from Compaq.
This patch now correctly handles SCSI_UNKNOWN types with regard to
BLIST_REPORTLUNS2 (allow it) and cdb[1] LUN inclusion (don't).
It also allows a BLIST_MAX_512 flag to restrict the maximum transfer
length to 512 blocks (apparently this is an RA4100 problem).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When spinlock debugging is turned on, a struct completion grows beyond the
size allowed for the scsi_pointer. So move the struct completion back onto
the stack. The additional memory barriers are to keep us from completing
a random piece of kernel stack if the command happens to complete after
the error handling has finished.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Encapsulate some more of the device reset processing in
preparation for SATA support.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove some unused printk macros, make some more robust, and
convert some to use standard printk macros when possible.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Simplify the dumping of the command status area by
removing some device specific information that has proven
to not be worthwhile.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fixup a check used by the ipr driver to determine if a given
device is a SCSI disk. Due to the addition of support for
attaching SATA devices, this check needs to be more robust.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Instead of NULLing the resource entry pointer when a disk
goes away to prevent any new commands being sent to it,
set the adapter resource handle to an invalid value so
new ops getting sent to it will fail with a selection timeout
response. This patch is needed for future SATA patches.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
when the sg driver is unable to setup direct IO, free that scatter
gather list prior to falling back to indirect IO
Further to this thread started by Bryan Holty:
http://marc.theaimsgroup.com/?l=linux-scsi&m=114306885116728&w=2
Here is the reworked patch again. This time it has been
tested with a program provided by Bryan.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch enables clustering and sets max_sectors to 0xffff to enable
reading and writing of large blocks with tapes (and large transfers with
sg). This change is needed after the sg and st drivers started using
chained bios through scsi_request_async() in 2.6.16.
Signed-off-by: Kai Makisara <kai.makisara@kolumbus.fi>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use wait_for_completion_timeout() instead of using a timer (as
Christoph Hellwig did for aic7xxx).
That lets me eliminate the sym_eh_wait structure; the struct completion,
the old_done pointer and the to_do flag can be folded into the sym_ucmd
(which overrides the scsi_pointer in scsi_cmnd).
The sym_eh_done() function becomes much simpler as the timeout handling
is done in sym_eh_handler() directly.
The host_lock can be unlocked earlier, and I cache the host in
a local variable to make accesses to it quicker.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The PDC code can set the bus mode, but we were ignoring that setting.
Also move the code that determines bus mode into its own function.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Now sym2 is using spi_print_msg, we don't need to have our own messages
for IGNORE WIDE RESIDUE and MODIFY DATA POINTER, so provide the option
of passing NULL for the label.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Undef SYM_OPT_HANDLE_DEVICE_QUEUEING.
Call sym_put_start_queue instead of sym_start_next_ccbs.
Turn asserts into checks that we can send the command to the adapter,
and return busy from queuecommand if we can't.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Patch below is one out of a large series to mark kernel data const when
possible, goal is to use .rodata and avoid false sharing
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- to_do was never set to SYM_EH_DO_COMPLETE, so remove that code
- move the spinlocks inside the common error handler code path
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We had our own code (pci_get_base_address()) to get the bus address of
a BAR. We can get this using pcibios_resource_to_bus() instead.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Most of the Kconfig options for switching between IO Port and MMIO
operations use the opposite sense from sym2. Really, this option
should be set at a chipset level rather than per-driver.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix puts so that release functions will be called.
Signed-off-by: Mike Anderson <andmike@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
Fix module param
Update driver version.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
On 64 bit machines, when a 32 bit application tries to acquire the AIF,
they will always get and EFAULT error response from the driver.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
Add max_channel and max_id sysfs parameters.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
Since the helper thread for the driver can be killed unceremoniously by
an application, we detect the loss of the helper and restart it.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
Remove superfluous code, optimize code, harden code, cast code, correct
some text, use msleep instead of schedule_timeout_interruptible. No
bugs.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
If there are no aacraid controllers, we do not create the raid
controller chrdev, thus when the driver is unloaded it performs a
superfluous deregistration.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
The max_channel field is set one too large.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
Some of the error return paths during initialization resulted in a zero
report to caller
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
Plug and play actions resulting from event sequences shall time out if
they take longer than 30 seconds to complete.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
The loss of the ownership flags, despite their flaws, in the scsi
command were sorely missed and are reinstated more accurately in the
aacraid driver to track commands and permit us to properly handle error
recovery actions.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn
Clean up the remaining scsi id access methods, drop ID_LUN_TO_CONTAINER
macro.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There is a window where we can be re-enabling an adapter, but
still allow SCSI commands to be sent to the target. This fix
sets our window (request_limit) to -1 as soon as we know the
adapter is being reenabled, and closes a very teeny tiny
window where we could set the window back to 1 before we
grab a lock.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Doug found a bug where if scsi_execute_async fails, we are leaking
sg resources. scsi_do_req never failed so we did not have to handle
that case before.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code. Both of them have drawbacks.
This patch kills the scsi layer version after updating the block version
with the missing bits:
- argument checking
- use scatterlist I/O
- set number of retries based on the submitted command
This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path. Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?
Thanks to Or Gerlitz for testing this patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently it crashes when trying to dump the registers. This is an obvious
one-liner fix I suppose.
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Data structures residing on memory and fetched by the controller
should have LE ordering. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
There's no need to perform hardreset during driver initialization.
It's already done during host reset and even if the controller is in
some wacky state, we now have proper hardreset to back us up.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Reimplement hardreset according to the datasheet. The old hardreset
didn't reset controller status and the controller might not be ready
after reset. Also, as SStatus is a bit flakey after hardreset,
sata_std_hardrset() didn't use to wait long enough before proceeding.
Note that as we're not depending on SStatus, DET==1 condition cannot
be used to wait for link, so use shorter timeout for no device case.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Nothing, not the datasheet nor the errats, says this delay is
necessary and with the previous PORT_CS_INIT change, we know the
controller is in good state. Kill 10ms sleep.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make sure the controller has no pending commands and ready for command
before issuing SRST.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement sil24_init_port which performs port initialization via
PORT_CS_INIT. To be used later by softreset and EH.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
SiI3124 might lose completion interrupt if completion interrupt occurs
shortly after SLOT_STAT register is read for the previous completion
interrupt if it is operating in PCI-X mode.
This currently doesn't trigger as libata never queues more than one
command, but it will with NCQ changes. This patch implements the
workaround - turning on WoC and explicitly clearing interrupt.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
All sil24 controllers share the same host flags except for NPORTS.
Consolidate them into SIL24_COMMON_FLAGS.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Move ata_set_mode() failure handling outside of ap->ops->set_mode if
clause such that it can handle ap->ops->set_mode failures after it's
updated.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Devices which consumed all their changes used to be disabled every
iteration. This causes unnecessary noise in the console output.
Disable once and leave alone.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
As waiting for some register bits to change seems to be a common
operation shared by some controllers, implement helper function
ata_wait_register(). This function also takes care of register write
flushing.
Note that the condition is inverted, the wait is over when the masked
value does NOT match @val. As we're waiting for bits to change, this
test is more powerful and allows the function to be used in more
places.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
sil24_softreset calculated timeout by adding ATA_TMOUT_BOOT * HZ to
jiffies; however, as ATA_TMOUT_BOOT is already in jiffies, multiplying
by HZ makes the value way off. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
@verbose was added to ata_reset_fn_t because AHCI complained during
probing if no device was attached to the port. However, muting
failure message isn't the correct approach. Reset methods are
responsible for detecting no device condition and finishing
successfully. Now that AHCI softreset is fixed, kill @verbose.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make setting CBL type responsibility of probeinit. This allows using
only ap->cbl == ATA_CBL_SATA test in all other parts. Without this,
ata_down_sata_spd_limit() doesn't work during probe reset.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
We must disable local IRQs while holding KM_IRQ0 or KM_IRQ1. Otherwise, an
IRQ handler could use those kmap slots while this code is using them,
resulting in memory corruption.
Thanks to Nick Orlov <bugfixer@list.ru> for reporting.
Cc: <linuxraid@amcc.com>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
All softreset methods are responsible for detecting device presence
and succeed softreset in such cases. AHCI didn't use to check for
device presence before proceeding with softreset and this caused
unnecessary reset retrials during probing. This patch adds presence
detection to AHCI softreset.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert the ATAPI_ENABLE_DMADIR compile time option needed
by some SATA-PATA bridge to runtime module parameter.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
irq-pio minor fix:
- remove the redundant hsm_task_state = HSM_ST_IDLE
- add devno to printk() as done in upstream
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits)
Documentation: fix minor kernel-doc warnings
BUG_ON() Conversion in drivers/net/
BUG_ON() Conversion in drivers/s390/net/lcs.c
BUG_ON() Conversion in mm/slab.c
BUG_ON() Conversion in mm/highmem.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/ptrace.c
BUG_ON() Conversion in ipc/shm.c
BUG_ON() Conversion in fs/freevxfs/
BUG_ON() Conversion in fs/udf/
BUG_ON() Conversion in fs/sysv/
BUG_ON() Conversion in fs/inode.c
BUG_ON() Conversion in fs/fcntl.c
BUG_ON() Conversion in fs/dquot.c
BUG_ON() Conversion in md/raid10.c
BUG_ON() Conversion in md/raid6main.c
BUG_ON() Conversion in md/raid5.c
Fix minor documentation typo
BFP->BPF in Documentation/networking/tuntap.txt
...
Make libata-core routines which will be used by EH implementation
extern.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
A lot of EH codes are about to be added to libata. Separate out
libata-eh.c. ata_scsi_timed_out(), ata_scsi_error(),
ata_qc_timeout(), ata_eng_timeout(), ata_eh_qc_complete() and
ata_eh_qc_retry() are moved. No code is changed by this patch.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
qcs might get retried because of unrelated failure. e.g. NCQ command
failure causes the whole command set to be aborted. Decrement
scmd->retries for such retrials to avoid unnecessarily failing
commands. Note that scmd->retries will be incremented the first time.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
TF register might not be directly accessible depending on errors.
e.g. TF of failed NCQ command is in log page 10h. Make reading TF
responsibility of error handlers. For the current EH, simply push TF
reading into qc completion functions as they are practically part of
EH. New EH will fill qc->tf with status registers before complting
qcs.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Current sense generation code does not generate sense error if status
register value doesn't indicate error condition. However, LLDD's may
indicate errors which 't show up in status register. Completing such
qc's without generating sense results in successful completion of
failed commands.
Invoke ata_to_sense_error() regardless of status register if
qc->err_mask is not zero such that ata_to_sense_error() generates
default sense error.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The current code passes pointer to ap around and repeatedly performs
ata_qc_from_tag() to access the ongoing qc. This is unnatural and
makes EH synchronization cumbersome. Make PIO codes deal with qc
instead of ap.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add a new qc flag ATA_QCFLAG_IO. This flag gets set for normal IO
commands originating from SCSI midlayer. This information will be
used by EH to determine transfer speed reconfiguration.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_dev_set_mode() is now responsible for managing ATA_DFLAG_PIO.
Clear it before setting it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_dev_configure() should not clear dynamic device flags determined
elsewhere. Lower eight bits are reserved for feature flags, define
ATA_DFLAG_CFG_MASK and clear only those bits before configuring
device. Without this patch, ATA_DFLAG_PIO gets turned off during
revalidation making PIO mode unuseable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rename ATA_FLAG_PORT_DISABLED to ATA_FLAG_DISABLED for consistency.
(ATA_FLAG_* are always about ports).
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make sure ata_dev_revalidate() complains on failures and kill
revalidation failure message printed from ata_dev_set_mode().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_bus_probe() now marks failed devices properly and leaves
meaningful transfer mode masks. This patch makes ata_dev_xfermask()
consider disable devices when determining PIO mode to avoid violating
device selection timing.
While at it, move port-wide resttriction out of device iteration loop
and try to make the function look a bit prettier.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Improve ata_bus_probe() such that configuration failures are handled
better. Each device is given ATA_PROBE_MAX_TRIES chances, but any
non-transient error (revalidation failure with -ENODEV, configuration
failure with -EINVAL...) disables the device directly. Any IO error
results in SATA PHY speed down and ata_set_mode() failure lowers
transfer mode. The last try always puts a device into PIO-0.
After each failure, the whole port is reset to make sure that the
controller and all the devices are in a known and stable state. The
reset also applies SATA SPD configuration if necessary.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement ata_down_xfermask_limit(). This function manipulates
@dev->pio/mwdma/udma_mask such that the next lower transfer mode is
selected. This will be used to improve ata_bus_probe() failure
handling and later by EH.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some devices react badly if resets are performed back-to-back. Give
devices some time to breath and tell user that we're taking a nap.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make ata_drive_probe_reset() use SATA SPD configuration. Hardreset
will be force if speed renegotiation is necessary. Also, if a
hardreset fails, PHY speed is stepped down and hardreset is retried
until the lowest speed is reached.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ap->sata_spd_limit contrains SATA PHY speed of the port. It is
initialized to the configured value prior to probing thus preserving
BIOS configured value. hardreset is responsible for applying SPD
limit and sata_std_hardreset() is updated to do that. SATA SPD limit
will be used to enhance failure handling during probing and later by
EH.
This patch also normalizes some comments around affected code.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Don't overwrite SPD setting during hard reset. This change has the
(intended) side effect of honoring the BIOS configuration.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When ata_set_mode() fails on a device, make ata_set_mode() return
error code and pointer to the device instead of disabling it directly.
This gives more control to higher level driving logic.
This patch does not change the end result (configured transfer mode)
although it may make libata repeat mode configuration to the peer of a
failing device. Later ata_bus_probe() rewrite will make full use of
this change.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Handle DRQ=1 ERR=1 situation. Revised according to what IDE try_to_flush_leftover_data() does.
Changes:
- For ATA PIO writes and ATAPI devices, just stop the HSM and let EH handle it.
- For ATA PIO reads, read only one block of junk data and then let EH handle it.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Merge ata_host_set_pio() and ata_host_set_dma() into ata_set_mode()
and use function-level *dev to iterate over devices. This eases
soon-to-follow ata_set_mode() interface change.
While at it, kill an unnecessary comment and normalize others.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make ata_set_mode() return without doing anything if there is no
device on the port. This is in preparation for ata_bus_probe()
changes.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch renames ata_dev_present() to ata_dev_enabled() and adds
ata_dev_disabled(). This is to discern the state where a device is
present but disabled from not-present state. This disctinction is
necessary when configuring transfer mode because device selection
timing must not be violated even if a device fails to configure.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make do_probe_reset() generic by pushing classification check into
ata_drive_probe_reset() and rename it to ata_do_reset(). This will be
used by EH reset.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Separate out ata_spd_string() from sata_print_link_status(). This
will be used by SATA spd configuration routines.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_bus_probe() uses unsigned int rc to receive negative errno and
returns the converted unsigned int value. Convert temporary variables
to int and make ata_bus_probe() return negative errno on failure.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make ata_set_mode() return correct error value when ata_dev_set_mode()
fails.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Also cleans up some nearby whitespace problems.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The boot cmdline is parsed in parse_early_param() and
parse_args(,unknown_bootoption).
And __setup() is used in obsolete_checksetup().
start_kernel()
-> parse_args()
-> unknown_bootoption()
-> obsolete_checksetup()
If __setup()'s callback (->setup_func()) returns 1 in
obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
handled.
If ->setup_func() returns 0, obsolete_checksetup() tries other
->setup_func(). If all ->setup_func() that matched a parameter returns 0,
a parameter is seted to argv_init[].
Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
If the app doesn't ignore those arguments, it will warning and exit.
This patch fixes a wrong usage of it, however fixes obvious one only.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of the two status values struct pcmcia_device->p_state and state,
use descriptive bitfields. Most value-checking in drivers was invalid, as
the core now only calls the ->remove() (a.k.a. detach) function in case the
attachement _and_ configuration was successful.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Most of the driver initialization isn't done in the .probe function, but in
the internal _config() functions. Make them return a value, so that .probe
can properly report whether the probing of the device succeeded or not.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
dev_link_t * and client_handle_t both mean struct pcmcai_device * by now.
Therefore, remove all such indirections.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Embed dev_link_t into struct pcmcia_device(), as they basically address the
same entity. The actual contents of dev_link_t will be cleaned up step by step.
This patch includes a bugfix from and signed-off-by Andrew Morton.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
As we do not allow setting Vcc in the pcmcia core, and Vpp1 and
Vpp2 can only be set to the same value, a lot of code can be
streamlined.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
In all but one case, the suspend and resume functions of PCMCIA drivers
contain mostly of calls to pcmcia_release_configuration() and
pcmcia_request_configuration(). Therefore, move this code out of the
drivers and into the core.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Convert the remaining drivers which use pcmcia_release_io or
pcmcia_release_irq, and remove the EXPORT of these symbols.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
pcmcia_disable_device(struct pcmcia_device *p_dev) performs the necessary
cleanups upon device or driver removal: it calls the appropriate
pcmcia_release_* functions, and can replace (most) of the current drivers'
_release() functions.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
ata_xfer_tbl is terminated by entry with -1 as ->shift. However,
->shift was unsigned int making the termination condition bogus. This
patch converts ->shift and ->bits to int.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
There is no reason for the issuer to diddle with a failed qc as the
issuer has complete control over when a qc gets freed (usually in
->complete_fn). Make ata_qc_issue() responsible for completing qcs
which failed to issue.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
On sg_err failure path, ata_qc_issue() doesn't mark the qc active
before returning. This triggers WARN_ON() in __ata_qc_complete() when
the qc gets completed. This patch moves ap->active_tag and
QCFLAG_ACTIVE setting to the top of the function.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
MAP tables of ich6 and ich6m are wrong. Depending on port usage,
ata_piix may fail to initialize attached devices.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
print out information for ATAPI devices with CDB interrupts
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: (24 commits)
[PARISC] Fix double free when removing HIL drivers
[PARISC] Add atomic_sub_and_test
[PARISC] Enabled some NLS modules in a500, b180 and c3000 defconfigs
[PARISC] Kill duplicated EXPORT_SYMBOL warnings
[PARISC] Move ioremap EXPORT_SYMBOL from parisc_ksyms.c
[PARISC] Make local_t use atomic_long_t
[PARISC] Update defconfigs
[PARISC] Add PREEMPT support
[PARISC] More useful readwrite lock helpers
[PARISC] Convert HIL drivers to use input_allocate_device
[PARISC] Fixup CONFIG_EISA a bit
[PARISC] getsockopt should be ENTRY_COMP
[PARISC] Remove obsolete CONFIG_DEBUG_IOREMAP
[PARISC] Temporary FIXME for ioremapping EISA regions
[PARISC] Enable ioremap functionality unconditionally
[PARISC] Fix stifb with IOREMAP and a 64-bit kernel
[PARISC] Add CONFIG_HPPA_IOREMAP to conditionally enable ioremap
[PARISC] Add STRICT_MM_TYPECHECKS
[PARISC] Fix IOREMAP with a 64-bit kernel
[PARISC] Add parisc implementation of flush_kernel_dcache_page()
...
Addresses in F-space must be accessed uncached on most parisc machines.
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
(1) A DMA transfer size of 0x10000 was not being written
as 0x0000 in the PRDs. Fixed.
(1) The DEV_IRQ interrupt cause bit happens spuriously
during EDMA operation, and was not being ignored by the driver.
This led to various "drive busy" errors being reported,
with associated unpredictable behaviour. Fixed.
(2) If a SATA or PCI interrupt was received with no outstanding
command, the interrupt handler still attempted to invoke
ata_qc_complete(), triggering assert()/BUG_ON() behaviour
elsewhere in libata. Fixed.
The driver still has issues with confusion after error-recovery,
but should now be reliable in the absence of drive errors.
I will be looking more into the error-handling bugs next.
Signed-Off-By: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_dev_init_params() fixes:
- Get the "heads" and "sectors" parameters from caller instead of implicitly from dev->id[].
- Return AC_ERR_INVALID instead of 0 if an invalid parameter is found
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Alan Cox <alan@redhat.com>
Last of the set, just clean up some oddments. Assuming the whole set is
now ok then the remaining differences are the setup of PIO_0 at reset
and the ->data_xfer method.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add a field to the host_set called 'flags' (was host_set_flags changed
to suit Jeff)
Add a simplex_claimed field so we can remember who owns the DMA channel
Add a ->mode_filter() hook to allow drivers to filter modes
Add docs for mode_filter and set_mode
Filter according to simplex state
Filter cable in core
This provides the needed framework to support all the mode rules found
in the PATA world. The simplex filter deals with 'to spec' simplex DMA
systems found in older chips. The cable filter avoids duplicating the
same rules in each chip driver with PATA. Finally the mode filter is
neccessary because drive/chip combinations have errata that forbid
certain modes with some drives or types of ATA object.
Drive speed setup remains per channel for now and the filters now use
the framework Tejun put into place which cleans them up a lot from the
older libata-pata patches.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
I think this is still needed with the new probe code (which btw seems to
be missing docs in upstream ?).
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some hardware doesn't want the usual mode setup logic running. This
allows the hardware driver to replace it for special cases in the least
invasive way possible.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This is the minimal patch set to enable the current code to be used with
a controller following SFF (ie any PATA and early SATA controllers)
safely without crashes if there is no BMDMA area or if BMDMA is not
assigned by the BIOS for some reason.
Simplex status is recorded but not acted upon in this change, this isn't
a problem with the current drivers as none of them are for simplex
hardware. A following diff will deal with that.
The flags in the probe structure remain ->host_set_flags although Jeff
asked me to rename them, simply because the rename would break the usual
Linux rules that old code should break when there are changes. not
compile and run and then blow up/eat your computer/etc. Renaming this
later is a trivial exercise once a better name is chosen.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some CD-ROM drives are slow to clear DRQ, after the last data block
is read by PIO. Use ata_wait_idle() after reading the last data block.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Current irq-pio checks ERR bit and stops on ERR before it does anything else.
This behavior doesn't look right.
The DRQ bit should take higher precedence than the ERR bit.
Changes:
- Let the HSM do the data transfer whenever the
device asks for DRQ bit, even if the ERR bit is set.
- For DRQ=1 ERR=1, don't trust the data
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make the the in_wq check easier to read as an inline function.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_check_atapi_dma() fix for LLDDs with the ATA_FLAG_PIO_POLLING flag.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cleanup the following unused functions:
- ata_pio_poll()
- ata_pio_complete()
- ata_pio_first_block()
- ata_pio_block()
- ata_pio_error()
ap->pio_task_timeout and other enums.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert ata_pio_task() to use the new ata_hsm_move().
Changes:
- refactor ata_pio_task() to poll device status register and
- call the new ata_hsm_move() when device indicates it is not BSY.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Let ata_hsm_move() work with both irq-pio and polling pio codepath.
Changes:
- add a new parameter "in_wq" for polling pio
- add return value "poll_next" to tell polling pio task whether the HSM is finished
- merge code from ata_pio_first_block() to the HSM_ST_FIRST state.
- call ata_poll_qc_complete() if called from the workqueue
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Minor fix for ata_hsm_move() to work with ata_host_intr().
Changes:
- WARN_ON() and comment fix
- Make the HSM_ST_LAST device status checking more rigid.
- Treat unknown HSM state as BUG().
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Move out the irq-pio HSM code from ata_host_intr() to the new ata_hsm_move() function verbatim.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
atapi_packet_task() was replaced by ata_pio_task().
Remove the unused atapi_packet_task().
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix merge problem with upstream.
Changes:
1. add missing dev->cdb_len = 16 for ATA devices
2. use ata_pio_task instead of atapi_packet_task
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all occurences of 0xff.. in calls to function pci_set_dma_mask()
and pci_set_consistant_dma_mask() with the corresponding DMA_xBIT_MASK from
linux/dma-mapping.h.
Signed-off-by: Matthias Gehre <M.Gehre@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
They deal with wrapping correctly and are nicer to read.
Signed-off-by: Marcelo Feitoza Parisi <marcelo@feitoza.com.br>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism. With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.
We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants. This commit also
changes various drivers to use the new macro instead of looking at
_machine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes several mempool users, all of which are basically just
wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather
than their own wrapper function, removing a bunch of duplicated code.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/scsi/sd.c: In function `sd_store_cache_type':
drivers/scsi/sd.c:193: warning: comparison of distinct pointer types lacks a cast
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MODULE_PARM was actually breaking: recent gcc version optimize them out as
unused. It's time to replace the last users, which are generally in the
most unloved drivers anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>