This fixes the following oops:
http://marc.info/?l=linux-kernel&m=123316111415677&w=2
You can reproduce this bug by interrupting a program before a sg
response completes. This leads to the special sg state (the orphan
state), then sg calls blk_put_request in interrupt (rq->end_io).
The above bug report shows the recursive lock problem because sg calls
blk_put_request in interrupt. We could call __blk_put_request here
instead however we also need to handle blk_rq_unmap_user here, which
can't be called in interrupt too.
In the orphan state, we don't need to care about the data transfer
(the program revoked the command) so adding 'just free the resource'
mode to blk_rq_unmap_user is a possible option.
I prefer to avoid complicating the blk mapping API when possible. I
change the orphan state to call sg_finish_rem_req via
execute_in_process_context. We hold sg_fd->kref so sg_fd doesn't go
away until keventd_wq finishes our work. copy_from_user/to_user fails
so blk_rq_unmap_user just frees the resource without the data
transfer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
sg_io_owned needs to be set before the command is sent to the midlevel;
otherwise, a quickly-completing command may cause a different CPU
to see "srp->done == 1 && !srp->sg_io_owned", which would lead to
incorrect behavior.
Check srp->done and set srp->orphan while holding rq_list_lock to
prevent races with sg_rq_end_io().
There is no need to check sfp->closed from read/write/ioctl/poll/etc.
since the kernel guarantees that this won't happen.
The usefulness of sg_srp_done() was questionable before; now it is
definitely not needed.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
sg has the following problems related to device removal:
* opening a sg fd races with removing a device
* closing a sg fd races with removing a device
* /proc/scsi/sg/* access races with removing a device
* command completion races with removing a device
* command completion races with closing a sg fd
* can rmmod sg with active commands
These problems can cause kernel oopses, memory-use-after-free, or
double-free errors. This patch fixes these problems by using krefs
to manage the lifetime of sg_device and sg_fd.
Each command submitted to the midlevel holds a reference to sg_fd
until the completion callback. This ensures that sg_fd doesn't go
away if the fd is closed with commands still outstanding.
sg_fd gets the reference of sg_device (with scsi_device) and also
makes sure that the sg module doesn't go away.
/proc/scsi/sg/* functions don't play nicely with krefs because they
give information about sg_fds which have been closed but not yet
freed due to still having outstanding commands and sg_devices which
have been removed but not yet freed due to still being referenced
by one or more sg_fds. To deal with this safely without removing
functionality, /proc functions now access sg_device and sg_fd while
holding a lock instead of using kref_get()/kref_put().
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Hi,
we have run into an issue with blktrace being started for sg devices.
Please apply.
Thanks,
Martin
From: Martin Peschke <mpeschke@linux.vnet.ibm.com>
The device number denoting a generic SCSI devices (sg) in a blktrace
trace is broken; major and minor are always 0. It looks like
sdp->device->sdev_gendev.devt is not initialized properly.
The fix below uses other data to make up a valid device number,
similar to the way an sg device number is generated for sysfs output.
Reported-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The commit 818827669d (block: make
blk_rq_map_user take a NULL user-space buffer) extended
blk_rq_map_user to accept a NULL user-space buffer with a READ
command. It was necessary to convert sg to use the block layer mapping
API.
This patch extends blk_rq_map_user again for a WRITE command. It is
necessary to convert st and osst drivers to use the block layer
apping API.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This fixes bio_copy_user_iov to properly handle the partial mappings
with struct rq_map_data (which only sg uses for now but st and osst
will shortly). It adds the offset member to struct rq_map_data and
changes blk_rq_map_user to update it so that bio_copy_user_iov can add
an appropriate page frame via bio_add_pc_page().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that device_create() has been audited, rename things back to the
original call to be sane.
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
blk_rq_unmap_user in sg_finish_rem_req can take care of all the cases.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
sg_read_xfer was used to copy data to user space for READ
commands. blk_rq_unmap_user does the job so sg_read_xfer does nothing
useful.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
sg_write_xfer was used to copy data from user space for WRITE
commands. blk_rq_map_user_iov and blk_rq_map_user do the job so
sg_write_xfer does nothing useful.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Calling blk_rq_map_user() at a single place is better than at
different two places. It makes the code more understandable.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
__sg_start_req() was used temporarily to call blk_get_request() during
converting sg to use the block layer.
Now sg always calls blk_get_request() so we can move blk_get_request()
to sg_start_req(). We don't need __sg_start_req anymore.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's not used for anything useful after the block layer conversion.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
sg had lots of the own functions for the direct IO but now sg uses the
block layer functions for it. There are only five lines for the direct
IO. SG_ALLOW_DIO_CODE define was used to compile out the direct IO
code but we don't need the define. If someone wants to remove the
direct IO code, he can do easily without the define.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
old sg_rq_end_io() was used to wrap sg_cmd_done during converting sg
to use the block layer (in order to cover the difference
scsi_execute_async and blk_execute_rq_nowait). Now we don't need it so
let's remove it.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
With the older SG interface, we don't know a user-space address to
trasfer data when executing a SCSI command. So we can't pass a
user-space address to blk_rq_map_user.
This patch fixes sg to pass a NULL user-space address to
blk_rq_map_user so that it just sets up a request and bios with page
frames propely without data transfer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts the indirect IO path (including mmap IO and old
struct sg_header) to use the block layer functions (blk_get_request,
blk_execute_rq_nowait, blk_rq_map_user, etc) instead of
scsi_execute_async().
[Jens: fixed compile error with SCSI logging enabled]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts the direct IO path (SG_FLAG_DIRECT_IO) to use the
block layer functions (blk_get_request, blk_execute_rq_nowait,
blk_rq_map_user, etc) instead of scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts the non data path to use the block layer functions
(blk_get_request, blk_execute_rq_nowait, etc) instead of uses
scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
sg allowed any command for TYPE_SCANNER. The cmd_filter patchset
doesn't. We can't change sg's permission since it might break the
existing software.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
cmd_filter works only for the block layer SG_IO with SCSI block
devices. It breaks scsi/sg.c, bsg, and the block layer SG_IO with SCSI
character devices (such as st). We hit a kernel crash with them.
The problem is that cmd_filter code accesses to gendisk (having struct
blk_scsi_cmd_filter) via inode->i_bdev->bd_disk. It works for only
SCSI block device files. With character device files, inode->i_bdev
leads you to struct cdev. inode->i_bdev->bd_disk->blk_scsi_cmd_filter
isn't safe.
SCSI ULDs don't expose gendisk; they keep it private. bsg needs to be
independent on any protocols. We shouldn't change ULDs to expose their
gendisk.
This patch moves struct blk_scsi_cmd_filter from gendisk to
request_queue, a common object, which eveyone can access to.
The user interface doesn't change; users can change the filters via
/sys/block/. gendisk has a pointer to request_queue so the cmd_filter
code accesses to struct blk_scsi_cmd_filter.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).
This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
[SCSI] scsi_dh: fix kconfig related build errors
[SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
[SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
[SCSI] make struct scsi_{host,target}_type static
[SCSI] fix locking in host use of blk_plug_device()
[SCSI] zfcp: Cleanup external header file
[SCSI] zfcp: Cleanup code in zfcp_erp.c
[SCSI] zfcp: zfcp_fsf cleanup.
[SCSI] zfcp: consolidate sysfs things into one file.
[SCSI] zfcp: Cleanup of code in zfcp_aux.c
[SCSI] zfcp: Cleanup of code in zfcp_scsi.c
[SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
[SCSI] zfcp: Small QDIO cleanups
[SCSI] zfcp: Adapter reopen for large number of unsolicited status
[SCSI] zfcp: Fix error checking for ELS ADISC requests
[SCSI] zfcp: wait until adapter is finished with ERP during auto-port
[SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
[SCSI] sg: Add target reset support
[SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
[SCSI] sd: Move scsi_disk() accessor function to sd.h
...
Adds support for target reset to SG_SCSI_RESET.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch adds the commands that the former sg filter allowed for read
access to the cmdfilter to keep userspace apps that rely on them working.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch exports the per-gendisk command filter to user space through
sysfs, so it can be changed by the system administrator.
All users of the old cmd filter have been converted to use the new one.
Original patch from Peter Jones.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There is a race from when a device is created with device_create() and
then the drvdata is set with a call to dev_set_drvdata() in which a
sysfs file could be open, yet the drvdata will be NULL, causing all
sorts of bad things to happen.
This patch fixes the problem by using the new function,
device_create_drvdata(). It fixes the problem in all of the scsi
drivers that need it.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.
Add correct ->owner to proc_fops to fix reading/module unloading race.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's big, but there doesn't seem to be a way to split it up smaller...
Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the SCSI layer uses the request queues from the block layer, blktrace can
also be used to trace the requests to all SCSI devices (like SCSI tape drives),
not only disks. The only missing part is the ioctl interface to start and stop
tracing.
This patch adds the SETUP, START, STOP and TEARDOWN ioctls from blktrace to the
sg device files. With this change, blktrace can be used for SCSI devices like
for disks, e.g.: blktrace -d /dev/sg1 -o - | blkparse -i -
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The patch "[SCSI] sg: use idr to replace static arrays" in 2.6.24-rc1
causes a bogus line to appear in /proc/scsi/sg/devices containing
"-1 -1 -1 -1 -1 -1 -1 -1 -1" when there are no SCSI devices in the
system. In 2.6.23, /proc/scsi/sg/devices is empty when there are no
SCSI devices in the system. A similar problem exists with
/proc/scsi/sg/device_strs. The following patch restores the behavior
of 2.6.23.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If cdev_add fails in sg_add, sg_remove crashes since class_data is
bogus.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When I use cdparanoia my logs get spammed a lot by
printk: 464 messages suppressed.
sg_write: data in/out 30576/30576 bytes for SCSI command 0xbe--guessing data in;
program cdparanoia not setting count and/or reply_len properly
printk: 1078 messages suppressed.
and many more of those. With this patch the message is only printed once
for a command in a row.
v1->v2: Prevent rate limit messages too (pointed out by jejb)
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
After turning on DEBUG_SG I hit a fail:
kernel BUG at include/linux/scatterlist.h:50!
sg_build_indirect
sg_build_reserve
sg_open
chrdev_open
__dentry_open
do_filp_open
do_sys_open
We should initialise the sg list when we allocate it in sg_build_sgat.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.
Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
sg uses a scheme to reallocate a single contiguous array of all its
pointers for lookup and management. This didn't matter too much when sg
could only attach 256 nodes, but now the maximum has been bumped up to
32k we're starting to push the limits of the maximum allocatable
contiguous memory. The solution to this is to eliminate the static
array and do everything via idr, which this patch does.
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
unsigned short is too small for sizeof(struct scatterlist) *
min(q->max_hw_segments, q->max_phys_segments).
This fixes memory leak with 4096 segments since 16 (likely sg size
with x86) * 4096 sets sglist_len to zero.
This might not happen without sg chaining support.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
coverity spotted this (cid #758). All callers dereference sfp, so we dont
need this check. In addition to this, we dereference it earlier in the
function.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (as857) modifies the SG_GET_RESERVED_SIZE and
SG_SET_RESERVED_SIZE ioctls in the sg driver, capping the values at
the device's request_queue's max_sectors value. This will permit
cdrecord to obtain a legal value for the maximum transfer length,
fixing Bugzilla #7026.
The patch also caps the initial reserved_size value. There's no
reason to have a reserved buffer larger than max_sectors, since it
would be impossible to use the extra space.
The corresponding ioctls in the block layer are modified similarly,
and the initial value for the reserved_size is set as large as
possible. This will effectively make it default to max_sectors.
Note that the actual value is meaningless anyway, since block devices
don't have a reserved buffer.
Finally, the BLKSECTGET ioctl is added to sg, so that there will be a
uniform way for users to determine the actual max_sectors value for
any raw SCSI transport.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For certain LLDs the sg driver can cause on oops
when the transfer length is large and not a
multiple of PAGE_SIZE.
ChangeLog:
- correct the length of the last scatter gather
list element.
- fix some printk()s that have the wrong function
name.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This sg driver patch addresses the problem with larger
page sizes reported by Brian King in this post:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115867718623631&w=2
Some other related matters are also addressed. Some of these
prevent oopses when the SG_SCATTER_SZ or scatter_elem_sz are
set to inappropriate values.
The scatter_elem_sz has been tested up to 4 MB which should
make the largest data transfer with one SCSI command, 32 MB
less one block, achievable with a relatively small number
of elements in the scatter gather list.
ChangeLog:
- add scatter_elem_sz boot time parameter and sysfs module
parameter that is initialized to SG_SCATTER_SZ
- the driver will then adjust scatter_elem_sz to be the
max(given(scatter_elem_sz), PAGE_SIZE)
It will also round it up, if necessary, to be a power
of two
- clean up sg.h header, correct bad urls and some statements
that are no longer valid
- make the def_reserved_size sysfs module attribute writable
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There's a problem where sg is executing a ->nopage operation on a
compound page, it actually calls get_page() on the first page in the
compound rather than the page which is being mapped. The fix is to
select the correct page by indexing into the compound.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Conflicts:
drivers/scsi/nsp32.c
drivers/scsi/pcmcia/nsp_cs.c
Removal of randomness flag conflicts with SA_ -> IRQF_ global
replacement.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I got a NULL derefrence in cdev_del+1 when called from sg_remove. By looking at
the code of sg_add, sg_alloc and sg_remove (all in drivers/scsi/sg.c) I found
out that sg_add is calling sg_alloc but if it fails afterwards it does not
deallocate the space that was allocated in sg_alloc and the redundant entry has
NULL in cdev. When sg_remove is being called, it tries to perform cdev_del to
this NULL cdev and fails.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove
duplicates of the macro.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
when the sg driver is unable to setup direct IO, free that scatter
gather list prior to falling back to indirect IO
Further to this thread started by Bryan Holty:
http://marc.theaimsgroup.com/?l=linux-scsi&m=114306885116728&w=2
Here is the reworked patch again. This time it has been
tested with a program provided by Bryan.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Doug found a bug where if scsi_execute_async fails, we are leaking
sg resources. scsi_do_req never failed so we did not have to handle
that case before.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code. Both of them have drawbacks.
This patch kills the scsi layer version after updating the block version
with the missing bits:
- argument checking
- use scatterlist I/O
- set number of retries based on the submitted command
This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path. Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?
Thanks to Or Gerlitz for testing this patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sg increments the refcount of constituent pages in its higher order memory
allocations when they are about to be mapped by userspace. This is done so
the subsequent get_page/put_page when doing the mapping and unmapping does not
free the page.
Move over to the preferred way, that is, using compound pages instead. This
fixes a whole class of possible obscure bugs where a get_user_pages on a
constituent page may outlast the user mappings or even the driver.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As devfs has been disabled from the kernel tree for a number of months
now (5 to be exact), here's a patch against 2.6.16-rc1-git1 that removes
support for it from the SCSI subsystem.
The patch also removes the scsi_disk devfs_name field as it's no longer
needed.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Change the core SCSI code to use kzalloc rather than kmalloc+memset
where possible.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove a hack in the sg driver that alters the total buffer
length for SG_IO commands to ensure buffers are not odd byte
lengths. This breaks on the ipr driver since it requires the
request_bufflen to equal the length specified in the cdb.
The block layer SG_IO code does not appear to have this hack.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When the scsi_execute_async interface was added it ended up reducing
the flexibility of userspace to send arbitrary scsi commands through
sg using SG_IO. The SG_IO interface allows userspace to specify the
CDB length. This is now ignored in scsi_execute_async and it is
guessed using the COMMAND_SIZE macro, which is not always correct,
particularly for vendor specific commands. This patch adds a cmd_len
parameter to the scsi_execute_async interface to allow the caller
to specify the length of the CDB.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sg_page_malloc should clear the data buffer, not that extent of mem_map.
This fixes Jesper's sg_page_free "Bad page states"
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert sg to always send scatterlists, and kill scsi_request usage.
TODO:
- move DIO code to common place or make block layers usable for ULDs.
- move buffer allocation code to common place for all ULDs to use. And
make buffer allocation code obey all queue limits so we can find
out about problems before calling scsi_execute_async. Currently, sg.c
could allocate a buffer that is too large, and send the request
to scsi_execute_async. scsi_execute_async will then check it against
all the queue limits and return a failure in this case. It would nicer
to know about the queue limit violation right away.
- move indirect (copy_to/from_user) paths commone place or make block
layers usable for ULDs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sg's st_map_user_pages is modelled on an earlier version of st's
sgl_map_user_pages, and has the same bug: if get_user_pages got some but
not all of the pages, then those got were released, but the positive res
code returned implied that they were still to be freed.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2.6.15-rc1 made sg's st_unmap_user_pages and st's sgl_unmap_user_pages
BUG on a PageReserved page. But that's wrong: they could be unmapping
the ZERO_PAGE, which is marked PG_reserved; and perhaps others (while
get_user_pages is still permitted on VM_PFNMAP areas - that may change).
More change is needed here: sg claims to dirty even pages written from,
and st claims not to dirty even pages read into; and SetPageDirty is not
adequate for this nowadays. Fixes to those follow in a later patch: for
the moment just fix the 2.6.15 regression.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch removes almost all inclusions of linux/version.h. The 3
#defines are unused in most of the touched files.
A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
unfortunatly in linux/version.h.
There are also lots of #ifdef for long obsolete kernels, this was not
touched. In a few places, the linux/version.h include was move to where
the LINUX_VERSION_CODE was used.
quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`
search pattern:
/UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the drivers/scsi/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in drivers/scsi/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Kai Makisara <kai.makisara@kolumbus.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PageReserved() calls from core code by tightening VM_RESERVED
handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged
in the page_alloc checks to help ensure we don't introduce any refcount
based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
deprecated. We never completely handled it correctly anyway, and is be
reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to
determine whether a struct page points to valid memory or not. This still
needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
thus mapcounted and count towards shared rss). These writes to the struct
page could cause excessive cacheline bouncing on big systems. There are a
number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch uses sg_set_buf/sg_init_one in some places where it was
duplicated.
Signed-off-by: David Hardeman <david@2gen.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This should eliminate (at least in the mid layer) to make numeric
assumptions about any of the enumeration variables. As a side effect,
it will also make all the messages consistent and line us up nicely for
the error logging strategy (if it ever shows itself again).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create(). This patch
fixes up all in-kernel users of the function.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Driver core: pass interface to class intreface methods
Pass interface as argument to add() and remove() class interface
methods. This way a subsystem can implement generic add/remove
handlers and then call interface-specific ones.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A bunch of create_proc_dir_entry() calls creating directories had crept
in since the last sweep; converted to proc_mkdir().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We fix the oops by enforcing the host state model. There have also
been two extra states added: SHOST_CANCEL_RECOVERY and
SHOST_DEL_RECOVERY so we can take the model through host removal while
the recovery thread is active.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Further to the problem discussed in this post:
http://marc.theaimsgroup.com/?l=linux-scsi&m=112540053711489&w=2
It seems that the sg driver does not need to set the VM_IO flag
on pages that it memory maps to the user space since they are
not from the IO space. Ahmed Teirelbar <ahmed.teirelbar@adic.com>
wants the facility and has tested this patch as I have without
adverse effects.
The oops protection is still important. Some users really did
try and use dio transfers from the sg driver to memory mapped
IO space (on a video capture card if my memory serves) during the
lk 2.4 series. I'm not sure how successful it was but that will
now be politely refused in lk 2.6.13+ .
Changelog:
- set the page flags for sg's reserved buffer mmap-ed
to the user space to VM_RESERVED (rather than
VM_RESERVED | VM_IO )
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch adopts the same solution as proposed by Kai M. in
a post titled: "[PATCH] SCSI tape signed/unsigned fix".
The fix is in a function that the sg driver borrowed from
the st driver so its maintenance is a little easier if
the functions remain the same after the fix.
- change nr_pages type from unsigned to signed so errors
from get_user_pages() call are properly handled
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I know that scsi procfs is legacy code but this is a fix for a memory leak.
While reading through sg.c I realized that the implementation of
/proc/scsi/sg/devices with seq_file is leaking memory due to freeing the
pointer returned by the next() iterator method. Since next() might return
NULL or an error this is wrong. This patch fixes it through using the
seq_files private field for holding the reference to the iterator object.
Here is a small bash script to trigger the leak. Use slabtop to watch
the size-32 usage grow and grow.
#!/bin/sh
while true; do
cat /proc/scsi/sg/devices > /dev/null
done
Signed-off-by: Jan Blunck <j.blunck@tu-harburg.de>
Acked-by: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Migrate the current SCSI host state model to a model like SCSI
device is using.
Signed-off-by: Mike Anderson <andmike@us.ibm.com>
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A problem exists todayin the sg driver that if an SG_IO request is
outstanding to a device when it is removed from the system. The
system may oops if that command completes later in time.
1. sg_remove gets called
2. sg_remove calls sg_finish_req_req on all pending requests
This removes the Sg_request's from the headrp list in the Sg_fd
3. The sleeping SG_IO ioctl is woken. It does nothing and returns.
4. The caller closes the fd, which invokes sg_release
5. sg_release calls sg_remove_sfp. It finds no outstanding commands
since the headrp list is empty, so it calls __sg_remove_sfp,
which frees the sfp.
6. Now when sg_cmd_done gets called, sg uses upper_private_data in
the Scsi_Request, which should point to the srp, which has been
freed, so it points to freed memory.
7. sg then dereferences the srp pointer to get the sfp, and we oops.
The fix is to NULL out the upper_private_data field in this path,
which sg_cmd_done already checks for, which will prevent the oops
from occurring.
cpu 0x1: Vector: 300 (Data Access) at [c00000000fff7aa0]
pc: d0000000002bbea8: .sg_cmd_done+0x70/0x394 [sg]
lr: d000000000073304: .scsi_finish_command+0x10c/0x130 [scsi_mod]
sp: c00000000fff7d20
msr: 8000000000009032
dar: 2f70726f63202f78
dsisr: 40000000
current = 0xc0000000024589b0
paca = 0xc0000000003da800
pid = 7, comm = events/1
[c00000000fff7dc0] d000000000073304 .scsi_finish_command+0x10c/0x130 [scsi_mod]
[c00000000fff7e50] d00000000007317c .scsi_softirq+0x140/0x168 [scsi_mod]
[c00000000fff7ef0] c0000000000634dc .__do_softirq+0xa0/0x17c
[c00000000fff7f90] c000000000018430 .call_do_softirq+0x14/0x24
[c00000000ed472e0] c0000000000142e0 .do_softirq+0x74/0x9c
[c00000000ed47370] c000000000013c9c .do_IRQ+0xe8/0x100
[c00000000ed473f0] c00000000000ae34 HardwareInterrupt_entry+0x8/0x54
c00000000003df28 .smp_call_function+0
x100/0x1d0
[c00000000ed47780] c0000000000ba99c .invalidate_bh_lrus+0x30/0x70
[c00000000ed47810] c0000000000b91a0 .invalidate_bdev+0x18/0x3c
[c00000000ed478a0] c0000000000da7b8 .__invalidate_device+0x70/0x94
[c00000000ed47930] c0000000001d40bc .invalidate_partition+0x4c/0x7c
[c00000000ed479c0] c00000000010a944 .del_gendisk+0x48/0x15c
[c00000000ed47a50] d00000000003d55c .sd_remove+0x34/0xe4 [sd_mod]
[c00000000ed47ae0] c0000000001c5d30 .device_release_driver+0x90/0xb4
[c00000000ed47b70] c0000000001c6130 .bus_remove_device+0xb0/0x12c
[c00000000ed47c00] c0000000001c4378 .device_del+0x120/0x198
[c00000000ed47ca0] d00000000007dcdc .scsi_remove_device+0xb4/0x194 [scsi_mod]
[c00000000ed47d30] d0000000000a5864 .ipr_worker_thread+0x1d4/0x27c [ipr]
[c00000000ed47dd0] c0000000000734c4 .worker_thread+0x238/0x2f4
[c00000000ed47ee0] c0000000000796c0 .kthread+0xcc/0x11c
[c00000000ed47f90] c000000000018ad0 .kernel_thread+0x4c/0x6c
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
these have been wrappers for the generic dma direction bits since 2.5.x.
This patch converts the few remaining drivers and removes the macros.
Arjan noticed there's some hunk in here that shouldn't. Updated patch
below:
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We have the scsi_print_* functions in the proper namespace for a long
time now and there weren't a lot users left.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The attachment combines the most recent patch from
Yum Rayan <yum.rayan@gmail.com> (to reduce sg stack
usage), Adrian Bunk <bunk@stusta.de> (to fix check
after use) and me (fix elapsed time calculation
(duration) on ia64 machines).
I have modified the patch from Yum Rayan so kmalloc()
in sg_read() is only called for the (rare) code paths
that need them.
Changelog:
- reduce stack usage in sg_ioctl() and sg_read()
- fix check after use in sg_mmap()
- hold duration internally in milliseconds and
check current time later than held time
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!