Commit Graph

239 Commits

Author SHA1 Message Date
Jens Axboe
206dc69b31 [BLOCK] cfq-iosched: seek and async performance fixes
Detect whether a given process is seeky and if so disable (mostly) the
idle window if it is. We still allow just a little idle time, just enough
to allow that process to submit a new request. That is needed to maintain
fairness across priority groups.

In some cases, we could setup several async queues. This is not optimal
from a performance POV, since we want all async io in one queue to perform
good sorting on it. It also impacted sync queues, as async io got too much
slice time.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 13:03:44 +02:00
Jens Axboe
7143dd4b01 [PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
This makes akpm more happy.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 09:00:28 +02:00
Andreas Mohr
e8a99053ea [PATCH] cfq-iosched: small cfq_choose_req() optimization
this is a small optimization to cfq_choose_req() in the CFQ I/O scheduler
(this function is a semi-often invoked candidate in an oprofile log):
by using a bit mask variable, we can use a simple switch() to check
the various cases instead of having to query two variables for each check.
Benefit: 251 vs. 285 bytes footprint of cfq_choose_req().
Also, common case 0 (no request wrapping) is now checked first in code.

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 08:59:49 +02:00
Jens Axboe
e2d74ac066 [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
On setups with many disks, we spend a considerable amount of time
looking up the process-disk mapping on each queue of io. Testing with
a NULL based block driver, this costs 40-50% reduction in throughput
for 1000 disks.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 08:59:01 +02:00
Linus Torvalds
4fa639123d Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] Don't make debugfs depend on DEBUG_KERNEL
  [PATCH] Fix blktrace compile with sysfs not defined
  [PATCH] unused label in drivers/block/cciss.
  [BLOCK] increase size of disk stat counters
  [PATCH] blk_execute_rq_nowait-speedup
  [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure
  [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
  [PATCH] kzalloc() conversion in drivers/block
  [PATCH] update max_sectors documentation
2006-03-27 08:46:49 -08:00
NeilBrown
89e5c8b5b8 [PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md.
This flag should be set for a virtual device iff it is set for all underlying
devices.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:00 -08:00
Jens Axboe
09540e691d [PATCH] Fix blktrace compile with sysfs not defined
debugfs depends on sysfs, so make blktrace kconfig option depend
on that.

Reported by Adrian Bunk.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:03 +02:00
Ben Woodard
837c787877 [BLOCK] increase size of disk stat counters
The kernel's representation of the disk statistics uses the type unsigned
which is 32b on both 32b and 64b platforms.  Unfortunately, most system
tools that work with these numbers that are exported in /proc/diskstats
including iostat read these numbers into unsigned longs.  This works fine
on 32b platforms and when the number of IO transactions are small on 64b
platforms.  However, when the numbers wrap on 64b platforms & you read the
numbers into unsigned longs, and compare the numbers to previous readings,
then you get an unsigned representation of a negative number.  This looks
like a very large 64b number & gives you bizarre readouts in iostat:

ilc4: Device:    rrqm/s wrqm/s r/s    w/s  rsec/s  wsec/s    rkB/s wkB/s avgrq-sz avgqu-sz   await  svctm  %util
ilc4: sda        5.50   0.00   143.96 0.00 307496983987862656.00 0.00 153748491993931328.00     0.00 2136028725038430.00     7.94   55.12    5.59  80.42

Though fixing iostat in user space is possible, and a quick survey
indicates that several other similar tools also use unsigned longs when
processing /proc/diskstats.  Therefore, it seems like a better approach
would be to extend the length of the disk_stats structure on 64b
architectures to 64b.  The following patch does that.  It should not affect
the operation on 32b platforms.

Signed-off-by: Ben Woodard <woodard@redhat.com>
Cc: Rick Lindsley <ricklind@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Andrew Morton
4c5d0bbde9 [PATCH] blk_execute_rq_nowait-speedup
Both elv_add_request() and generic_unplug_device() grab the queue lock
and disable interrupts, do that locally and use the __ variants.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Jens Axboe
f68110fc28 [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Takashi Sato
a0f62ac636 [PATCH] 2TB files: add blkcnt_t
Add blkcnt_t as the type of inode.i_blocks.  This enables you to make the size
of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF.

- CONFIG_LSF
  Add new configuration parameter.
- blkcnt_t
  On h8300, i386, mips, powerpc, s390 and sh that define sector_t,
  blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is
  defined as unsigned long.
  On other architectures, it is defined as unsigned long.
- inode.i_blocks
  Change the type from sector_t to blkcnt_t.

Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson
93d2341c75 [PATCH] mempool: use mempool_create_slab_pool()
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Eric Sesterhenn
ce52449742 BUG_ON() Conversion in block/elevator.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:43:26 +01:00
Jens Axboe
2056a782f8 [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23 20:00:26 +01:00
Arjan van de Ven
c039e3134a [PATCH] sem2mutex: blockdev #2
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:11 -08:00
Jes Sorensen
58383af629 [PATCH] kobj_map semaphore to mutex conversion
Convert the kobj_map code to use a mutex instead of a semaphore.  It
converts the single two users as well, genhd.c and char_dev.c.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:58 -08:00
Al Viro
e572ec7e4e [PATCH] fix rmmod problems with elevator attributes, clean them up 2006-03-18 22:27:18 -05:00
Al Viro
3d1ab40f4c [PATCH] elevator_t lifetime rules and sysfs fixes 2006-03-18 18:35:43 -05:00
Al Viro
1cc9be68eb [PATCH] noise removal: cfq-iosched.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:35:08 -05:00
Al Viro
a90d742e4c [PATCH] don't bother with refcounting for cfq_data
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:35:05 -05:00
Al Viro
483f4afc42 [PATCH] fix sysfs interaction and lifetime rules handling for queues 2006-03-18 18:34:37 -05:00
Al Viro
6f325a1344 [PATCH] fix cfq_get_queue()/ioprio_set(2) races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:17 -05:00
Al Viro
334e94de9b [PATCH] deal with rmmod/put_io_context() races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:15 -05:00
Al Viro
e17a9489b4 [PATCH] stop elv_unregister() from rogering other iosched's data, fix locking
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:12 -05:00
Al Viro
25975f863b [PATCH] stop cfq from pinning queue down
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:09 -05:00
Al Viro
d9ff418793 [PATCH] make cfq_exit_queue() prune the cfq_io_context for that queue
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:07 -05:00
Al Viro
a6a0763a60 [PATCH] fix the exclusion for ioprio_set()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:04 -05:00
Al Viro
12a0573215 [PATCH] keep sync and async cfq_queue separate
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:02 -05:00
Al Viro
478a82b0ed [PATCH] switch to use of ->key to get cfq_data by cfq_io_context
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:59 -05:00
Al Viro
7670876d2d [PATCH] stop leaking cfq_data in cfq_set_request()
We don't need to pin ->key down; ->cfqq->cfqd will do that for us.
Incidentally, that stops the leak we had - that reference was never
dropped.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:56 -05:00
Al Viro
b0a6916bcc [PATCH] fix cfq hash lookups
If somebody does a hash lookup for cfq_queue while ioprio of an async queue
is elevated, they shouldn't end up stuck with lowered ioprio when we go back.
Fix is to use ->org_ioprio{,class} in hash lookups.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:54 -05:00
Al Viro
c981ff9f89 [PATCH] fix locking in queue_requests_store()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:51 -05:00
Al Viro
8669aafdb5 [PATCH] fix double-free in blk_init_queue_node()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:49 -05:00
Andi Kleen
5ee1af9f51 [PATCH] block: disable block layer bouncing for most memory on 64bit systems
The low level PCI DMA mapping functions should handle it in most cases.

This should fix problems with depleting the DMA zone early. The old
code used precious GFP_DMA memory in many cases where it was not needed.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 18:10:31 -08:00
Jens Axboe
7b14e3b52f [PATCH] cfq-iosched: slice expiry fixups
During testing of SLES10, we encountered a hang in the CFQ io scheduler.
Turns out the deferred slice expiry logic is buggy, so remove that for
now.  We could be left with an idle queue that would never wake up.  So
kill that logic, always expire immediately.  Also fix a potential timer
race condition.

Patch looks bigger than it is, because it moves a function.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-28 00:38:02 -08:00
Linus Torvalds
b7ed1de0ae Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block 2006-02-08 07:58:18 -08:00
Tejun Heo
30e9656cc3 [PATCH] block: implement elv_insert and use it (fix ordcolor flipping bug)
q->ordcolor must only be flipped on initial queueing of a hardbarrier
request.

Constructing ordered sequence and requeueing used to pass through
__elv_add_request() which flips q->ordcolor when it sees a barrier
request.

This patch separates out elv_insert() from __elv_add_request() and uses
elv_insert() when constructing ordered sequence and requeueing.
elv_insert() inserts the given request at the specified position and
does nothing else.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-08 07:52:58 -08:00
Jens Axboe
01840f9c9d [PATCH] blk: Fix SG_IO ioctl failure retry looping
When issuing an SG_IO ioctl through sd that resulted in an unrecoverable
error, a nearly infinite retry loop was discovered. This is due to the
fact that the block layer SG_IO code is not setting up rq->retries. This
patch also fixes up the sg_scsi_ioctl path.

Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-02-08 10:07:13 +01:00
Tejun Heo
238e7db935 [PATCH] block: request_queue->ordcolor must not be flipped on SOFTBARRIER
q->ordcolor must not be flipped on SOFTBARRIER.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:51 -08:00
Jens Axboe
9a7a67af8b [PATCH] fix ordering on requeued request drainage
Previously, if a fs request which was being drained failed and got
requeued, blk_do_ordered() didn't allow it to be reissued, which causes
queue stall.  This patch makes blk_do_ordered() use the sequence of each
request to determine whether a request can be issued or not.  This fixes
the bug and simplifies code.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:51 -08:00
Eric Dumazet
88a2a4ac6b [PATCH] percpu data: only iterate over possible CPUs
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().

(The above only applies to users of asm-generic/percpu.h.  powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:51 -08:00
Jun'ichi "Nick" Nomura
3eaf840e0b [PATCH] device-mapper disk statistics: timing
Record I/O timing statistics

The start time is added to struct dm_io, an existing structure allocated
privately internally within dm and attached to each incoming bio.

We export disk_round_stats() from block/ll_rw_blk.c instead of creating a
private clone.

Signed-off-by: Jun'ichi "Nick" Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:11 -08:00
Jens Axboe
fddfdeafa8 [BLOCK] A few kerneldoc fixups
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-31 15:24:34 +01:00
Tetsuo Takata
60481b12b8 [BLOCK] ll_rw_blk: fix setting of ->ordered on init
This makes XFS barrier mounts succeed on my SCSI system.

Signed-off-by: Tetsuo Takata <takatatt@intellilink.co.jp>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:34:36 +01:00
Nate Diller
248d5ca5ed [BLOCK] elevator: allow default scheduler to potentially be modular
Jens has decided that allowing the default scheduler to be a module is
a bug, and should not be allowed under kconfig.  However, I find that
scenario useful for debugging, and wish for the kernel to be able to
handle this situation without OOPSing, if I enable such an option in
the .config directly.  This patch dynamically checks for the presence
of the compiled-in default, and falls back to no-op, emitting a
suitable error message, when the default is not available

Tested for a range of boot options on 2.6.16-rc1-mm2.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:09:14 +01:00
Nate Diller
5f00397644 [BLOCK] elevator: default choice selection
My previous default iosched patch did a poor job dealing with the
'elevator=' boot-time option.  The old behavior falls back to the
compiled-in default if the requested one is not registered at boot
time.  This patch dynamically evaluates which default
to use, and emits a suitable error message when the requested scheduler
is not available.  It also does the 'as' -> 'anticipatory' conversion
before elevator registration, which along with a modified registration
function, allows it to correctly indicate which default scheduler is
in use.

Tested for a range of boot options on 2.6.16-rc1-mm2.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:07:58 +01:00
Jens Axboe
53e86061b5 [BLOCK] ll_rw_blk: use preempt-disabling disk_stat_add() in completion
It can legally be called with interrupts/preemption enabled.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:06:19 +01:00
Jens Axboe
2cb2e147a6 [BLOCK] ll_rw_blk: make max_sectors and max_hw_sectors unsigned ints
IDE lba48 can support full 64k request size, which overflows the
max_hw_sectors variable.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:06:19 +01:00
Jens Axboe
b7bfcf7cbd [BLOCK] elevator: if specified scheduler is not found, fall back to default
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-16 09:48:58 +01:00
Chuck Ebbert
752a3b7963 [BLOCK] elevator: Make elevator=as work again for anticipatory
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-16 09:47:37 +01:00
Neil Horman
7170be5f58 [PATCH] convert /proc/devices to use seq_file interface
A Christoph suggested that the /proc/devices file be converted to use the
seq_file interface.  This patch does that.

I've obxerved one or two installation that had sufficiently large sans that
they overran the 4k limit on /proc/devices.

Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 18:25:19 -08:00
Tejun Heo
1bc691d357 [PATCH] fix queue stalling while barrier sequencing
If ordered tag isn't supported, request ordering for barrier
sequencing is performed by queue draining, which basically hangs the
request queue until elv_completed_request() reports completion of all
previous fs requests.

The condition check in elv_completed_request() was only performed for
fs requests.  If a special request is queued between the last
to-be-drained request and the barrier sequence, draining is never
completed and the queue is stalled forever.

This patch moves the end-of-draining condition check such that it's
performed for all requests.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:05:39 -08:00
Randy.Dunlap
c59ede7b78 [PATCH] move capable() to capability.h
- Move capable() from sched.h to capability.h;

- Use <linux/capability.h> where capable() is used
	(in include/, block/, ipc/, kernel/, a few drivers/,
	mm/, security/, & sound/;
	many more drivers/ to go)

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:13 -08:00
Linus Torvalds
e2688f00dc Merge branch 'blk-softirq' of git://brick.kernel.dk/data/git/linux-2.6-block
Manual merge for trivial #include changes
2006-01-09 09:26:40 -08:00
Jens Axboe
ff856bad67 [BLOCK] ll_rw_blk: Enable out-of-order request completions through softirq
Request completion can be a quite heavy process, since it needs to
iterate through the entire request and complete the bio's it holds.
This patch adds blk_complete_request() which moves this processing
into a dedicated block softirq.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 16:02:34 +01:00
Jens Axboe
356cebea11 [BLOCK] Kill blk_attempt_remerge()
It's a broken interface, it's done way too late. And apparently it triggers
slab problems in recent kernels as well (most likely after the generic dispatch
code was merged). So kill it, ide-cd is the only user of it.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 15:30:20 +01:00
Jens Axboe
5a57be8d10 [BLOCK] scsi_ioctl: file can be NULL from ioctl_by_bdev()
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 14:52:21 +01:00
Coywolf Qi Hunt
769db45b73 make elv_try_merge() static, kill the dead declaration of
elv_try_last_merge().

Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 14:44:15 +01:00
Nicolas Kaiser
1abee6d2d1 [BLOCK][TRIVIAL] ll_rw_blk: header included twice
linux/blkdev.h included twice

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 14:44:15 +01:00
Christoph Hellwig
a885c8c431 [PATCH] Add block_device_operations.getgeo block device method
HDIO_GETGEO is implemented in most block drivers, and all of them have to
duplicate the code to copy the structure to userspace, as well as getting
the start sector.  This patch moves that to common code [1] and adds a
->getgeo method to fill out the raw kernel hd_geometry structure.  For many
drivers this means ->ioctl can go away now.

[1] the s390 block drivers are odd in this respect.  xpram sets ->start
    to 4 always which seems more than odd, and the dasd driver shifts
    the start offset around, probably because of it's non-standard
    sector size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@suse.de>
Cc: <mike.miller@hp.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo Giarrusso <blaisorblade@yahoo.it>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:54 -08:00
Linus Torvalds
d99cf9d679 Merge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-block
Manual fixup for merge with Jens' "Suspend support for libata", commit
ID 9b84754866.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 09:01:25 -08:00
Martin Schwidefsky
347a8dc3b8 [PATCH] s390: cleanup Kconfig
Sanitize some s390 Kconfig options.  We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT.  Replace these 6 options by
S390, 64BIT and COMPAT.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:53 -08:00
Tejun Heo
797e7dbbee [BLOCK] reimplement handling of barrier request
Reimplement handling of barrier requests.

* Flexible handling to deal with various capabilities of
  target devices.
* Retry support for falling back.
* Tagged queues which don't support ordered tag can do ordered.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:51:03 +01:00
Tejun Heo
52d9e67536 [BLOCK] ll_rw_blk: separate out bio init part from __make_request
Separate out bio initialization part from __make_request.  It
will be used by the following blk_ordered_reimpl.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:49:58 +01:00
Tejun Heo
8ffdc6550c [BLOCK] add @uptodate to end_that_request_last() and @error to rq_end_io_fn()
add @uptodate argument to end_that_request_last() and @error
to rq_end_io_fn().  there's no generic way to pass error code
to request completion function, making generic error handling
of non-fs request difficult (rq->errors is driver-specific and
each driver uses it differently).  this patch adds @uptodate
to end_that_request_last() and @error to rq_end_io_fn().

for fs requests, this doesn't really matter, so just using the
same uptodate argument used in the last call to
end_that_request_first() should suffice.  imho, this can also
help the generic command-carrying request jens is working on.

Signed-off-by: tejun heo <htejun@gmail.com>
Signed-Off-By: Jens Axboe <axboe@suse.de>
2006-01-06 09:49:03 +01:00
Arjan van de Ven
64100099ed [BLOCK] mark some block/ variables cons
the patch below marks various read-only variables in block/* as const,
so that gcc can optimize the use of them; eg gcc will replace the use by
the value directly now and will even remove the memory usage of these.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:46:02 +01:00
Jens Axboe
88ee5ef157 [BLOCK] ll_rw_blk: fastpath get_request()
Originally from: Nick Piggin <nickpiggin@yahoo.com.au>

Move current_io_context out of the get_request fastpth.  Also try to
streamline a few other things in this area.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:39:04 +01:00
Tejun Heo
ef9be1d336 [BLOCK] as-iosched: update alias handling
Unlike other ioscheds, as-iosched handles alias by chaing them using
rq->queuelist.  As aliased requests are very rare in the first place,
this complicates merge/dispatch handling without meaningful
performance improvement.  This patch updates as-iosched to dump
aliased requests into dispatch queue as other ioscheds do.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:39:03 +01:00
Linus Torvalds
db9edfd7e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
Trivial manual merge fixup for usb_find_interface clashes.
2006-01-04 18:44:12 -08:00
Linus Torvalds
f61ea1b0c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2006-01-04 16:30:12 -08:00
Kay Sievers
312c004d36 [PATCH] driver core: replace "hotplug" by "uevent"
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:08 -08:00
Ben Collins
f98d2dfd02 [PATCH] block: Cleanup CDROMEJECT ioctl
This is just a basic cleanup. No change in functionality.

Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-19 16:47:50 -08:00
Mike Christie
defd94b754 [SCSI] seperate max_sectors from max_hw_sectors
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.

Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15 15:11:40 -08:00
Mike Christie
6e39b69e7e [SCSI] export blk layer functions needed for blk_execute_rq_nowait
To send async requests we need these two functions exported.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-14 19:00:50 -08:00
Jens Axboe
8ad9ebb391 [PATCH] as-iosched: remove state assertion in as_add_request()
Kill the arq->state poison statement in as_add_request(), it can trigger
for perfectly valid code that just reuses a request after io completion
instead of freeing it and allocating a new one. We probably should
introduce a blk_init_request() to start from scratch, but for now just
kill it as we will be removing the as specific poisoning soon.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-21 11:04:52 -08:00
Coywolf Qi Hunt
eb97b73d75 [BLOCK] new block/ directory comment tidy
Some leftover comments referring to drivers/block that are now block/.
They don't add any information we don't already have, so kill them.

Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-18 21:59:31 +01:00
Tejun Heo
3beb207712 [BLOCK] elevator: elv_latter/former_request update
With generic dispatch queue update, implicit former/latter request
handling using rq->queuelist.prev/next doesn't work as expected
anymore.  Also, the only iosched dependent on this feature was
noop-iosched and it has been reimplemented to have its own
latter/former methods.  This patch removes implicit former/latter
handling.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:57:05 +01:00
Tejun Heo
5a7c47eefb [BLOCK] noop-iosched: reimplementation of request dispatching
The original implementation directly used dispatch queue.  As new
generic dispatch queue imposes stricter rules over ioscheds and
dispatch queue usage, this direct use becomes somewhat problematic.
This patch reimplements noop-iosched such that it complies to generic
iosched model better.  Request merging with q->last_merge and
rq->queuelist.prev/next work again now.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de
2005-11-12 10:56:52 +01:00
Tejun Heo
b740d98f56 [BLOCK] cfq-iosched: fix slice_left calculation
When cfq slice expires, remainder of slice is calculated and stored in
cfqq->slice_left.  Current code calculates the opposite of remainder -
how many jiffies the cfqq has used past slice end.  This patch fixes
the bug.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:56:36 +01:00
Tejun Heo
be56123568 [BLOCK] fix string handling in elv_iosched_store
elv_iosched_store doesn't terminate string passed from userspace if
it's too long.  Also, if the written length is zero (probably not
possible), it accesses elevator_name[-1].  This patch fixes both bugs.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:56:21 +01:00
Tejun Heo
15853af9f0 [BLOCK] Implement elv_drain_elevator for improved switch error detection
This patch adds request_queue->nr_sorted which keeps the number of
requests in the iosched and implement elv_drain_elevator which
performs forced dispatching.  elv_drain_elevator checks whether
iosched actually dispatches all requests it has and prints error
message if it doesn't.  As buggy forced dispatching can result in
wrong barrier operations, I think this extra check is worthwhile.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:56:06 +01:00
Tejun Heo
1b5ed5e1f1 [BLOCK] cfq-iosched: cfq forced dispatching fix
cfq forced dispatching might not return all requests on the queue.
This bug can hang elevator switchinig and corrupt request ordering
during flush sequence.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:55:51 +01:00
Tejun Heo
407df2aa29 [BLOCK] elevator: run queue in elevator_switch
elevator_dispatch needs to run queue after forced dispatching;
otherwise, the queue might stall.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:55:37 +01:00
Jens Axboe
47a004103d [BLOCK] Document the READ/WRITE splitup of the disk stats
Use the symbolic name where appropriate and add a comment to the
disk_stats structure.

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:55:21 +01:00
Zachary Amsden
cff3ba2204 [BLOCK] elevator init fixes #2
In addition to the first patch, which is probably goodness, I found the
cause of my panic - applying this patch fixes it and now I am booting.
If the chosen_elevator[] is not found, fall back to noop.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:55:05 +01:00
Zachary Amsden
b8ea2cb512 [BLOCK] elevator init fixes
I got a panic in the elevator code, backtrace :

Unable to handle kernel NULL pointer dereference at virtual address 00000060
..
EIP is at elevator_put+0x0/0x30 (null elevator_type passed)
..
elevator_init+0x38
blk_init_queu_node+0xc9
floppy_init+0xdb
do_initcalls+0x23
init+0x10a
init+0x0

Clearly if the kmalloc here fails, e->elevator_type is not yet set; this
appears to be the correct fix, but I think I probably hit the second case
due to a race condition.  Someone more familiar with the elevator code
should look at this more closely until I can determine if I can reproduce.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-12 10:54:48 +01:00
Linus Torvalds
333c47c847 Merge branch 'block-dir' of git://brick.kernel.dk/data/git/linux-2.6-block 2005-11-07 08:32:39 -08:00
Jens Axboe
c6ea2ba7b8 [BLOCK] iosched: fix setting of default io scheduler
With the recent reorg of the io scheduler selection, it unfortunately
became possible to select an io scheduler to be the default even if it
wasn't builtin. Fix this by requiring the default scheduler to be
builtin.

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-04 08:44:58 +01:00
Jens Axboe
3a65dfe8c0 [BLOCK] Move all core block layer code to new block/ directory
drivers/block/ is right now a mix of core and driver parts. Lets move
the core parts to a new top level directory. Al will move the fs/
related block parts to block/ next.

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-04 08:43:35 +01:00