linux

Author	SHA1	Message	Date
Ingo Molnar	91d75e209b	Merge branch 'x86/core' into core/percpu	2009-03-04 02:29:19 +01:00
Jens Axboe	1e42807918	block: reduce stack footprint of blk_recount_segments() blk_recalc_rq_segments() requires a request structure passed in, which we don't have from blk_recount_segments(). So the latter allocates one on the stack, using > 400 bytes of stack for that. This can cause us to spill over one page of stack from ext4 at least: 0) 4560 400 blk_recount_segments+0x43/0x62 1) 4160 32 bio_phys_segments+0x1c/0x24 2) 4128 32 blk_rq_bio_prep+0x2a/0xf9 3) 4096 32 init_request_from_bio+0xf9/0xfe 4) 4064 112 __make_request+0x33c/0x3f6 5) 3952 144 generic_make_request+0x2d1/0x321 6) 3808 64 submit_bio+0xb9/0xc3 7) 3744 48 submit_bh+0xea/0x10e 8) 3696 368 ext4_mb_init_cache+0x257/0xa6a [ext4] 9) 3328 288 ext4_mb_regular_allocator+0x421/0xcd9 [ext4] 10) 3040 160 ext4_mb_new_blocks+0x211/0x4b4 [ext4] 11) 2880 336 ext4_ext_get_blocks+0xb61/0xd45 [ext4] 12) 2544 96 ext4_get_blocks_wrap+0xf2/0x200 [ext4] 13) 2448 80 ext4_da_get_block_write+0x6e/0x16b [ext4] 14) 2368 352 mpage_da_map_blocks+0x7e/0x4b3 [ext4] 15) 2016 352 ext4_da_writepages+0x2ce/0x43c [ext4] 16) 1664 32 do_writepages+0x2d/0x3c 17) 1632 144 __writeback_single_inode+0x162/0x2cd 18) 1488 96 generic_sync_sb_inodes+0x1e3/0x32b 19) 1392 16 sync_sb_inodes+0xe/0x10 20) 1376 48 writeback_inodes+0x69/0xb3 21) 1328 208 balance_dirty_pages_ratelimited_nr+0x187/0x2f9 22) 1120 224 generic_file_buffered_write+0x1d4/0x2c4 23) 896 176 __generic_file_aio_write_nolock+0x35f/0x393 24) 720 80 generic_file_aio_write+0x6c/0xc8 25) 640 80 ext4_file_write+0xa9/0x137 [ext4] 26) 560 320 do_sync_write+0xf0/0x137 27) 240 48 vfs_write+0xb3/0x13c 28) 192 64 sys_write+0x4c/0x74 29) 128 128 system_call_fastpath+0x16/0x1b Split the segment counting out into a __blk_recalc_rq_segments() helper to avoid allocating an onstack request just for checking the physical segment count. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-26 10:45:48 +01:00
Márton Németh	9e8c0bccdc	block: add documentation for register_blkdev() Add documentation for register_blkdev() function and for the parameters. Signed-off-by: Márton Németh <nm127@freemail.hu> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-26 10:45:48 +01:00
Ingo Molnar	0edcf8d692	Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu Conflicts: arch/x86/include/asm/pgtable.h	2009-02-24 21:52:45 +01:00
Rusty Russell	313e458f81	alloc_percpu: add align argument to __alloc_percpu. This prepares for a real __alloc_percpu, by adding an alignment argument. Only one place uses __alloc_percpu directly, and that's for a string. tj: af_inet also uses __alloc_percpu(), update it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk>	2009-02-20 16:29:08 +09:00
Hannes Reinecke	be987fdb55	block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list blk_abort_queue() iterates the timeout list and aborts each request on the list, but if the driver error handling readds a request to the timeout list during this processing, we could be looping forever. Fix this by splicing current entries to a local list and run over that list instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:34:16 +01:00
Neil Brown	41b8c853a4	block: fix booting from partitioned md array Hi Tejun, it looks like your commit: block: don't depend on consecutive minor space `f331c0296f` broke a particular case for booting from partitioned md/raid devices. That is the second time this has been broken recently. The previous time was fixed by block: do_mounts - accept root=<non-existant partition> `30f2f0eb4b` Because the data isn't available when an md device is first created (we add disks and set it up after creation), the initial partition scan finds nothing. It is not until the device is opened that another partition scan happens and finds something. So at the point where the kernel parameter "root=/dev/md_d0p1" is being parsed, md_d0 exists, but md_d0p1 does not. However if we let blk_lookup_devt return the correct device number even though the device doesn't exist, then the attempt to mount it will successfully find the partition. I have tried in the past to find a way to get the partition table to be read as soon as the array is assembled but that proved impossible (at the time). I don't remember the details, and could possibly revisit it. However it would be really nice if blk_lookup_devt could be adjusted to again accept non existant partitions. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:33:59 +01:00
Jens Axboe	93dbb39350	block: fix bad definition of BIO_RW_SYNC We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before `213d9417fe`. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:32:00 +01:00
Boaz Harrosh	c1c201200a	bsg: Fix sense buffer bug in SG_IO When submitting requests via SG_IO, which does a sync io, a bsg_command is not allocated. So an in-Kernel sense_buffer was not set. However when calling blk_execute_rq() with no sense buffer one is provided from the stack. Now bsg at blk_complete_sgv4_hdr_rq() would check if rq->sense_len and a sense was requested by sg_io_v4 the rq->sense was copy_user() back, but by now it is already mangled stack memory. I have fixed that by forcing a sense_buffer when calling bsg_map_hdr(). The bsg_command->sense is provided in the write/read path like before, and on-the-stack buffer is provided when doing SG_IO. I have also fixed a dprintk message to print rq->errors in hex because of the scsi bit-field use of this member. For other block devices it does not matter anyway. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:32:00 +01:00
Jens Axboe	fb8ec18c31	block: fix oops in blk_queue_io_stat() Some initial probe requests don't have disk->queue mapped yet, so we can't rely on a non-NULL queue in blk_queue_io_stat(). Wrap it in blk_do_io_stat(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-02 08:42:32 +01:00
Divyesh Shah	3a9a3f6cc5	cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice This patch adds the ability to pre-empt an ongoing BE timeslice when a RT request is waiting for the current timeslice to complete. This reduces the wait time to disk for RT requests from an upper bound of 4 (current value of cfq_quantum) to 1 disk request. Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt() and retested. Latency(secs) for the RT task when doing sequential reads from 10G file. \| only RT \| RT + BE \| RT + BE + this patch small (512 byte) reads \| 143 \| 163 \| 145 large (1Mb) reads \| 142 \| 158 \| 146 Signed-off-by: Divyesh Shah <dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:47:33 +01:00
Jens Axboe	bc58ba9468	block: add sysfs file for controlling io stats accounting This allows us to turn off disk stat accounting completely, for the cases where the 0.5-1% reduction in system time is important. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:38 +01:00
Jens Axboe	cec0707e40	block: silently error an unsupported barrier bio This fixes a "regression" from 2.6.28, where the barrier probes that file systems may do would trigger additional end request warnings in dmesg. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:37 +01:00
Theodore Ts'o	dbdac9b71d	block: Fix documentation for blkdev_issue_flush() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:37 +01:00
Jens Axboe	213d9417fe	block: seperate bio/request unplug and sync bits Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:37 +01:00
Bartlomiej Zolnierkiewicz	1308835fff	block: export SSD/non-rotational queue flag through sysfs For some devices (i.e. CFA ATA) we can't reliably detect whether the device is of rotational or non-rotational type so we need to leave the final decision about this setting to the user-space. As a bonus do a minor CodingStyle fixup in queue_nomerges_store(). Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:37 +01:00
Jens Axboe	f48fc4d32e	block: get rid of the manual directory counting in blktrace It can result in a stuck blktrace system, if --kill is used. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:36 +01:00
Martin K. Petersen	322316385d	block: Allow empty integrity profile Allow a block device to allocate and register an integrity profile without providing a template. This allows DM to preallocate a profile to avoid deadlocks during table reconfiguration. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-01-30 12:34:36 +01:00
Linus Torvalds	2150edc6c5	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...	2009-01-08 17:14:59 -08:00
Linus Torvalds	cd764695b6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (45 commits) [SCSI] qla2xxx: Update version number to 8.03.00-k1. [SCSI] qla2xxx: Add ISP81XX support. [SCSI] qla2xxx: Use proper request/response queues with MQ instantiations. [SCSI] qla2xxx: Correct MQ-chain information retrieval during a firmware dump. [SCSI] qla2xxx: Collapse EFT/FCE copy procedures during a firmware dump. [SCSI] qla2xxx: Don't pollute kernel logs with ZIO/RIO status messages. [SCSI] qla2xxx: Don't fallback to interrupt-polling during re-initialization with MSI-X enabled. [SCSI] qla2xxx: Remove support for reading/writing HW-event-log. [SCSI] cxgb3i: add missing include [SCSI] scsi_lib: fix DID_RESET status problems [SCSI] fc transport: restore missing dev_loss_tmo callback to LLDD [SCSI] aha152x_cs: Fix regression that keeps driver from using shared interrupts [SCSI] sd: Correctly handle 6-byte commands with DIX [SCSI] sd: DIF: Fix tagging on platforms with signed char [SCSI] sd: DIF: Show app tag on error [SCSI] Fix error handling for DIF/DIX [SCSI] scsi_lib: don't decrement busy counters when inserting commands [SCSI] libsas: fix test for negative unsigned and typos [SCSI] a2091, gvp11: kill warn_unused_result warnings [SCSI] fusion: Move a dereference below a NULL test ... Fixed up trivial conflict due to moving the async part of sd_probe around in the async probes vs using dev_set_name() in naming.	2009-01-08 16:27:31 -08:00
Theodore Ts'o	4d783b093c	block: Add Kconfig help which notes that ext4 needs CONFIG_LBD Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>	2009-01-06 15:16:33 -05:00
Kay Sievers	3ada8b7e98	block: struct device - replace bus_id with dev_name(), dev_set_name() Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-01-06 10:44:43 -08:00
FUJITA Tomonori	97ae77a1cd	[SCSI] block: make blk_rq_map_user take a NULL user-space buffer for WRITE The commit `818827669d` (block: make blk_rq_map_user take a NULL user-space buffer) extended blk_rq_map_user to accept a NULL user-space buffer with a READ command. It was necessary to convert sg to use the block layer mapping API. This patch extends blk_rq_map_user again for a WRITE command. It is necessary to convert st and osst drivers to use the block layer apping API. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 11:10:35 -06:00
FUJITA Tomonori	56c451f4b5	[SCSI] block: fix the partial mappings with struct rq_map_data This fixes bio_copy_user_iov to properly handle the partial mappings with struct rq_map_data (which only sg uses for now but st and osst will shortly). It adds the offset member to struct rq_map_data and changes blk_rq_map_user to update it so that bio_copy_user_iov can add an appropriate page frame via bio_add_pc_page(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 11:10:08 -06:00
Rusty Russell	2ca1a61583	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: arch/x86/kernel/io_apic.c	2008-12-31 23:05:57 +10:30
Rusty Russell	33edcf133b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-12-30 08:02:35 +10:30
Jens Axboe	62c1fe9d9f	cfq-iosched: fix race between exiting queue and exiting task Original patch from Nikanth Karthikesan <knikanth@suse.de> When a queue exits the queue lock is taken and cfq_exit_queue() would free all the cic's associated with the queue. But when a task exits, cfq_exit_io_context() gets cic one by one and then locks the associated queue to call __cfq_exit_single_io_context. It looks like between getting a cic from the ioc and locking the queue, the queue might have exited on another cpu. Fix this by rechecking the cfq_io_context queue key inside the queue lock again, and not calling into __cfq_exit_single_io_context() if somebody beat us to it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:52 +01:00
Jens Axboe	b3a6ffe16b	Get rid of CONFIG_LSF We have two seperate config entries for large devices/files. One is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF that handles large files. This doesn't make a lot of sense, you typically want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording to indicate that it covers both. Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Roel Kluin	3c18ce71af	block: make blk_softirq_init() static Sparse asked whether these could be static. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
FUJITA Tomonori	18af8b2ca3	block: use min_not_zero in blk_queue_stack_limits zero is invalid for max_phys_segments, max_hw_segments, and max_segment_size. It's better to use use min_not_zero instead of min. min() works though (because the commit `0e435ac` makes sure that these values are set to the default values, non zero, if a queue is initialized properly). With this patch, blk_queue_stack_limits does the almost same thing that dm's combine_restrictions_low() does. I think that it's easy to remove dm's combine_restrictions_low. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Jens Axboe	a6f23657d3	block: add one-hit cache for disk partition lookup disk_map_sector_rcu() returns a partition from a sector offset, which we use for IO statistics on a per-partition basis. The lookup itself is an O(N) list lookup, where N is the number of partitions. This actually hurts performance quite a bit, even on the lower end partitions. On higher numbered partitions, it can get pretty bad. Solve this by adding a one-hit cache for partition lookup. This makes the lookup O(1) for the case where we do most IO to one partition. Even for mixed partition workloads, amortized cost is pretty close to O(1) since the natural IO batching makes the one-hit cache last for lots of IOs. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Jens Axboe	30e0dc28bf	cfq-iosched: remove limit of dispatch depth of max 4 times quantum This basically limits the hardware queue depth to 4*quantum at any point in time, which is 16 with the default settings. As CFQ uses other means to shrink the hardware queue when necessary in the first place, there's really no need for this extra heuristic. Additionally, it ends up hurting performance in some cases. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Jens Axboe	b374d18a4b	block: get rid of elevator_t typedef Just use struct elevator_queue everywhere instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:50 +01:00
Jens Axboe	a31a97381c	block: don't use plugging on SSD devices We just want to hand the first bits of IO to the device as fast as possible. Gains a few percent on the IOPS rate. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	a185eb4bc8	block: fix empty barrier on write-through w/ ordered tag Empty barrier on write-through (or no cache) w/ ordered tag has no command to execute and without any command to execute ordered tag is never issued to the device and the ordering is never achieved. Force draining for such cases. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	58eea927d2	block: simplify empty barrier implementation Empty barrier required special handling in __elv_next_request() to complete it without letting the low level driver see it. With previous changes, barrier code is now flexible enough to skip the BAR step using the same barrier sequence selection mechanism. Drop the special handling and mask off q->ordered from start_ordered(). Remove blk_empty_barrier() test which now has no user. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	8f11b3e99a	block: make barrier completion more robust Barrier completion had the following assumptions. * start_ordered() couldn't finish the whole sequence properly. If all actions are to be skipped, q->ordseq is set correctly but the actual completion was never triggered thus hanging the barrier request. * Drain completion in elv_complete_request() assumed that there's always at least one request in the queue when drain completes. Both assumptions are true but these assumptions need to be removed to improve empty barrier implementation. This patch makes the following changes. * Make start_ordered() use blk_ordered_complete_seq() to mark skipped steps complete and notify __elv_next_request() that it should fetch the next request if the whole barrier has completed inside start_ordered(). * Make drain completion path in elv_complete_request() check whether the queue is empty. Empty queue also indicates drain completion. * While at it, convert 0/1 return from blk_do_ordered() to false/true. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	f671620e7d	block: make every barrier action optional In all barrier sequences, the barrier write itself was always assumed to be issued and thus didn't have corresponding control flag. This patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in start_ordered() such that any barrier action can be skipped. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:45 +01:00
Tejun Heo	a7384677b2	block: remove duplicate or unused barrier/discard error paths * Because barrier mode can be changed dynamically, whether barrier is supported or not can be determined only when actually issuing the barrier and there is no point in checking it earlier. Drop barrier support check in generic_make_request() and __make_request(), and update comment around the support check in blk_do_ordered(). * There is no reason to check discard support in both generic_make_request() and __make_request(). Drop the check in __make_request(). While at it, move error action block to the end of the function and add unlikely() to q existence test. * Barrier request, be it empty or not, is never passed to low level driver and thus it's meaningless to try to copy back req->sector to bio->bi_sector on error. In addition, the notion of failed sector doesn't make any sense for empty barrier to begin with. Drop the code block from __end_that_request_first(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Tejun Heo	313e42999d	block: reorganize QUEUE_ORDERED_* constants Separate out ordering type (drain,) and action masks (preflush, postflush, fua) from visible ordering mode selectors (QUEUE_ORDERED_). Ordering types are now named QUEUE_ORDERED_BY_ while action masks are named QUEUE_ORDERED_DO_*. This change is necessary to add QUEUE_ORDERED_DO_BAR and make it optional to improve empty barrier implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Cheng Renquan	64d01dc9e1	block: use cancel_work_sync() instead of kblockd_flush_work() After many improvements on kblockd_flush_work, it is now identical to cancel_work_sync, so a direct call to cancel_work_sync is suggested. The only difference is that cancel_work_sync is a GPL symbol, so no non-GPL modules anymore. Signed-off-by: Cheng Renquan <crquan@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Keith Mannthey	08bafc0341	block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set Allow the scsi request REQ_QUIET flag to be propagated to the buffer file system layer. The basic ideas is to pass the flag from the scsi request to the bio (block IO) and then to the buffer layer. The buffer layer can then suppress needless printks. This patch declutters the kernel log by removed the 40-50 (per lun) buffer io error messages seen during a boot in my multipath setup . It is a good chance any real errors will be missed in the "noise" it the logs without this patch. During boot I see blocks of messages like " __ratelimit: 211 callbacks suppressed Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242847 Buffer I/O error on device sdm, logical block 1 Buffer I/O error on device sdm, logical block 5242878 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242879 Buffer I/O error on device sdm, logical block 5242872 " in my logs. My disk environment is multipath fiber channel using the SCSI_DH_RDAC code and multipathd. This topology includes an "active" and "ghost" path for each lun. IO's to the "ghost" path will never complete and the SCSI layer, via the scsi device handler rdac code, quick returns the IOs to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi layer messages. I am wanting to extend the QUIET behavior to include the buffer file system layer to deal with these errors as well. I have been running this patch for a while now on several boxes without issue. A few runs of bonnie++ show no noticeable difference in performance in my setup. Thanks for John Stultz for the quiet_error finalization. Submitted-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:44 +01:00
Wu Fengguang	7c239517d9	block: don't take lock on changing ra_pages There's no need to take queue_lock or kernel_lock when modifying bdi->ra_pages. So remove them. Also remove out of date comment for queue_max_sectors_store(). Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:43 +01:00
Qinghuang Feng	c6a06f707c	block/blk-tag.c: cleanup kernel-doc There is no argument named @tags in blk_init_tags, remove its' comment. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:43 +01:00
Milton Miller	2b91bafcc0	scsi-ioctl: use clock_t <> jiffies Convert the timeout ioctl scalling to use the clock_t functions which are much more accurate with some USER_HZ vs HZ combinations. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:42 +01:00
Jens Axboe	70ed28b92a	block: leave the request timeout timer running even on an empty list For sync IO, we'll often do them serialized. This means we'll be touching the queue timer for every IO, as opposed to only occasionally like we do for queued IO. Instead of deleting the timer when the last request is removed, just let continue running. If a new request comes up soon we then don't have to readd the timer again. If no new requests arrive, the timer will expire without side effect later. This improves high iops sync IO by ~1%. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:42 +01:00
Jens Axboe	65d3618ccf	block: add comment in blk_rq_timed_out() about why next can not be 0 Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:42 +01:00
malahal@us.ibm.com	565e411d76	block: optimizations in blk_rq_timed_out_timer() Now the rq->deadline can't be zero if the request is in the timeout_list, so there is no need to have next_set. There is no need to access a request's deadline field if blk_rq_timed_out is called on it. Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:42 +01:00
Rusty Russell	be4d638c15	cpumask: Replace cpu_coregroup_map with cpu_coregroup_mask cpu_coregroup_map returned a cpumask_t: it's going away. (Note, the sched part of this patch won't apply meaningfully to the sched tree, but I'm posting it to show the goal). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com>	2008-12-26 22:23:43 +10:30
Ingo Molnar	30cd324e97	Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core Conflicts: include/linux/ftrace.h	2008-12-19 09:42:40 +01:00

1 2 3 4 5 ...

789 Commits