linux

Author	SHA1	Message	Date
Tejun Heo	9c4376de98	dm: use non reentrant workqueues if equivalent kmirrord_wq, kcopyd_work and md->wq are created per dm instance and serve only a single work item from the dm instance, so non-reentrant workqueues would provide the same ordering guarantees as ordered ones while allowing CPU affinity and use of the workqueues for other purposes. Switch them to non-reentrant workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-01-13 19:59:58 +00:00
Tejun Heo	4d4d66ab53	dm: convert workqueues to alloc_ordered Convert all create[_singlethread]_work() users to the new alloc[_ordered]_workqueue(). This conversion is mechanical and doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-01-13 19:59:57 +00:00
Mikulas Patocka	8d35d3e37e	dm kcopyd: delay unplugging Make kcopyd merge more I/O requests by using device unplugging. Without this patch, each I/O request is dispatched separately to the device. If the device supports tagged queuing, there are many small requests sent to the device. To improve performance, this patch will batch as many requests as possible, allowing the queue to merge consecutive requests, and send them to the device at once. In my tests (15k SCSI disk), this patch improves sequential write throughput: Sequential write throughput (chunksize of 4k, 32k, 512k) unpatched: 15.2, 18.5, 17.5 MB/s patched: 14.4, 22.6, 23.0 MB/s In most common uses (snapshot or two-way mirror), kcopyd is only used for two devices, one for reading and the other for writing, thus this optimization is implemented only for two devices. The optimization may be extended to n-way mirrors with some code complexity increase. We keep track of two block devices to unplug (one for read and the other for write) and unplug them when exiting "do_work" thread. If there are more devices used (in theory it could happen, in practice it is rare), we unplug immediately. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-01-13 19:59:50 +00:00
Mikulas Patocka	d9bf0b508d	dm io: remove BIO_RW_SYNCIO flag from kcopyd Remove the REQ_SYNC flag to improve write throughput when writing to the origin with a snapshot on the same device (using the CFQ I/O scheduler). Sequential write throughput (chunksize of 4k, 32k, 512k) unpatched: 8.5, 8.6, 9.3 MB/s patched: 15.2, 18.5, 17.5 MB/s Snapshot exception reallocations are triggered by writes that are usually async, so mark the associated dm_io_request as async as well. This helps when using the CFQ I/O scheduler because it has separate queues for sync and async I/O. Async is optimized for throughput; sync for latency. With this change we're consciously favoring throughput over latency. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-01-13 19:59:47 +00:00
Christoph Hellwig	7b6d91daee	block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:20:39 +02:00
Mikulas Patocka	9ca170a3c0	dm kcopyd: accept zero size jobs dm-kcopyd: accept zero-size jobs This patch changes dm-kcopyd so that it accepts zero-size jobs and completes them immediatelly via its completion thread. It is needed for multisnapshots snapshot resizing. When we are writing to a chunk beyond origin end, no copying is done. To simplify the code, we submit an empty request to kcopyd and let kcopyd complete it. If we didn't submit a request to kcopyd and called the completion routine immediatelly, it would violate the principle that completion is called only from one thread and it would need additional locking. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-12-10 23:52:13 +00:00
Mikulas Patocka	340cd44451	dm kcopyd: fix callback race If the thread calling dm_kcopyd_copy is delayed due to scheduling inside split_job/segment_complete and the subjobs complete before the loop in split_job completes, the kcopyd callback could be invoked from the thread that called dm_kcopyd_copy instead of the kcopyd workqueue. dm_kcopyd_copy -> split_job -> segment_complete -> job->fn() Snapshots depend on the fact that callbacks are called from the singlethreaded kcopyd workqueue and expect that there is no racing between individual callbacks. The racing between callbacks can lead to corruption of exception store and it can also mean that exception store callbacks are called twice for the same exception - a likely reason for crashes reported inside pending_complete() / remove_exception(). This patch fixes two problems: 1. job->fn being called from the thread that submitted the job (see above). - Fix: hand over the completion callback to the kcopyd thread. 2. job->fn(read_err, write_err, job->context); in segment_complete reports the error of the last subjob, not the union of all errors. - Fix: pass job->write_err to the callback to report all error bits (it is done already in run_complete_job) Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-04-09 00:27:17 +01:00
Mikulas Patocka	73830857bc	dm kcopyd: prepare for callback race fix Use a variable in segment_complete() to point to the dm_kcopyd_client struct and only release job->pages in run_complete_job() if any are defined. These changes are needed by the next patch. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-04-09 00:27:16 +01:00
Jens Axboe	93dbb39350	block: fix bad definition of BIO_RW_SYNC We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before `213d9417fe`. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:32:00 +01:00
Mikulas Patocka	586e80e6ee	dm: remove dm header from targets Change #include "dm.h" to #include <linux/device-mapper.h> in all targets. Targets should not need direct access to internal DM structures. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-10-21 17:44:59 +01:00
Kazuo Ito	b673c3a819	dm kcopyd: avoid queue shuffle Write throughput to LVM snapshot origin volume is an order of magnitude slower than those to LV without snapshots or snapshot target volumes, especially in the case of sequential writes with O_SYNC on. The following patch originally written by Kevin Jamieson and Jan Blunck and slightly modified for the current RCs by myself tries to improve the performance by modifying the behaviour of kcopyd, so that it pushes back an I/O job to the head of the job queue instead of the tail as process_jobs() currently does when it has to wait for free pages. This way, write requests aren't shuffled to cause extra seeks. I tested the patch against 2.6.27-rc5 and got the following results. The test is a dd command writing to snapshot origin followed by fsync to the file just created/updated. A couple of filesystem benchmarks gave me similar results in case of sequential writes, while random writes didn't suffer much. dd if=/dev/zero of=<somewhere on snapshot origin> bs=4096 count=... [conv=notrunc when updating] 1) linux 2.6.27-rc5 without the patch, write to snapshot origin, average throughput (MB/s) 10M 100M 1000M create,dd 511.46 610.72 11.81 create,dd+fsync 7.10 6.77 8.13 update,dd 431.63 917.41 12.75 update,dd+fsync 7.79 7.43 8.12 compared with write throughput to LV without any snapshots, all dd+fsync and 1000 MiB writes perform very poorly. 10M 100M 1000M create,dd 555.03 608.98 123.29 create,dd+fsync 114.27 72.78 76.65 update,dd 152.34 1267.27 124.04 update,dd+fsync 130.56 77.81 77.84 2) linux 2.6.27-rc5 with the patch, write to snapshot origin, average throughput (MB/s) 10M 100M 1000M create,dd 537.06 589.44 46.21 create,dd+fsync 31.63 29.19 29.23 update,dd 487.59 897.65 37.76 update,dd+fsync 34.12 30.07 26.85 Although still not on par with plain LV performance - cannot be avoided because it's copy on write anyway - this simple patch successfully improves throughtput of dd+fsync while not affecting the rest. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Kazuo Ito <ito.kazuo@oss.ntt.co.jp> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org	2008-10-21 17:44:50 +01:00
Mikulas Patocka	7ff14a3615	dm: unplug queues in threads Remove an avoidable 3ms delay on some dm-raid1 and kcopyd I/O. It is specified that any submitted bio without BIO_RW_SYNC flag may plug the queue (i.e. block the requests from being dispatched to the physical device). The queue is unplugged when the caller calls blk_unplug() function. Usually, the sequence is that someone calls submit_bh to submit IO on a buffer. The IO plugs the queue and waits (to be possibly joined with other adjacent bios). Then, when the caller calls wait_on_buffer(), it unplugs the queue and submits the IOs to the disk. This was happenning: When doing O_SYNC writes, function fsync_buffers_list() submits a list of bios to dm_raid1, the bios are added to dm_raid1 write queue and kmirrord is woken up. fsync_buffers_list() calls wait_on_buffer(). That unplugs the queue, but there are no bios on the device queue as they are still in the dm_raid1 queue. wait_on_buffer() starts waiting until the IO is finished. kmirrord is scheduled, kmirrord takes bios and submits them to the devices. The submitted bio plugs the harddisk queue but there is no one to unplug it. (The process that called wait_on_buffer() is already sleeping.) So there is a 3ms timeout, after which the queues on the harddisks are unplugged and requests are processed. This 3ms timeout meant that in certain workloads (e.g. O_SYNC, 8kb writes), dm-raid1 is 10 times slower than md raid1. Every time we submit something asynchronously via dm_io, we must unplug the queue actually to send the request to the device. This patch adds an unplug call to kmirrord - while processing requests, it keeps the queue plugged (so that adjacent bios can be merged); when it finishes processing all the bios, it unplugs the queue to submit the bios. It also fixes kcopyd which has the same potential problem. All kcopyd requests are submitted with BIO_RW_SYNC. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-25 13:26:57 +01:00
Alasdair G Kergon	a765e20eeb	dm: move include files Publish the dm-io, dm-log and dm-kcopyd headers in include/linux. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-04-25 13:26:55 +01:00
Alasdair G Kergon	2d1e580afe	dm kcopyd: rename Rename kcopyd.[ch] to dm-kcopyd.[ch]. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-04-25 13:26:54 +01:00

14 Commits