linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-08 05:01:48 +00:00

Author	SHA1	Message	Date
Artem Bityutskiy	0d5c3eba2e	vfs: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from vfs comments. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:37 +04:00
Linus Torvalds	0d167518e0	Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block Merge block/IO core bits from Jens Axboe: "This is a bit bigger on the core side than usual, but that is purely because we decided to hold off on parts of Tejun's submission on 3.4 to give it a bit more time to simmer. As a consequence, it's seen a long cycle in for-next. It contains: - Bug fix from Dan, wrong locking type. - Relax splice gifting restriction from Eric. - A ton of updates from Tejun, primarily for blkcg. This improves the code a lot, making the API nicer and cleaner, and also includes fixes for how we handle and tie policies and re-activate on switches. The changes also include generic bug fixes. - A simple fix from Vivek, along with a fix for doing proper delayed allocation of the blkcg stats." Fix up annoying conflict just due to different merge resolution in Documentation/feature-removal-schedule.txt * 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits) blkcg: tg_stats_alloc_lock is an irq lock vmsplice: relax alignement requirements for SPLICE_F_GIFT blkcg: use radix tree to index blkgs from blkcg blkcg: fix blkcg->css ref leak in __blkg_lookup_create() block: fix elvpriv allocation failure handling block: collapse blk_alloc_request() into get_request() blkcg: collapse blkcg_policy_ops into blkcg_policy blkcg: embed struct blkg_policy_data in policy specific data blkcg: mass rename of blkcg API blkcg: style cleanups for blk-cgroup.h blkcg: remove blkio_group->path[] blkcg: blkg_rwstat_read() was missing inline blkcg: shoot down blkgs if all policies are deactivated blkcg: drop stuff unused after per-queue policy activation update blkcg: implement per-queue policy activation blkcg: add request_queue->root_blkg blkcg: make request_queue bypassing on allocation blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing blkcg: make blkg_conf_prep() take @pol and return with queue lock held blkcg: remove static policy ID enums ...	2012-05-30 08:52:42 -07:00
Bernd Schubert	f908ee9463	bio allocation failure due to bio_get_nr_vecs() The number of bio_get_nr_vecs() is passed down via bio_alloc() to bvec_alloc_bs(), which fails the bio allocation if nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an unexpected bio allocation failure. Limiting to queue_max_segments() is not sufficient, as max_segments also might be very large. bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs > BIO_MAX_PAGES bio_alloc_bioset(gfp_mask, nr_iovecs, ...) bio_alloc(GFP_NOIO, nvecs) xfs_alloc_ioend_bio() Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-11 16:45:12 +02:00
Tejun Heo	959d851caa	Merge branch 'for-3.5' of ../cgroup into block/for-3.5/core-merged cgroup/for-3.5 contains the following changes which blk-cgroup needs to proceed with the on-going cleanup. * Dynamic addition and removal of cftypes to make config/stat file handling modular for policies. * cgroup removal update to not wait for css references to drain to fix blkcg removal hang caused by cfq caching cfqgs. Pull in cgroup/for-3.5 into block/for-3.5/core. This causes the following conflicts in block/blk-cgroup.c. * `761b3ef50e` "cgroup: remove cgroup_subsys argument from callbacks" conflicts with blkiocg_pre_destroy() addition and blkiocg_attach() removal. Resolved by removing @subsys from all subsys methods. * `676f7c8f84` "cgroup: relocate cftype and cgroup_subsys definitions in controllers" conflicts with ->pre_destroy() and ->attach() updates and removal of modular config. Resolved by dropping forward declarations of the methods and applying updates to the relocated blkio_subsys. * `4baf6e3325` "cgroup: convert all non-memcg controllers to the new cftype interface" builds upon the previous item. Resolved by adding ->base_cftypes to the relocated blkio_subsys. Signed-off-by: Tejun Heo <tj@kernel.org>	2012-04-01 12:55:00 -07:00
Tejun Heo	852c788f83	block: implement bio_associate_current() IO scheduling and cgroup are tied to the issuing task via io_context and cgroup of %current. Unfortunately, there are cases where IOs need to be routed via a different task which makes scheduling and cgroup limit enforcement applied completely incorrectly. For example, all bios delayed by blk-throttle end up being issued by a delayed work item and get assigned the io_context of the worker task which happens to serve the work item and dumped to the default block cgroup. This is double confusing as bios which aren't delayed end up in the correct cgroup and makes using blk-throttle and cfq propio together impossible. Any code which punts IO issuing to another task is affected which is getting more and more common (e.g. btrfs). As both io_context and cgroup are firmly tied to task including userland visible APIs to manipulate them, it makes a lot of sense to match up tasks to bios. This patch implements bio_associate_current() which associates the specified bio with %current. The bio will record the associated ioc and blkcg at that point and block layer will use the recorded ones regardless of which task actually ends up issuing the bio. bio release puts the associated ioc and blkcg. It grabs and remembers ioc and blkcg instead of the task itself because task may already be dead by the time the bio is issued making ioc and blkcg inaccessible and those are all block layer cares about. elevator_set_req_fn() is updated such that the bio elvdata is being allocated for is available to the elevator. This doesn't update block cgroup policies yet. Further patches will implement the support. -v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in rq_ioc() to fix build breakage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Kent Overstreet <koverstreet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-06 21:27:24 +01:00
Paul Gortmaker	630d9c4727	fs: reduce the use of module.h wherever possible For files only using THIS_MODULE and/or EXPORT_SYMBOL, map them onto including export.h -- or if the file isn't even using those, then just delete the include. Fix up any implicit include dependencies that were being masked by module.h along the way. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-02-28 19:31:58 -05:00
Kent Overstreet	5abebfdd02	bio: don't overflow in bio_get_nr_vecs() There were two places bio_get_nr_vecs() could overflow: First, it did a left shift to convert from sectors to bytes immediately before dividing by PAGE_SIZE. If PAGE_SIZE ever was less than 512 a great many things would break, so dividing by PAGE_SIZE >> 9 is safe and will generate smaller code too. The nastier overflow was in the DIV_ROUND_UP() (that's what the code was effectively doing, anyways). If n + d overflowed, the whole thing would return 0 which breaks things rather effectively. bio_get_nr_vecs() doesn't claim to give an exact value anyways, so the DIV_ROUND_UP() is silly; we could do a straight divide except if a device's queue_max_sectors was less than PAGE_SIZE we'd return 0. So we just add 1; this should always be safe - things will break badly if bio_get_nr_vecs() returns > BIO_MAX_PAGES (bio_alloc() will suddenly start failing) but it's queue_max_segments that must guard against this, if queue_max_sectors is preventing this from happen things are going to explode on architectures with different PAGE_SIZE. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-02-08 22:07:18 +01:00
Dan Carpenter	121f099412	bio: change some signed vars to unsigned This is just a cleanup patch to silence a static checker warning. The problem is that we cap "nr_iovecs" so it can't be larger than "UIO_MAXIOV" but we don't check for negative values. It turns out this is prevented at other layers, but logically it doesn't make sense to have negative nr_iovecs so making it unsigned is nicer. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:50 +01:00
Tao Ma	9562ad9ab3	block: Remove the control of complete cpu from bio. bio originally has the functionality to set the complete cpu, but it is broken. Chirstoph said that "This code is unused, and from the all the discussions lately pretty obviously broken. The only thing keeping it serves is creating more confusion and possibly more bugs." And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine with leaving cpu control to the request based drivers, they are the only ones that can toggle the setting anyway". So this patch tries to remove all the work of controling complete cpu from a bio. Cc: Shaohua Li <shaohua.li@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-10-24 16:11:30 +02:00
Andreas Gruenbacher	c642808454	vfs: Improve the bio_add_page() and bio_add_pc_page() descriptions The descriptions of bio_add_page() and bio_add_pc_page() are slightly inconsistent; improve them. Signed-off-by: Andreas Gruenbacher <agruen@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-05-27 09:43:00 -04:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Linus Torvalds	6c51038900	Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}	2011-03-24 10:16:26 -07:00
Mandeep Singh Baines	80cdc6dae7	fs: use appropriate printk priority levels printk()s without a priority level default to KERN_WARNING. To reduce noise at KERN_WARNING, this patch set the priority level appriopriately for unleveled printks()s. This should be useful to folks that look at dmesg warnings closely. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:10 -07:00
Martin K. Petersen	a91a2785b2	block: Require subsystems to explicitly allocate bio_set integrity mempool MD and DM create a new bio_set for every metadevice. Each bio_set has an integrity mempool attached regardless of whether the metadevice is capable of passing integrity metadata. This is a waste of memory. Instead we defer the allocation decision to MD and DM since we know at metadevice creation time whether integrity passthrough is needed or not. Automatic integrity mempool allocation can then be removed from bioset_create() and we make an explicit integrity allocation for the fs_bio_set. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Acked-by: Mike Snitzer <snizer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-17 11:11:05 +01:00
Martin K. Petersen	df67714028	block: biovec_slab vs. CONFIG_BLK_DEV_INTEGRITY The block integrity subsystem no longer uses the bio_vec slabs so this code can safely be compiled in. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-08 08:28:01 +01:00
Jens Axboe	cb4644cac4	bio: take care not overflow page count when mapping/copying user data If the iovec is being set up in a way that causes uaddr + PAGE_SIZE to overflow, we could end up attempting to map a huge number of pages. Check for this invalid input type. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:40:43 +01:00
Jens Axboe	f3f63c1c28	block: limit vec count in bio_kmalloc() and bio_alloc_map_data() Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:40:42 +01:00
Christoph Hellwig	7b6d91daee	block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:20:39 +02:00
Jens Axboe	b4b7a4ef09	Merge branch 'master' into for-linus Conflicts: block/Kconfig Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-03-19 08:05:10 +01:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Dmitry Monakhov	8bf8c376ab	blkdev: fix merge_bvec_fn return value checks v2 merge_bvec_fn() returns bvec->bv_len on success. So we have to check against this value. But in case of fs_optimization merge we compare with wrong value. This patch must be included in b428cd6da7e6559aca69aa2e3a526037d3f20403 But accidentally i've forgot to add this in the initial patch. To make things straight let's replace all such checks. In fact this makes code easy to understand. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-03-08 09:10:38 +01:00
Jens Axboe	9599945bac	Revert "blkdev: fix merge_bvec_fn return value checks" This reverts commit `9f7cdbc33f`. It's causing oopses om dm setups, so revert it until we investigate. Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-03-02 19:17:34 +01:00
Dmitry Monakhov	9f7cdbc33f	blkdev: fix merge_bvec_fn return value checks merge_bvec_fn() returns bvec->bv_len on success. So we have to check against this value. But in case of fs_optimization merge we compare with wrong value. This patch must be included in b428cd6da7e6559aca69aa2e3a526037d3f20403 But accidentally i've forgot to add this in the initial patch. To make things straight let's replace all such checks. In fact this makes code easy to understand. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-02-28 19:47:18 +01:00
Martin K. Petersen	8a78362c4e	block: Consolidate phys_segment and hw_segment limits Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-02-26 13:58:08 +01:00
Jaak Ristioja	db18efac0b	bio: Fix outdated comment about bio_alloc_bioset() In commit `451a9ebf65` bio_alloc_bioset() was refactored not to take NULL as a valid argument for bs. This patch changes the comment for that function accordingly. Currently, passing NULL as argument to parameter bs would result in a NULL pointer dereference. Signed-off-by: Jaak Ristioja <ristioja@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:36 +01:00
Dmitry Monakhov	1d6165851c	block: fix bio_add_page for non trivial merge_bvec_fn case We have to properly decrease bi_size in order to merge_bvec_fn return right result. Otherwise this result in false merge rejects for two absolutely valid bio_vecs. This may cause significant performance penalty for example fs_block_size == 1k and block device is raid0 with small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in to bio which already has 6 fs-blocks. Cc: <stable@kernel.org> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-01-28 15:08:29 +01:00
Thiago Farina	f06f135d86	fs/bio.c: fix shadows sparse warning fs/bio.c:81:33: warning: symbol 'bslab' shadows an earlier one fs/bio.c:74:25: originally declared here Signed-off-by: Thiago Farina <tfransosi@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-01-19 14:07:09 +01:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
Ilya Loginov	2d4dc890b5	block: add helpers to run flush_dcache_page() against a bio and a request's pages Mtdblock driver doesn't call flush_dcache_page for pages in request. So, this causes problems on architectures where the icache doesn't fill from the dcache or with dcache aliases. The patch fixes this. The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid pointless empty cache-thrashing loops on architectures for which flush_dcache_page() is a no-op. Every architecture was provided with this flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is equal 1 or do nothing otherwise. See "fix mtd_blkdevs problem with caches on some architectures" discussion on LKML for more information. Signed-off-by: Ilya Loginov <isloginov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Peter Horton <phorton@bitbox.co.uk> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-26 09:16:19 +01:00
Alberto Bertogli	5f04eeb8a7	Fix bio_alloc() and bio_kmalloc() documentation Commit `451a9ebf` accidentally broke bio_alloc() and bio_kmalloc() comments by (almost) swapping them. This patch fixes that, by placing the comments in the right place. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-02 11:41:13 +01:00
Alberto Bertogli	ad0bf11070	bio_put(): add bio_clone() to the list of functions in the comment In bio_put()'s comment, add bio_clone() to the list of functions that can give you a bio reference. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-02 11:39:22 +01:00
H Hartley Sweeten	a112a71d45	fs/bio.c: move EXPORT* macros to line after function As mentioned in Documentation/CodingStyle, move EXPORT* macro's to the line immediately after the closing function brace line. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-01 21:15:46 +02:00
FUJITA Tomonori	ecb554a846	block: fix sg SG_DXFER_TO_FROM_DEV regression I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use the block layer mapping API (2.6.28). Douglas Gilbert explained SG_DXFER_TO_FROM_DEV: http://www.spinics.net/lists/linux-scsi/msg37135.html = The semantics of SG_DXFER_TO_FROM_DEV were: - copy user space buffer to kernel (LLD) buffer - do SCSI command which is assumed to be of the DATA_IN (data from device) variety. This would overwrite some or all of the kernel buffer - copy kernel (LLD) buffer back to the user space. The idea was to detect short reads by filling the original user space buffer with some marker bytes ("0xec" it would seem in this report). The "resid" value is a better way of detecting short reads but that was only added this century and requires co-operation from the LLD. = This patch changes the block layer mapping API to support this semantics. This simply adds another field to struct rq_map_data and enables __bio_copy_iov() to copy data from user space even with READ requests. It's better to add the flags field and kills null_mapped and the new from_user fields in struct rq_map_data but that approach makes it difficult to send this patch to stable trees because st and osst drivers use struct rq_map_data (they were converted to use the block layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer mapping API. zhou sf reported this regiression and tested this patch: http://www.spinics.net/lists/linux-scsi/msg37128.html http://www.spinics.net/lists/linux-scsi/msg37168.html Reported-by: zhou sf <sxzzsf@gmail.com> Tested-by: zhou sf <sxzzsf@gmail.com> Cc: stable@kernel.org Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Martin K. Petersen	7878cba9f0	block: Create bip slabs with embedded integrity vectors This patch restores stacking ability to the block layer integrity infrastructure by creating a set of dedicated bip slabs. Each bip slab has an embedded bio_vec array at the end. This cuts down on memory allocations and also simplifies the code compared to the original bvec version. Only the largest bip slab is backed by a mempool. The pool is contained in the bio_set so stacking drivers can ensure forward progress. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>	2009-07-01 10:56:25 +02:00
Li Zefan	e212d6f250	block: remove some includings of blktrace_api.h When porting blktrace to tracepoints, we changed to trace/block.h for trace prober declarations. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-16 11:19:36 +02:00
Nikanth Karthikesan	76d93ff344	trivial: fix typo in bio_alloc kernel doc Fix typo in bio_alloc kernel doc. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:47 +02:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
Michal Simek	0e0c62123b	fs/bio.c: add missing __user annotation As reported by sparse: fs/bio.c:720:13: warning: incorrect type in assignment (different address spaces) fs/bio.c:720:13: expected char iov_addr fs/bio.c:720:13: got void [noderef] <asn:1> fs/bio.c:724:36: warning: incorrect type in argument 2 (different address spaces) fs/bio.c:724:36: expected void const [noderef] <asn:1>from fs/bio.c:724:36: got char iov_addr Signed-off-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-10 23:07:15 +02:00
Li Zefan	55782138e4	tracing/events: convert block trace points to TRACE_EVENT() TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-09 12:34:23 -04:00
Martin K. Petersen	ae03bf639a	block: Use accessor functions for queue limits Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Jens Axboe	e4b636366c	Merge branch 'master' into for-2.6.31 Conflicts: drivers/block/hd.c drivers/block/mg_disk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 20:25:34 +02:00
Tejun Heo	4fc981ef9e	bio: always copy back data for copied kernel requests When a read bio_copy_kern() request fails, the content of the bounce buffer is not copied back. However, as request failure doesn't necessarily mean complete failure, the buffer state can be useful. This behavior is also inconsistent with the user map counterpart and causes the subtle difference between bounced and unbounced IO causes confusion. This patch makes bio_copy_kern_endio() ignore @err and always copy back data on request completion. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-19 11:36:08 +02:00
FUJITA Tomonori	69838727bc	bio: fix memcpy corruption in bio_copy_user_iov() st driver uses blk_rq_map_user() in order to just build a request out of page frames. In this case, map_data->offset is a non zero value and iov[0].iov_base is NULL. We need to increase nr_pages for that. Cc: stable@kernel.org Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-28 20:24:29 +02:00
Tejun Heo	a9e9dc24bb	bio: use bio_kmalloc() in copy/map functions Impact: remove possible deadlock condition There is no reason to use mempool backed allocation for map functions. Also, because kern mapping is used inside LLDs (e.g. for EH), using mempool backed allocation can lead to deadlock under extreme conditions (mempool already consumed by the time a request reached EH and requests are blocked on EH). Switch copy/map functions to bio_kmalloc(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-22 08:35:10 +02:00
Tejun Heo	451a9ebf65	bio: fix bio_kmalloc() Impact: fix bio_kmalloc() and its destruction path bio_kmalloc() was broken in two ways. * bvec_alloc_bs() first allocates bvec using kmalloc() and then ignores it and allocates again like non-kmalloc bvecs. * bio_kmalloc_destructor() didn't check for and free bio integrity data. This patch fixes the above problems. kmalloc patch is separated out from bio_alloc_bioset() and allocates the requested number of bvecs as inline bvecs. * bio_alloc_bioset() no longer takes NULL @bs. None other than bio_kmalloc() used it and outside users can't know how it was allocated anyway. * Define and use BIO_POOL_NONE so that pool index check in bvec_free_bs() triggers if inline or kmalloc allocated bvec gets there. * Relocate destructors on top of each allocation function so that how they're used is more clear. Jens Axboe suggested allocating bvecs inline. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-22 08:35:10 +02:00
Jens Axboe	86c824b943	bio: add documentation to bio_alloc() Explain that with __GFP_WAIT set it will not fail, and that the caller must never allocate more than 1 bio at the time. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 12:10:12 +02:00
Alberto Bertogli	c7eee1b836	trivial: Fix typo in bio_split()'s documentation Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:22:02 +02:00
Martin K. Petersen	6d2a78e783	block: add private bio_set for bio integrity allocations The integrity bio allocation needs its own bio_set to avoid violating the mempool allocation rules and risking deadlocks. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-24 12:35:17 +01:00
Jens Axboe	a7fcd37cdc	block: don't create bio_vec slabs of less than the inline number If we don't have CONFIG_BLK_DEV_INTEGRITY set, then we don't have any external dependencies on the bio_vec slabs. So don't create the ones that we will inline anyway. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-24 12:35:16 +01:00
Ingo Molnar	34053979fb	block: cleanup bio_alloc_bioset() this warning (which got fixed by commit `b2bf968`): fs/bio.c: In function ‘bio_alloc_bioset’: fs/bio.c:305: warning: ‘p’ may be used uninitialized in this function Triggered because the code flow in bio_alloc_bioset() is correct but a bit complex for the compiler to see through. Streamline it a bit - this also makes the code a tiny bit more compact: text data bss dec hex filename 7540 256 40 7836 1e9c bio.o.before 7539 256 40 7835 1e9b bio.o.after Also remove an older compiler-warnings annotation from this function, it's not needed. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-24 12:35:16 +01:00
Li Zefan	059ea3318c	block: fix memory leak in bio_clone() If bio_integrity_clone() fails, bio_clone() returns NULL without freeing the newly allocated bio. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-14 21:06:52 +01:00
un'ichi Nomura	87092698c6	block: Add gfp_mask parameter to bio_integrity_clone() Stricter gfp_mask might be required for clone allocation. For example, request-based dm may clone bio in interrupt context so it has to use GFP_ATOMIC. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-03-14 21:06:51 +01:00
Jens Axboe	b2bf96833c	block: fix bogus gcc warning for uninitialized var usage Newer gcc throw this warning: fs/bio.c: In function ?bio_alloc_bioset?: fs/bio.c:305: warning: ?p? may be used uninitialized in this function since it cannot figure out that 'p' is only ever used if 'bs' is non-NULL. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-26 10:45:48 +01:00
Subhash Peddamallu	a60e78e57a	fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free When freeing from bio pool use right ptr to account for bs->front_pad, instead of bio ptr, Signed-off-by: Subhash Peddamallu <subhash.peddamallu@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:32:01 +01:00
FUJITA Tomonori	97ae77a1cd	[SCSI] block: make blk_rq_map_user take a NULL user-space buffer for WRITE The commit `818827669d` (block: make blk_rq_map_user take a NULL user-space buffer) extended blk_rq_map_user to accept a NULL user-space buffer with a READ command. It was necessary to convert sg to use the block layer mapping API. This patch extends blk_rq_map_user again for a WRITE command. It is necessary to convert st and osst drivers to use the block layer apping API. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 11:10:35 -06:00
FUJITA Tomonori	56c451f4b5	[SCSI] block: fix the partial mappings with struct rq_map_data This fixes bio_copy_user_iov to properly handle the partial mappings with struct rq_map_data (which only sg uses for now but st and osst will shortly). It adds the offset member to struct rq_map_data and changes blk_rq_map_user to update it so that bio_copy_user_iov can add an appropriate page frame via bio_add_pc_page(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 11:10:08 -06:00
FUJITA Tomonori	e623ddb4e9	[SCSI] block: fix bio_add_page misuse with rq_map_data This fixes bio_add_page misuse in bio_copy_user_iov with rq_map_data, which only sg uses now. rq_map_data carries page frames for bio_add_pc_page. bio_copy_user_iov uses bio_add_pc_page with a larger size than PAGE_SIZE. It's clearly wrong. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 11:09:41 -06:00
Jens Axboe	d3f761104b	bio: get rid of bio_vec clearing We don't need to clear the memory used for adding bio_vec entries, since nobody should be looking at members unitialized. Any valid use should be below bio->bi_vcnt, and that members up until that count must be valid since they were added through bio_add_page(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:53 +01:00
Jens Axboe	392ddc3298	bio: add support for inlining a number of bio_vecs inside the bio When we go and allocate a bio for IO, we actually do two allocations. One for the bio itself, and one for the bi_io_vec that holds the actual pages we are interested in. This feature inlines a definable amount of io vecs inside the bio itself, so we eliminate the bio_vec array allocation for IO's up to a certain size. It defaults to 4 vecs, which is typically 16k of IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:50 +01:00
Jens Axboe	bb799ca020	bio: allow individual slabs in the bio_set Instead of having a global bio slab cache, add a reference to one in each bio_set that is created. This allows for personalized slabs in each bio_set, so that they can have bios of different sizes. This means we can personalize the bios we return. File systems may want to embed the bio inside another structure, to avoid allocation more items (and stuffing them in ->bi_private) after the get a bio. Or we may want to embed a number of bio_vecs directly at the end of a bio, to avoid doing two allocations to return a bio. This is now possible. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:23 +01:00
Jens Axboe	1b43449869	bio: move the slab pointer inside the bio_set In preparation for adding differently sized bios. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:46 +01:00
Jens Axboe	7ff9345ffa	bio: only mempool back the largest bio_vec slab cache We only very rarely need the mempool backing, so it makes sense to get rid of all but one of the mempool in a bio_set. So keep the largest bio_vec count mempool so we can always honor the largest allocation, and "upgrade" callers that fail. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:28:46 +01:00
Ingo Molnar	0bfc24559d	blktrace: port to tracepoints, update Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE() sites. Spread them out to the usage sites, as suggested by Mathieu Desnoyers. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>	2008-11-26 13:04:35 +01:00
Arnaldo Carvalho de Melo	5f3ea37c77	blktrace: port to tracepoints This was a forward port of work done by Mathieu Desnoyers, I changed it to encode the 'what' parameter on the tracepoint name, so that one can register interest in specific events and not on classes of events to then check the 'what' parameter. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-26 12:13:34 +01:00
Denis ChengRq	6feef531f5	block: mark bio_split_pool static Since all bio_split calls refer the same single bio_split_pool, the bio_split function can use bio_split_pool directly instead of the mempool_t parameter; then the mempool_t parameter can be removed from bio_split param list, and bio_split_pool is only referred in fs/bio.c file, can be marked static. Signed-off-by: Denis ChengRq <crquan@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:57:05 +02:00
Martin K. Petersen	ad3316bf4e	block: Find bio sector offset given idx and offset Helper function to find the sector offset in a bio given bvec index and page offset. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:22 +02:00
Jens Axboe	0a0d96b03a	block: add bio_kmalloc() Not all callers need (or want!) the mempool backing guarentee, it essentially means that you can only use bio_alloc() for short allocations and not for preallocating some bio's at setup or init time. So add bio_kmalloc() which does the same thing as bio_alloc(), except it just uses kmalloc() as the backing instead of the bio mempools. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:17 +02:00
FUJITA Tomonori	818827669d	block: make blk_rq_map_user take a NULL user-space buffer This patch changes blk_rq_map_user to accept a NULL user-space buffer with a READ command if rq_map_data is not NULL. Thus a caller can pass page frames to lk_rq_map_user to just set up a request and bios with page frames propely. bio_uncopy_user (called via blk_rq_unmap_user) doesn't copy data to user space with such request. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:11 +02:00
FUJITA Tomonori	4d8ab62e08	bio: convert bio_copy_kern to use bio_copy_user bio_copy_kern and bio_copy_user are very similar. This converts bio_copy_kern to use bio_copy_user. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:10 +02:00
FUJITA Tomonori	152e283fdf	block: introduce struct rq_map_data to use reserved pages This patch introduces struct rq_map_data to enable bio_copy_use_iov() use reserved pages. Currently, bio_copy_user_iov allocates bounce pages but drivers/scsi/sg.c wants to allocate pages by itself and use them. struct rq_map_data can be used to pass allocated pages to bio_copy_user_iov. The current users of bio_copy_user_iov simply passes NULL (they don't want to use pre-allocated pages). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Douglas Gilbert <dougg@torque.net> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:10 +02:00
FUJITA Tomonori	a3bce90edd	block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Currently, blk_rq_map_user and blk_rq_map_user_iov always do GFP_KERNEL allocation. This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov so sg can use it (sg always does GFP_ATOMIC allocation). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Douglas Gilbert <dougg@torque.net> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:10 +02:00
Jens Axboe	c7c22e4d5c	block: add support for IO CPU affinity This patch adds support for controlling the IO completion CPU of either all requests on a queue, or on a per-request basis. We export a sysfs variable (rq_affinity) which, if set, migrates completions of requests to the CPU that originally submitted it. A bio helper (bio_set_completion_cpu()) is also added, so that queuers can ask for completion on that specific CPU. In testing, this has been show to cut the system time by as much as 20-40% on synthetic workloads where CPU affinity is desired. This requires a little help from the architecture, so it'll only work as designed for archs that are using the new generic smp helper infrastructure. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:09 +02:00
Mikulas Patocka	5df97b91b5	drop vmerge accounting Remove hw_segments field from struct bio and struct request. Without virtual merge accounting they have no purpose. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:03 +02:00
Mikulas Patocka	b8b3e16cfe	block: drop virtual merging accounting Remove virtual merge accounting. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:03 +02:00
FUJITA Tomonori	aefcc28a3a	bio: fix __bio_copy_iov() handling of bio->bv_len The commit `c5dec1c303` introduced __bio_copy_iov() to add bounce support to blk_rq_map_user_iov. __bio_copy_iov() uses bio->bv_len to copy data for READ commands after the completion but it doesn't work with a request that partially completed. SCSI always completes a PC request as a whole but seems some don't. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-08-27 09:50:19 +02:00
FUJITA Tomonori	76029ff37f	bio: fix bio_copy_kern() handling of bio->bv_len The commit `68154e90c9` introduced bio_copy_kern() to add bounce support to blk_rq_map_kern. bio_copy_kern() uses bio->bv_len to copy data for READ commands after the completion but it doesn't work with a request that partially completed. SCSI always completes a PC request as a whole but seems some don't. This patch fixes bio_copy_kern to handle the above case. As bio_copy_user does, bio_copy_kern uses struct bio_map_data to store struct bio_vec. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reported-by: Nix <nix@esperi.org.uk> Tested-by: Nix <nix@esperi.org.uk> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-08-27 09:50:19 +02:00
Denis ChengRq	1ac0ae062c	bio: make use of bvec_nr_vecs Since introduced in `7ba1ba12ee`, it should be made use of. Signed-off-by: Denis ChengRq <crquan@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-08-06 12:30:04 +02:00
Nick Piggin	f5dd33c494	dio: use get_user_pages_fast Use get_user_pages_fast in the common/generic block and fs direct IO paths. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:06 -07:00
Alasdair G Kergon	cc371e66e3	Add bvec_merge_data to handle stacked devices and ->merge_bvec() When devices are stacked, one device's merge_bvec_fn may need to perform the mapping and then call one or more functions for its underlying devices. The following bio fields are used: bio->bi_sector bio->bi_bdev bio->bi_size bio->bi_rw using bio_data_dir() This patch creates a new struct bvec_merge_data holding a copy of those fields to avoid having to change them directly in the struct bio when going down the stack only to have to change them back again on the way back up. (And then when the bio gets mapped for real, the whole exercise gets repeated, but that's a problem for another day...) Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:15 +02:00
Martin K. Petersen	7ba1ba12ee	block: Block layer data integrity support Some block devices support verifying the integrity of requests by way of checksums or other protection information that is submitted along with the I/O. This patch implements support for generating and verifying integrity metadata, as well as correctly merging, splitting and cloning bios and requests that have this extra information attached. See Documentation/block/data-integrity.txt for more information. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:13 +02:00
Martin K. Petersen	51d654e1d8	block: Globalize bio_set and bio_vec_slab Move struct bio_set and biovec_slab definitions to bio.h so they can be used outside of bio.c. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-03 13:21:13 +02:00
Randy Dunlap	ffee0259c9	docbook: fix bio missing parameter Fix fs/bio.c kernel-doc parameter warning: Warning(linux-2.6.25-git14//fs/bio.c:972): No description found for parameter 'reading' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-05-07 18:35:03 +02:00
Jens Axboe	eeae1d48c0	block: use unitialized_var() in bio_alloc_bioset() Better than setting idx to some random value and it silences the same bogus gcc warning. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-05-07 13:26:27 +02:00
FUJITA Tomonori	68154e90c9	block: add dma alignment and padding support to blk_rq_map_kern This patch adds bio_copy_kern similar to bio_copy_user. blk_rq_map_kern uses bio_copy_kern instead of bio_map_kern if necessary. bio_copy_kern uses temporary pages and the bi_end_io callback frees these pages. bio_copy_kern saves the original kernel buffer at bio->bi_private it doesn't use something like struct bio_map_data to store the information about the caller. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-29 09:50:34 +02:00
FUJITA Tomonori	c5dec1c303	block: convert bio_copy_user to bio_copy_user_iov This patch enables bio_copy_user to take struct sg_iovec (renamed bio_copy_user_iov). bio_copy_user uses bio_copy_user_iov internally as bio_map_user uses bio_map_user_iov. The major changes are: - adds sg_iovec array to struct bio_map_data - adds __bio_copy_iov that copy data between bio and sg_iovec. bio_copy_user_iov and bio_uncopy_user use it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <htejun@gmail.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-21 09:50:08 +02:00
Jens Axboe	40044ce0bf	Revert "unexport bio_{,un}map_user" Outside users like asmlib uses the mapping functions. API wise, the export is definitely sane. It's a better idea to keep this export than to require external users to open-code this piece of code instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-17 21:14:40 +01:00
Adrian Bunk	86b6c7a7f7	fs/block_dev.c: remove #if 0'ed code Commit `b2e895dbd8` #if 0'ed this code stating: <-- snip --> [PATCH] revert blockdev direct io back to 2.6.19 version Andrew Vasquez is reporting as-iosched oopses and a 65% throughput slowdown due to the recent special-casing of direct-io against blockdevs. We don't know why either of these things are occurring. The patch minimally reverts us back to the 2.6.19 code for a 2.6.20 release. <-- snip --> It has since been dead code, and unless someone wants to revive it now it's time to remove it. This patch also makes bio_release_pages() static again and removes the ki_bio_count member from struct kiocb, reverting changes that had been done for this dead code. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>	2008-02-19 10:04:00 +01:00
Jens Axboe	5d84070ee0	__bio_clone: don't calculate hw/phys segment counts If the users sets a new ->bi_bdev on the bio after __bio_clone() has returned it, the "segment counts valid" flag still remains even though it may be different with the new target. So don't calculate segment counts in __bio_clone(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:04:46 +01:00
Jens Axboe	992c5ddaf1	bio: make freeing of ->bi_io_vec conditional in bio_free() The empty barrier patches do not carry data, so they have no iovec attached. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:03:52 +02:00
Jens Axboe	2b94de552e	bio: use memset() in bio_init() Use memset() to clear the bio, instead of doing each field manually. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-16 11:03:51 +02:00
NeilBrown	6712ecf8f6	Drop 'size' argument from bio_endio and bi_end_io As bi_end_io is only called once when the reqeust is complete, the 'size' argument is now redundant. Remove it. Now there is no need for bio_endio to subtract the size completed from bi_size. So don't do that either. While we are at it, change bi_end_io to return void. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
NeilBrown	5bb23a688b	Don't decrement bi_size in bio_endio The only caller of bio_endio that does not pass the full bi_size is end_that_request_first. Also, no ->bi_end_io method is really interested in bi_size being decremented. So move the decrement and related code into ll_rw_blk and merge it with order_bio_endio to form req_bio_endio which does endio functionality specific to request completion. As some ->bi_end_io methods do check bi_size of 0, we set it thus for now, but that will go in the next patch. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./block/ll_rw_blk.c \| 42 +++++++++++++++++++++++++++--------------- ./fs/bio.c \| 23 +++++++++++------------ 2 files changed, 38 insertions(+), 27 deletions(-) diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
NeilBrown	9cc54d40b8	Only call bi_end_io once for any bio Currently bi_end_io can be called multiple times as sub-requests complete. However no ->bi_end_io function wants to know about that. So only call when the bio is complete. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./fs/bio.c \| 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff .prev/fs/bio.c ./fs/bio.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
Jens Axboe	165125e1e4	[BLOCK] Get rid of request_queue_t typedef Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-24 09:28:11 +02:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Adrian Bunk	72d3a38ee0	unexport bio_{,un}map_user bio_{,un}map_user no longer have any modular users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:03:34 +02:00
Christoph Lameter	0a31bd5f2b	KMEM_CACHE(): simplify slab cache creation This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Jens Axboe	5972511b77	[BLOCK] Don't pin lots of memory in mempools Currently we scale the mempool sizes depending on memory installed in the machine, except for the bio pool itself which sits at a fixed 256 entry pre-allocation. There's really no point in "optimizing" this OOM path, we just need enough preallocated to make progress. A single unit is enough, lets scale it down to 2 just to be on the safe side. This patch saves ~150kb of pinned kernel memory on a 32-bit box. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:08:17 +02:00

1 2 3 4

187 Commits