linux/include
Neil Brown d89d87965d When stacked block devices are in-use (e.g. md or dm), the recursive calls
to generic_make_request can use up a lot of space, and we would rather they
didn't.

As generic_make_request is a void function, and as it is generally not
expected that it will have any effect immediately, it is safe to delay any
call to generic_make_request until there is sufficient stack space
available.

As ->bi_next is reserved for the driver to use, it can have no valid value
when generic_make_request is called, and as __make_request implicitly
assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be
certain that all callers set it to NULL.  We can therefore safely use
bi_next to link pending requests together, providing we clear it before
making the real call.

So, we choose to allow each thread to only be active in one
generic_make_request at a time.  If a subsequent (recursive) call is made,
the bio is linked into a per-thread list, and is handled when the active
call completes.

As the list of pending bios is per-thread, there are no locking issues to
worry about.

I say above that it is "safe to delay any call...".  There are, however,
some behaviours of a make_request_fn which would make it unsafe.  These
include any behaviour that assumes anything will have changed after a
recursive call to generic_make_request.

These could include:
 - waiting for that call to finish and call it's bi_end_io function.
   md use to sometimes do this (marking the superblock dirty before
   completing a write) but doesn't any more
 - inspecting the bio for fields that generic_make_request might
   change, such as bi_sector or bi_bdev.  It is hard to see a good
   reason for this, and I don't think anyone actually does it.
 - inspecing the queue to see if, e.g. it is 'full' yet.  Again, I
   think this is very unlikely to be useful, or to be done.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <dm-devel@redhat.com>

Alasdair G Kergon <agk@redhat.com> said:

 I can see nothing wrong with this in principle.

 For device-mapper at the moment though it's essential that, while the bio
 mappings may now get delayed, they still get processed in exactly
 the same order as they were passed to generic_make_request().

 My main concern is whether the timing changes implicit in this patch
 will make the rare data-corrupting races in the existing snapshot code
 more likely. (I'm working on a fix for these races, but the unfinished
 patch is already several hundred lines long.)

 It would be helpful if some people on this mailing list would test
 this patch in various scenarios and report back.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-05-11 13:28:37 +02:00
..
acpi ACPICA: Lindent 2007-05-09 23:34:35 -04:00
asm-alpha rename thread_info to stack 2007-05-09 12:30:56 -07:00
asm-arm Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-05-09 13:05:57 -07:00
asm-arm26 Fix "deprecated" typoes. 2007-05-09 07:18:01 +02:00
asm-avr32 Merge branch 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32 2007-05-09 12:50:25 -07:00
asm-blackfin rename thread_info to stack 2007-05-09 12:30:56 -07:00
asm-cris move die notifier handling to common code 2007-05-08 11:15:04 -07:00
asm-frv FRV: Replace pgd management via slabs through quicklists 2007-05-09 12:30:46 -07:00
asm-generic Fix misspellings collected by members of KJ list. 2007-05-09 07:14:03 +02:00
asm-h8300 Remove tas() 2007-05-08 11:15:20 -07:00
asm-i386 Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" 2007-05-10 09:26:53 -07:00
asm-ia64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2007-05-09 13:38:45 -07:00
asm-m32r Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2007-05-09 12:54:17 -07:00
asm-m68k Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2007-05-09 12:54:17 -07:00
asm-m68knommu Remove tas() 2007-05-08 11:15:20 -07:00
asm-mips Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2007-05-09 12:54:17 -07:00
asm-parisc wrap access to thread_info 2007-05-09 12:30:56 -07:00
asm-powerpc [POWERPC] Fix warning in hpte_decode(), and generalize it 2007-05-10 21:28:13 +10:00
asm-ppc Merge branch 'linux-2.6' 2007-05-10 21:08:37 +10:00
asm-s390 [S390] Kconfig: use common Kconfig files for s390. 2007-05-10 15:46:08 +02:00
asm-sh Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6 2007-05-09 13:08:20 -07:00
asm-sh64 Remove tas() 2007-05-08 11:15:20 -07:00
asm-sparc Remove hardcoding of hard_smp_processor_id on UP systems 2007-05-09 12:30:48 -07:00
asm-sparc64 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2007-05-10 13:32:05 -07:00
asm-um Remove hardcoding of hard_smp_processor_id on UP systems 2007-05-09 12:30:48 -07:00
asm-v850 Remove tas() 2007-05-08 11:15:20 -07:00
asm-x86_64 rename thread_info to stack 2007-05-09 12:30:56 -07:00
asm-xtensa fix file specification in comments 2007-05-09 08:58:16 +02:00
crypto [CRYPTO] cryptd: Add software async crypto daemon 2007-05-02 14:38:32 +10:00
keys
linux When stacked block devices are in-use (e.g. md or dm), the recursive calls 2007-05-11 13:28:37 +02:00
math-emu Delete unused header file math-emu/extended.h 2007-05-08 11:15:05 -07:00
media i2c: Cleanup the includes of <linux/i2c.h> 2007-05-01 23:26:29 +02:00
mtd UBI: Unsorted Block Images 2007-04-27 14:23:33 +03:00
net Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream 2007-05-09 18:54:49 -04:00
pcmcia add new_id to PCMCIA drivers 2007-05-07 12:12:50 -07:00
rdma IB: Put rlimit accounting struct in struct ib_umem 2007-05-08 18:00:37 -07:00
rxrpc [AF_RXRPC]: Delete the old RxRPC code. 2007-04-26 15:55:48 -07:00
scsi [SCSI] sas_scsi_host: Convert to use the kthread API 2007-05-06 09:33:17 -05:00
sound
video atyfb: halve XCLK with Mobility and 32bit memory 2007-05-08 11:15:32 -07:00
Kbuild