linux/include
Nick Piggin 08291429cf mm: fix pagecache write deadlocks
Modify the core write() code so that it won't take a pagefault while holding a
lock on the pagecache page. There are a number of different deadlocks possible
if we try to do such a thing:

1.  generic_buffered_write
2.   lock_page
3.    prepare_write
4.     unlock_page+vmtruncate
5.     copy_from_user
6.      mmap_sem(r)
7.       handle_mm_fault
8.        lock_page (filemap_nopage)
9.    commit_write
10.  unlock_page

a. sys_munmap / sys_mlock / others
b.  mmap_sem(w)
c.   make_pages_present
d.    get_user_pages
e.     handle_mm_fault
f.      lock_page (filemap_nopage)

2,8	- recursive deadlock if page is same
2,8;2,8	- ABBA deadlock is page is different
2,6;b,f	- ABBA deadlock if page is same

The solution is as follows:
1.  If we find the destination page is uptodate, continue as normal, but use
    atomic usercopies which do not take pagefaults and do not zero the uncopied
    tail of the destination. The destination is already uptodate, so we can
    commit_write the full length even if there was a partial copy: it does not
    matter that the tail was not modified, because if it is dirtied and written
    back to disk it will not cause any problems (uptodate *means* that the
    destination page is as new or newer than the copy on disk).

1a. The above requires that fault_in_pages_readable correctly returns access
    information, because atomic usercopies cannot distinguish between
    non-present pages in a readable mapping, from lack of a readable mapping.

2.  If we find the destination page is non uptodate, unlock it (this could be
    made slightly more optimal), then allocate a temporary page to copy the
    source data into. Relock the destination page and continue with the copy.
    However, instead of a usercopy (which might take a fault), copy the data
    from the pinned temporary page via the kernel address space.

(also, rename maxlen to seglen, because it was confusing)

This increases the CPU/memory copy cost by almost 50% on the affected
workloads. That will be solved by introducing a new set of pagecache write
aops in a subsequent patch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:54 -07:00
..
acpi ACPI: CONFIG_ACPI_SLEEP=n power off regression in 2.6.23-rc8 (NOT in rc7) 2007-09-25 17:58:52 -04:00
asm-alpha
asm-arm Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-10-15 16:08:50 -07:00
asm-avr32 x86: optimize page faults like all other achitectures and kill notifier cruft 2007-10-16 09:42:50 -07:00
asm-blackfin Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2007-10-15 13:41:39 -07:00
asm-cris
asm-frv frv: missing casts in cmpxchg() 2007-10-14 12:41:51 -07:00
asm-generic Generic Virtual Memmap support for SPARSEMEM 2007-10-16 09:42:51 -07:00
asm-h8300 Binfmt_flat: Add minimum support for the Blackfin relocations 2007-10-03 23:41:43 +08:00
asm-ia64 IA64: SPARSEMEM_VMEMMAP 16K page size support 2007-10-16 09:42:51 -07:00
asm-m32r Binfmt_flat: Add minimum support for the Blackfin relocations 2007-10-03 23:41:43 +08:00
asm-m68k m68k: Export cachectl.h 2007-10-13 09:41:03 -07:00
asm-m68knommu Binfmt_flat: Add minimum support for the Blackfin relocations 2007-10-03 23:41:43 +08:00
asm-mips move a few definitions to au1000_xxs1500.c 2007-10-16 09:42:50 -07:00
asm-parisc
asm-powerpc ppc64: SPARSEMEM_VMEMMAP support 2007-10-16 09:42:51 -07:00
asm-ppc [POWERPC] Prevent direct inclusion of <asm/rwsem.h>. 2007-09-22 14:49:21 +10:00
asm-s390 x86: optimize page faults like all other achitectures and kill notifier cruft 2007-10-16 09:42:50 -07:00
asm-sh x86: optimize page faults like all other achitectures and kill notifier cruft 2007-10-16 09:42:50 -07:00
asm-sh64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh64-2.6 2007-10-13 09:50:26 -07:00
asm-sparc [SPARC32]: Add irqflags.h to sparc32 and use it from generic code. 2007-10-13 21:53:11 -07:00
asm-sparc64 SPARC64: SPARSEMEM_VMEMMAP support 2007-10-16 09:42:51 -07:00
asm-um
asm-v850 Binfmt_flat: Add minimum support for the Blackfin relocations 2007-10-03 23:41:43 +08:00
asm-x86 x86_64: SPARSEMEM_VMEMMAP 2M page size support 2007-10-16 09:42:51 -07:00
asm-xtensa
crypto [CRYPTO] sha: Add header file for SHA definitions 2007-10-10 16:55:50 -07:00
keys
linux mm: fix pagecache write deadlocks 2007-10-16 09:42:54 -07:00
math-emu
media v4l: copy_to_user() is not a good method name 2007-10-13 09:58:59 -07:00
mtd
net [IPV6]: Replace sk_buff ** with sk_buff * in input handlers 2007-10-15 12:50:28 -07:00
pcmcia pcmcia: use DMA_MASK_NONE for the default for all pcmcia devices 2007-10-16 09:42:50 -07:00
rdma IB/cm: Modify interface to send MRAs in response to duplicate messages 2007-10-09 19:59:17 -07:00
rxrpc
scsi Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2007-10-15 08:19:33 -07:00
sound
video
xen
Kbuild