linux/fs
Linus Torvalds 5f22ca9b13 vfat: fix 'sync' mount deadlock due to BKL->lock_super conversion
There was another FAT BKL conversion deadlock reported by Bart
Trojanowski due to the BKL being used as a recursive lock by FAT, which
was missed because it only triggers with 'sync' (or 'dirsync') mounts.

The recursion worked for the BKL, but after the conversion to lock_super
(which uses a mutex), it just deadlocks.

Thanks to Bart for debugging this and testing the fix.  The lock
debugging information from the original report:

  =============================================
  [ INFO: possible recursive locking detected ]
  2.6.27-rc3-bisect-00448-ga7f5aaf #16
  ---------------------------------------------
  mv/4020 is trying to acquire lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  but task is already holding lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  other info that might help us debug this:
  3 locks held by mv/4020:
   #0:  (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140
   #1:  (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110
   #2:  (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  stack backtrace:
  Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf #16
   [<c014e694>] validate_chain+0x984/0xea0
   [<c0108d70>] ? native_sched_clock+0x0/0xf0
   [<c014ee9c>] __lock_acquire+0x2ec/0x9b0
   [<c014f5cf>] lock_acquire+0x6f/0x90
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c044e5fd>] mutex_lock_nested+0xad/0x300
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] lock_super+0x1e/0x20
   [<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat]
   [<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80
   [<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat]
   [<f8b3a962>] fat_sync_inode+0x12/0x20 [fat]
   [<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat]
   [<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat]
   [<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat]
   [<c01b0968>] vfs_unlink+0x98/0x110
   [<c01b2400>] do_unlinkat+0x130/0x140
   [<c016a8f5>] ? audit_syscall_entry+0x105/0x150
   [<c01b253b>] sys_unlinkat+0x3b/0x40
   [<c01040d3>] sysenter_do_call+0x12/0x3f
   =======================

where the deadlock is due to the nesting of lock_super from vfat_unlink
to fat_write_inode:

 - do_unlinkat
   - vfs_unlink
     - vfat_unlink
       * lock_super
       - fat_remove_entries
         - fat_sync_inode
           - fat_write_inode
             * lock_super

and the fix is to simply remove the use of lock_super() in fat_write_inode.

The lock_super() there had been just an automatic conversion of the
kernel lock to the superblock lock, but no locking was actually needed
there, since the code in fat_write_inode already protected all relevant
accesses with a spinlock (sbi->inode_hash_lock to be exact).  The only
code inside the BKL (and thus the superblock lock) was accesses tp local
variables or calls to functions that have long been SMP-safe (i.e.
sb_bread, mark_buffe_dirty and brlese).

Bart reports:
 "Looks good.  I ran 10 parallel processes creating 1M files truncating
  them, writing to them again and then deleting them.  This patch fixes
  the issue I ran into.

  Signed-off-by: Bart Trojanowski <bart@jukie.net>"

Reported-and-tested-by: Bart Trojanowski <bart@jukie.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 08:31:19 -07:00
..
9p 9p: fix O_APPEND in legacy mode 2008-07-03 09:59:03 -05:00
adfs SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
affs [PATCH] f_count may wrap around 2008-07-26 20:53:40 -04:00
afs mm: rename page trylock 2008-08-04 21:31:34 -07:00
autofs
autofs4 autofs4: remove unused ioctls 2008-07-24 10:47:33 -07:00
befs SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
bfs SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
cifs [CIFS] mount of IPC$ breaks with iget patch 2008-08-14 03:55:14 +00:00
coda [PATCH] sanitize __user_walk_fd() et.al. 2008-07-26 20:53:34 -04:00
configfs [PATCH] configfs: Pin configfs subsystems separately from new config_items. 2008-07-31 16:21:13 -07:00
cramfs fs: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:16:44 -04:00
debugfs debugfs: Implement debugfs_remove_recursive() 2008-07-21 21:54:59 -07:00
devpts [PATCH] devpts: switch to IDA 2008-08-01 11:25:29 -04:00
dlm dlm: rename structs 2008-08-13 12:47:36 -05:00
ecryptfs eCryptfs: use page_alloc not kmalloc to get a page of memory 2008-07-28 16:30:21 -07:00
efs SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
exportfs fs: replace remaining __FUNCTION__ occurrences 2008-04-30 08:29:54 -07:00
ext2 vfs: pagecache usage optimization for pagesize!=blocksize 2008-07-28 16:30:21 -07:00
ext3 [PATCH] fix races and leaks in vfs_quota_on() users 2008-08-01 11:25:25 -04:00
ext4 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2008-08-03 10:50:44 -07:00
fat vfat: fix 'sync' mount deadlock due to BKL->lock_super conversion 2008-08-20 08:31:19 -07:00
freevxfs fs/freevxfs/: proper externs 2008-04-29 08:06:00 -07:00
fuse [PATCH] fix MAY_CHDIR/MAY_ACCESS/LOOKUP_ACCESS mess 2008-07-26 20:53:21 -04:00
gfs2 [PATCH] don't pass nameidata to gfs2_lookupi() 2008-07-26 20:53:36 -04:00
hfs [PATCH] f_count may wrap around 2008-07-26 20:53:40 -04:00
hfsplus [PATCH] f_count may wrap around 2008-07-26 20:53:40 -04:00
hostfs [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
hpfs [patch 05/14] hpfs: dont call permission() 2008-07-26 20:53:13 -04:00
hppfs [patch] hppfs: remove hppfs_permission 2008-07-26 20:53:07 -04:00
hugetlbfs SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
isofs SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
jbd Merge branch 'core/locking' into core/urgent 2008-08-12 00:11:49 +02:00
jbd2 Merge branch 'core/locking' into core/urgent 2008-08-12 00:11:49 +02:00
jffs2 [JFFS2] Fix allocation of summary buffer 2008-08-01 10:07:51 +01:00
jfs [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
lockd Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux 2008-08-12 16:39:22 -07:00
minix SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
msdos fatfs: add UTC timestamp option 2008-07-25 10:53:34 -07:00
ncpfs [PATCH] don't pass nameidata to __ncp_lookup_validate() 2008-07-26 20:53:37 -04:00
nfs Revert "UFS: add const to parser token table" 2008-08-04 16:50:38 -07:00
nfs_common
nfsd Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux 2008-08-12 16:39:22 -07:00
nls
ntfs fs: rename buffer trylock 2008-08-04 21:56:09 -07:00
ocfs2 [PATCH] ocfs2: Release mutex in error handling code 2008-07-31 16:21:14 -07:00
omfs omfs: fix oops when file metadata is corrupted 2008-08-15 08:35:44 -07:00
openpromfs SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
partitions fs/partitions/efi: convert to pr_debug 2008-07-25 10:53:44 -07:00
proc proc: fix warnings 2008-08-05 14:33:50 -07:00
qnx4 SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
ramfs ramfs: enable splice write 2008-07-04 09:52:14 +02:00
reiserfs reiserfs: removed duplicated #include 2008-08-12 16:07:30 -07:00
romfs romfs_readpage: don't report errors for pages beyond i_size 2008-07-30 14:30:34 -07:00
smbfs [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
sysfs Use WARN() in fs/sysfs 2008-07-26 12:00:07 -07:00
sysv SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
ubifs UBIFS: xattr bugfixes 2008-08-14 12:46:20 +03:00
udf SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
ufs Revert "UFS: add const to parser token table" 2008-08-04 16:50:38 -07:00
vfat fatfs: add UTC timestamp option 2008-07-25 10:53:34 -07:00
xfs CRED: Introduce credential access wrappers 2008-08-14 09:35:23 +10:00
aio.c [PATCH] f_count may wrap around 2008-07-26 20:53:40 -04:00
anon_inodes.c flag parameters: NONBLOCK in anon_inode_getfd 2008-07-24 10:47:28 -07:00
attr.c [patch 4/4] vfs: immutable inode checking cleanup 2008-07-26 20:53:28 -04:00
bad_inode.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
binfmt_aout.c tracehook: exec 2008-07-26 12:00:08 -07:00
binfmt_elf_fdpic.c binfmt_elf_fdpic: Magical stack pointer index, for NEW_AUX_ENT compat. 2008-07-28 18:10:28 +09:00
binfmt_elf.c tracehook: exec 2008-07-26 12:00:08 -07:00
binfmt_em86.c binfmt_misc.c: avoid potential kernel stack overflow 2008-04-29 08:06:04 -07:00
binfmt_flat.c tracehook: exec 2008-07-26 12:00:08 -07:00
binfmt_misc.c binfmt_misc: use simple_read_from_buffer() 2008-07-24 10:47:27 -07:00
binfmt_script.c binfmt_misc.c: avoid potential kernel stack overflow 2008-04-29 08:06:04 -07:00
binfmt_som.c tracehook: exec 2008-07-26 12:00:08 -07:00
bio-integrity.c bio-integrity: remove EXPORT_SYMBOL for bio_integrity_init_slab() 2008-07-28 16:30:21 -07:00
bio.c bio: make use of bvec_nr_vecs 2008-08-06 12:30:04 +02:00
block_dev.c [PATCH] switch mtd and dm-table to lookup_bdev() 2008-08-01 11:25:31 -04:00
buffer.c fs: rename buffer trylock 2008-08-04 21:56:09 -07:00
char_dev.c Remove the lock_kernel() call from chrdev_open() 2008-06-20 14:05:53 -06:00
compat_binfmt_elf.c
compat_ioctl.c remove unused #include <linux/dirent.h>'s 2008-07-25 10:53:34 -07:00
compat.c [PATCH] sanitize __user_walk_fd() et.al. 2008-07-26 20:53:34 -04:00
dcache.c dcache: Add case-insensitive support d_ci_add() routine 2008-07-28 16:58:39 +10:00
dcookies.c
direct-io.c dio: use get_user_pages_fast 2008-07-26 12:00:06 -07:00
dnotify.c [PATCH] split linux/file.h 2008-05-01 13:08:16 -04:00
dquot.c [PATCH] fix races and leaks in vfs_quota_on() users 2008-08-01 11:25:25 -04:00
drop_caches.c vfs: skip inodes without pages to free in drop_pagecache_sb() 2008-04-29 08:06:05 -07:00
eventfd.c flag parameters: check magic constants 2008-07-24 10:47:29 -07:00
eventpoll.c fs/eventpoll.c: fix sys_epoll_create1() comment 2008-08-12 16:07:30 -07:00
exec.c exec: include pagemap.h again to fix build 2008-07-28 16:30:20 -07:00
fcntl.c [PATCH] clean dup2() up a bit 2008-08-01 11:25:24 -04:00
fifo.c [PATCH] reuse xxx_fifo_fops for xxx_pipe_fops 2008-07-26 20:53:06 -04:00
file_table.c [PATCH] f_count may wrap around 2008-07-26 20:53:40 -04:00
file.c [PATCH] merge locate_fd() and get_unused_fd() 2008-08-01 11:25:23 -04:00
filesystems.c
fs-writeback.c VFS: export sync_sb_inodes 2008-07-14 19:10:52 +03:00
generic_acl.c
inode.c fs/inode.c: properly init address_space->writeback_index 2008-08-15 08:35:44 -07:00
inotify_user.c [PATCH] sanitize __user_walk_fd() et.al. 2008-07-26 20:53:34 -04:00
inotify.c
internal.h [PATCH] move a bunch of declarations to fs/internal.h 2008-04-21 23:11:01 -04:00
ioctl.c make vfs_ioctl() static 2008-04-29 08:06:00 -07:00
ioprio.c
Kconfig omfs: update kbuild to include OMFS 2008-07-26 12:00:05 -07:00
Kconfig.binfmt sh: Initial ELF FDPIC support. 2008-07-28 18:10:28 +09:00
libfs.c VFS: increase pseudo-filesystem block size to PAGE_SIZE 2008-07-30 09:41:44 -07:00
locks.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
Makefile omfs: update kbuild to include OMFS 2008-07-26 12:00:05 -07:00
mbcache.c vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs 2008-04-15 19:35:41 -07:00
mpage.c vfs: add hooks for ext4's delayed allocation support 2008-07-11 19:27:31 -04:00
namei.c [patch 3/4] vfs: remove unused nameidata argument of may_create() 2008-08-01 11:25:30 -04:00
namespace.c [PATCH] pass struct path * to do_add_mount() 2008-08-01 11:25:32 -04:00
nfsctl.c
no-block.c
open.c [PATCH] merge locate_fd() and get_unused_fd() 2008-08-01 11:25:23 -04:00
pipe.c [PATCH] reuse xxx_fifo_fops for xxx_pipe_fops 2008-07-26 20:53:06 -04:00
pnode.c [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
pnode.h [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
posix_acl.c
quota_v1.c quota: move function-macros from quota.h to quotaops.h 2008-07-25 10:53:35 -07:00
quota_v2.c quota: move function-macros from quota.h to quotaops.h 2008-07-25 10:53:35 -07:00
quota.c quota: cleanup loop in sync_dquots() 2008-07-25 10:53:35 -07:00
read_write.c Remove BKL from remote_llseek v2 2008-07-02 15:06:27 -06:00
read_write.h
readdir.c
select.c Fix performance regression on lmbench select benchmark 2008-06-22 12:23:15 -07:00
seq_file.c seq_file: add seq_cpumask(), seq_nodemask() 2008-08-12 16:07:30 -07:00
signalfd.c flag parameters: check magic constants 2008-07-24 10:47:29 -07:00
splice.c mm: rename page trylock 2008-08-04 21:31:34 -07:00
stack.c
stat.c [PATCH] sanitize __user_walk_fd() et.al. 2008-07-26 20:53:34 -04:00
super.c fix soft lock up at NFS mount via per-SB LRU-list of unused dentries 2008-07-24 10:47:15 -07:00
sync.c SYNC_FILE_RANGE_WRITE may and will block. Document that. 2008-07-24 10:47:17 -07:00
timerfd.c flag parameters: check magic constants 2008-07-24 10:47:29 -07:00
utimes.c [PATCH] sanitize __user_walk_fd() et.al. 2008-07-26 20:53:34 -04:00
xattr_acl.c
xattr.c [PATCH] sanitize __user_walk_fd() et.al. 2008-07-26 20:53:34 -04:00