mirror of
https://github.com/torvalds/linux.git
synced 2024-11-24 05:02:12 +00:00
7cb74be6fd
Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().
This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state". Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:
$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt
After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.
There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years. [1]
The unpatched code [2] has always been wrong since it entered the kernel
tree. The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.
The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents. When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging. There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.
References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761
[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:36 2004 -0800
[PATCH] HFS rewrite
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd
commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:48 2004 -0800
[PATCH] HFS+ support
[3]:
http://sourceforge.net/projects/linux-hfsplus/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/
http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5
Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.
[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d
commit
|
||
---|---|---|
.. | ||
9p | ||
adfs | ||
affs | ||
afs | ||
autofs4 | ||
befs | ||
bfs | ||
btrfs | ||
cachefiles | ||
ceph | ||
cifs | ||
coda | ||
configfs | ||
cramfs | ||
debugfs | ||
devpts | ||
dlm | ||
ecryptfs | ||
efivarfs | ||
efs | ||
exofs | ||
exportfs | ||
ext2 | ||
ext4 | ||
f2fs | ||
fat | ||
freevxfs | ||
fscache | ||
fuse | ||
gfs2 | ||
hfs | ||
hfsplus | ||
hostfs | ||
hpfs | ||
hugetlbfs | ||
isofs | ||
jbd2 | ||
jffs2 | ||
jfs | ||
kernfs | ||
lockd | ||
logfs | ||
minix | ||
ncpfs | ||
nfs | ||
nfs_common | ||
nfsd | ||
nilfs2 | ||
nls | ||
notify | ||
ntfs | ||
ocfs2 | ||
omfs | ||
openpromfs | ||
overlayfs | ||
proc | ||
pstore | ||
qnx4 | ||
qnx6 | ||
quota | ||
ramfs | ||
reiserfs | ||
romfs | ||
squashfs | ||
sysfs | ||
sysv | ||
tracefs | ||
ubifs | ||
udf | ||
ufs | ||
xfs | ||
aio.c | ||
anon_inodes.c | ||
attr.c | ||
bad_inode.c | ||
binfmt_aout.c | ||
binfmt_elf_fdpic.c | ||
binfmt_elf.c | ||
binfmt_em86.c | ||
binfmt_flat.c | ||
binfmt_misc.c | ||
binfmt_script.c | ||
block_dev.c | ||
buffer.c | ||
char_dev.c | ||
compat_binfmt_elf.c | ||
compat_ioctl.c | ||
compat.c | ||
coredump.c | ||
dax.c | ||
dcache.c | ||
dcookies.c | ||
direct-io.c | ||
drop_caches.c | ||
eventfd.c | ||
eventpoll.c | ||
exec.c | ||
fcntl.c | ||
fhandle.c | ||
file_table.c | ||
file.c | ||
filesystems.c | ||
fs_pin.c | ||
fs_struct.c | ||
fs-writeback.c | ||
inode.c | ||
internal.h | ||
ioctl.c | ||
Kconfig | ||
Kconfig.binfmt | ||
libfs.c | ||
locks.c | ||
Makefile | ||
mbcache.c | ||
mount.h | ||
mpage.c | ||
namei.c | ||
namespace.c | ||
no-block.c | ||
nsfs.c | ||
open.c | ||
pipe.c | ||
pnode.c | ||
pnode.h | ||
posix_acl.c | ||
proc_namespace.c | ||
read_write.c | ||
readdir.c | ||
select.c | ||
seq_file.c | ||
signalfd.c | ||
splice.c | ||
stack.c | ||
stat.c | ||
statfs.c | ||
super.c | ||
sync.c | ||
timerfd.c | ||
userfaultfd.c | ||
utimes.c | ||
xattr.c |