linux/fs
David Howells c4d6d8dbf3 CacheFiles: Fix the marking of cached pages
Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.

There are, however, two problems with this:

 (1) It can lead to the PG_fscache mark being applied _after_ the page is set
     PG_uptodate and unlocked (by the call to fscache_end_io()).

 (2) CacheFiles's ref on the page is dropped immediately following
     fscache_end_io() - and so may not still be held when the mark is applied.
     This can lead to the page being passed back to the allocator before the
     mark is applied.

Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page.  This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.

The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:

BUG: Bad page state in process tar  pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping:          (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G        W   3.1.0-rc4-fsdevel+ #1064
Call Trace:
 [<ffffffff8109583c>] ? dump_page+0xb9/0xbe
 [<ffffffff81095916>] bad_page+0xd5/0xea
 [<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a
 [<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662
 [<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267
 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
 [<ffffffff81098d7b>] ra_submit+0x1c/0x20
 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
 [<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a
 [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
 [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
 [<ffffffff810c22c4>] do_sync_read+0xba/0xfa
 [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a
 [<ffffffff810c2a79>] sys_read+0x45/0x6c
 [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.

Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was
set appropriately showed that sometimes it wasn't.  This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.

Reported-by: M. Stevens <m@tippett.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-12-20 21:54:30 +00:00
..
9p The following changes since commit 4cbe5a555f: 2012-10-12 09:59:23 +09:00
adfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
afs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
autofs4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-17 15:44:47 -08:00
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
bfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2012-12-18 09:42:05 -08:00
cachefiles CacheFiles: Fix the marking of cached pages 2012-12-20 21:54:30 +00:00
ceph ceph: fix dentry reference leak in ceph_encode_fh() 2012-12-18 15:02:11 -08:00
cifs lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
coda fs: push rcu_barrier() from deactivate_locked_super() to filesystems 2012-10-02 21:35:55 -04:00
configfs lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
cramfs userns: Convert cramfs to use kuid/kgid where appropriate 2012-09-21 03:13:08 -07:00
debugfs fs/debugsfs: remove unnecessary inode->i_private initialization 2012-11-15 17:46:42 -08:00
devpts TTY: devpts, document devpts inode operations 2012-10-22 16:50:13 -07:00
dlm dlm: fix lvb invalidation conditions 2012-11-16 11:20:42 -06:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
efs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
exofs exofs: don't leak io_state and pages on read error 2012-12-14 12:17:32 +02:00
exportfs fs, exportfs: add exportfs_encode_inode_fh() helper 2012-12-17 17:15:27 -08:00
ext2 ext2: fix return values on parse_options() failure 2012-10-09 23:23:53 +02:00
ext3 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
ext4 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
fat fs/fat: strip "cp" prefix from codepage in display 2012-12-17 17:15:22 -08:00
freevxfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
fscache CacheFiles: Fix the marking of cached pages 2012-12-20 21:54:30 +00:00
fuse Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-17 20:58:12 -08:00
gfs2 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
hfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
hostfs Merge branch 'for-linus-37rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2012-10-10 11:15:20 +09:00
hpfs hpfs: drop lock/unlock super 2012-10-09 23:33:38 -04:00
hppfs pidns: Use task_active_pid_ns where appropriate 2012-11-19 05:59:09 -08:00
hugetlbfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
isofs tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking 2012-10-09 23:33:55 -04:00
jbd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
jbd2 There are two major features for this merge window. The first is 2012-12-16 17:33:01 -08:00
jffs2 jffs2: hold erase_completion_lock on exit 2012-11-18 11:59:01 +02:00
jfs jfs: Fix FITRIM argument handling 2012-10-17 09:18:38 -05:00
lockd lockd: Remove BUG_ON()s from fs/lockd/clntproc.c 2012-11-04 14:43:40 -05:00
logfs Fix misspellings of "whether" in comments. 2012-11-19 14:31:35 +01:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
ncpfs propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
nfs NFS client updates for Linux 3.8 2012-12-18 09:36:34 -08:00
nfs_common
nfsd UAPI Disintegration 2012-10-09 2012-10-09 18:35:22 -04:00
nilfs2 mm: redefine address_space.assoc_mapping 2012-12-11 17:22:26 -08:00
nls nls: fix (and rename) mac NLS table files and config options 2012-06-01 19:51:22 -07:00
notify fs, fanotify: add @mflags field to fanotify output 2012-12-17 17:15:28 -08:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
ocfs2 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
omfs omfs: convert to use beXX_add_cpu() 2012-10-06 03:05:31 +09:00
openpromfs fs: push rcu_barrier() from deactivate_locked_super() to filesystems 2012-10-02 21:35:55 -04:00
proc Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-17 20:58:12 -08:00
pstore lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
qnx4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
qnx6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
quota quota: Use the pre-processor to compile out quotactl_cmd_write when !CONFIG_BLOCK 2012-12-13 16:33:24 +01:00
ramfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
reiserfs reiserfs: Move quota calls out of write lock 2012-11-19 21:34:33 +01:00
romfs fs: push rcu_barrier() from deactivate_locked_super() to filesystems 2012-10-02 21:35:55 -04:00
squashfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-17 15:44:47 -08:00
sysv sysv: drop lock/unlock super 2012-10-09 23:33:39 -04:00
ubifs ubifs: use prandom_bytes 2012-12-17 17:15:26 -08:00
udf udf: remove un-needed variable from inode_getblk 2012-12-13 16:33:23 +01:00
ufs ufs: drop lock/unlock super 2012-10-09 23:33:39 -04:00
xfs xfs: fix sparse reported log CRC endian issue 2012-12-03 12:10:59 -06:00
aio.c aio: now fput() is OK from interrupt context; get rid of manual delayed __fput() 2012-07-22 23:57:59 +04:00
anon_inodes.c
attr.c userns: Allow chown and setgid preservation 2012-11-20 04:17:24 -08:00
bad_inode.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
binfmt_aout.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
binfmt_elf_fdpic.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
binfmt_elf.c binfmt_elf: fix corner case kfree of uninitialized data 2012-12-17 17:15:19 -08:00
binfmt_em86.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_flat.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
binfmt_misc.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_script.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_som.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
bio-integrity.c block: Ues bi_pool for bio_integrity_alloc() 2012-09-09 10:35:38 +02:00
bio.c vfs: fix: don't increase bio_slab_max if krealloc() fails 2012-10-22 22:00:26 +02:00
block_dev.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
buffer.c fs/buffer.c: remove redundant initialization in alloc_page_buffers() 2012-12-12 17:38:35 -08:00
char_dev.c char_dev: pin parent kobject 2012-10-22 08:50:37 +03:00
compat_binfmt_elf.c coredump: extend core dump note section to contain file names of mapped files 2012-10-06 03:05:17 +09:00
compat_ioctl.c Merge 3.7-rc3 into tty-next 2012-10-29 09:00:57 -07:00
compat.c vfs: define struct filename and have getname() return it 2012-10-12 20:14:55 -04:00
coredump.c do_coredump(): get rid of pt_regs argument 2012-11-29 00:01:25 -05:00
coredump.h coredump: update coredump-related headers 2012-10-06 03:05:15 +09:00
dcache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
dcookies.c
direct-io.c direct-io: don't read inode->i_blkbits multiple times 2012-11-29 12:38:44 -08:00
drop_caches.c
eventfd.c fs, eventfd: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
eventpoll.c fs, epoll: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
exec.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-17 20:58:12 -08:00
fcntl.c Fix F_DUPFD_CLOEXEC breakage 2012-10-09 15:52:31 +09:00
fhandle.c Fix misspellings of "whether" in comments. 2012-11-19 14:31:35 +01:00
fifo.c fifo: Do not restart open() if it already found a partner 2012-07-16 08:33:14 -07:00
file_table.c lglock: add DEFINE_STATIC_LGLOCK() 2012-10-10 01:15:44 -04:00
file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-12-12 12:22:13 -08:00
filesystems.c vfs: define struct filename and have getname() return it 2012-10-12 20:14:55 -04:00
fs_struct.c kill daemonize() 2012-11-28 21:49:02 -05:00
fs-writeback.c writeback: fix a typo in comment 2012-12-12 17:38:34 -08:00
generic_acl.c userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr 2012-09-18 01:01:35 -07:00
inode.c mm: redefine address_space.assoc_mapping 2012-12-11 17:22:26 -08:00
internal.h writeback: put unused inodes to LRU after writeback completion 2012-11-26 17:41:24 -08:00
ioctl.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
ioprio.c
Kconfig ext4: Remove CONFIG_EXT4_FS_XATTR 2012-12-10 16:30:43 -05:00
Kconfig.binfmt coredump: make core dump functionality optional 2012-10-06 03:05:15 +09:00
libfs.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
locks.c UAPI Disintegration 2012-10-09 2012-10-09 18:35:22 -04:00
Makefile coredump: make core dump functionality optional 2012-10-06 03:05:15 +09:00
mbcache.c
mount.h proc: Usable inode numbers for the namespace file descriptors. 2012-11-20 04:19:49 -08:00
mpage.c
namei.c lookup_one_len: don't accept . and .. 2012-11-29 22:17:21 -05:00
namespace.c userns: Require CAP_SYS_ADMIN for most uses of setns. 2012-12-14 16:12:03 -08:00
no-block.c
open.c vfs: Allow chroot if you have CAP_SYS_CHROOT in your user namespace 2012-11-19 05:59:17 -08:00
pipe.c pipe(2) - race-free error recovery 2012-09-26 21:08:52 -04:00
pnode.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2012-07-14 16:37:27 +04:00
pnode.h vfs: Only support slave subtrees across different user namespaces 2012-11-19 05:59:20 -08:00
posix_acl.c userns: Convert vfs posix_acl support to use kuids and kgids 2012-09-18 01:01:35 -07:00
proc_namespace.c get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
read_write.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
read_write.h compat: fs: Generic compat_sys_sendfile implementation 2012-10-02 21:35:55 -04:00
readdir.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
select.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
seq_file.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
signalfd.c fs, epoll: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
splice.c writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr() 2012-12-11 17:22:21 -08:00
stack.c
stat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
statfs.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
super.c vfs: drop lock/unlock super 2012-10-09 23:33:39 -04:00
sync.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
timerfd.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
utimes.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
xattr_acl.c userns: Fix posix_acl_file_xattr_userns gid conversion 2012-10-12 13:16:48 -07:00
xattr.c fs, xattr: fix bug when removing a name not in xattr list 2012-10-18 12:35:58 -07:00