android_kernel_samsung_univ.../fs
David Howells c4d6d8dbf3 CacheFiles: Fix the marking of cached pages
Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.

There are, however, two problems with this:

 (1) It can lead to the PG_fscache mark being applied _after_ the page is set
     PG_uptodate and unlocked (by the call to fscache_end_io()).

 (2) CacheFiles's ref on the page is dropped immediately following
     fscache_end_io() - and so may not still be held when the mark is applied.
     This can lead to the page being passed back to the allocator before the
     mark is applied.

Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page.  This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.

The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:

BUG: Bad page state in process tar  pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping:          (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G        W   3.1.0-rc4-fsdevel+ #1064
Call Trace:
 [<ffffffff8109583c>] ? dump_page+0xb9/0xbe
 [<ffffffff81095916>] bad_page+0xd5/0xea
 [<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a
 [<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662
 [<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267
 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
 [<ffffffff81098d7b>] ra_submit+0x1c/0x20
 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
 [<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a
 [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
 [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
 [<ffffffff810c22c4>] do_sync_read+0xba/0xfa
 [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a
 [<ffffffff810c2a79>] sys_read+0x45/0x6c
 [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.

Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was
set appropriately showed that sometimes it wasn't.  This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.

Reported-by: M. Stevens <m@tippett.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-12-20 21:54:30 +00:00
..
9p
adfs
affs
afs
autofs4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-17 15:44:47 -08:00
befs
bfs
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2012-12-18 09:42:05 -08:00
cachefiles CacheFiles: Fix the marking of cached pages 2012-12-20 21:54:30 +00:00
ceph ceph: fix dentry reference leak in ceph_encode_fh() 2012-12-18 15:02:11 -08:00
cifs lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
coda
configfs lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
cramfs
debugfs fs/debugsfs: remove unnecessary inode->i_private initialization 2012-11-15 17:46:42 -08:00
devpts TTY: devpts, document devpts inode operations 2012-10-22 16:50:13 -07:00
dlm dlm: fix lvb invalidation conditions 2012-11-16 11:20:42 -06:00
ecryptfs
efs
exofs exofs: don't leak io_state and pages on read error 2012-12-14 12:17:32 +02:00
exportfs fs, exportfs: add exportfs_encode_inode_fh() helper 2012-12-17 17:15:27 -08:00
ext2
ext3 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
ext4 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
fat fs/fat: strip "cp" prefix from codepage in display 2012-12-17 17:15:22 -08:00
freevxfs
fscache CacheFiles: Fix the marking of cached pages 2012-12-20 21:54:30 +00:00
fuse Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-17 20:58:12 -08:00
gfs2 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
hfs
hfsplus
hostfs
hpfs
hppfs pidns: Use task_active_pid_ns where appropriate 2012-11-19 05:59:09 -08:00
hugetlbfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
isofs
jbd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
jbd2 There are two major features for this merge window. The first is 2012-12-16 17:33:01 -08:00
jffs2 jffs2: hold erase_completion_lock on exit 2012-11-18 11:59:01 +02:00
jfs
lockd lockd: Remove BUG_ON()s from fs/lockd/clntproc.c 2012-11-04 14:43:40 -05:00
logfs Fix misspellings of "whether" in comments. 2012-11-19 14:31:35 +01:00
minix
ncpfs propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
nfs NFS client updates for Linux 3.8 2012-12-18 09:36:34 -08:00
nfs_common
nfsd
nilfs2 mm: redefine address_space.assoc_mapping 2012-12-11 17:22:26 -08:00
nls
notify fs, fanotify: add @mflags field to fanotify output 2012-12-17 17:15:28 -08:00
ntfs
ocfs2 lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
omfs
openpromfs
proc Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-17 20:58:12 -08:00
pstore lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
qnx4
qnx6
quota quota: Use the pre-processor to compile out quotactl_cmd_write when !CONFIG_BLOCK 2012-12-13 16:33:24 +01:00
ramfs
reiserfs reiserfs: Move quota calls out of write lock 2012-11-19 21:34:33 +01:00
romfs
squashfs
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-17 15:44:47 -08:00
sysv
ubifs ubifs: use prandom_bytes 2012-12-17 17:15:26 -08:00
udf udf: remove un-needed variable from inode_getblk 2012-12-13 16:33:23 +01:00
ufs
xfs xfs: fix sparse reported log CRC endian issue 2012-12-03 12:10:59 -06:00
aio.c
anon_inodes.c
attr.c userns: Allow chown and setgid preservation 2012-11-20 04:17:24 -08:00
bad_inode.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
binfmt_aout.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
binfmt_elf_fdpic.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
binfmt_elf.c binfmt_elf: fix corner case kfree of uninitialized data 2012-12-17 17:15:19 -08:00
binfmt_em86.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_flat.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
binfmt_misc.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_script.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_som.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
bio-integrity.c
bio.c vfs: fix: don't increase bio_slab_max if krealloc() fails 2012-10-22 22:00:26 +02:00
block_dev.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
buffer.c fs/buffer.c: remove redundant initialization in alloc_page_buffers() 2012-12-12 17:38:35 -08:00
char_dev.c char_dev: pin parent kobject 2012-10-22 08:50:37 +03:00
compat_binfmt_elf.c
compat_ioctl.c Merge 3.7-rc3 into tty-next 2012-10-29 09:00:57 -07:00
compat.c
coredump.c do_coredump(): get rid of pt_regs argument 2012-11-29 00:01:25 -05:00
coredump.h
dcache.c
dcookies.c
direct-io.c direct-io: don't read inode->i_blkbits multiple times 2012-11-29 12:38:44 -08:00
drop_caches.c
eventfd.c fs, eventfd: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
eventpoll.c fs, epoll: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
exec.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-17 20:58:12 -08:00
fcntl.c
fhandle.c Fix misspellings of "whether" in comments. 2012-11-19 14:31:35 +01:00
fifo.c
file_table.c
file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-12-12 12:22:13 -08:00
filesystems.c
fs_struct.c kill daemonize() 2012-11-28 21:49:02 -05:00
fs-writeback.c writeback: fix a typo in comment 2012-12-12 17:38:34 -08:00
generic_acl.c
inode.c mm: redefine address_space.assoc_mapping 2012-12-11 17:22:26 -08:00
internal.h writeback: put unused inodes to LRU after writeback completion 2012-11-26 17:41:24 -08:00
ioctl.c
ioprio.c
Kconfig ext4: Remove CONFIG_EXT4_FS_XATTR 2012-12-10 16:30:43 -05:00
Kconfig.binfmt
libfs.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
locks.c
Makefile
mbcache.c
mount.h proc: Usable inode numbers for the namespace file descriptors. 2012-11-20 04:19:49 -08:00
mpage.c
namei.c lookup_one_len: don't accept . and .. 2012-11-29 22:17:21 -05:00
namespace.c userns: Require CAP_SYS_ADMIN for most uses of setns. 2012-12-14 16:12:03 -08:00
no-block.c
open.c vfs: Allow chroot if you have CAP_SYS_CHROOT in your user namespace 2012-11-19 05:59:17 -08:00
pipe.c
pnode.c
pnode.h vfs: Only support slave subtrees across different user namespaces 2012-11-19 05:59:20 -08:00
posix_acl.c
proc_namespace.c
read_write.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
read_write.h
readdir.c
select.c
seq_file.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
signalfd.c fs, epoll: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
splice.c writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr() 2012-12-11 17:22:21 -08:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
utimes.c
xattr_acl.c
xattr.c