linux/mm
Michal Hocko 063d99b4fa mm, fs: obey gfp_mapping for add_to_page_cache()
Commit 6afdb859b7 ("mm: do not ignore mapping_gfp_mask in page cache
allocation paths") has caught some users of hardcoded GFP_KERNEL used in
the page cache allocation paths.  This, however, wasn't complete and
there were others which went unnoticed.

Dave Chinner has reported the following deadlock for xfs on loop device:
: With the recent merge of the loop device changes, I'm now seeing
: XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
:
: The deadlocked is as follows:
:
: kloopd1: loop_queue_read_work
:       xfs_file_iter_read
:       lock XFS inode XFS_IOLOCK_SHARED (on image file)
:       page cache read (GFP_KERNEL)
:       radix tree alloc
:       memory reclaim
:       reclaim XFS inodes
:       log force to unpin inodes
:       <wait for log IO completion>
:
: xfs-cil/loop1: <does log force IO work>
:       xlog_cil_push
:       xlog_write
:       <loop issuing log writes>
:               xlog_state_get_iclog_space()
:               <blocks due to all log buffers under write io>
:               <waits for IO completion>
:
: kloopd1: loop_queue_write_work
:       xfs_file_write_iter
:       lock XFS inode XFS_IOLOCK_EXCL (on image file)
:       <wait for inode to be unlocked>
:
: i.e. the kloopd, with it's split read and write work queues, has
: introduced a dependency through memory reclaim. i.e. that writes
: need to be able to progress for reads make progress.
:
: The problem, fundamentally, is that mpage_readpages() does a
: GFP_KERNEL allocation, rather than paying attention to the inode's
: mapping gfp mask, which is set to GFP_NOFS.
:
: The didn't used to happen, because the loop device used to issue
: reads through the splice path and that does:
:
:       error = add_to_page_cache_lru(page, mapping, index,
:                       GFP_KERNEL & mapping_gfp_mask(mapping));

This has changed by commit aa4d86163e ("block: loop: switch to VFS
ITER_BVEC").

This patch changes mpage_readpage{s} to follow gfp mask set for the
mapping.  There are, however, other places which are doing basically the
same.

lustre:ll_dir_filler is doing GFP_KERNEL from the function which
apparently uses GFP_NOFS for other allocations so let's make this
consistent.

cifs:readpages_get_pages is called from cifs_readpages and
__cifs_readpages_from_fscache called from the same path obeys mapping
gfp.

ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
regardless it uses mapping_gfp_mask for the page allocation.

ext4_mpage_readpages is the called from the page cache allocation path
same as read_pages and read_cache_pages

As I've noticed in my previous post I cannot say I would be happy about
sprinkling mapping_gfp_mask all over the place and it sounds like we
should drop gfp_mask argument altogether and use it internally in
__add_to_page_cache_locked that would require all the filesystems to use
mapping gfp consistently which I am not sure is the case here.  From a
quick glance it seems that some file system use it all the time while
others are selective.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-16 11:42:28 -07:00
..
kasan kasan: fix last shadow judgement in memory_is_poisoned_16() 2015-09-17 21:16:07 -07:00
backing-dev.c Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block 2015-09-10 18:56:14 -07:00
balloon_compaction.c mm/balloon_compaction: fix deflation when compaction is disabled 2014-10-29 16:33:15 -07:00
bootmem.c bootmem: avoid freeing to bootmem after bootmem is done 2015-09-08 15:35:28 -07:00
cleancache.c cleancache: remove limit on the number of cleancache enabled filesystems 2015-04-14 16:49:03 -07:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
cma.c mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute 2015-06-24 17:49:44 -07:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
compaction.c mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
dmapool.c dmapool: fix overflow condition in pool_find_page() 2015-10-01 21:42:35 -04:00
early_ioremap.c mm/early_ioremap: add explicit #include of asm/early_ioremap.h 2015-09-11 15:21:34 -07:00
fadvise.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
failslab.c
filemap.c Revert "fs: do not prefault sys_write() user buffer pages" 2015-10-07 08:32:38 +01:00
frame_vector.c [media] mm: Provide new get_vaddr_frames() helper 2015-08-16 13:02:47 -03:00
frontswap.c frontswap: allow multiple backends 2015-06-24 17:49:45 -07:00
gup.c mm: make GUP handle pfn mapping unless FOLL_GET is requested 2015-09-04 16:54:41 -07:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
hugetlb_cgroup.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
hugetlb.c mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault 2015-10-01 21:42:35 -04:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
Kconfig media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
Kconfig.debug mm/debug_pagealloc: remove obsolete Kconfig options 2015-01-08 15:10:52 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c kmemleak: use seq_hex_dump() to dump buffers 2015-09-10 13:29:01 -07:00
ksm.c mm: remove rest of ACCESS_ONCE() usages 2015-04-15 16:35:18 -07:00
list_lru.c list_lru: don't call list_lru_from_kmem if the list_head is empty 2015-09-08 15:35:28 -07:00
maccess.c lib: move strncpy_from_unsafe() into mm/maccess.c 2015-08-31 12:36:10 -07:00
madvise.c mm: madvise allow remove operation for hugetlbfs 2015-09-08 15:35:28 -07:00
Makefile media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memblock.c mm/memblock.c: fix comment in __next_mem_range() 2015-09-08 15:35:28 -07:00
memcontrol.c memcg: remove pcp_counter_lock 2015-10-01 21:42:35 -04:00
memory_hotplug.c libnvdimm for 4.3: 2015-09-08 14:35:59 -07:00
memory-failure.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
memory.c mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() 2015-09-10 13:29:01 -07:00
mempolicy.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
mempool.c mm/mempool: allow NULL `pool' pointer in mempool_destroy() 2015-09-08 15:35:28 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c memcg: fix dirty page migration 2015-10-01 21:42:35 -04:00
mincore.c mincore: apply page table walker on do_mincore() 2015-02-11 17:06:06 -08:00
mlock.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified 2015-09-22 15:09:53 -07:00
mmu_context.c
mmu_notifier.c mmu-notifier: add clear_young callback 2015-09-10 13:29:01 -07:00
mmzone.c mm: microoptimize zonelist operations 2015-02-11 17:06:02 -08:00
mprotect.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mremap.c mremap: simplify the "overlap" check in mremap_to() 2015-09-04 16:54:41 -07:00
msync.c mm: remove rest usage of VM_NONLINEAR and pte_file() 2015-02-10 14:30:31 -08:00
nobootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
nommu.c mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() 2015-09-10 13:29:01 -07:00
oom_kill.c mm, oom: remove unnecessary variable 2015-09-08 15:35:28 -07:00
page_alloc.c Merge branch 'akpm' (patches from Andrew) 2015-09-08 17:52:23 -07:00
page_counter.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
page_ext.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_idle.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_io.c fs: use helper bio_add_page() instead of open coding on bi_io_vec 2015-08-13 12:32:00 -06:00
page_isolation.c mm, page_isolation: make set/unset_migratetype_isolate() file-local 2015-09-08 15:35:28 -07:00
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
page-writeback.c Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block 2015-09-10 18:56:14 -07:00
pagewalk.c mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers 2015-03-25 16:20:30 -07:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: clean up of schunk->map[] assignment in pcpu_setup_first_chunk 2015-07-21 11:31:00 -04:00
pgtable-generic.c mm: clarify that the function operates on hugepage pte 2015-06-24 17:49:44 -07:00
process_vm_access.c process_vm_access: switch to {compat_,}import_iovec() 2015-04-11 22:27:12 -04:00
quicklist.c
readahead.c mm, fs: obey gfp_mapping for add_to_page_cache() 2015-10-16 11:42:28 -07:00
rmap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
shmem.c shmem: recalculate file inode when fstat 2015-09-08 15:35:28 -07:00
slab_common.c memcg: export struct mem_cgroup 2015-09-08 15:35:28 -07:00
slab.c mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) 2015-10-01 21:42:35 -04:00
slab.h mm/slab.h: fix argument order in cache_from_obj's error message 2015-09-04 16:54:41 -07:00
slob.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
slub.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: swap: zswap: maybe_preload & refactoring 2015-09-08 15:35:28 -07:00
swap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
swapfile.c mm: /proc/pid/smaps:: show proportional swap share of the mapping 2015-09-08 15:35:28 -07:00
truncate.c memcg: add per cgroup dirty page accounting 2015-06-02 08:33:33 -06:00
userfaultfd.c userfaultfd: avoid mmap_sem read recursion in mcopy_atomic 2015-09-04 16:54:41 -07:00
util.c mm: uninline and cleanup page-mapping related helpers 2015-04-15 16:35:19 -07:00
vmacache.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
vmalloc.c mm/vmalloc: get rid of dirty bitmap inside vmap_block structure 2015-04-15 16:35:18 -07:00
vmpressure.c mm/vmpressure.c: fix race in vmpressure_work_fn() 2014-12-02 17:32:07 -08:00
vmscan.c vmscan: fix sane_reclaim helper for legacy memcg 2015-09-22 15:09:53 -07:00
vmstat.c vmstat: Reduce time interval to stat update on idle cpu 2015-02-11 17:06:07 -08:00
workingset.c list_lru: add helpers to isolate items 2015-02-12 18:54:10 -08:00
zbud.c mm: zbud: constify the zbud_ops 2015-09-08 15:35:28 -07:00
zpool.c zpool: add zpool_has_pool() 2015-09-10 13:29:01 -07:00
zsmalloc.c mm: zpool: constify the zpool_ops 2015-09-08 15:35:28 -07:00
zswap.c zswap: change zpool/compressor at runtime 2015-09-10 13:29:01 -07:00