linux/mm
Minchan Kim f05714293a mm: support anonymous stable page
During developemnt for zram-swap asynchronous writeback, I found strange
corruption of compressed page, resulting in:

  Modules linked in: zram(E)
  CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G            E   4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  task: ffff88007620b840 task.stack: ffff880078090000
  RIP: set_freeobj.part.43+0x1c/0x1f
  RSP: 0018:ffff880078093ca8  EFLAGS: 00010246
  RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
  RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
  RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
  R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
  Call Trace:
    obj_malloc+0x22b/0x260
    zs_malloc+0x1e4/0x580
    zram_bvec_rw+0x4cd/0x830 [zram]
    page_requests_rw+0x9c/0x130 [zram]
    zram_thread+0xe6/0x173 [zram]
    kthread+0xca/0xe0
    ret_from_fork+0x25/0x30

With investigation, it reveals currently stable page doesn't support
anonymous page.  IOW, reuse_swap_page can reuse the page without waiting
writeback completion so it can overwrite page zram is compressing.

Unfortunately, zram has used per-cpu stream feature from v4.7.
It aims for increasing cache hit ratio of scratch buffer for
compressing. Downside of that approach is that zram should ask
memory space for compressed page in per-cpu context which requires
stricted gfp flag which could be failed. If so, it retries to
allocate memory space out of per-cpu context so it could get memory
this time and compress the data again, copies it to the memory space.

In this scenario, zram assumes the data should never be changed
but it is not true unless stable page supports. So, If the data is
changed under us, zram can make buffer overrun because second
compression size could be bigger than one we got in previous trial
and blindly, copy bigger size object to smaller buffer which is
buffer overrun. The overrun breaks zsmalloc free object chaining
so system goes crash like above.

I think below is same problem.
https://bugzilla.suse.com/show_bug.cgi?id=997574

Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
writeback in there so the approach in this patch is simply return false if
we found it needs stable page.  Although it increases memory footprint
temporarily, it happens rarely and it should be reclaimed easily althoug
it happened.  Also, It would be better than waiting of IO completion,
which is critial path for application latency.

Fixes: da9556a236 ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
..
kasan Power management material for v4.10-rc1 2016-12-13 10:41:53 -08:00
backing-dev.c writeback: track if we're sleeping on progress in balance_dirty_pages() 2016-11-08 08:28:55 -07:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma_debug.c
cma.c mm/cma.c: check the max limit for cma allocation 2016-11-11 08:12:37 -08:00
cma.h
compaction.c mm, compaction: allow compaction for GFP_NOFS requests 2016-12-14 16:04:07 -08:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
debug.c mm, debug: print raw struct page data in __dump_page() 2016-12-12 18:55:08 -08:00
dmapool.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
early_ioremap.c
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c dax: fix deadlock with DAX 4k holes 2017-01-10 18:31:54 -08:00
frame_vector.c mm: replace get_vaddr_frames() write/force parameters with gup_flags 2016-10-19 08:11:24 -07:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c mm: unexport __get_user_pages_unlocked() 2016-12-14 16:04:09 -08:00
highmem.c mm/highmem: make nr_free_highpages() handles all highmem zones by itself 2016-05-19 19:12:14 -07:00
huge_memory.c mm: pmd dirty emulation in page fault handler 2017-01-10 18:31:55 -08:00
hugetlb_cgroup.c mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size 2016-05-20 17:58:30 -07:00
hugetlb.c mm: add tlb_remove_check_page_size_change to track page size change 2016-12-12 18:55:07 -08:00
hwpoison-inject.c
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2016-11-22 11:49:48 -06:00
internal.h mm: add PageWaiters indicating tasks are waiting for a page bit 2016-12-25 11:54:48 -08:00
interval_tree.c
Kconfig mm: THP page cache support for ppc64 2016-12-12 18:55:08 -08:00
Kconfig.debug PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO 2016-09-13 02:35:27 +02:00
khugepaged.c mm: get rid of __GFP_OTHER_NODE 2017-01-10 18:31:55 -08:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c kmemleak: fix reference to Documentation 2016-12-12 18:55:07 -08:00
ksm.c mm,ksm: add __GFP_HIGH to the allocation in alloc_stable_node() 2016-10-07 18:46:29 -07:00
list_lru.c mm/list_lru.c: avoid error-path NULL pointer deref 2016-10-27 18:43:42 -07:00
maccess.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
madvise.c mm: add tlb_remove_check_page_size_change to track page size change 2016-12-12 18:55:07 -08:00
Makefile Disable the __builtin_return_address() warning globally after all 2016-10-12 10:23:41 -07:00
memblock.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
memcontrol.c mm, memcg: fix the active list aging for lowmem requests when memcg is enabled 2017-01-10 18:31:55 -08:00
memory_hotplug.c mm: remove x86-only restriction of movable_node 2016-12-12 18:55:07 -08:00
memory-failure.c mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked 2016-12-25 11:54:48 -08:00
memory.c dax: wrprotect pmd_t in dax_mapping_entry_mkclean 2017-01-10 18:31:54 -08:00
mempolicy.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mempool.c Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" 2016-07-28 16:07:41 -07:00
memtest.c
migrate.c mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked 2016-12-25 11:54:48 -08:00
mincore.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mlock.c thp: fix corner case of munlock() of PTE-mapped THPs 2016-11-30 16:32:52 -08:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mmu_context.c mm/mmu_context, sched/core: Fix mmu_context.h assumption 2016-04-28 11:44:19 +02:00
mmu_notifier.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
mmzone.c mm, page_alloc: inline the fast path of the zonelist iterator 2016-05-19 19:12:14 -07:00
mprotect.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
mremap.c mremap: move_ptes: check pte dirty after its removal 2016-11-29 08:20:24 -08:00
msync.c
nobootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
nommu.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
oom_kill.c oom: print nodemask in the oom report 2016-10-07 18:46:29 -07:00
page_alloc.c mm: rename __page_frag functions to __page_frag_cache, drop order from drain 2017-01-10 18:31:55 -08:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm/page_ext: support extra space allocation by page_ext user 2016-10-07 18:46:27 -07:00
page_idle.c mm, vmscan: move lru_lock to the node 2016-07-28 16:07:41 -07:00
page_io.c writeback: add wbc_to_write_flags() 2016-11-02 10:24:03 -06:00
page_isolation.c mm/page_isolation: fix typo: "paes" -> "pages" 2016-10-07 18:46:29 -07:00
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c mm: check the return value of lookup_page_ext for all call sites 2016-06-03 15:06:22 -07:00
page-writeback.c radix-tree: delete radix_tree_range_tag_if_tagged() 2016-12-14 16:04:10 -08:00
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c
percpu.c Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2016-12-13 12:34:47 -08:00
pgtable-generic.c mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range 2016-03-17 15:09:34 -07:00
process_vm_access.c mm: unexport __get_user_pages_unlocked() 2016-12-14 16:04:09 -08:00
quicklist.c fix Christoph's email addresses 2016-03-17 15:09:34 -07:00
readahead.c mm: don't cap request size based on read-ahead setting 2016-12-12 18:55:08 -08:00
rmap.c mm, rmap: handle anon_vma_prepare() common case inline 2016-12-12 18:55:08 -08:00
shmem.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
slab_common.c mm/slab_common.c: check kmem_create_cache flags are common 2016-12-12 18:55:06 -08:00
slab.c Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-12-13 12:59:57 -08:00
slab.h mm, slab: maintain total slab count instead of active count 2016-12-12 18:55:07 -08:00
slob.c slub: move synchronize_sched out of slab_mutex on shrink 2016-12-12 18:55:06 -08:00
slub.c slub: avoid false-postive warning 2016-12-12 18:55:06 -08:00
sparse-vmemmap.c treewide: replace obsolete _refok by __ref 2016-08-02 17:31:41 -04:00
sparse.c treewide: replace obsolete _refok by __ref 2016-08-02 17:31:41 -04:00
swap_cgroup.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_state.c mm, swap: use offset of swap entry as key of swap cache 2016-10-07 18:46:28 -07:00
swap.c mm: add PageWaiters indicating tasks are waiting for a page bit 2016-12-25 11:54:48 -08:00
swapfile.c mm: support anonymous stable page 2017-01-10 18:31:55 -08:00
truncate.c mm: Invalidate DAX radix tree entries only if appropriate 2016-12-26 20:29:24 -08:00
usercopy.c mm: usercopy: Check for module addresses 2016-09-20 16:07:39 -07:00
userfaultfd.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
util.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
vmacache.c mm: unrig VMA cache hit ratio 2016-10-07 18:46:27 -07:00
vmalloc.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
vmpressure.c mm/vmpressure.c: fix subtree pressure detection 2016-02-03 08:28:43 -08:00
vmscan.c mm, memcg: fix the active list aging for lowmem requests when memcg is enabled 2017-01-10 18:31:55 -08:00
vmstat.c mm/vmstat: Convert to hotplug state machine 2016-12-02 00:52:35 +01:00
workingset.c mm: workingset: fix use-after-free in shadow node shrinker 2017-01-07 18:22:40 -08:00
z3fold.c mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup 2016-06-03 16:02:55 -07:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c mm/zsmalloc: Convert to hotplug state machine 2016-12-02 00:52:36 +01:00
zswap.c mm/zswap: Convert pool to hotplug state machine 2016-12-02 00:52:36 +01:00