linux/fs/ubifs
Zhihao Cheng 3b67db8a6c ubifs: Fix to add refcount once page is set private
MM defined the rule [1] very clearly that once page was set with PG_private
flag, we should increment the refcount in that page, also main flows like
pageout(), migrate_page() will assume there is one additional page
reference count if page_has_private() returns true. Otherwise, we may
get a BUG in page migration:

  page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
  index:0xe2 pfn:0x14c12
  aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
  flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
  zone=1|lastcpupid=0x1fffff)
  page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
  ------------[ cut here ]------------
  kernel BUG at include/linux/page_ref.h:184!
  invalid opcode: 0000 [#1] SMP
  CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
  RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
  Call Trace:
    ubifs_migrate_page+0x22/0xc0 [ubifs]
    move_to_new_page+0xb4/0x600
    migrate_pages+0x1523/0x1cc0
    compact_zone+0x8c5/0x14b0
    kcompactd+0x2bc/0x560
    kthread+0x18c/0x1e0
    ret_from_fork+0x1f/0x30

Before the time, we should make clean a concept, what does refcount means
in page gotten from grab_cache_page_write_begin(). There are 2 situations:
Situation 1: refcount is 3, page is created by __page_cache_alloc.
  TYPE_A - the write process is using this page
  TYPE_B - page is assigned to one certain mapping by calling
	   __add_to_page_cache_locked()
  TYPE_C - page is added into pagevec list corresponding current cpu by
	   calling lru_cache_add()
Situation 2: refcount is 2, page is gotten from the mapping's tree
  TYPE_B - page has been assigned to one certain mapping
  TYPE_A - the write process is using this page (by calling
	   page_cache_get_speculative())
Filesystem releases one refcount by calling put_page() in xxx_write_end(),
the released refcount corresponds to TYPE_A (write task is using it). If
there are any processes using a page, page migration process will skip the
page by judging whether expected_page_refs() equals to page refcount.

The BUG is caused by following process:
    PA(cpu 0)                           kcompactd(cpu 1)
				compact_zone
ubifs_write_begin
  page_a = grab_cache_page_write_begin
    add_to_page_cache_lru
      lru_cache_add
        pagevec_add // put page into cpu 0's pagevec
  (refcnf = 3, for page creation process)
ubifs_write_end
  SetPagePrivate(page_a) // doesn't increase page count !
  unlock_page(page_a)
  put_page(page_a)  // refcnt = 2
				[...]

    PB(cpu 0)
filemap_read
  filemap_get_pages
    add_to_page_cache_lru
      lru_cache_add
        __pagevec_lru_add // traverse all pages in cpu 0's pagevec
	  __pagevec_lru_add_fn
	    SetPageLRU(page_a)
				isolate_migratepages
                                  isolate_migratepages_block
				    get_page_unless_zero(page_a)
				    // refcnt = 3
                                      list_add(page_a, from_list)
				migrate_pages(from_list)
				  __unmap_and_move
				    move_to_new_page
				      ubifs_migrate_page(page_a)
				        migrate_page_move_mapping
					  expected_page_refs get 3
                                  (migration[1] + mapping[1] + private[1])
	 release_pages
	   put_page_testzero(page_a) // refcnt = 3
                                          page_ref_freeze  // refcnt = 0
	     page_ref_dec_and_test(0 - 1 = -1)
                                          page_ref_unfreeze
                                            VM_BUG_ON_PAGE(-1 != 0, page)

UBIFS doesn't increase the page refcount after setting private flag, which
leads to page migration task believes the page is not used by any other
processes, so the page is migrated. This causes concurrent accessing on
page refcount between put_page() called by other process(eg. read process
calls lru_cache_add) and page_ref_unfreeze() called by migration task.

Actually zhangjun has tried to fix this problem [2] by recalculating page
refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
just like Kirill suggested in [2], we need to check all users of
page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
refcount when setting/clearing private for a page. BTW, according to [4],
we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
And, [5] provided a common helper to set/clear page private, ubifs can
use this helper following the example of iomap, afs, btrfs, etc.

Jump [6] to find a reproducer.

[1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com
[2] https://www.spinics.net/lists/linux-mtd/msg04018.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
[4] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org
[5] https://lore.kernel.org/all/20200517214718.468-1-guoqing.jiang@cloud.ionos.com
[6] https://bugzilla.kernel.org/show_bug.cgi?id=214961

Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-01-10 23:08:14 +01:00
..
auth.c ubifs: Fix memleak in ubifs_init_authentication 2021-02-12 21:53:22 +01:00
budget.c
commit.c ubifs: Pass node length in all node dumping callers 2020-12-13 22:12:32 +01:00
compress.c
crypto.c fscrypt: remove fscrypt_operations::max_namelen 2021-09-20 19:32:33 -07:00
debug.c ubifs: fix snprintf() checking 2021-06-18 22:04:47 +02:00
debug.h ubifs: ubifs_dump_sleb: Remove unused function 2020-12-13 22:12:38 +01:00
dir.c ubifs: Rectify space amount budget for mkdir/tmpfile operations 2022-01-10 22:12:14 +01:00
file.c ubifs: Fix to add refcount once page is set private 2022-01-10 23:08:14 +01:00
find.c
gc.c ubifs: read-only if LEB may always be taken in ubifs_garbage_collect 2021-12-23 22:30:38 +01:00
io.c ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock() 2022-01-10 22:58:27 +01:00
ioctl.c ubifs: setflags: Make dirtied_ino_d 8 bytes aligned 2022-01-10 22:18:42 +01:00
journal.c ubifs: Rename whiteout atomically 2022-01-10 21:58:37 +01:00
Kconfig
key.h
log.c
lprops.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
lpt_commit.c mm: remove the pgprot argument to __vmalloc 2020-06-02 10:59:11 -07:00
lpt.c ubifs: Fix the printing type of c->big_lpt 2020-12-13 21:57:10 +01:00
Makefile ubifs: Export filesystem error counters 2021-12-23 20:23:42 +01:00
master.c ubifs: Fix spelling mistakes 2021-06-22 09:21:39 +02:00
misc.c
misc.h ubifs: misc.h: delete a duplicated word 2020-08-02 22:59:03 +02:00
orphan.c ubifs: Pass node length in all node dumping callers 2020-12-13 22:12:32 +01:00
recovery.c ubifs: Pass node length in all node dumping callers 2020-12-13 22:12:32 +01:00
replay.c ubifs: Fix spelling mistakes 2021-12-23 20:23:40 +01:00
sb.c ubifs: Default to zstd compression 2021-04-15 22:00:26 +02:00
scan.c ubifs: Pass node length in all node dumping callers 2020-12-13 22:12:32 +01:00
shrinker.c
super.c ubifs: Export filesystem error counters 2021-12-23 20:23:42 +01:00
sysfs.c ubifs: fix snprintf() length check 2021-12-23 22:08:19 +01:00
tnc_commit.c ubifs: Fix spelling mistakes 2021-06-22 09:21:39 +02:00
tnc_misc.c ubifs: Pass node length in all node dumping callers 2020-12-13 22:12:32 +01:00
tnc.c ubifs: Pass node length in all node dumping callers 2020-12-13 22:12:32 +01:00
ubifs-media.h
ubifs.h ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment 2022-01-09 21:35:38 +01:00
xattr.c ubifs: Remove ui_mutex in ubifs_xattr_get and change_xattr 2021-06-18 22:04:47 +02:00