linux/fs/xfs
Dave Chinner 993f951f50 xfs: make inode reclaim almost non-blocking
Now that dirty inode writeback doesn't cause read-modify-write
cycles on the inode cluster buffer under memory pressure, the need
to throttle memory reclaim to the rate at which we can clean dirty
inodes goes away. That is due to the fact that we no longer thrash
inode cluster buffers under memory pressure to clean dirty inodes.

This means inode writeback no longer stalls on memory allocation
or read IO, and hence can be done asynchronously without generating
memory pressure. As a result, blocking inode writeback in reclaim is
no longer necessary to prevent reclaim priority windup as cleaning
dirty inodes is no longer dependent on having memory reserves
available for the filesystem to make progress reclaiming inodes.

Hence we can convert inode reclaim to be non-blocking for shrinker
callouts, both for direct reclaim and kswapd.

On a vanilla kernel, running a 16-way fsmark create workload on a
4 node/16p/16GB RAM machine, I can reliably pin 14.75GB of RAM via
userspace mlock(). The OOM killer gets invoked at 15GB of
pinned RAM.

Without the inode cluster pinning, this non-blocking reclaim patch
triggers premature OOM killer invocation with the same memory
pinning, sometimes with as much as 45% of RAM being free.  It's
trivially easy to trigger the OOM killer when reclaim does not
block.

With pinning inode clusters in RAM and then adding this patch, I can
reliably pin 14.5GB of RAM and still have the fsmark workload run to
completion. The OOM killer gets invoked 14.75GB of pinned RAM, which
is only a small amount of memory less than the vanilla kernel. It is
much more reliable than just with async reclaim alone.

simoops shows that allocation stalls go away when async reclaim is
used. Vanilla kernel:

Run time: 1924 seconds
Read latency (p50: 3,305,472) (p95: 3,723,264) (p99: 4,001,792)
Write latency (p50: 184,064) (p95: 553,984) (p99: 807,936)
Allocation latency (p50: 2,641,920) (p95: 3,911,680) (p99: 4,464,640)
work rate = 13.45/sec (avg 13.44/sec) (p50: 13.46) (p95: 13.58) (p99: 13.70)
alloc stall rate = 3.80/sec (avg: 2.59) (p50: 2.54) (p95: 2.96) (p99: 3.02)

With inode cluster pinning and async reclaim:

Run time: 1924 seconds
Read latency (p50: 3,305,472) (p95: 3,715,072) (p99: 3,977,216)
Write latency (p50: 187,648) (p95: 553,984) (p99: 789,504)
Allocation latency (p50: 2,748,416) (p95: 3,919,872) (p99: 4,448,256)
work rate = 13.28/sec (avg 13.32/sec) (p50: 13.26) (p95: 13.34) (p99: 13.34)
alloc stall rate = 0.02/sec (avg: 0.02) (p50: 0.01) (p95: 0.03) (p99: 0.03)

Latencies don't really change much, nor does the work rate. However,
allocation almost never stalls with these changes, whilst the
vanilla kernel is sometimes reporting 20 stalls/s over a 60s sample
period. This difference is due to inode reclaim being largely
non-blocking now.

IOWs, once we have pinned inode cluster buffers, we can make inode
reclaim non-blocking without a major risk of premature and/or
spurious OOM killer invocation, and without any changes to memory
reclaim infrastructure.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-07 07:15:07 -07:00
..
libxfs xfs: pin inode backing buffer to the inode log item 2020-07-07 07:15:07 -07:00
scrub xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data fork 2020-07-06 10:46:56 -07:00
Kconfig treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
kmem.c mm: remove the pgprot argument to __vmalloc 2020-06-02 10:59:11 -07:00
kmem.h xfs: more lockdep whackamole with kmem_alloc* 2020-05-27 08:49:28 -07:00
Makefile xfs: refactor log recovery item sorting into a generic dispatch structure 2020-05-08 08:49:58 -07:00
mrlock.h
xfs_acl.c xfs: only allocate the buffer size actually needed in __xfs_set_acl 2020-03-02 20:55:55 -08:00
xfs_acl.h xfs: improve xfs_forget_acl 2020-03-02 20:55:55 -08:00
xfs_aops.c New code for 5.8: 2020-06-02 19:21:40 -07:00
xfs_aops.h xfs: add a xfs_inode_buftarg helper 2019-10-28 08:37:54 -07:00
xfs_attr_inactive.c xfs: cleanup xfs_idestroy_fork 2020-05-19 09:40:59 -07:00
xfs_attr_list.c xfs: move the fork format fields into struct xfs_ifork 2020-05-19 09:40:58 -07:00
xfs_bio_io.c xfs: chain bios the right way around in xfs_rw_bdev 2019-07-10 10:04:16 -07:00
xfs_bmap_item.c xfs: hoist setting of XFS_LI_RECOVERED to caller 2020-05-08 08:50:01 -07:00
xfs_bmap_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_bmap_util.c xfs: preserve rmapbt swapext block reservation from freed blocks 2020-07-06 10:46:56 -07:00
xfs_bmap_util.h xfs: simplify xfs_iomap_eof_align_last_fsb 2019-11-03 10:22:30 -08:00
xfs_buf_item_recover.c xfs: mark log recovery buffers for completion 2020-07-06 10:46:58 -07:00
xfs_buf_item.c xfs: pin inode backing buffer to the inode log item 2020-07-07 07:15:07 -07:00
xfs_buf_item.h xfs: get rid of log item callbacks 2020-07-07 07:15:07 -07:00
xfs_buf.c xfs: call xfs_buf_iodone directly 2020-07-06 10:46:58 -07:00
xfs_buf.h xfs: call xfs_buf_iodone directly 2020-07-06 10:46:58 -07:00
xfs_dir2_readdir.c xfs: move the fork format fields into struct xfs_ifork 2020-05-19 09:40:58 -07:00
xfs_discard.c xfs: remove XFS_BUF_TO_AGF 2020-03-11 09:11:39 -07:00
xfs_discard.h
xfs_dquot_item_recover.c xfs: mark log recovery buffers for completion 2020-07-06 10:46:58 -07:00
xfs_dquot_item.c xfs: unwind log item error flagging 2020-07-07 07:15:07 -07:00
xfs_dquot_item.h xfs: factor out quotaoff intent AIL removal and memory free 2020-03-18 08:12:23 -07:00
xfs_dquot.c xfs: move xfs_clear_li_failed out of xfs_ail_delete_one() 2020-07-07 07:15:07 -07:00
xfs_dquot.h xfs: pass xfs_dquot to xfs_qm_adjust_dqtimers 2020-05-27 08:49:26 -07:00
xfs_error.c xfs: random buffer write failure errortag 2020-05-07 08:27:48 -07:00
xfs_error.h xfs: xfs_buf_corruption_error should take __this_address 2020-03-12 07:58:12 -07:00
xfs_export.c xfs: factor out a new xfs_log_force_inode helper 2020-04-06 08:44:35 -07:00
xfs_export.h
xfs_extent_busy.c xfs: cleanup use of the XFS_ALLOC_ flags 2019-11-03 10:22:31 -08:00
xfs_extent_busy.h
xfs_extfree_item.c xfs: hoist setting of XFS_LI_RECOVERED to caller 2020-05-08 08:50:01 -07:00
xfs_extfree_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_file.c xfs: add an inode item lock 2020-07-06 10:46:58 -07:00
xfs_filestream.c xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers 2020-01-26 14:32:26 -08:00
xfs_filestream.h
xfs_fsmap.c xfs: prohibit fs freezing when using empty transactions 2020-03-26 08:19:24 -07:00
xfs_fsmap.h
xfs_fsops.c xfs: remove unused shutdown types 2020-05-07 08:27:48 -07:00
xfs_fsops.h xfs: change some error-less functions to void types 2019-05-01 20:26:30 -07:00
xfs_globals.c xfs: multithreaded iwalk implementation 2019-07-03 07:33:26 -07:00
xfs_health.c xfs: introduce new v5 bulkstat structure 2019-07-03 20:36:26 -07:00
xfs_icache.c xfs: make inode reclaim almost non-blocking 2020-07-07 07:15:07 -07:00
xfs_icache.h xfs: straighten out all the naming around incore inode tree walks 2020-05-27 08:49:27 -07:00
xfs_icreate_item.c xfs: refactor log recovery icreate item dispatch for pass2 commit functions 2020-05-08 08:49:59 -07:00
xfs_icreate_item.h
xfs_inode_item_recover.c xfs: mark log recovery buffers for completion 2020-07-06 10:46:58 -07:00
xfs_inode_item.c xfs: pin inode backing buffer to the inode log item 2020-07-07 07:15:07 -07:00
xfs_inode_item.h xfs: make inode IO completion buffer centric 2020-07-06 10:46:59 -07:00
xfs_inode.c xfs: get rid of log item callbacks 2020-07-07 07:15:07 -07:00
xfs_inode.h xfs: move helpers that lock and unlock two inodes against userspace IO 2020-07-06 10:46:57 -07:00
xfs_ioctl32.c xfs: lift cursor copy in/out into xfs_ioc_attr_list 2020-03-02 20:55:54 -08:00
xfs_ioctl32.h xfs: rename compat_time_t to old_time32_t 2020-01-06 08:57:36 -08:00
xfs_ioctl.c Third part of new DAX code for 5.8: 2020-06-11 10:48:12 -07:00
xfs_ioctl.h xfs: embedded the attrlist cursor into struct xfs_attr_list_context 2020-03-02 20:55:55 -08:00
xfs_iomap.c xfs: refactor xfs_iomap_prealloc_size 2020-05-27 08:49:28 -07:00
xfs_iomap.h xfs: simplify the xfs_iomap_write_direct calling 2019-11-03 10:22:30 -08:00
xfs_iops.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
xfs_iops.h
xfs_itable.c xfs: move the fork format fields into struct xfs_ifork 2020-05-19 09:40:58 -07:00
xfs_itable.h xfs: remove all *_ITER_ABORT values 2019-08-29 21:22:41 -07:00
xfs_iwalk.c xfs: kill the XFS_WANT_CORRUPT_* macros 2019-11-12 17:19:02 -08:00
xfs_iwalk.h xfs: remove all *_ITER_CONTINUE values 2019-08-30 22:43:56 -07:00
xfs_linux.h xfs: remove useless definitions in xfs_linux.h 2020-07-06 10:46:58 -07:00
xfs_log_cil.c xfs: fix use-after-free on CIL context on shutdown 2020-06-22 19:22:57 -07:00
xfs_log_priv.h xfs: fix use-after-free on CIL context on shutdown 2020-06-22 19:22:57 -07:00
xfs_log_recover.c xfs: mark log recovery buffers for completion 2020-07-06 10:46:58 -07:00
xfs_log.c xfs: don't write a corrupt unmount record to force summary counter recalc 2020-03-27 08:32:55 -07:00
xfs_log.h xfs: refactor and split xfs_log_done() 2020-03-27 08:32:53 -07:00
xfs_message.c xfs: refactor ratelimited buffer error messages into helper 2020-05-07 08:27:46 -07:00
xfs_message.h xfs: refactor ratelimited buffer error messages into helper 2020-05-07 08:27:46 -07:00
xfs_mount.c xfs: reduce free inode accounting overhead 2020-05-27 08:49:25 -07:00
xfs_mount.h fs/xfs: Make DAX mount option a tri-state 2020-05-29 20:13:20 -07:00
xfs_mru_cache.c fs: xfs: Remove KM_NOSLEEP and KM_SLEEP. 2019-08-26 12:06:22 -07:00
xfs_mru_cache.h
xfs_ondisk.h xfs: make struct xfs_buf_log_format have a consistent size 2020-01-16 08:07:23 -08:00
xfs_pnfs.c xfs: define printk_once variants for xfs messages 2020-05-04 09:03:15 -07:00
xfs_pnfs.h
xfs_pwork.c xfs: poll waiting for quotacheck 2019-07-03 08:21:58 -07:00
xfs_pwork.h xfs: poll waiting for quotacheck 2019-07-03 08:21:58 -07:00
xfs_qm_bhv.c xfs: remove the xfs_disk_dquot_t and xfs_dquot_t 2019-11-13 11:13:45 -08:00
xfs_qm_syscalls.c xfs: straighten out all the naming around incore inode tree walks 2020-05-27 08:49:27 -07:00
xfs_qm.c xfs: per-type quota timers and warn limits 2020-05-27 08:49:26 -07:00
xfs_qm.h xfs: per-type quota timers and warn limits 2020-05-27 08:49:26 -07:00
xfs_quota.h xfs: use direct calls for dquot IO completion 2020-07-06 10:46:59 -07:00
xfs_quotaops.c xfs: per-type quota timers and warn limits 2020-05-27 08:49:26 -07:00
xfs_refcount_item.c xfs: hoist setting of XFS_LI_RECOVERED to caller 2020-05-08 08:50:01 -07:00
xfs_refcount_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_reflink.c xfs: move helpers that lock and unlock two inodes against userspace IO 2020-07-06 10:46:57 -07:00
xfs_reflink.h xfs: move helpers that lock and unlock two inodes against userspace IO 2020-07-06 10:46:57 -07:00
xfs_rmap_item.c xfs: hoist setting of XFS_LI_RECOVERED to caller 2020-05-08 08:50:01 -07:00
xfs_rmap_item.h xfs: refactor intent item RECOVERED flag into the log item 2020-05-08 08:50:01 -07:00
xfs_rtalloc.c xfs: make xfs_trans_get_buf return an error code 2020-01-26 14:32:26 -08:00
xfs_rtalloc.h
xfs_stats.c xfs: Use scnprintf() for avoiding potential buffer overflow 2020-03-12 07:58:13 -07:00
xfs_stats.h
xfs_super.c (More) new code for 5.8: 2020-06-02 19:48:41 -07:00
xfs_super.h xfs: include QUOTA, FATAL ASSERT build options in XFS_BUILD_OPTIONS 2019-10-21 09:04:57 -07:00
xfs_symlink.c xfs: move the fork format fields into struct xfs_ifork 2020-05-19 09:40:58 -07:00
xfs_symlink.h xfs: Correct comment tyops -> typos 2019-11-10 10:21:57 -08:00
xfs_sysctl.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
xfs_sysctl.h xfs: multithreaded iwalk implementation 2019-07-03 07:33:26 -07:00
xfs_sysfs.c xfs: avoid unused to_mp() function warning 2019-09-24 09:40:19 -07:00
xfs_sysfs.h
xfs_trace.c xfs: support bulk loading of staged btrees 2020-03-18 08:12:23 -07:00
xfs_trace.h xfs: redesign the reflink remap loop to fix blkres depletion crash 2020-07-06 10:46:57 -07:00
xfs_trans_ail.c xfs: pin inode backing buffer to the inode log item 2020-07-07 07:15:07 -07:00
xfs_trans_buf.c xfs: clean up the buffer iodone callback functions 2020-07-07 07:15:07 -07:00
xfs_trans_dquot.c xfs: per-type quota timers and warn limits 2020-05-27 08:49:26 -07:00
xfs_trans_priv.h xfs: refactor adding recovered intent items to the log 2020-05-08 08:50:00 -07:00
xfs_trans.c xfs: preserve rmapbt swapext block reservation from freed blocks 2020-07-06 10:46:56 -07:00
xfs_trans.h xfs: unwind log item error flagging 2020-07-07 07:15:07 -07:00
xfs_xattr.c xfs: remove duplicate headers 2020-05-08 08:51:34 -07:00
xfs.h