linux/fs
Dave Chinner 108a42358a xfs: Lower CIL flush limit for large logs
The current CIL size aggregation limit is 1/8th the log size. This
means for large logs we might be aggregating at least 250MB of dirty objects
in memory before the CIL is flushed to the journal. With CIL shadow
buffers sitting around, this means the CIL is often consuming >500MB
of temporary memory that is all allocated under GFP_NOFS conditions.

Flushing the CIL can take some time to do if there is other IO
ongoing, and can introduce substantial log force latency by itself.
It also pins the memory until the objects are in the AIL and can be
written back and reclaimed by shrinkers. Hence this threshold also
tends to determine the minimum amount of memory XFS can operate in
under heavy modification without triggering the OOM killer.

Modify the CIL space limit to prevent such huge amounts of pinned
metadata from aggregating. We can have 2MB of log IO in flight at
once, so limit aggregation to 16x this size. This threshold was
chosen as it little impact on performance (on 16-way fsmark) or log
traffic but pins a lot less memory on large logs especially under
heavy memory pressure.  An aggregation limit of 8x had 5-10%
performance degradation and a 50% increase in log throughput for
the same workload, so clearly that was too small for highly
concurrent workloads on large logs.

This was found via trace analysis of AIL behaviour. e.g. insertion
from a single CIL flush:

xfs_ail_insert: old lsn 0/0 new lsn 1/3033090 type XFS_LI_INODE flags IN_AIL

$ grep xfs_ail_insert /mnt/scratch/s.t |grep "new lsn 1/3033090" |wc -l
1721823
$

So there were 1.7 million objects inserted into the AIL from this
CIL checkpoint, the first at 2323.392108, the last at 2325.667566 which
was the end of the trace (i.e. it hadn't finished). Clearly a major
problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-27 08:32:54 -07:00
..
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
adfs fs/adfs: bigdir: Fix an error code in adfs_fplus_read() 2020-01-25 11:31:59 -05:00
affs affs: fix a memory leak in affs_remount 2019-11-18 14:26:43 +01:00
afs Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
autofs Merge branch 'next.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-05 17:11:48 -08:00
befs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
bfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
btrfs for-5.6-rc2-tag 2020-02-23 09:43:50 -08:00
cachefiles cachefiles: drop direct usage of ->bmap method. 2020-02-03 08:05:56 -05:00
ceph ceph: noacl mount option is effectively ignored 2020-02-11 17:04:40 +01:00
cifs cifs: make sure we do not overflow the max EA buffer size 2020-02-14 11:10:24 -06:00
coda y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
configfs utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
cramfs cramfs: switch to use of errofc() et.al. 2020-02-07 14:48:41 -05:00
crypto fscrypt: improve format of no-key names 2020-01-22 14:50:03 -08:00
debugfs Merge branch 'work.recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:09:46 +00:00
devpts devpts_pty_kill(): don't bother with d_delete() 2019-09-03 09:30:56 -04:00
dlm dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD 2019-12-18 18:07:31 +01:00
ecryptfs eCryptfs fixes for 5.6-rc3 2020-02-17 21:08:37 -08:00
efivarfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
efs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
erofs erofs: clean up z_erofs_submit_queue() 2020-01-21 16:46:23 +08:00
exportfs race in exportfs_decode_fh() 2019-11-11 09:21:59 -05:00
ext2 dax fixes 5.6-rc1 2020-02-11 16:52:08 -08:00
ext4 ext4: potential crash on allocation error in ext4_alloc_flex_bg_array() 2020-02-29 17:48:08 -05:00
f2fs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:04:49 -08:00
fat Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:02:42 +00:00
freevxfs fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
fscache proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
fuse Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
gfs2 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
hfs hfs/hfsplus: use 64-bit inode timestamps 2019-12-18 18:07:32 +01:00
hfsplus hfs/hfsplus: use 64-bit inode timestamps 2019-12-18 18:07:32 +01:00
hostfs hostfs: pass 64-bit timestamps to/from user space 2019-12-18 18:07:32 +01:00
hpfs fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
hugetlbfs hugetlbfs: switch to use of invalfc() 2020-02-07 14:48:42 -05:00
iomap fs: Fix page_mkwrite off-by-one errors 2020-01-06 08:58:23 -08:00
isofs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
jbd2 jbd2: fix data races at struct journal_head 2020-02-29 13:40:02 -05:00
jffs2 fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
jfs Trivial cleanup for jfs 2020-02-05 05:28:20 +00:00
kernfs Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:02:42 +00:00
lockd proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
minix fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
nfs NFSv4: Ensure the delegation cred is pinned when we call delegreturn 2020-02-13 16:23:02 -05:00
nfs_common
nfsd Highlights: 2020-02-07 17:50:21 -08:00
nilfs2 fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
nls
notify fs: call fsnotify_sb_delete after evict_inodes 2019-12-18 00:03:01 -05:00
ntfs utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
ocfs2 treewide: remove redundant IS_ERR() before error code check 2020-02-04 03:05:27 +00:00
omfs fs: omfs: Initialize filesystem timestamp ranges 2019-08-30 08:11:25 -07:00
openpromfs Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
orangefs help_next should increase position index 2020-02-04 15:22:04 -05:00
overlayfs ovl: fix lseek overflow on 32bit 2020-02-03 11:41:53 +01:00
proc Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
pstore pstore/ram: Regularize prz label allocation lifetime 2020-01-08 17:05:45 -08:00
qnx4 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
qnx6 fs: Fill in max and min timestamps in superblock 2019-08-30 07:27:17 -07:00
quota \n 2020-01-30 15:37:41 -08:00
ramfs fs_parse: fold fs_parameter_desc/fs_parameter_spec 2020-02-07 14:48:37 -05:00
reiserfs Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
romfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
squashfs Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-09-19 10:06:57 -07:00
sysfs treewide: remove redundant IS_ERR() before error code check 2020-02-04 03:05:27 +00:00
sysv fs: sysv: Initialize filesystem timestamp ranges 2019-08-30 07:27:18 -07:00
tracefs simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems 2019-12-10 22:29:58 -05:00
ubifs Merge branch 'imm.timestamp' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-05 05:02:42 +00:00
udf udf: Clarify meaning of f_files in udf_statfs 2020-01-20 13:59:41 +01:00
ufs y2038: add inode timestamp clamping 2019-09-19 09:42:37 -07:00
unicode kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
vboxsf fs: Add VirtualBox guest shared folder (vboxsf) support 2020-02-08 17:34:58 -05:00
verity fs-verity: use u64_to_user_ptr() 2020-01-14 13:28:28 -08:00
xfs xfs: Lower CIL flush limit for large logs 2020-03-27 08:32:54 -07:00
zonefs zonefs: select FS_IOMAP 2020-02-26 16:58:15 +09:00
aio.c aio: prevent potential eventfd recursion on poll 2020-02-03 17:27:47 -07:00
anon_inodes.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
attr.c utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
binfmt_elf.c fs/binfmt_elf.c: coredump: allow process with empty address space to coredump 2020-01-31 10:30:41 -08:00
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: remove set but not used variable 'inode' 2019-07-16 19:23:22 -07:00
binfmt_misc.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
binfmt_script.c
block_dev.c block: don't send uevent for empty disk when not invalidating 2019-12-02 18:49:30 -07:00
buffer.c smp: Remove allocation mask from on_each_cpu_cond.*() 2020-01-24 20:40:09 +01:00
char_dev.c chardev: Avoid potential use-after-free in 'chrdev_open()' 2020-01-06 20:10:26 +01:00
compat_binfmt_elf.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
compat.c
coredump.c pipe: use exclusive waits when reading or writing 2020-02-08 11:39:19 -08:00
d_path.c [PATCH] fix d_absolute_path() interplay with fsmount() 2019-08-30 19:31:09 -04:00
dax.c dax: pass NOWAIT flag to iomap_apply 2020-02-05 20:34:32 -08:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
dcookies.c
direct-io.c fs/direct-io.c: include fs/internal.h for missing prototype 2020-01-04 13:55:09 -08:00
drop_caches.c fs: avoid softlockups in s_inodes iterators 2019-12-18 00:03:01 -05:00
eventfd.c eventfd: track eventfd_signal() recursion depth 2020-02-03 17:27:38 -07:00
eventpoll.c eventpoll: support non-blocking do_epoll_ctl() calls 2020-01-29 15:45:47 -07:00
exec.c Merge branch 'akpm' (patches from Andrew) 2020-01-31 12:16:36 -08:00
fcntl.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
fhandle.c fs/handle.c - fix up kerneldoc 2019-08-07 21:51:47 -04:00
file_table.c vfs: Export flush_delayed_fput for use by knfsd. 2019-08-19 11:00:39 -04:00
file.c threads-v5.6 2020-01-29 19:38:34 -08:00
filesystems.c fs_parser: remove fs_parameter_description name field 2020-02-07 14:48:36 -05:00
fs_context.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
fs_parser.c turn fs_param_is_... into functions 2020-02-07 14:48:38 -05:00
fs_pin.c switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
fs_struct.c
fs_types.c
fs-writeback.c memcg: fix a crash in wb_workfn when a device disappears 2020-01-31 10:30:36 -08:00
fsopen.c add prefix to fs_context->log 2020-02-07 14:48:35 -05:00
inode.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:04:49 -08:00
internal.h for-5.6/io_uring-vfs-2020-01-29 2020-01-29 18:53:37 -08:00
io_uring.c io_uring: fix 32-bit compatability with sendmsg/recvmsg 2020-02-27 14:17:49 -07:00
io-wq.c io-wq: remove spin-for-work optimization 2020-02-25 08:57:37 -07:00
io-wq.h io-wq: ensure work->task_pid is cleared on init 2020-02-25 13:23:48 -07:00
ioctl.c compat-ioctl fix for v5.6 2020-02-08 13:44:41 -08:00
Kconfig fs: New zonefs file system 2020-02-09 15:51:46 -08:00
Kconfig.binfmt
libfs.c simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems 2019-12-10 22:29:58 -05:00
locks.c locks: print unsigned ino in /proc/locks 2019-12-29 09:00:58 -05:00
Makefile fs: New zonefs file system 2020-02-09 15:51:46 -08:00
mbcache.c
mount.h switch the remnants of releasing the mountpoint away from fs_pin 2019-07-16 22:52:37 -04:00
mpage.c fs: move guard_bio_eod() after bio_set_op_attrs 2020-01-09 08:16:12 -07:00
namei.c vfs: fix do_last() regression 2020-02-01 10:36:49 -08:00
namespace.c saner copy_mount_options() 2020-02-03 21:23:33 -05:00
no-block.c
nsfs.c Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-01-29 11:20:24 -08:00
open.c fs: make build_open_flags() available internally 2020-01-20 17:01:53 -07:00
pipe.c pipe: make sure to wake up everybody when the last reader/writer closes 2020-02-18 14:34:36 -08:00
pnode.c
pnode.h
posix_acl.c fs/posix_acl.c: fix kernel-doc warnings 2020-01-04 13:55:09 -08:00
proc_namespace.c vfs: subtype handling moved to fuse 2019-09-06 21:28:49 +02:00
read_write.c overlayfs update for 5.6 2020-02-04 11:45:21 +00:00
readdir.c readdir: make user_access_begin() use the real access range 2020-01-23 10:15:28 -08:00
select.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
seq_file.c seq_file: fix problem when seeking mid-record 2019-08-13 16:06:52 -07:00
signalfd.c
splice.c pipe: use exclusive waits when reading or writing 2020-02-08 11:39:19 -08:00
stack.c sched/rt, fs: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
stat.c fs: make two stat prep helpers available 2020-01-20 17:03:54 -07:00
statfs.c vfs: Fix EOVERFLOW testing in put_compat_statfs64 2019-10-03 14:21:35 -07:00
super.c fs: call fsnotify_sb_delete after evict_inodes 2019-12-18 00:03:01 -05:00
sync.c
timerfd.c timerfd: Make timerfd_settime() time namespace aware 2020-01-14 12:20:53 +01:00
userfaultfd.c Merge branch 'akpm' (patches from Andrew) 2019-12-01 20:36:41 -08:00
utimes.c utimes: Clamp the timestamps in notify_change() 2019-12-08 19:10:50 -05:00
xattr.c