linux/fs/nfs
Max Kellermann db531db951 Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
This reverts commit be4c2d4723.

That commit caused a severe memory leak in nfs_readdir_make_qstr().

When listing a directory with more than 100 files (this is how many
struct nfs_cache_array_entry elements fit in one 4kB page), all
allocated file name strings past those 100 leak.

The root of the leakage is that those string pointers are managed in
pages which are never linked into the page cache.

fs/nfs/dir.c puts pages into the page cache by calling
read_cache_page(); the callback function nfs_readdir_filler() will
then fill the given page struct which was passed to it, which is
already linked in the page cache (by do_read_cache_page() calling
add_to_page_cache_lru()).

Commit be4c2d4723 added another (local) array of allocated pages, to
be filled with more data, instead of discarding excess items received
from the NFS server.  Those additional pages can be used by the next
nfs_readdir_filler() call (from within the same nfs_readdir() call).

The leak happens when some of those additional pages are never used
(copied to the page cache using copy_highpage()).  The pages will be
freed by nfs_readdir_free_pages(), but their contents will not.  The
commit did not invoke nfs_readdir_clear_array() (and doing so would
have been dangerous, because it did not track which of those pages
were already copied to the page cache, risking double free bugs).

How to reproduce the leak:

- Use a kernel with CONFIG_SLUB_DEBUG_ON.

- Create a directory on a NFS mount with more than 100 files with
  names long enough to use the "kmalloc-32" slab (so we can easily
  look up the allocation counts):

  for i in `seq 110`; do touch ${i}_0123456789abcdef; done

- Drop all caches:

  echo 3 >/proc/sys/vm/drop_caches

- Check the allocation counter:

  grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
  30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1

- Request a directory listing and check the allocation counters again:

  ls
  [...]
  grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
  30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1

There are now 120 new allocations.

- Drop all caches and check the counters again:

  echo 3 >/proc/sys/vm/drop_caches
  grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
  30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1

110 allocations are gone, but 10 have leaked and will never be freed.

Unhelpfully, those allocations are explicitly excluded from KMEMLEAK,
that's why my initial attempts with KMEMLEAK were not successful:

	/*
	 * Avoid a kmemleak false positive. The pointer to the name is stored
	 * in a page cache page which kmemleak does not scan.
	 */
	kmemleak_not_leak(string->name);

It would be possible to solve this bug without reverting the whole
commit:

- keep track of which pages were not used, and call
  nfs_readdir_clear_array() on them, or
- manually link those pages into the page cache

But for now I have decided to just revert the commit, because the real
fix would require complex considerations, risking more dangerous
(crash) bugs, which may seem unsuitable for the stable branches.

Signed-off-by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-12 16:01:37 -04:00
..
blocklayout treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
filelayout treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
flexfilelayout NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O 2019-06-28 11:48:52 -04:00
cache_lib.c NFS client updates for Linux 4.15 2017-11-17 14:18:00 -08:00
cache_lib.h NFS client updates for Linux 4.15 2017-11-17 14:18:00 -08:00
callback_proc.c NFS4: Add a trace event to record invalid CB sequence IDs 2019-07-09 10:30:25 -04:00
callback_xdr.c SUNRPC/nfs: Fix return value for nfs4_callback_compound() 2019-04-24 09:46:34 -04:00
callback.c SUNRPC: Cache the process user cred in the RPC server listener 2019-04-24 09:46:35 -04:00
callback.h NFS CB_OFFLOAD xdr 2018-08-09 12:56:38 -04:00
client.c NFS: Cleanup if nfs_match_client is interrupted 2019-07-06 14:54:53 -04:00
delegation.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
delegation.h NFSv4: don't mark all open state for recovery when handling recallable state revoked flag 2019-05-09 16:26:57 -04:00
dir.c Revert "NFS: readdirplus optimization by cache mechanism" (memleak) 2019-07-12 16:01:37 -04:00
direct.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
dns_resolve.c dns_resolver: Allow used keys to be invalidated 2019-05-15 17:35:54 +01:00
dns_resolve.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
export.c NFS: Pass the inode down to the getattr() callback 2018-06-04 12:07:07 -04:00
file.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
fscache-index.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
fscache.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
fscache.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
getroot.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
inode.c Merge branch 'containers' 2019-07-06 14:54:52 -04:00
internal.h Revert "NFS: readdirplus optimization by cache mechanism" (memleak) 2019-07-12 16:01:37 -04:00
io.c NFS: Fix up documentation warnings 2019-02-20 15:14:21 -05:00
iostat.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Makefile NFS: Create a root NFS directory in /sys/fs/nfs 2019-07-06 14:54:49 -04:00
mount_clnt.c SUNRPC: Cache cred of process creating the rpc_client 2019-04-26 16:00:48 -04:00
namespace.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
netns.h NFS: Add sysfs support for per-container identifier 2019-07-06 14:54:49 -04:00
nfs2super.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
nfs2xdr.c NFS: Record task, client ID, and XID in xdr_status trace points 2019-07-09 10:30:25 -04:00
nfs3_fs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nfs3acl.c nfs: fix xfstest generic/099 failed on nfsv3 2019-02-20 17:33:55 -05:00
nfs3client.c pNFS: Allow multiple connections to the DS 2019-07-06 14:54:50 -04:00
nfs3proc.c NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'. 2018-12-19 13:52:46 -05:00
nfs3super.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
nfs3xdr.c NFS: Record task, client ID, and XID in xdr_status trace points 2019-07-09 10:30:25 -04:00
nfs4_fs.h NFS: Allow signal interruption of NFS4ERR_DELAYed operations 2019-04-25 14:18:14 -04:00
nfs4client.c pNFS: Allow multiple connections to the DS 2019-07-06 14:54:50 -04:00
nfs4file.c nfs: disable client side deduplication 2019-07-06 14:54:53 -04:00
nfs4getroot.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nfs4idmap.c NFSv4: Convert the NFS client idmapper to use the container user namespace 2019-04-26 17:10:53 -04:00
nfs4idmap.h NFS: Move nfs_idmap.h into fs/nfs/ 2015-04-23 15:16:14 -04:00
nfs4namespace.c NFS: Fix up documentation warnings 2019-02-20 15:14:21 -05:00
nfs4proc.c NFS: send state management on a single connection. 2019-07-06 14:54:50 -04:00
nfs4renewd.c NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'. 2018-12-19 13:52:46 -05:00
nfs4session.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
nfs4session.h NFSv4.1: Bump the default callback session slot count to 16 2019-03-02 16:25:26 -05:00
nfs4state.c NFSv4: don't mark all open state for recovery when handling recallable state revoked flag 2019-05-09 16:26:57 -04:00
nfs4super.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
nfs4sysctl.c nfs: Do not convert nfs_idmap_cache_timeout to jiffies 2018-01-18 15:10:47 -05:00
nfs4trace.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nfs4trace.h NFS: Record task, client ID, and XID in xdr_status trace points 2019-07-09 10:30:25 -04:00
nfs4xdr.c NFS: Record task, client ID, and XID in xdr_status trace points 2019-07-09 10:30:25 -04:00
nfs42.h NFSv4.2: Add client support for the generic 'layouterror' RPC call 2019-03-01 16:20:16 -05:00
nfs42proc.c NFSv4.1 fix incorrect return value in copy_file_range 2019-04-11 15:23:48 -04:00
nfs42xdr.c NFSv4.2: Add client support for the generic 'layouterror' RPC call 2019-03-01 16:20:16 -05:00
nfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nfsroot.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nfstrace.c NFS: Add trace events to report non-zero NFS status codes 2019-02-13 12:03:21 -05:00
nfstrace.h NFS: Record task, client ID, and XID in xdr_status trace points 2019-07-09 10:30:25 -04:00
pagelist.c NFS: Clean up writeback code 2019-07-06 14:54:52 -04:00
pnfs_dev.c NFS/flexfiles: Speed up read failover when DSes are down 2019-03-01 22:37:38 -05:00
pnfs_nfs.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
pnfs.c NFS: Clean up writeback code 2019-07-06 14:54:52 -04:00
pnfs.h NFS: Add a helper to return a pointer to the open context of a struct nfs_page 2019-04-25 14:18:15 -04:00
proc.c NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'. 2018-12-19 13:52:46 -05:00
read.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
super.c NFSv4: Add lease_time and lease_expired to 'nfs4:' line of mountstats 2019-07-06 14:54:52 -04:00
symlink.c nfs: pass the correct prototype to read_cache_page 2019-05-09 16:26:57 -04:00
sysctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysfs.c NFS: Add sysfs support for per-container identifier 2019-07-06 14:54:49 -04:00
sysfs.h NFS: Add sysfs support for per-container identifier 2019-07-06 14:54:49 -04:00
unlink.c NFS: Fix up documentation warnings 2019-02-20 15:14:21 -05:00
write.c NFS: Clean up writeback code 2019-07-06 14:54:52 -04:00