linux/fs/netfs
Max Kellermann f71aa06398
fs/netfs/fscache_cookie: add missing "n_accesses" check
This fixes a NULL pointer dereference bug due to a data race which
looks like this:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43
  Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
  Workqueue: events_unbound netfs_rreq_write_to_cache_work
  RIP: 0010:cachefiles_prepare_write+0x30/0xa0
  Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10
  RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286
  RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000
  RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438
  RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001
  R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68
  R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00
  FS:  0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0
  Call Trace:
   <TASK>
   ? __die+0x1f/0x70
   ? page_fault_oops+0x15d/0x440
   ? search_module_extables+0xe/0x40
   ? fixup_exception+0x22/0x2f0
   ? exc_page_fault+0x5f/0x100
   ? asm_exc_page_fault+0x22/0x30
   ? cachefiles_prepare_write+0x30/0xa0
   netfs_rreq_write_to_cache_work+0x135/0x2e0
   process_one_work+0x137/0x2c0
   worker_thread+0x2e9/0x400
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xcc/0x100
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x30/0x50
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30
   </TASK>
  Modules linked in:
  CR2: 0000000000000008
  ---[ end trace 0000000000000000 ]---

This happened because fscache_cookie_state_machine() was slow and was
still running while another process invoked fscache_unuse_cookie();
this led to a fscache_cookie_lru_do_one() call, setting the
FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
fscache_cookie_state_machine(), withdrawing the cookie via
cachefiles_withdraw_cookie(), clearing cookie->cache_priv.

At the same time, yet another process invoked
cachefiles_prepare_write(), which found a NULL pointer in this code
line:

  struct cachefiles_object *object = cachefiles_cres_object(cres);

The next line crashes, obviously:

  struct cachefiles_cache *cache = object->volume->cache;

During cachefiles_prepare_write(), the "n_accesses" counter is
non-zero (via fscache_begin_operation()).  The cookie must not be
withdrawn until it drops to zero.

The counter is checked by fscache_cookie_state_machine() before
switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
FSCACHE_COOKIE_STATE_FAILED"), but not for
FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case
FSCACHE_COOKIE_STATE_ACTIVE").

This patch adds the missing check.  With a non-zero access counter,
the function returns and the next fscache_end_cookie_access() call
will queue another fscache_cookie_state_machine() call to handle the
still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.

Fixes: 12bb21a29c ("fscache: Implement cookie user counting and resource pinning")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12 22:03:26 +02:00
..
buffered_read.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
buffered_write.c netfs: Fault in smaller chunks for non-large folio mappings 2024-08-12 22:03:25 +02:00
direct_read.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
direct_write.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
fscache_cache.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
fscache_cookie.c fs/netfs/fscache_cookie: add missing "n_accesses" check 2024-08-12 22:03:26 +02:00
fscache_internal.h netfs, fscache: Combine fscache with netfs 2023-12-24 15:08:46 +00:00
fscache_io.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
fscache_main.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
fscache_proc.c netfs: Fix proc/fs/fscache symlink to point to "netfs" not "../netfs" 2024-01-04 13:15:32 +00:00
fscache_stats.c netfs: Fix interaction between write-streaming and cachefiles culling 2024-01-05 15:42:25 +00:00
fscache_volume.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
internal.h netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
io.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
iterator.c netfs: Add func to calculate pagecount/size-limited span of an iterator 2023-12-28 09:45:18 +00:00
Kconfig netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG 2024-07-24 10:15:38 +02:00
locking.c netfs: Implement unbuffered/DIO vs buffered I/O locking 2023-12-24 15:08:52 +00:00
main.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
Makefile netfs: Cut over to using new writeback code 2024-05-01 18:07:37 +01:00
misc.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
objects.c netfs, 9p: Fix race between umount and async request completion 2024-05-27 13:12:13 +02:00
stats.c netfs: Add some write-side stats and clean up some stat names 2024-05-01 18:07:36 +01:00
write_collect.c netfs: Revert "netfs: Switch debug logging to pr_debug()" 2024-07-24 10:15:37 +02:00
write_issue.c netfs: Fix writeback that needs to go to both server and cache 2024-07-24 10:53:13 +02:00