linux/include
Shiyang Ruan fa422b353d mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind
Now, if we suddenly remove a PMEM device(by calling unbind) which
contains FSDAX while programs are still accessing data in this device,
e.g.:
```
 $FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 &
 # $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 &
 echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
```
it could come into an unacceptable state:
  1. device has gone but mount point still exists, and umount will fail
       with "target is busy"
  2. programs will hang and cannot be killed
  3. may crash with NULL pointer dereference

To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
are going to remove the whole device, and make sure all related processes
could be notified so that they could end up gracefully.

This patch is inspired by Dan's "mm, dax, pmem: Introduce
dev_pagemap_failure()"[1].  With the help of dax_holder and
->notify_failure() mechanism, the pmem driver is able to ask filesystem
on it to unmap all files in use, and notify processes who are using
those files.

Call trace:
trigger unbind
 -> unbind_store()
  -> ... (skip)
   -> devres_release_all()
    -> kill_dax()
     -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
      -> xfs_dax_notify_failure()
      `-> freeze_super()             // freeze (kernel call)
      `-> do xfs rmap
      ` -> mf_dax_kill_procs()
      `  -> collect_procs_fsdax()    // all associated processes
      `  -> unmap_and_kill()
      ` -> invalidate_inode_pages2_range() // drop file's cache
      `-> thaw_super()               // thaw (both kernel & user call)

Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
new dax mapping from being created.  Do not shutdown filesystem directly
if configuration is not supported, or if failure range includes metadata
area.  Make sure all files and processes(not only the current progress)
are handled correctly.  Also drop the cache of associated files before
pmem is removed.

[1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
[2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-07 14:34:26 +05:30
..
acpi ACPI: PM: Add acpi_device_fix_up_power_children() function 2023-11-20 17:31:49 +01:00
asm-generic asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation 2023-11-22 09:32:49 -08:00
clocksource
crypto crypto: FIPS 202 SHA-3 register in hash info for IMA 2023-10-27 18:04:30 +08:00
drm amd-drm-fixes-6.7-2023-11-30: 2023-12-01 13:57:11 +10:00
dt-bindings linux-watchdog 6.7-rc1 tag 2023-11-09 13:54:25 -08:00
keys
kunit
kvm KVM/arm64 updates for 6.7 2023-10-31 16:37:07 -04:00
linux mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind 2023-12-07 14:34:26 +05:30
math-emu
media
memory
misc
net wireless fixes: 2023-11-29 19:43:34 -08:00
pcmcia
ras
rdma
rv
scsi scsi: sd: Fix system start for ATA devices 2023-11-24 20:44:21 -05:00
soc IOMMU Updates for Linux v6.7 2023-11-09 13:37:28 -08:00
sound ALSA: cs35l41: Fix for old systems which do not support command 2023-11-20 12:37:01 +01:00
target
trace rxrpc: Fix RTT determination to use any ACK as a source 2023-11-17 02:50:33 +00:00
uapi hardening fixes for v6.7-rc4 2023-12-01 14:17:54 +09:00
ufs
vdso
video fbdev: stifb: Make the STI next font pointer a 32-bit signed offset 2023-10-30 14:54:41 +01:00
xen xen/events: reduce externally visible helper functions 2023-11-14 09:29:28 +01:00