linux/fs
Ross Zwisler 642261ac99 dax: add struct iomap based DAX PMD support
DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
locking.  This patch allows DAX PMDs to participate in the DAX radix tree
based locking scheme so that they can be re-enabled using the new struct
iomap based fault handlers.

There are currently three types of DAX 4k entries: 4k zero pages, 4k DAX
mappings that have an associated block allocation, and 4k DAX empty
entries.  The empty entries exist to provide locking for the duration of a
given page fault.

This patch adds three equivalent 2MiB DAX entries: Huge Zero Page (HZP)
entries, PMD DAX entries that have associated block allocations, and 2 MiB
DAX empty entries.

Unlike the 4k case where we insert a struct page* into the radix tree for
4k zero pages, for HZP we insert a DAX exceptional entry with the new
RADIX_DAX_HZP flag set.  This is because we use a single 2 MiB zero page in
every 2MiB hole mapping, and it doesn't make sense to have that same struct
page* with multiple entries in multiple trees.  This would cause contention
on the single page lock for the one Huge Zero Page, and it would break the
page->index and page->mapping associations that are assumed to be valid in
many other places in the kernel.

One difficult use case is when one thread is trying to use 4k entries in
radix tree for a given offset, and another thread is using 2 MiB entries
for that same offset.  The current code handles this by making the 2 MiB
user fall back to 4k entries for most cases.  This was done because it is
the simplest solution, and because the use of 2MiB pages is already
opportunistic.

If we were to try to upgrade from 4k pages to 2MiB pages for a given range,
we run into the problem of how we lock out 4k page faults for the entire
2MiB range while we clean out the radix tree so we can insert the 2MiB
entry.  We can solve this problem if we need to, but I think that the cases
where both 2MiB entries and 4K entries are being used for the same range
will be rare enough and the gain small enough that it probably won't be
worth the complexity.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:34:45 +11:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
adfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
afs fs: use mapping_set_error instead of opencoded set_bit 2016-10-11 15:06:33 -07:00
autofs4 autofs: refactor ioctl fn vector in iookup_dev_ioctl() 2016-10-11 15:06:31 -07:00
befs befs fixes for 4.9-rc1 2016-10-15 12:09:13 -07:00
bfs Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
btrfs Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-10-14 17:44:56 -07:00
cachefiles Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
cifs CIFS: Retrieve uid and gid from special sid if enabled 2016-10-14 14:22:16 -05:00
coda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
configfs Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
cramfs
crypto Lots of bug fixes and cleanups. 2016-10-07 15:15:33 -07:00
debugfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
devpts Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
dlm dlm: free workqueues after the connections 2016-10-10 09:54:00 -05:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
efivarfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
efs
exofs fs: use mapping_set_error instead of opencoded set_bit 2016-10-11 15:06:33 -07:00
exportfs exportfs: be careful to only return expected errors. 2016-10-06 09:07:44 -04:00
ext2 dax: correct dax iomap code namespace 2016-11-08 11:32:46 +11:00
ext4 ext4: tell DAX the size of allocation holes 2016-11-08 11:30:58 +11:00
f2fs fs: use mapping_set_error instead of opencoded set_bit 2016-10-11 15:06:33 -07:00
fat Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
freevxfs
fscache Merge branch 'd_real' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into work.misc 2016-06-30 23:34:49 -04:00
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hostfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hugetlbfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
isofs get rid of 'parent' argument of ->d_compare() 2016-07-31 16:37:25 -04:00
jbd2 fs: use mapping_set_error instead of opencoded set_bit 2016-10-11 15:06:33 -07:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
kernfs Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-10-14 12:18:50 -07:00
lockd treewide: remove redundant #include <linux/kconfig.h> 2016-10-11 15:06:33 -07:00
logfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
nfs NFS client updates for Linux 4.9 2016-10-13 21:28:20 -07:00
nfs_common
nfsd Some RDMA work and some good bugfixes, and two new features that could 2016-10-13 21:04:42 -07:00
nilfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
nls
notify fsnotify: clean up spinlock assertions 2016-10-07 18:46:26 -07:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ocfs2 ocfs2: fix memory leak in dlm_migrate_request_handler() 2016-10-11 15:06:30 -07:00
omfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
openpromfs fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
orangefs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
overlayfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-14 18:19:05 -07:00
proc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
pstore Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
qnx4
qnx6
quota quota: fill in Q_XGETQSTAT inode information for inactive quotas 2016-08-15 17:43:31 +02:00
ramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
reiserfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
romfs
squashfs vfs: Remove {get,set,remove}xattr inode operations 2016-10-07 21:48:36 -04:00
sysfs Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-10-14 12:18:50 -07:00
sysv Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
tracefs fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
ubifs This pull request contains: 2016-10-11 10:49:44 -07:00
udf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ufs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
xfs dax: correct dax iomap code namespace 2016-11-08 11:32:46 +11:00
aio.c fs/aio.c: eliminate redundant loads in put_aio_ring_file 2016-09-27 21:45:46 -04:00
anon_inodes.c
attr.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
bad_inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
binfmt_aout.c
binfmt_elf_fdpic.c elf_fdpic_transfer_args_to_stack(): make it generic 2016-07-25 16:51:49 +10:00
binfmt_elf.c x86/coredump: Use pr_reg size, rather that TIF_IA32 flag 2016-09-14 21:28:10 +02:00
binfmt_em86.c fs/binfmt_em86.c: fix incompatible pointer type 2016-08-02 19:35:15 -04:00
binfmt_flat.c binfmt_flat: allow compressed flat binary format to work on MMU systems 2016-07-28 13:29:12 +10:00
binfmt_misc.c fs: Replace current_fs_time() with current_time() 2016-09-27 21:06:22 -04:00
binfmt_script.c
block_dev.c block: implement (some of) fallocate for block devices 2016-10-11 15:06:30 -07:00
buffer.c fs: use mapping_set_error instead of opencoded set_bit 2016-10-11 15:06:33 -07:00
char_dev.c dax: define a unified inode/address_space for device-dax mappings 2016-08-23 22:58:51 -07:00
compat_binfmt_elf.c
compat_ioctl.c fs: compat_ioctl: add pretimeout functions for watchdogs 2016-09-24 09:27:18 +02:00
compat.c compat: remove compat_printk() 2016-09-27 21:20:53 -04:00
coredump.c
dax.c dax: add struct iomap based DAX PMD support 2016-11-08 11:34:45 +11:00
dcache.c Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
dcookies.c
direct-io.c consistent treatment of EFAULT on O_DIRECT read/write 2016-10-03 20:38:55 -04:00
drop_caches.c
eventfd.c
eventpoll.c
exec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2016-08-04 18:04:44 -04:00
fcntl.c
fhandle.c
file_table.c
file.c fs/file: more unsigned file descriptors 2016-09-27 18:47:38 -04:00
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c mm, writeback: flush plugged IO in wakeup_flusher_threads() 2016-08-09 19:58:06 -06:00
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
internal.h Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 13:04:49 -07:00
ioctl.c vfs: cap dedupe request structure size at PAGE_SIZE 2016-09-15 13:29:52 -07:00
iomap.c Merge branch 'iomap-4.9-dax' into for-next 2016-10-03 09:53:59 +11:00
Kconfig mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGE 2016-10-07 18:46:29 -07:00
Kconfig.binfmt ARM: 8594/1: enable binfmt_flat on systems with an MMU 2016-08-12 16:47:05 +01:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
locks.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
Makefile fs: introduce iomap infrastructure 2016-06-21 09:23:11 +10:00
mbcache.c mbcache: fix to detect failure of register_shrinker 2016-08-31 11:44:36 -04:00
mount.h mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
mpage.c block/mm: make bdev_ops->rw_page() take a bool for read/write 2016-08-07 14:41:02 -06:00
namei.c Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-10-14 17:23:33 -07:00
namespace.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
no-block.c
nsfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
open.c xfs: reflink update for 4.9-rc1 2016-10-13 20:28:22 -07:00
pipe.c pipe: cap initial pipe capacity according to pipe-max-size limit 2016-10-11 15:06:32 -07:00
pnode.c mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
pnode.h mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
posix_acl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
proc_namespace.c
read_write.c iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector() 2016-10-14 20:00:34 -04:00
readdir.c
select.c fs/select: add vmalloc fallback for select(2) 2016-10-11 15:06:30 -07:00
seq_file.c seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char 2016-10-07 18:46:30 -07:00
signalfd.c
splice.c fix ITER_PIPE interaction with direct_IO 2016-10-10 13:36:06 -04:00
stack.c
stat.c
statfs.c
super.c fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths 2016-10-14 20:41:59 -04:00
sync.c
timerfd.c
userfaultfd.c mm: introduce fault_env 2016-07-26 16:19:19 -07:00
utimes.c Merge remote-tracking branch 'jk/vfs' into work.misc 2016-10-08 11:06:08 -04:00
xattr.c vfs: Remove {get,set,remove}xattr inode operations 2016-10-07 21:48:36 -04:00