Combine inode creation with opening a file.
There are six separate objects that are being set up: the backing inode,
dentry and file, and the overlay inode, dentry and file. Cleanup in case
of an error is a bit of a challenge and is difficult to test, so careful
review is needed.
All tmpfile testcases except generic/509 now run/pass, and no regressions
are observed with full xfstests.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Assert that the file object is allocated in a backing_file container
so that file_user_path() could be used to display the user path and
not the backing file's path in /proc/<pid>/maps.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
There is not much in those helpers, but it makes sense to have them
logically next to the backing_file_{read,write}_iter() helpers as they
may grow more common logic in the future.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
In preparation for factoring out some backing file io helpers from
overlayfs, move backing_file_open() into a new file fs/backing-file.c
and header.
Add a MAINTAINERS entry for stackable filesystems and add a Kconfig
FS_STACK which stackable filesystems need to select.
For now, the backing_file struct, the backing_file alloc/free functions
and the backing_file_real_path() accessor remain internal to file_table.c.
We may change that in the future.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
In vfs code, sb_start_write() is usually called after the permission hook
in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule,
where kiocb_start_write() is called by its callers.
Move kiocb_start_write() from the callers into vfs_iocb_iter_write()
after the rw_verify_area() checks, to make them "start-write-safe".
The semantics of vfs_iocb_iter_write() is changed, so that the caller is
responsible for calling kiocb_end_write() on completion only if async
iocb was queued. The completion handlers of both callers were adapted
to this semantic change.
This is needed for fanotify "pre content" events.
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231122122715.2561213-14-amir73il@gmail.com
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
All the callers of vfs_iter_write() call file_start_write() just before
calling vfs_iter_write() except for target_core_file's fd_do_rw().
Move file_start_write() from the callers into vfs_iter_write().
fd_do_rw() calls vfs_iter_write() with a non-regular file, so
file_start_write() is a no-op.
This is needed for fanotify "pre content" events.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231122122715.2561213-11-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
We want to protect concurrent updates of ovl inode size and mtime
(i.e. ovl_copyattr()) from aio completion context.
Punt write aio completion to a workqueue so that we can protect
ovl_copyattr() with a spinlock.
Export sb_init_dio_done_wq(), so that overlayfs can use its own
dio workqueue to punt aio completions.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/8620dfd3-372d-4ae0-aa3f-2fe97dda1bca@kernel.dk/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
If ovl file is opened O_APPEND, the underlying realfile is also
opened O_APPEND, so it makes sense to propagate the IOCB_APPEND flags
on sync writes to realfile, just as we do with aio writes.
Effectively, because sync ovl writes are protected by inode lock,
this change only makes a difference if the realfile is written to (size
extending writes) from underneath overlayfs. The behavior in this case
is undefined, so it is ok if we change the behavior (to fail the ovl
IOCB_APPEND write).
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Overlayfs implements its own function to translate iocb flags into rw
flags, so that they can be passed into another vfs call.
With commit ce71bfea20 ("fs: align IOCB_* flags with RWF_* flags")
Jens created a 1:1 matching between the iocb flags and rw flags,
simplifying the conversion.
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Commit 724768a393 ("ovl: fix incorrect fdput() on aio completion")
took a refcount on real file before submitting aio, but forgot to
avoid clearing FDPUT_FPUT from real.flags stack variable.
This can result in a file reference leak.
Fixes: 724768a393 ("ovl: fix incorrect fdput() on aio completion")
Reported-by: Gil Lev <contact@levgil.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZRKHuAAKCRCRxhvAZXjc
ohOLAQDU9Fxq5UdqCdmsyi/b24XJFZlQhcVIZy2Hrhcor9TiVQEAjuECGlxFPSgj
atVOWLdugDJquiHextqTEMgIecJpNw4=
=uINF
-----END PGP SIGNATURE-----
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains the usual miscellaneous fixes and cleanups for vfs and
individual fses:
Fixes:
- Revert ki_pos on error from buffered writes for direct io fallback
- Add missing documentation for block device and superblock handling
for changes merged this cycle
- Fix reiserfs flexible array usage
- Ensure that overlayfs sets ctime when setting mtime and atime
- Disable deferred caller completions with overlayfs writes until
proper support exists
Cleanups:
- Remove duplicate initialization in pipe code
- Annotate aio kioctx_table with __counted_by"
* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
overlayfs: set ctime when setting mtime and atime
ntfs3: put resources during ntfs_fill_super()
ovl: disable IOCB_DIO_CALLER_COMP
porting: document superblock as block device holder
porting: document new block device opening order
fs/pipe: remove duplicate "offset" initializer
fs-writeback: do not requeue a clean inode having skipped pages
aio: Annotate struct kioctx_table with __counted_by
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
reiserfs: Replace 1-element array with C99 style flex-array
overlayfs copies the kiocb flags when it sets up a new kiocb to handle
a write, but it doesn't properly support dealing with the deferred
caller completions of the kiocb. This means it doesn't get the final
write completion value, and hence will complete the write with '0' as
the result.
We could support the caller completions in overlayfs, but for now let's
just disable them in the generated write kiocb.
Reported-by: Zorro Lang <zlang@redhat.com>
Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
Fixes: 8c052fb300 ("iomap: support IOCB_DIO_CALLER_COMP")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <71897125-e570-46ce-946a-d4729725e28f@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
ovl_{read,write}_iter() always call fdput(real) to put one or zero
refcounts of the real file, but for aio, whether it was submitted or not,
ovl_aio_put() also calls fdput(), which is not balanced. This is only a
problem in the less common case when FDPUT_FPUT flag is set.
To fix the problem use get_file() to take file refcount and use fput()
instead of fdput() in ovl_aio_put().
Fixes: 2406a307ac ("ovl: implement async IO routines")
Cc: <stable@vger.kernel.org> # v5.6
Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmTu0QoACgkQEVvVyTe/
1WpbzBAAjIZXzhn8KldDpG0muw9JKaSOxM45uhZE1s/2uKsVCyp4k3lubTbxxYO1
S9rUjhF2gSJFOfuSOK/XXEKXyu4MGT7iy7pKswu0k8+AHDDRBksPXJKA/AkhLPUr
vX1pU6aWw2OSn1xdhIgY+F4DveyzYQL/CEoUzFyRPxSB0G/yjktRAjdZ2HL4cAvN
eVXPyTj0bd4LVj1ITla4uj8DbgivrqmRJbZ9bKnSRE8GXWBriJhV//M2Q3QRno+W
04TtAvyh+klQeqZFVOQ0reZUFZzYBBZZTmqoFiUzTny7oljWl5F0+JfJOHhRGknG
LYZCia34+T6TZPhOnZzT/szTDoXVvNJhEf+vBQCqhaCugqJc/2uJdw9CW8ZcDvA9
ZNOMxEbXE4VgGjJ0HM6MoDMUoIEUiNWEnXWEaKyCAfOPqgYwPy+QeDO4JtBPQpRn
fwZx7Xpc1FLpTc9feHxzox9o81S8rPRMycUBg2c3KZB6TFnYNDxWIIo365naMCzz
A8IDVGf+gd+S4NaZvh9FUijciIslYfyFgqwQERZmJnpDk1d1NyeUC7Nn7EkmUpyp
guRaC+rUcqYP4CpuSHTCPle94qHqiAkbsKSJWebZ2M1j9fjZ+okPw0k83Nih79vu
vRhs70Ah51v1lpBb0mlDjsV3vKm3Apv8nMJKZvVuC+Cw6Qiob5s=
=F4Hi
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
- add verification feature needed by composefs (Alexander Larsson)
- improve integration of overlayfs and fanotify (Amir Goldstein)
- fortify some overlayfs code (Andrea Righi)
* tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: validate superblock in OVL_FS()
ovl: make consistent use of OVL_FS()
ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG
ovl: auto generate uuid for new overlay filesystems
ovl: store persistent uuid/fsid with uuid=on
ovl: add support for unique fsid per instance
ovl: support encoding non-decodable file handles
ovl: Handle verity during copy-up
ovl: Validate verity xattr when resolving lowerdata
ovl: Add versioned header for overlay.metacopy xattr
ovl: Add framework for verity support
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTxQAKCRCRxhvAZXjc
okaVAP94WAlItvDRt/z2Wtzf0+RqPZeTXEdGTxua8+RxqCyYIQD+OO5nRfKQPHlV
AqqGJMKItQMSMIYgB5ftqVhNWZfnHgM=
=pSEW
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual filesystems.
Features:
- Block mode changes on symlinks and rectify our broken semantics
- Report file modifications via fsnotify() for splice
- Allow specifying an explicit timeout for the "rootwait" kernel
command line option. This allows to timeout and reboot instead of
always waiting indefinitely for the root device to show up
- Use synchronous fput for the close system call
Cleanups:
- Get rid of open-coded lockdep workarounds for async io submitters
and replace it all with a single consolidated helper
- Simplify epoll allocation helper
- Convert simple_write_begin and simple_write_end to use a folio
- Convert page_cache_pipe_buf_confirm() to use a folio
- Simplify __range_close to avoid pointless locking
- Disable per-cpu buffer head cache for isolated cpus
- Port ecryptfs to kmap_local_page() api
- Remove redundant initialization of pointer buf in pipe code
- Unexport the d_genocide() function which is only used within core
vfs
- Replace printk(KERN_ERR) and WARN_ON() with WARN()
Fixes:
- Fix various kernel-doc issues
- Fix refcount underflow for eventfds when used as EFD_SEMAPHORE
- Fix a mainly theoretical issue in devpts
- Check the return value of __getblk() in reiserfs
- Fix a racy assert in i_readcount_dec
- Fix integer conversion issues in various functions
- Fix LSM security context handling during automounts that prevented
NFS superblock sharing"
* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
cachefiles: use kiocb_{start,end}_write() helpers
ovl: use kiocb_{start,end}_write() helpers
aio: use kiocb_{start,end}_write() helpers
io_uring: use kiocb_{start,end}_write() helpers
fs: create kiocb_{start,end}_write() helpers
fs: add kerneldoc to file_{start,end}_write() helpers
io_uring: rename kiocb_end_write() local helper
splice: Convert page_cache_pipe_buf_confirm() to use a folio
libfs: Convert simple_write_begin and simple_write_end to use a folio
fs/dcache: Replace printk and WARN_ON by WARN
fs/pipe: remove redundant initialization of pointer buf
fs: Fix kernel-doc warnings
devpts: Fix kernel-doc warnings
doc: idmappings: fix an error and rephrase a paragraph
init: Add support for rootwait timeout parameter
vfs: fix up the assert in i_readcount_dec
fs: Fix one kernel-doc comment
docs: filesystems: idmappings: clarify from where idmappings are taken
fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
...
Use helpers instead of the open coded dance to silence lockdep warnings.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <20230817141337.1025891-7-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The new digest field in the metacopy xattr is used during lookup to
record whether the header contained a digest in the OVL_HAS_DIGEST
flags.
When accessing file data the first time, if OVL_HAS_DIGEST is set, we
reload the metadata and check that the source lowerdata inode matches
the specified digest in it (according to the enabled verity
options). If the verity check passes we store this info in the inode
flags as OVL_VERIFIED_DIGEST, so that we can avoid doing it again if
the inode remains in memory.
The verification is done in ovl_maybe_validate_verity() which needs to
be called in the same places as ovl_maybe_lookup_lowerdata(), so there
is a new ovl_verify_lowerdata() helper that calls these in the right
order, and all current callers of ovl_maybe_lookup_lowerdata() are
changed to call it instead.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-64-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmSTFh8ACgkQEVvVyTe/
1Wpy5g/+NTL00ptBHHnEZJG7rRu+PzvjB+gzI8YpMT4kiGkJBNKZIxvgURvisDJ/
coOXySq0yk0y901r2Rb+hP8YSYcSo8nrLV+jJwlZcmFsudtbSbOdUUJiAUjqGZk2
YxiCl9NrAZJMVQj6AfL1LE0L00NmIxXruGiCXWEnbHjYvmy/LIsLOg9b+vlkPPZM
xqxQq//hVZel5Lh6rmz9svJIqFF/e3zobX7NBV0MiFGHX926u2TV0ImAnjmhuxdV
mZ8zKCQ4kGTU++PjOCyKPS+xF9eg6mlCVW9uq/QIN83muhwaUlOimJEBunbNxBQz
if/OWxSMZv5UFXqYbF6YsHQrWupCV2uEcs8ypcGNsPX3AeX9i8JTGdVWMPax+4HO
IMZ7Ltxf8Udihsn4s9n0ZMgSDbMTcPRFMNrHXtWU6FuEEjbP5SuD1KREJrh58myL
I99/0tOICAx3ZRueZNmqz4BJ9zev+86PtPNDqHbZy2qrwBn7C6ly3AFWCCv32mza
ehINgUv/TtzUBKqkbhznrjinrv5upV3CV9xg4L8ohhuVeoa8h3kGf8H5XSfnhY56
x1DNWAZK+yZpLf/I39hrWwSME0T1D3GEuI2NlPmAKpqZSgwYdhSAcfebfQhmwD3Q
B3qY3Dh74dsN9zW9oygcXXtHHizzsomOc/qXkXvYP4MXkQi0+h0=
=KPxq
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs update from Amir Goldstein:
- fix two NULL pointer deref bugs (Zhihao Cheng)
- add support for "data-only" lower layers destined to be used by
composefs
- port overlayfs to the new mount api (Christian Brauner)
* tag 'ovl-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: (26 commits)
ovl: add Amir as co-maintainer
ovl: reserve ability to reconfigure mount options with new mount api
ovl: modify layer parameter parsing
ovl: port to new mount api
ovl: factor out ovl_parse_options() helper
ovl: store enum redirect_mode in config instead of a string
ovl: pass ovl_fs to xino helpers
ovl: clarify ovl_get_root() semantics
ovl: negate the ofs->share_whiteout boolean
ovl: check type and offset of struct vfsmount in ovl_entry
ovl: implement lazy lookup of lowerdata in data-only layers
ovl: prepare for lazy lookup of lowerdata inode
ovl: prepare to store lowerdata redirect for lazy lowerdata lookup
ovl: implement lookup in data-only layers
ovl: introduce data-only lower layers
ovl: remove unneeded goto instructions
ovl: deduplicate lowerdata and lowerstack[]
ovl: deduplicate lowerpath and lowerstack[]
ovl: move ovl_entry into ovl_inode
ovl: factor out ovl_free_entry() and ovl_stack_*() helpers
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8QgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpupIEADKEZvpxDyaxHjYZFFeoSJRkh+AEJHe0Xtr
J5vUL8t8zmAV3F7i8XaoAEcR0dC0VQcoTc8fAOty71+5hsc7gvtyyNjqU/YWRVqK
Xr+VJuSJ+OGx3MzpRWEkepagfPyqP5cyyCOK6gqIgqzc3IwqkR/3QHVRc6oR8YbY
AQd7tqm2fQXK9WDHEy5hcaQeqb9uKZjQQoZejpPPerpJM+9RMgKxpCGtnLLIUhr/
sgl7KyLIQPBmveO2vfOR+dmsJBqsLqneqkXDKMAIfpeVEEkHHAlCH4E5Ne1XUS+s
ie4If+reuyn1Ktt5Ry1t7w2wr8cX1fcay3K28tgwjE2Bvremc5YnYgb3pyUDW38f
tXXkpg/eTXd/Pn0Crpagoa9zJ927tt5JXIO1/PagPEP1XOqUuthshDFsrVqfqbs+
36gqX2JWB4NJTg9B9KBHA3+iVCJyZLjUqOqws7hOJOvhQytZVm/IwkGBg1Slhe1a
J5WemBlqX8lTgXz0nM7cOhPYTZeKe6hazCcb5VwxTUTj9SGyYtsMfqqTwRJO9kiF
j1VzbOAgExDYe+GvfqOFPh9VqZho66+DyOD/Xtca4eH7oYyHSmP66o8nhRyPBPZA
maBxQhUkPQn4/V/0fL2TwIdWYKsbj8bUyINKPZ2L35YfeICiaYIctTwNJxtRmItB
M3VxWD3GZQ==
=KhW4
-----END PGP SIGNATURE-----
Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux
Pull splice updates from Jens Axboe:
"This kills off ITER_PIPE to avoid a race between truncate,
iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio
with unpinned/unref'ed pages from an O_DIRECT splice read. This causes
memory corruption.
Instead, we either use (a) filemap_splice_read(), which invokes the
buffered file reading code and splices from the pagecache into the
pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads
into it and then pushes the filled pages into the pipe; or (c) handle
it in filesystem-specific code.
Summary:
- Rename direct_splice_read() to copy_splice_read()
- Simplify the calculations for the number of pages to be reclaimed
in copy_splice_read()
- Turn do_splice_to() into a helper, vfs_splice_read(), so that it
can be used by overlayfs and coda to perform the checks on the
lower fs
- Make vfs_splice_read() jump to copy_splice_read() to handle
direct-I/O and DAX
- Provide shmem with its own splice_read to handle non-existent pages
in the pagecache. We don't want a ->read_folio() as we don't want
to populate holes, but filemap_get_pages() requires it
- Provide overlayfs with its own splice_read to call down to a lower
layer as overlayfs doesn't provide ->read_folio()
- Provide coda with its own splice_read to call down to a lower layer
as coda doesn't provide ->read_folio()
- Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs
and random files as they just copy to the output buffer and don't
splice pages
- Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3,
ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation
- Make cifs use filemap_splice_read()
- Replace pointers to generic_file_splice_read() with pointers to
filemap_splice_read() as DIO and DAX are handled in the caller;
filesystems can still provide their own alternate ->splice_read()
op
- Remove generic_file_splice_read()
- Remove ITER_PIPE and its paraphernalia as generic_file_splice_read
was the only user"
* tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits)
splice: kdoc for filemap_splice_read() and copy_splice_read()
iov_iter: Kill ITER_PIPE
splice: Remove generic_file_splice_read()
splice: Use filemap_splice_read() instead of generic_file_splice_read()
cifs: Use filemap_splice_read()
trace: Convert trace/seq to use copy_splice_read()
zonefs: Provide a splice-read wrapper
xfs: Provide a splice-read wrapper
orangefs: Provide a splice-read wrapper
ocfs2: Provide a splice-read wrapper
ntfs3: Provide a splice-read wrapper
nfs: Provide a splice-read wrapper
f2fs: Provide a splice-read wrapper
ext4: Provide a splice-read wrapper
ecryptfs: Provide a splice-read wrapper
ceph: Provide a splice-read wrapper
afs: Provide a splice-read wrapper
9p: Add splice_read wrapper
net: Make sock_splice_read() use copy_splice_read() by default
tty, proc, kernfs, random: Use copy_splice_read()
...
Overlayfs creates the real underlying files with fake f_path, whose
f_inode is on the underlying fs and f_path on overlayfs.
Those real files were open with FMODE_NONOTIFY, because fsnotify code was
not prapared to handle fsnotify hooks on files with fake path correctly
and fanotify would report unexpected event->fd with fake overlayfs path,
when the underlying fs was being watched.
Teach fsnotify to handle events on the real files, and do not set real
files to FMODE_NONOTIFY to allow operations on real file (e.g. open,
access, modify, close) to generate async and permission events.
Because fsnotify does not have notifications on address space
operations, we do not need to worry about ->vm_file not reporting
events to a watched overlayfs when users are accessing a mapped
overlayfs file.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230615112229.2143178-6-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Overlayfs uses open_with_fake_path() to allocate internal kernel files,
with a "fake" path - whose f_path is not on the same fs as f_inode.
Allocate a container struct backing_file for those internal files, that
is used to hold the "fake" ovl path along with the real path.
backing_file_real_path() can be used to access the stored real path.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230615112229.2143178-5-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Defer lookup of lowerdata in the data-only layers to first data access
or before copy up.
We perform lowerdata lookup before copy up even if copy up is metadata
only copy up. We can further optimize this lookup later if needed.
We do best effort lazy lookup of lowerdata for d_real_inode(), because
this interface does not expect errors. The only current in-tree caller
of d_real_inode() is trace_uprobe and this caller is likely going to be
followed reading from the file, before placing uprobes on offset within
the file, so lowerdata should be available when setting the uprobe.
Tested-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Make the code handle the case of numlower > 1 and missing lowerdata
dentry gracefully.
Missing lowerdata dentry is an indication for lazy lookup of lowerdata
and in that case the lowerdata_redirect path is stored in ovl_inode.
Following commits will defer lookup and perform the lazy lookup on
access.
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Implement splice-read for overlayfs by passing the request down a layer
rather than going through generic_file_splice_read() which is going to be
changed to assume that ->read_folio() is present on buffered files.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christian Brauner <brauner@kernel.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Amir Goldstein <amir73il@gmail.com>
cc: linux-unionfs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-11-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCY5b3+gAKCRDh3BK/laaZ
PIPxAQCPgyV/X/yJFd3wVgKa3/JxcHl5qdPbwHXFuYiJCBd69QEA9LYQEeEoTLCY
veGiQPkl6Sp8ZqmTbDBxqw5OaBTSMwM=
=7TiE
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix a couple of bugs found by syzbot
- Don't ingore some open flags set by fcntl(F_SETFL)
- Fix failure to create a hard link in certain cases
- Use type safe helpers for some mnt_userns transformations
- Improve performance of mount
- Misc cleanups
* tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: Kconfig: Fix spelling mistake "undelying" -> "underlying"
ovl: use inode instead of dentry where possible
ovl: Add comment on upperredirect reassignment
ovl: use plain list filler in indexdir and workdir cleanup
ovl: do not reconnect upper index records in ovl_indexdir_cleanup()
ovl: fix comment typos
ovl: port to vfs{g,u}id_t and associated helpers
ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
ovl: Use "buf" flexible array for memcpy() destination
ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags
ovl: fix use inode directly in rcu-walk mode
ovl_change_flags() is an open-coded variant of fs/fcntl.c:setfl() and it
got missed by commit 164f4064ca ("keep iocb_flags() result cached in
struct file"); the same change applies there.
Reported-by: Pierre Labastie <pierre.labastie@neuf.fr>
Fixes: 164f4064ca ("keep iocb_flags() result cached in struct file")
Cc: <stable@vger.kernel.org> # v6.0
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216738
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Underlying fs doesn't remove privs because fallocate is called with
privileged mounter credentials.
This fixes some failure in fstests generic/683..687.
Fixes: aab8848cee ("ovl: add ovl_fallocate()")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Underlying fs doesn't remove privs because copy_range/remap_range are
called with privileged mounter credentials.
This fixes some failures in fstest generic/673.
Fixes: 8ede205541 ("ovl: add reflink/copyfile/dedup support")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYo+nGwAKCRDh3BK/laaZ
PBouAP0VBH/jygclzc42jlRkKjp+wJnF1FifpWOJEtTPiYqhtAD/UWjR/2Sy4TMT
fRsw9N9/FXxcXShjg3U42fpCNSVEqgM=
=oY4z
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
- Support idmapped layers in overlayfs (Christian Brauner)
- Add a fix to exportfs that is relevant to open_by_handle_at(2) as
well
- Introduce new lookup helpers that allow passing mnt_userns into
inode_permission()
* tag 'ovl-update-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: support idmapped layers
ovl: handle idmappings in ovl_xattr_{g,s}et()
ovl: handle idmappings in layer open helpers
ovl: handle idmappings in ovl_permission()
ovl: use ovl_copy_{real,upper}attr() wrappers
ovl: store lower path in ovl_inode
ovl: handle idmappings for layer lookup
ovl: handle idmappings for layer fileattrs
ovl: use ovl_path_getxattr() wrapper
ovl: use ovl_lookup_upper() wrapper
ovl: use ovl_do_notify_change() wrapper
ovl: pass layer mnt to ovl_open_realfile()
ovl: pass ofs to setattr operations
ovl: handle idmappings in creation operations
ovl: add ovl_upper_mnt_userns() wrapper
ovl: pass ofs to creation operations
ovl: use wrappers to all vfs_*xattr() calls
exportfs: support idmapped mounts
fs: add two trivial lookup helpers
Currently various places test if direct IO is possible on a file by
checking for the existence of the direct_IO address space operation.
This is a poor choice, as the direct_IO operation may not be used - it is
only used if the generic_file_*_iter functions are called for direct IO
and some filesystems - particularly NFS - don't do this.
Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the
various places to check this (avoiding pointer dereferences).
do_dentry_open() will set this flag if ->direct_IO is present, so
filesystems do not need to be changed.
NFS *is* changed, to set the flag explicitly and discard the direct_IO
entry in the address_space_operations for files.
Other filesystems which currently use noop_direct_IO could usefully be
changed to set this flag instead.
Link: https://lkml.kernel.org/r/164859778128.29473.15189737957277399416.stgit@noble.brown
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In earlier patches we already passed down the relevant upper or lower
path to ovl_open_realfile(). Now let the open helpers actually take the
idmapping of the relevant mount into account when checking permissions.
This is needed to support idmapped base layers with overlay.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When copying inode attributes from the upper or lower layer to ovl inodes
we need to take the upper or lower layer's mount's idmapping into
account. In a lot of places we call ovl_copyattr() only on upper inodes and
in some we call it on either upper or lower inodes. Split this into two
separate helpers.
The first one should only be called on upper
inodes and is thus called ovl_copy_upperattr(). The second one can be
called on upper or lower inodes. We add ovl_copy_realattr() for this
task. The new helper makes use of the previously added ovl_i_path_real()
helper. This is needed to support idmapped base layers with overlay.
When overlay copies the inode information from an upper or lower layer
to the relevant overlay inode it will apply the idmapping of the upper
or lower layer when doing so. The ovl inode ownership will thus always
correctly reflect the ownership of the idmapped upper or lower layer.
All idmapping helpers are nops when no idmapped base layers are used.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Ensure that ovl_open_realfile() takes the mount's idmapping into
account. We add a new helper ovl_path_realdata() that can be used to
easily retrieve the relevant path which we can pass down. This is needed
to support idmapped base layers with overlay.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYYk9CAAKCRDh3BK/laaZ
PNk0AP4l7/V9QdvJ3kwzSsADZaw2859H8oIw0unJF4RaGYY/IwD/QUnj20vMpyN5
HbcyOCKdZfE6egtMKHik10ltOV8ZQQY=
=LgJR
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
- Fix a regression introduced in the last cycle
- Fix a use-after-free in the AIO path
- Fix a bogus warning reported by syzbot
* tag 'ovl-update-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix filattr copy-up failure
ovl: fix warning in ovl_create_real()
ovl: fix use after free in struct ovl_aio_req
Example for triggering use after free in a overlay on ext4 setup:
aio_read
ovl_read_iter
vfs_iter_read
ext4_file_read_iter
ext4_dio_read_iter
iomap_dio_rw -> -EIOCBQUEUED
/*
* Here IO is completed in a separate thread,
* ovl_aio_cleanup_handler() frees aio_req which has iocb embedded
*/
file_accessed(iocb->ki_filp); /**BOOM**/
Fix by introducing a refcount in ovl_aio_req similarly to aio_kiocb. This
guarantees that iocb is only freed after vfs_read/write_iter() returns on
underlying fs.
Fixes: 2406a307ac ("ovl: implement async IO routines")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20210930032228.3199690-3-yangerkun@huawei.com/
Cc: <stable@vger.kernel.org> # v5.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.
Now that everybody is passing in zero, kill off the extra argument.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Normally the check at open time suffices, but e.g loop device does set
IOCB_DIRECT after doing its own checks (which are not sufficent for
overlayfs).
Make sure we don't call the underlying filesystem read/write method with
the IOCB_DIRECT if it's not supported.
Reported-by: Huang Jianan <huangjianan@oppo.com>
Fixes: 16914e6fc7 ("ovl: add ovl_read_iter()")
Cc: <stable@vger.kernel.org> # v4.19
Tested-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There's possibility of an ABBA deadlock in case of a splice write to an
overlayfs file and a concurrent splice write to a corresponding real file.
The call chain for splice to an overlay file:
-> do_splice [takes sb_writers on overlay file]
-> do_splice_from
-> iter_file_splice_write [takes pipe->mutex]
-> vfs_iter_write
...
-> ovl_write_iter [takes sb_writers on real file]
And the call chain for splice to a real file:
-> do_splice [takes sb_writers on real file]
-> do_splice_from
-> iter_file_splice_write [takes pipe->mutex]
Syzbot successfully bisected this to commit 82a763e61e ("ovl: simplify
file splice").
Fix by reverting the write part of the above commit and by adding missing
bits from ovl_write_iter() into ovl_splice_write().
Fixes: 82a763e61e ("ovl: simplify file splice")
Reported-and-tested-by: syzbot+579885d1a9a833336209@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYIwTsgAKCRDh3BK/laaZ
PDktAP41eScbCiFzXDRjXw9S7Wfd8HEct0y1p+9BUh8m3VdHfwEA0pDlJWNaJdYW
nFixPJ5GsAfxo+1ags0vn06CUS/K4gA=
=QlbJ
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix a regression introduced in 5.2 that resulted in valid overlayfs
mounts being rejected with ELOOP (Too many levels of symbolic links)
- Fix bugs found by various tools
- Miscellaneous improvements and cleanups
* tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: add debug print to ovl_do_getxattr()
ovl: invalidate readdir cache on changes to dir with origin
ovl: allow upperdir inside lowerdir
ovl: show "userxattr" in the mount data
ovl: trivial typo fixes in the file inode.c
ovl: fix misspellings using codespell tool
ovl: do not copy attr several times
ovl: remove ovl_map_dev_ino() return value
ovl: fix error for ovl_fill_super()
ovl: fix missing revert_creds() on error path
ovl: fix leaked dentry
ovl: restrict lower null uuid for "xino=auto"
ovl: check that upperdir path is not on a read-only mount
ovl: plumb through flush method
Pull fileattr conversion updates from Miklos Szeredi via Al Viro:
"This splits the handling of FS_IOC_[GS]ETFLAGS from ->ioctl() into a
separate method.
The interface is reasonably uniform across the filesystems that
support it and gives nice boilerplate removal"
* 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
ovl: remove unneeded ioctls
fuse: convert to fileattr
fuse: add internal open/release helpers
fuse: unsigned open flags
fuse: move ioctl to separate source file
vfs: remove unused ioctl helpers
ubifs: convert to fileattr
reiserfs: convert to fileattr
ocfs2: convert to fileattr
nilfs2: convert to fileattr
jfs: convert to fileattr
hfsplus: convert to fileattr
efivars: convert to fileattr
xfs: convert to fileattr
orangefs: convert to fileattr
gfs2: convert to fileattr
f2fs: convert to fileattr
ext4: convert to fileattr
ext2: convert to fileattr
btrfs: convert to fileattr
...
mmap_region() now calls fput() on the vma->vm_file.
Fix this by using vma_set_file() so it doesn't need to be handled
manually here any more.
Link: https://lkml.kernel.org/r/20210421132012.82354-2-christian.koenig@amd.com
Fixes: 1527f926fd ("mm: mmap: fix fput in error path v2")
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: <stable@vger.kernel.org> [5.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The FS_IOC_[GS]ETFLAGS/FS_IOC_FS[GS]ETXATTR ioctls are now handled via the
fileattr api. The only unconverted filesystem remaining is CIFS and it is
not allowed to be overlayed due to case insensitive filenames.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>