Commit Graph

733 Commits

Author SHA1 Message Date
NeilBrown
a2ad63daa8 VFS: add FMODE_CAN_ODIRECT file flag
Currently various places test if direct IO is possible on a file by
checking for the existence of the direct_IO address space operation.
This is a poor choice, as the direct_IO operation may not be used - it is
only used if the generic_file_*_iter functions are called for direct IO
and some filesystems - particularly NFS - don't do this.

Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the
various places to check this (avoiding pointer dereferences).
do_dentry_open() will set this flag if ->direct_IO is present, so
filesystems do not need to be changed.

NFS *is* changed, to set the flag explicitly and discard the direct_IO
entry in the address_space_operations for files.

Other filesystems which currently use noop_direct_IO could usefully be
changed to set this flag instead.

Link: https://lkml.kernel.org/r/164859778128.29473.15189737957277399416.stgit@noble.brown
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09 18:20:49 -07:00
Muchun Song
fd60b28842 fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>		[ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:03 -07:00
Linus Torvalds
24d7f48c72 overlayfs fixes for 5.17-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYfkylwAKCRDh3BK/laaZ
 PLY6AP9rui/UD3RE/koFfRVTTPuZkv7I14mHmCzDloYcmPDJCgEAwebzCiOo22kQ
 Jn3V/B8mmG2vBv+qu+iM3/WbkPqCtgg=
 =ajWT
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "Fix a regression introduced in v5.15, affecting copy up of files with
  'noatime' or 'sync' attributes to a tmpfs upper layer"

* tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: don't fail copy up if no fileattr support on upper
  ovl: fix NULL pointer dereference in copy up warning
2022-02-01 11:23:02 -08:00
Miklos Szeredi
94fd19752b ovl: don't fail copy up if no fileattr support on upper
Christoph Fritz is reporting that failure to copy up fileattr when upper
doesn't support fileattr or xattr results in a regression.

Return success in these failure cases; this reverts overlayfs to the old
behavior.

Add a pr_warn_once() in these cases to still let the user know about the
copy up failures.

Reported-by: Christoph Fritz <chf.fritz@googlemail.com>
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-01-14 16:57:56 +01:00
Christoph Fritz
4ee7e4a6c9 ovl: fix NULL pointer dereference in copy up warning
This patch is fixing a NULL pointer dereference to get a recently
introduced warning message working.

Fixes: 5b0a414d06 ("ovl: fix filattr copy-up failure")
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-01-14 15:59:22 +01:00
Christian Brauner
bb49e9e730
fs: add is_idmapped_mnt() helper
Multiple places open-code the same check to determine whether a given
mount is idmapped. Introduce a simple helper function that can be used
instead. This allows us to get rid of the fragile open-coding. We will
later change the check that is used to determine whether a given mount
is idmapped. Introducing a helper allows us to do this in a single
place instead of doing it for multiple places.

Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1)
Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2)
Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-03 18:44:06 +01:00
Linus Torvalds
1bdd629e5a overlayfs update for 5.16
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYYk9CAAKCRDh3BK/laaZ
 PNk0AP4l7/V9QdvJ3kwzSsADZaw2859H8oIw0unJF4RaGYY/IwD/QUnj20vMpyN5
 HbcyOCKdZfE6egtMKHik10ltOV8ZQQY=
 =LgJR
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:

 - Fix a regression introduced in the last cycle

 - Fix a use-after-free in the AIO path

 - Fix a bogus warning reported by syzbot

* tag 'ovl-update-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix filattr copy-up failure
  ovl: fix warning in ovl_create_real()
  ovl: fix use after free in struct ovl_aio_req
2021-11-09 10:51:12 -08:00
Miklos Szeredi
5b0a414d06 ovl: fix filattr copy-up failure
This regression can be reproduced with ntfs-3g and overlayfs:

  mkdir lower upper work overlay
  dd if=/dev/zero of=ntfs.raw bs=1M count=2
  mkntfs -F ntfs.raw
  mount ntfs.raw lower
  touch lower/file.txt
  mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work - overlay
  mv overlay/file.txt overlay/file2.txt

mv fails and (misleadingly) prints

  mv: cannot move 'overlay/file.txt' to a subdirectory of itself, 'overlay/file2.txt'

The reason is that ovl_copy_fileattr() is triggered due to S_NOATIME being
set on all inodes (by fuse) regardless of fileattr.

ovl_copy_fileattr() tries to retrieve file attributes from lower file, but
that fails because filesystem does not support this ioctl (this should fail
with ENOTTY, but ntfs-3g return EINVAL instead).  This failure is
propagated to origial operation (in this case rename) that triggered the
copy-up.

The fix is to ignore ENOTTY and EINVAL errors from fileattr_get() in copy
up.  This also requires turning the internal ENOIOCTLCMD into ENOTTY.

As a further measure to prevent unnecessary failures, only try the
fileattr_get/set on upper if there are any flags to copy up.

Side note: a number of filesystems set S_NOATIME (and sometimes other inode
flags) irrespective of fileattr flags.  This causes unnecessary calls
during copy up, which might lead to a performance issue, especially if
latency is high.  To fix this, the kernel would need to differentiate
between the two cases.  E.g. introduce SB_NOATIME_UPDATE, a per-sb variant
of S_NOATIME.  SB_NOATIME doesn't work, because that's interpreted as
"filesystem doesn't store an atime attribute"

Reported-and-tested-by: Kevin Locke <kevin@kevinlocke.name>
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-11-04 14:04:52 +01:00
Miklos Szeredi
1f5573cfe7 ovl: fix warning in ovl_create_real()
Syzbot triggered the following warning in ovl_workdir_create() ->
ovl_create_real():

	if (!err && WARN_ON(!newdentry->d_inode)) {

The reason is that the cgroup2 filesystem returns from mkdir without
instantiating the new dentry.

Weird filesystems such as this will be rejected by overlayfs at a later
stage during setup, but to prevent such a warning, call ovl_mkdir_real()
directly from ovl_workdir_create() and reject this case early.

Reported-and-tested-by: syzbot+75eab84fd0af9e8bf66b@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-11-04 10:55:34 +01:00
yangerkun
9a25440376 ovl: fix use after free in struct ovl_aio_req
Example for triggering use after free in a overlay on ext4 setup:

aio_read
  ovl_read_iter
    vfs_iter_read
      ext4_file_read_iter
        ext4_dio_read_iter
          iomap_dio_rw -> -EIOCBQUEUED
          /*
	   * Here IO is completed in a separate thread,
	   * ovl_aio_cleanup_handler() frees aio_req which has iocb embedded
	   */
          file_accessed(iocb->ki_filp); /**BOOM**/

Fix by introducing a refcount in ovl_aio_req similarly to aio_kiocb.  This
guarantees that iocb is only freed after vfs_read/write_iter() returns on
underlying fs.

Fixes: 2406a307ac ("ovl: implement async IO routines")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20210930032228.3199690-3-yangerkun@huawei.com/
Cc: <stable@vger.kernel.org> # v5.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-29 13:48:19 +02:00
Jens Axboe
6b19b766e8 fs: get rid of the res2 iocb->ki_complete argument
The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25 10:36:24 -06:00
Miklos Szeredi
1dc1eed46f ovl: fix IOCB_DIRECT if underlying fs doesn't support direct IO
Normally the check at open time suffices, but e.g loop device does set
IOCB_DIRECT after doing its own checks (which are not sufficent for
overlayfs).

Make sure we don't call the underlying filesystem read/write method with
the IOCB_DIRECT if it's not supported.

Reported-by: Huang Jianan <huangjianan@oppo.com>
Fixes: 16914e6fc7 ("ovl: add ovl_read_iter()")
Cc: <stable@vger.kernel.org> # v4.19
Tested-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-09-28 09:16:12 +02:00
Zheng Liang
a295aef603 ovl: fix missing negative dentry check in ovl_rename()
The following reproducer

  mkdir lower upper work merge
  touch lower/old
  touch lower/new
  mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merge
  rm merge/new
  mv merge/old merge/new & unlink upper/new

may result in this race:

PROCESS A:
  rename("merge/old", "merge/new");
  overwrite=true,ovl_lower_positive(old)=true,
  ovl_dentry_is_whiteout(new)=true -> flags |= RENAME_EXCHANGE

PROCESS B:
  unlink("upper/new");

PROCESS A:
  lookup newdentry in new_upperdir
  call vfs_rename() with negative newdentry and RENAME_EXCHANGE

Fix by adding the missing check for negative newdentry.

Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Fixes: e9be9d5e76 ("overlay filesystem")
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-09-24 21:00:31 +02:00
Miklos Szeredi
332f606b32 ovl: enable RCU'd ->get_acl()
Overlayfs does not cache ACL's (to avoid double caching).  Instead it just
calls the underlying filesystem's i_op->get_acl(), which will return the
cached value, if possible.

In rcu path walk, however, get_cached_acl_rcu() is employed to get the
value from the cache, which will fail on overlayfs resulting in dropping
out of rcu walk mode.  This can result in a big performance hit in certain
situations.

Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which
indicates pass-through)

Reported-by: garyhuang <zjh.20052005@163.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18 22:08:24 +02:00
Miklos Szeredi
0cad624662 vfs: add rcu argument to ->get_acl() callback
Add a rcu argument to the ->get_acl() callback to allow
get_cached_acl_rcu() to call the ->get_acl() method in the next patch.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18 22:08:24 +02:00
chenying
52d5a0c6bd ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
If function ovl_instantiate() returns an error, ovl_cleanup will be called
and try to remove newdentry from wdir, but the newdentry has been moved to
udir at this time.  This will causes BUG_ON(victim->d_parent->d_inode !=
dir) in fs/namei.c:may_delete.

Signed-off-by: chenying <chenying.kernel@bytedance.com>
Fixes: 01b39dcc95 ("ovl: use inode_insert5() to hash a newly created inode")
Link: https://lore.kernel.org/linux-unionfs/e6496a94-a161-dc04-c38a-d2544633acb4@bytedance.com/
Cc: <stable@vger.kernel.org> # v4.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 17:37:53 +02:00
Miklos Szeredi
f945ca1963 ovl: use kvalloc in xattr copy-up
Extended attributes are usually small, but could be up to 64k in size, so
use the most efficient method for doing the allocation.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:45 +02:00
Chengguang Xu
d8991e8622 ovl: update ctime when changing fileattr
Currently we keep size, mode and times of overlay inode
as the same as upper inode, so should update ctime when
changing file attribution as well.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:44 +02:00
Chengguang Xu
b71759ef1e ovl: skip checking lower file's i_writecount on truncate
It is possible that a directory tree is shared between multiple overlay
instances as a lower layer.  In this case when one instance executes a file
residing on the lower layer, the other instance denies a truncate(2) call
on this file.

This only happens for truncate(2) and not for open(2) with the O_TRUNC
flag.

Fix this interference and inconsistency by removing the preliminary
i_writecount check before copy-up.

This means that unlike on normal filesystems truncate(argv[0]) will now
succeed.  If this ever causes a regression in a real world use case this
needs to be revisited.

One way to fix this properly would be to keep a correct i_writecount in the
overlay inode, but that is difficult due to memory mapping code only
dealing with the real file/inode.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:44 +02:00
Amir Goldstein
ffb24e3c65 ovl: relax lookup error on mismatch origin ftype
We get occasional reports of lookup errors due to mismatched
origin ftype from users that re-format a lower squashfs image.

Commit 13c6ad0f45 ("ovl: document lower modification caveats")
tries to discourage the practice of re-formating lower layers and
describes the expected behavior as undefined.

Commit b0e0f69731 ("ovl: restrict lower null uuid for "xino=auto"")
limits the configurations in which origin file handles are followed.

In addition to these measures, change the behavior in case of detecting
a mismatch origin ftype in lookup to issue a warning, not follow origin,
but not fail the lookup operation either.

That should make overall more users happy without any big consequences.

Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxgPq9E9xxwU2CDyHy-_yCZZeymg+3n+-6AqkGGE1YtwvQ@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:44 +02:00
Vyacheslav Yurkov
1fc31aac96 ovl: do not set overlay.opaque for new directories
Enable optimizations only if user opted-in for any of extended features.
If optimization is enabled, it breaks existing use case when a lower layer
directory appears after directory was created on a merged layer. If
overlay.opaque is applied, new files on lower layer are not visible.

Consider the following scenario:
- /lower and /upper are mounted to /merged
- directory /merged/new-dir is created with a file test1
- overlay is unmounted
- directory /lower/new-dir is created with a file test2
- overlay is mounted again

If opaque is applied by default, file test2 is not going to be visible
without explicitly clearing the overlay.opaque attribute

Signed-off-by: Vyacheslav Yurkov <Vyacheslav.Yurkov@bruker.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:44 +02:00
Vyacheslav Yurkov
ca45275cd6 ovl: add ovl_allow_offline_changes() helper
Allows to check whether any of extended features are enabled

Signed-off-by: Vyacheslav Yurkov <Vyacheslav.Yurkov@bruker.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:44 +02:00
Vyacheslav Yurkov
e4522bc873 ovl: disable decoding null uuid with redirect_dir
Currently decoding origin with lower null uuid is not allowed unless user
opted-in to one of the new features that require following the lower inode
of non-dir upper (index, xino, metacopy). Now we add redirect_dir too to
that feature list.

Signed-off-by: Vyacheslav Yurkov <Vyacheslav.Yurkov@bruker.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:44 +02:00
Amir Goldstein
096a218a58 ovl: consistent behavior for immutable/append-only inodes
When a lower file has immutable/append-only fileattr flags, the behavior of
overlayfs post copy up is inconsistent.

Immediattely after copy up, ovl inode still has the S_IMMUTABLE/S_APPEND
inode flags copied from lower inode, so vfs code still treats the ovl inode
as immutable/append-only.  After ovl inode evict or mount cycle, the ovl
inode does not have these inode flags anymore.

We cannot copy up the immutable and append-only fileattr flags, because
immutable/append-only inodes cannot be linked and because overlayfs will
not be able to set overlay.* xattr on the upper inodes.

Instead, if any of the fileattr flags of interest exist on the lower inode,
we store them in overlay.protattr xattr on the upper inode and we read the
flags from xattr on lookup and on fileattr_get().

This gives consistent behavior post copy up regardless of inode eviction
from cache.

When user sets new fileattr flags, we update or remove the overlay.protattr
xattr.

Storing immutable/append-only fileattr flags in an xattr instead of upper
fileattr also solves other non-standard behavior issues - overlayfs can now
copy up children of "ovl-immutable" directories and lower aliases of
"ovl-immutable" hardlinks.

Reported-by: Chengguang Xu <cgxu519@mykernel.net>
Link: https://lore.kernel.org/linux-unionfs/20201226104618.239739-1-cgxu519@mykernel.net/
Link: https://lore.kernel.org/linux-unionfs/20210210190334.1212210-5-amir73il@gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:43 +02:00
Amir Goldstein
72db82115d ovl: copy up sync/noatime fileattr flags
When a lower file has sync/noatime fileattr flags, the behavior of
overlayfs post copy up is inconsistent.

Immediately after copy up, ovl inode still has the S_SYNC/S_NOATIME
inode flags copied from lower inode, so vfs code still treats the ovl
inode as sync/noatime.  After ovl inode evict or mount cycle,
the ovl inode does not have these inode flags anymore.

To fix this inconsistency, try to copy the fileattr flags on copy up
if the upper fs supports the fileattr_set() method.

This gives consistent behavior post copy up regardless of inode eviction
from cache.

We cannot copy up the immutable/append-only inode flags in a similar
manner, because immutable/append-only inodes cannot be linked and because
overlayfs will not be able to set overlay.* xattr on the upper inodes.

Those flags will be addressed by a followup patch.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:43 +02:00
Amir Goldstein
a0c236b117 ovl: pass ovl_fs to ovl_check_setxattr()
Instead of passing the overlay dentry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:43 +02:00
Miklos Szeredi
580c610429 ovl: fix uninitialized pointer read in ovl_lookup_real_one()
One error path can result in release_dentry_name_snapshot() being called
before "name" was initialized by take_dentry_name_snapshot().

Fix by moving the release_dentry_name_snapshot() to immediately after the
only use.

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-10 10:21:30 +02:00
Miklos Szeredi
9b91b6b019 ovl: fix deadlock in splice write
There's possibility of an ABBA deadlock in case of a splice write to an
overlayfs file and a concurrent splice write to a corresponding real file.

The call chain for splice to an overlay file:

 -> do_splice                     [takes sb_writers on overlay file]
   -> do_splice_from
     -> iter_file_splice_write    [takes pipe->mutex]
       -> vfs_iter_write
         ...
         -> ovl_write_iter        [takes sb_writers on real file]

And the call chain for splice to a real file:

 -> do_splice                     [takes sb_writers on real file]
   -> do_splice_from
     -> iter_file_splice_write    [takes pipe->mutex]

Syzbot successfully bisected this to commit 82a763e61e ("ovl: simplify
file splice").

Fix by reverting the write part of the above commit and by adding missing
bits from ovl_write_iter() into ovl_splice_write().

Fixes: 82a763e61e ("ovl: simplify file splice")
Reported-and-tested-by: syzbot+579885d1a9a833336209@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-10 10:21:30 +02:00
Amir Goldstein
9011c2791e ovl: skip stale entries in merge dir cache iteration
On the first getdents call, ovl_iterate() populates the readdir cache
with a list of entries, but for upper entries with origin lower inode,
p->ino remains zero.

Following getdents calls traverse the readdir cache list and call
ovl_cache_update_ino() for entries with zero p->ino to lookup the entry
in the overlay and return d_ino that is consistent with st_ino.

If the upper file was unlinked between the first getdents call and the
getdents call that lists the file entry, ovl_cache_update_ino() will not
find the entry and fall back to setting d_ino to the upper real st_ino,
which is inconsistent with how this object was presented to users.

Instead of listing a stale entry with inconsistent d_ino, simply skip
the stale entry, which is better for users.

xfstest overlay/077 is failing without this patch.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/fstests/CAOQ4uxgR_cLnC_vdU5=seP3fwqVkuZM_-WfD6maFTMbMYq=a9w@mail.gmail.com/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-10 10:21:30 +02:00
Linus Torvalds
d652502ef4 overlayfs update for 5.13
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYIwTsgAKCRDh3BK/laaZ
 PDktAP41eScbCiFzXDRjXw9S7Wfd8HEct0y1p+9BUh8m3VdHfwEA0pDlJWNaJdYW
 nFixPJ5GsAfxo+1ags0vn06CUS/K4gA=
 =QlbJ
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs update from Miklos Szeredi:

 - Fix a regression introduced in 5.2 that resulted in valid overlayfs
   mounts being rejected with ELOOP (Too many levels of symbolic links)

 - Fix bugs found by various tools

 - Miscellaneous improvements and cleanups

* tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: add debug print to ovl_do_getxattr()
  ovl: invalidate readdir cache on changes to dir with origin
  ovl: allow upperdir inside lowerdir
  ovl: show "userxattr" in the mount data
  ovl: trivial typo fixes in the file inode.c
  ovl: fix misspellings using codespell tool
  ovl: do not copy attr several times
  ovl: remove ovl_map_dev_ino() return value
  ovl: fix error for ovl_fill_super()
  ovl: fix missing revert_creds() on error path
  ovl: fix leaked dentry
  ovl: restrict lower null uuid for "xino=auto"
  ovl: check that upperdir path is not on a read-only mount
  ovl: plumb through flush method
2021-04-30 15:17:08 -07:00
Linus Torvalds
a4f7fae101 Merge branch 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fileattr conversion updates from Miklos Szeredi via Al Viro:
 "This splits the handling of FS_IOC_[GS]ETFLAGS from ->ioctl() into a
  separate method.

  The interface is reasonably uniform across the filesystems that
  support it and gives nice boilerplate removal"

* 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
  ovl: remove unneeded ioctls
  fuse: convert to fileattr
  fuse: add internal open/release helpers
  fuse: unsigned open flags
  fuse: move ioctl to separate source file
  vfs: remove unused ioctl helpers
  ubifs: convert to fileattr
  reiserfs: convert to fileattr
  ocfs2: convert to fileattr
  nilfs2: convert to fileattr
  jfs: convert to fileattr
  hfsplus: convert to fileattr
  efivars: convert to fileattr
  xfs: convert to fileattr
  orangefs: convert to fileattr
  gfs2: convert to fileattr
  f2fs: convert to fileattr
  ext4: convert to fileattr
  ext2: convert to fileattr
  btrfs: convert to fileattr
  ...
2021-04-27 11:18:24 -07:00
Linus Torvalds
d1466bc583 Merge branch 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode type handling updates from Al Viro:
 "We should never change the type bits of ->i_mode or the method tables
  (->i_op and ->i_fop) of a live inode.

  Unfortunately, not all filesystems took care to prevent that"

* 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  spufs: fix bogosity in S_ISGID handling
  9p: missing chunk of "fs/9p: Don't update file type when updating file attributes"
  openpromfs: don't do unlock_new_inode() until the new inode is set up
  hostfs_mknod(): don't bother with init_special_inode()
  cifs: have cifs_fattr_to_inode() refuse to change type on live inode
  cifs: have ->mkdir() handle race with another client sanely
  do_cifs_create(): don't set ->i_mode of something we had not created
  gfs2: be careful with inode refresh
  ocfs2_inode_lock_update(): make sure we don't change the type bits of i_mode
  orangefs_inode_is_stale(): i_mode type bits do *not* form a bitmap...
  vboxsf: don't allow to change the inode type
  afs: Fix updating of i_mode due to 3rd party change
  ceph: don't allow type or device number to change on non-I_NEW inodes
  ceph: fix up error handling with snapdirs
  new helper: inode_wrong_type()
2021-04-27 10:57:42 -07:00
Christian König
2896900e22 ovl: fix reference counting in ovl_mmap error path
mmap_region() now calls fput() on the vma->vm_file.

Fix this by using vma_set_file() so it doesn't need to be handled
manually here any more.

Link: https://lkml.kernel.org/r/20210421132012.82354-2-christian.koenig@amd.com
Fixes: 1527f926fd ("mm: mmap: fix fput in error path v2")
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: <stable@vger.kernel.org>	[5.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-23 14:42:39 -07:00
Miklos Szeredi
c4fe8aef2f ovl: remove unneeded ioctls
The FS_IOC_[GS]ETFLAGS/FS_IOC_FS[GS]ETXATTR ioctls are now handled via the
fileattr api.  The only unconverted filesystem remaining is CIFS and it is
not allowed to be overlayed due to case insensitive filenames.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:30 +02:00
Miklos Szeredi
66dbfabf10 ovl: stack fileattr ops
Add stacking for the fileattr operations.

Add hack for calling security_file_ioctl() for now.  Probably better to
have a pair of specific hooks for these operations.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:29 +02:00
Amir Goldstein
5e717c6fa4 ovl: add debug print to ovl_do_getxattr()
It was the only ovl_do helper missing it.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
Amir Goldstein
65cd913ec9 ovl: invalidate readdir cache on changes to dir with origin
The test in ovl_dentry_version_inc() was out-dated and did not include
the case where readdir cache is used on a non-merge dir that has origin
xattr, indicating that it may contain leftover whiteouts.

To make the code more robust, use the same helper ovl_dir_is_real()
to determine if readdir cache should be used and if readdir cache should
be invalidated.

Fixes: b79e05aaa1 ("ovl: no direct iteration for dir with origin xattr")
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxht70nODhNHNwGFMSqDyOKLXOKrY0H6g849os4BQ7cokA@mail.gmail.com/
Cc: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
Miklos Szeredi
708fa01597 ovl: allow upperdir inside lowerdir
Commit 146d62e5a5 ("ovl: detect overlapping layers") made sure we don't
have overlapping layers, but it also broke the arguably valid use case of

 mount -olowerdir=/,upperdir=/subdir,..

where upperdir overlaps lowerdir on the same filesystem.  This has been
causing regressions.

Revert the check, but only for the specific case where upperdir and/or
workdir are subdirectories of lowerdir.  Any other overlap (e.g. lowerdir
is subdirectory of upperdir, etc) case is crazy, so leave the check in
place for those.

Overlaps are detected at lookup time too, so reverting the mount time check
should be safe.

Fixes: 146d62e5a5 ("ovl: detect overlapping layers")
Cc: <stable@vger.kernel.org> # v5.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
Giuseppe Scrivano
321b46b904 ovl: show "userxattr" in the mount data
This was missed when adding the option.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Fixes: 2d2f2d7322 ("ovl: user xattr")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
Bhaskar Chowdhury
f48bbfb20e ovl: trivial typo fixes in the file inode.c
s/peresistent/persistent/
s/xatts/xattrs/
s/annotaion/annotation/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
Xiong Zhenwu
597534e7bc ovl: fix misspellings using codespell tool
A typo is found out by codespell tool:

$ codespell ./fs/overlayfs/
./fs/overlayfs/util.c:217: dependig  ==> depending

Fix a typo found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
Chengguang Xu
568edee485 ovl: do not copy attr several times
In ovl_xattr_set() we have already copied attr of real inode
so no need to copy it again in ovl_posix_acl_xattr_set().

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
youngjun
c68e7ec53a ovl: remove ovl_map_dev_ino() return value
ovl_map_dev_ino() always returns success.  Remove unnecessary return value.

Signed-off-by: youngjun <her0gyugyu@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:37 +02:00
Chengguang Xu
d7b49b10d5 ovl: fix error for ovl_fill_super()
There are some places should return -EINVAL instead of -ENOMEM in
ovl_fill_super().

[Amir] Consistently set error before checking the error condition.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:36 +02:00
Dan Carpenter
7b279bbfd2 ovl: fix missing revert_creds() on error path
Smatch complains about missing that the ovl_override_creds() doesn't
have a matching revert_creds() if the dentry is disconnected.  Fix this
by moving the ovl_override_creds() until after the disconnected check.

Fixes: aa3ff3c152 ("ovl: copy up of disconnected dentries")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:36 +02:00
Mickaël Salaün
eaab1d45cd ovl: fix leaked dentry
Since commit 6815f479ca ("ovl: use only uppermetacopy state in
ovl_lookup()"), overlayfs doesn't put temporary dentry when there is a
metacopy error, which leads to dentry leaks when shutting down the related
superblock:

  overlayfs: refusing to follow metacopy origin for (/file0)
  ...
  BUG: Dentry (____ptrval____){i=3f33,n=file3}  still in use (1) [unmount of overlay overlay]
  ...
  WARNING: CPU: 1 PID: 432 at umount_check.cold+0x107/0x14d
  CPU: 1 PID: 432 Comm: unmount-overlay Not tainted 5.12.0-rc5 #1
  ...
  RIP: 0010:umount_check.cold+0x107/0x14d
  ...
  Call Trace:
   d_walk+0x28c/0x950
   ? dentry_lru_isolate+0x2b0/0x2b0
   ? __kasan_slab_free+0x12/0x20
   do_one_tree+0x33/0x60
   shrink_dcache_for_umount+0x78/0x1d0
   generic_shutdown_super+0x70/0x440
   kill_anon_super+0x3e/0x70
   deactivate_locked_super+0xc4/0x160
   deactivate_super+0xfa/0x140
   cleanup_mnt+0x22e/0x370
   __cleanup_mnt+0x1a/0x30
   task_work_run+0x139/0x210
   do_exit+0xb0c/0x2820
   ? __kasan_check_read+0x1d/0x30
   ? find_held_lock+0x35/0x160
   ? lock_release+0x1b6/0x660
   ? mm_update_next_owner+0xa20/0xa20
   ? reacquire_held_locks+0x3f0/0x3f0
   ? __sanitizer_cov_trace_const_cmp4+0x22/0x30
   do_group_exit+0x135/0x380
   __do_sys_exit_group.isra.0+0x20/0x20
   __x64_sys_exit_group+0x3c/0x50
   do_syscall_64+0x45/0x70
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  ...
  VFS: Busy inodes after unmount of overlay. Self-destruct in 5 seconds.  Have a nice day...

This fix has been tested with a syzkaller reproducer.

Cc: Amir Goldstein <amir73il@gmail.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 6815f479ca ("ovl: use only uppermetacopy state in ovl_lookup()")
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210329164907.2133175-1-mic@digikod.net
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:36 +02:00
Amir Goldstein
b0e0f69731 ovl: restrict lower null uuid for "xino=auto"
Commit a888db3101 ("ovl: fix regression with re-formatted lower
squashfs") attempted to fix a regression with existing setups that
use a practice that we are trying to discourage.

The discourage part was described this way in the commit message:
"To avoid the reported regression while still allowing the new features
 with single lower squashfs, do not allow decoding origin with lower null
 uuid unless user opted-in to one of the new features that require
 following the lower inode of non-dir upper (index, xino, metacopy)."

The three mentioned features are disabled by default in Kconfig, so
it was assumed that if they are enabled, the user opted-in for them.
Apparently, distros started to configure CONFIG_OVERLAY_FS_XINO_AUTO=y
some time ago, so users upgrading their kernels can still be affected
by said regression even though they never opted-in for any new feature.

To fix this, treat "xino=on" as "user opted-in", but not "xino=auto".
Since we are changing the behavior of "xino=auto" to no longer follow
to lower origin with null uuid, take this one step further and disable
xino in that corner case.  To be consistent, disable xino also in cases
of lower fs without file handle support and upper fs without xattr
support.

Update documentation w.r.t the new "xino=auto" behavior and fix the out
dated bits of documentation regarding "xino" and regarding offline
modifications to lower layers.

Link: https://lore.kernel.org/linux-unionfs/b36a429d7c563730c28d763d4d57a6fc30508a4f.1615216996.git.kevin@kevinlocke.name/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:36 +02:00
Amir Goldstein
e21a6c57e3 ovl: check that upperdir path is not on a read-only mount
So far we only checked that sb is not read-only.

Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:36 +02:00
Sargun Dhillon
1f0cb8bcc7 ovl: plumb through flush method
Filesystems can implement their own flush method that release
resources, or manipulate caches. Currently if one of these
filesystems is used with overlayfs, the flush method is not called.

[Amir: fix fd leak in ovl_flush()]

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 12:00:36 +02:00
Al Viro
6e3e2c4362 new helper: inode_wrong_type()
inode_wrong_type(inode, mode) returns true if setting inode->i_mode
to given value would've changed the inode type.  We have enough of
those checks open-coded to make a helper worthwhile.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-03-08 10:19:35 -05:00