Commit Graph

821 Commits

Author SHA1 Message Date
David Howells
d4120d87a0 overlayfs: Implement splice-read
Implement splice-read for overlayfs by passing the request down a layer
rather than going through generic_file_splice_read() which is going to be
changed to assume that ->read_folio() is present on buffered files.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christian Brauner <brauner@kernel.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Amir Goldstein <amir73il@gmail.com>
cc: linux-unionfs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-11-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24 08:42:16 -06:00
Christian Brauner
a1fbb60734
ovl: check for ->listxattr() support
We have decoupled vfs_listxattr() from IOP_XATTR. Instead we just need
to check whether inode->i_op->listxattr is implemented.

Cc: linux-unionfs@vger.kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-06 09:57:13 +01:00
Christian Brauner
0c95c025a0
fs: drop unused posix acl handlers
Remove struct posix_acl_{access,default}_handler for all filesystems
that don't depend on the xattr handler in their inode->i_op->listxattr()
method in any way. There's nothing more to do than to simply remove the
handler. It's been effectively unused ever since we introduced the new
posix acl api.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-06 09:57:12 +01:00
Linus Torvalds
05e6295f7b fs.idmapped.v6.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
 orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
 Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
 =+BG5
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs idmapping updates from Christian Brauner:

 - Last cycle we introduced the dedicated struct mnt_idmap type for
   mount idmapping and the required infrastucture in 256c8aed2b ("fs:
   introduce dedicated idmap type for mounts"). As promised in last
   cycle's pull request message this converts everything to rely on
   struct mnt_idmap.

   Currently we still pass around the plain namespace that was attached
   to a mount. This is in general pretty convenient but it makes it easy
   to conflate namespaces that are relevant on the filesystem with
   namespaces that are relevant on the mount level. Especially for
   non-vfs developers without detailed knowledge in this area this was a
   potential source for bugs.

   This finishes the conversion. Instead of passing the plain namespace
   around this updates all places that currently take a pointer to a
   mnt_userns with a pointer to struct mnt_idmap.

   Now that the conversion is done all helpers down to the really
   low-level helpers only accept a struct mnt_idmap argument instead of
   two namespace arguments.

   Conflating mount and other idmappings will now cause the compiler to
   complain loudly thus eliminating the possibility of any bugs. This
   makes it impossible for filesystem developers to mix up mount and
   filesystem idmappings as they are two distinct types and require
   distinct helpers that cannot be used interchangeably.

   Everything associated with struct mnt_idmap is moved into a single
   separate file. With that change no code can poke around in struct
   mnt_idmap. It can only be interacted with through dedicated helpers.
   That means all filesystems are and all of the vfs is completely
   oblivious to the actual implementation of idmappings.

   We are now also able to extend struct mnt_idmap as we see fit. For
   example, we can decouple it completely from namespaces for users that
   don't require or don't want to use them at all. We can also extend
   the concept of idmappings so we can cover filesystem specific
   requirements.

   In combination with the vfs{g,u}id_t work we finished in v6.2 this
   makes this feature substantially more robust and thus difficult to
   implement wrong by a given filesystem and also protects the vfs.

 - Enable idmapped mounts for tmpfs and fulfill a longstanding request.

   A long-standing request from users had been to make it possible to
   create idmapped mounts for tmpfs. For example, to share the host's
   tmpfs mount between multiple sandboxes. This is a prerequisite for
   some advanced Kubernetes cases. Systemd also has a range of use-cases
   to increase service isolation. And there are more users of this.

   However, with all of the other work going on this was way down on the
   priority list but luckily someone other than ourselves picked this
   up.

   As usual the patch is tiny as all the infrastructure work had been
   done multiple kernel releases ago. In addition to all the tests that
   we already have I requested that Rodrigo add a dedicated tmpfs
   testsuite for idmapped mounts to xfstests. It is to be included into
   xfstests during the v6.3 development cycle. This should add a slew of
   additional tests.

* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
  shmem: support idmapped mounts for tmpfs
  fs: move mnt_idmap
  fs: port vfs{g,u}id helpers to mnt_idmap
  fs: port fs{g,u}id helpers to mnt_idmap
  fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
  fs: port i_{g,u}id_{needs_}update() to mnt_idmap
  quota: port to mnt_idmap
  fs: port privilege checking helpers to mnt_idmap
  fs: port inode_owner_or_capable() to mnt_idmap
  fs: port inode_init_owner() to mnt_idmap
  fs: port acl to mnt_idmap
  fs: port xattr to mnt_idmap
  fs: port ->permission() to pass mnt_idmap
  fs: port ->fileattr_set() to pass mnt_idmap
  fs: port ->set_acl() to pass mnt_idmap
  fs: port ->get_acl() to pass mnt_idmap
  fs: port ->tmpfile() to pass mnt_idmap
  fs: port ->rename() to pass mnt_idmap
  fs: port ->mknod() to pass mnt_idmap
  fs: port ->mkdir() to pass mnt_idmap
  ...
2023-02-20 11:53:11 -08:00
Miklos Szeredi
4f11ada10d ovl: fail on invalid uid/gid mapping at copy up
If st_uid/st_gid doesn't have a mapping in the mounter's user_ns, then
copy-up should fail, just like it would fail if the mounter task was doing
the copy using "cp -a".

There's a corner case where the "cp -a" would succeed but copy up fail: if
there's a mapping of the invalid uid/gid (65534 by default) in the user
namespace.  This is because stat(2) will return this value if the mapping
doesn't exist in the current user_ns and "cp -a" will in turn be able to
create a file with this uid/gid.

This behavior would be inconsistent with POSIX ACL's, which return -1 for
invalid uid/gid which result in a failed copy.

For consistency and simplicity fail the copy of the st_uid/st_gid are
invalid.

Fixes: 459c7c565a ("ovl: unprivieged mounts")
Cc: <stable@vger.kernel.org> # v5.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Seth Forshee <sforshee@kernel.org>
2023-01-27 16:17:19 +01:00
Miklos Szeredi
baabaa5055 ovl: fix tmpfile leak
Missed an error cleanup.

Reported-by: syzbot+fd749a7ea127a84e0ffd@syzkaller.appspotmail.com
Fixes: 2b1a77461f ("ovl: use vfs_tmpfile_open() helper")
Cc: <stable@vger.kernel.org> # v6.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-01-27 16:16:12 +01:00
Christian Brauner
4d7ca40901
fs: port vfs{g,u}id helpers to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:30 +01:00
Christian Brauner
e67fe63341
fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
Convert to struct mnt_idmap.
Remove legacy file_mnt_user_ns() and mnt_user_ns().

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:29 +01:00
Christian Brauner
9452e93e6d
fs: port privilege checking helpers to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:29 +01:00
Christian Brauner
01beba7957
fs: port inode_owner_or_capable() to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:29 +01:00
Christian Brauner
f2d40141d5
fs: port inode_init_owner() to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
39f60c1cce
fs: port xattr to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
4609e1f18e
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
8782a9aea3
fs: port ->fileattr_set() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
13e83a4923
fs: port ->set_acl() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
7743532277
fs: port ->get_acl() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
e18275ae55
fs: port ->rename() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:26 +01:00
Christian Brauner
5ebb29bee8
fs: port ->mknod() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:26 +01:00
Christian Brauner
c54bd91e9e
fs: port ->mkdir() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:26 +01:00
Christian Brauner
7a77db9551
fs: port ->symlink() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:25 +01:00
Christian Brauner
6c960e68aa
fs: port ->create() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:25 +01:00
Christian Brauner
b74d24f7a7
fs: port ->getattr() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:25 +01:00
Christian Brauner
c1632a0f11
fs: port ->setattr() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:02 +01:00
Christian Brauner
abf08576af
fs: port vfs_*() helpers to struct mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18 17:51:45 +01:00
Linus Torvalds
6df7cc2268 overlayfs update for 6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCY5b3+gAKCRDh3BK/laaZ
 PIPxAQCPgyV/X/yJFd3wVgKa3/JxcHl5qdPbwHXFuYiJCBd69QEA9LYQEeEoTLCY
 veGiQPkl6Sp8ZqmTbDBxqw5OaBTSMwM=
 =7TiE
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs update from Miklos Szeredi:

 - Fix a couple of bugs found by syzbot

 - Don't ingore some open flags set by fcntl(F_SETFL)

 - Fix failure to create a hard link in certain cases

 - Use type safe helpers for some mnt_userns transformations

 - Improve performance of mount

 - Misc cleanups

* tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: Kconfig: Fix spelling mistake "undelying" -> "underlying"
  ovl: use inode instead of dentry where possible
  ovl: Add comment on upperredirect reassignment
  ovl: use plain list filler in indexdir and workdir cleanup
  ovl: do not reconnect upper index records in ovl_indexdir_cleanup()
  ovl: fix comment typos
  ovl: port to vfs{g,u}id_t and associated helpers
  ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
  ovl: Use "buf" flexible array for memcpy() destination
  ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags
  ovl: fix use inode directly in rcu-walk mode
2022-12-12 20:18:26 -08:00
Linus Torvalds
e1212e9b6f fs.vfsuid.conversion.v6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bspgAKCRCRxhvAZXjc
 opEWAQDpF5rnZn1vv4/uOTij9ztcA4yLxu/Q19CdqBaoHlWZ9AD/d3eecee3bh5h
 iPHtlUK5/VspfD9LPpdc5ZbPCdZ2pA4=
 =t6NN
 -----END PGP SIGNATURE-----

Merge tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfsuid updates from Christian Brauner:
 "Last cycle we introduced the vfs{g,u}id_t types and associated helpers
  to gain type safety when dealing with idmapped mounts. That initial
  work already converted a lot of places over but there were still some
  left,

  This converts all remaining places that still make use of non-type
  safe idmapping helpers to rely on the new type safe vfs{g,u}id based
  helpers.

  Afterwards it removes all the old non-type safe helpers"

* tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: remove unused idmapping helpers
  ovl: port to vfs{g,u}id_t and associated helpers
  fuse: port to vfs{g,u}id_t and associated helpers
  ima: use type safe idmapping helpers
  apparmor: use type safe idmapping helpers
  caps: use type safe idmapping helpers
  fs: use type safe idmapping helpers
  mnt_idmapping: add missing helpers
2022-12-12 19:20:05 -08:00
Linus Torvalds
cf619f8919 fs.ovl.setgid.v6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bt7AAKCRCRxhvAZXjc
 ovAOAP9qcrUqs2MoyBDe6qUXThYY9w2rgX/ZI4ZZmbtsXEDGtQEA/LPddq8lD8o9
 m17zpvMGbXXRwz4/zVGuyWsHgg0HsQ0=
 =ioRq
 -----END PGP SIGNATURE-----

Merge tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull setgid inheritance updates from Christian Brauner:
 "This contains the work to make setgid inheritance consistent between
  modifying a file and when changing ownership or mode as this has been
  a repeated source of very subtle bugs. The gist is that we perform the
  same permission checks in the write path as we do in the ownership and
  mode changing paths after this series where we're currently doing
  different things.

  We've already made setgid inheritance a lot more consistent and
  reliable in the last releases by moving setgid stripping from the
  individual filesystems up into the vfs. This aims to make the logic
  even more consistent and easier to understand and also to fix
  long-standing overlayfs setgid inheritance bugs. Miklos was nice
  enough to just let me carry the trivial overlayfs patches from Amir
  too.

  Below is a more detailed explanation how the current difference in
  setgid handling lead to very subtle bugs exemplified via overlayfs
  which is a victim of the current rules. I hope this explains why I
  think taking the regression risk here is worth it.

  A long while ago I found a few setgid inheritance bugs in overlayfs in
  the write path in certain conditions. Amir recently picked this back
  up in [1] and I jumped on board to fix this more generally.

  On the surface all that overlayfs would need to fix setgid inheritance
  would be to call file_remove_privs() or file_modified() but actually
  that isn't enough because the setgid inheritance api is wildly
  inconsistent in that area.

  Before this pr setgid stripping in file_remove_privs()'s old
  should_remove_suid() helper was inconsistent with other parts of the
  vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
  S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
  and the caller isn't privileged over the inode although we require
  this already in setattr_prepare() and setattr_copy() and so all
  filesystem implement this requirement implicitly because they have to
  use setattr_{prepare,copy}() anyway.

  But the inconsistency shows up in setgid stripping bugs for overlayfs
  in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
  generic/687). For example, we test whether suid and setgid stripping
  works correctly when performing various write-like operations as an
  unprivileged user (fallocate, reflink, write, etc.):

      echo "Test 1 - qa_user, non-exec file $verb"
      setup_testfile
      chmod a+rws $junk_file
      commit_and_check "$qa_user" "$verb" 64k 64k

  The test basically creates a file with 6666 permissions. While the
  file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
  set.

  On a regular filesystem like xfs what will happen is:

      sys_fallocate()
      -> vfs_fallocate()
         -> xfs_file_fallocate()
            -> file_modified()
               -> __file_remove_privs()
                  -> dentry_needs_remove_privs()
                     -> should_remove_suid()
                  -> __remove_privs()
                     newattrs.ia_valid = ATTR_FORCE | kill;
                     -> notify_change()
                        -> setattr_copy()

  In should_remove_suid() we can see that ATTR_KILL_SUID is raised
  unconditionally because the file in the test has S_ISUID set.

  But we also see that ATTR_KILL_SGID won't be set because while the
  file is S_ISGID it is not S_IXGRP (see above) which is a condition for
  ATTR_KILL_SGID being raised.

  So by the time we call notify_change() we have attr->ia_valid set to
  ATTR_KILL_SUID | ATTR_FORCE.

  Now notify_change() sees that ATTR_KILL_SUID is set and does:

      ia_valid      = attr->ia_valid |= ATTR_MODE
      attr->ia_mode = (inode->i_mode & ~S_ISUID);

  which means that when we call setattr_copy() later we will definitely
  update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.

  Now we call into the filesystem's ->setattr() inode operation which
  will end up calling setattr_copy(). Since ATTR_MODE is set we will
  hit:

      if (ia_valid & ATTR_MODE) {
              umode_t mode = attr->ia_mode;
              vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
              if (!vfsgid_in_group_p(vfsgid) &&
                  !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                      mode &= ~S_ISGID;
              inode->i_mode = mode;
      }

  and since the caller in the test is neither capable nor in the group
  of the inode the S_ISGID bit is stripped.

  But assume the file isn't suid then ATTR_KILL_SUID won't be raised
  which has the consequence that neither the setgid nor the suid bits
  are stripped even though it should be stripped because the inode isn't
  in the caller's groups and the caller isn't privileged over the inode.

  If overlayfs is in the mix things become a bit more complicated and
  the bug shows up more clearly.

  When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
  file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
  raised but because the check in notify_change() is questioning the
  ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
  the S_ISGID bit isn't removed even though it should be stripped:

      sys_fallocate()
      -> vfs_fallocate()
         -> ovl_fallocate()
            -> file_remove_privs()
               -> dentry_needs_remove_privs()
                  -> should_remove_suid()
               -> __remove_privs()
                  newattrs.ia_valid = ATTR_FORCE | kill;
                  -> notify_change()
                     -> ovl_setattr()
                        /* TAKE ON MOUNTER'S CREDS */
                        -> ovl_do_notify_change()
                           -> notify_change()
                        /* GIVE UP MOUNTER'S CREDS */
           /* TAKE ON MOUNTER'S CREDS */
           -> vfs_fallocate()
              -> xfs_file_fallocate()
                 -> file_modified()
                    -> __file_remove_privs()
                       -> dentry_needs_remove_privs()
                          -> should_remove_suid()
                       -> __remove_privs()
                          newattrs.ia_valid = attr_force | kill;
                          -> notify_change()

  The fix for all of this is to make file_remove_privs()'s
  should_remove_suid() helper perform the same checks as we already
  require in setattr_prepare() and setattr_copy() and have
  notify_change() not pointlessly requiring S_IXGRP again. It doesn't
  make any sense in the first place because the caller must calculate
  the flags via should_remove_suid() anyway which would raise
  ATTR_KILL_SGID

  Note that some xfstests will now fail as these patches will cause the
  setgid bit to be lost in certain conditions for unprivileged users
  modifying a setgid file when they would've been kept otherwise. I
  think this risk is worth taking and I explained and mentioned this
  multiple times on the list [2].

  Enforcing the rules consistently across write operations and
  chmod/chown will lead to losing the setgid bit in cases were it
  might've been retained before.

  While I've mentioned this a few times but it's worth repeating just to
  make sure that this is understood. For the sake of maintainability,
  consistency, and security this is a risk worth taking.

  If we really see regressions for workloads the fix is to have special
  setgid handling in the write path again with different semantics from
  chmod/chown and possibly additional duct tape for overlayfs. I'll
  update the relevant xfstests with if you should decide to merge this
  second setgid cleanup.

  Before that people should be aware that there might be failures for
  fstests where unprivileged users modify a setgid file"

Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]

* tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: use consistent setgid checks in is_sxid()
  ovl: remove privs in ovl_fallocate()
  ovl: remove privs in ovl_copyfile()
  attr: use consistent sgid stripping checks
  attr: add setattr_should_drop_sgid()
  fs: move should_remove_suid()
  attr: add in_group_or_capable()
2022-12-12 19:03:10 -08:00
Colin Ian King
637d13b57d ovl: Kconfig: Fix spelling mistake "undelying" -> "underlying"
There is a spelling mistake in a Kconfig description. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Miklos Szeredi
1fa9c5c5ed ovl: use inode instead of dentry where possible
Passing dentry to some helpers is unnecessary.  Simplify these cases.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Stanislav Goriainov
cf4ef7801a ovl: Add comment on upperredirect reassignment
If memory for uperredirect was allocated with kstrdup() in upperdir != NULL
and d.redirect != NULL path, it may seem that it can be lost when
upperredirect is reassigned later, but it's not possible.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 0a2d0d3f2f ("ovl: Check redirect on index as well")
Signed-off-by: Stanislav Goriainov <goriainov@ispras.ru>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Amir Goldstein
af4dcb6d78 ovl: use plain list filler in indexdir and workdir cleanup
Those two cleanup routines are using the helper ovl_dir_read() with the
merge dir filler, which populates an rb tree, that is never used.

The index dir entry names all have a long (42 bytes) constant prefix, so it
is not surprising that perf top has demostrated high CPU usage by rb tree
population during cleanup of a large index dir:

      - 9.53% ovl_fill_merge
         - 78.41% ovl_cache_entry_find_link.constprop.27
            + 72.11% strncmp

Use the plain list filler that does not populate the unneeded rb tree.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Amir Goldstein
8ea2876577 ovl: do not reconnect upper index records in ovl_indexdir_cleanup()
ovl_indexdir_cleanup() is called on mount of overayfs with nfs_export
feature to cleanup stale index records for lower and upper files that have
been deleted while overlayfs was offline.

This has the side effect (good or bad) of pre populating inode cache with
all the copied up upper inodes, while verifying the index entries.

For copied up directories, the upper file handles are decoded to conncted
upper dentries.  This has the even bigger side effect of reading the
content of all the parent upper directories which may take significantly
more time and IO than just reading the upper inodes.

Do not request connceted upper dentries for verifying upper directory index
entries, because we have no use for the connected dentry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Jiangshan Yi
cdf5c9d1af ovl: fix comment typos
Fix two typos.

Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Christian Brauner
73db6a063c ovl: port to vfs{g,u}id_t and associated helpers
A while ago we introduced a dedicated vfs{g,u}id_t type in commit
1e5267cd08 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
over a good part of the VFS. Ultimately we will remove all legacy
idmapped mount helpers that operate only on k{g,u}id_t in favor of the
new type safe helpers that operate on vfs{g,u}id_t.

Cc: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Zhang Tianci
5b0db51215 ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
There is a wrong case of link() on overlay:
  $ mkdir /lower /fuse /merge
  $ mount -t fuse /fuse
  $ mkdir /fuse/upper /fuse/work
  $ mount -t overlay /merge -o lowerdir=/lower,upperdir=/fuse/upper,\
    workdir=work
  $ touch /merge/file
  $ chown bin.bin /merge/file // the file's caller becomes "bin"
  $ ln /merge/file /merge/lnkfile

Then we will get an error(EACCES) because fuse daemon checks the link()'s
caller is "bin", it denied this request.

In the changing history of ovl_link(), there are two key commits:

The first is commit bb0d2b8ad2 ("ovl: fix sgid on directory") which
overrides the cred's fsuid/fsgid using the new inode. The new inode's
owner is initialized by inode_init_owner(), and inode->fsuid is
assigned to the current user. So the override fsuid becomes the
current user. We know link() is actually modifying the directory, so
the caller must have the MAY_WRITE permission on the directory. The
current caller may should have this permission. This is acceptable
to use the caller's fsuid.

The second is commit 51f7e52dc9 ("ovl: share inode for hard link")
which removed the inode creation in ovl_link(). This commit move
inode_init_owner() into ovl_create_object(), so the ovl_link() just
give the old inode to ovl_create_or_link(). Then the override fsuid
becomes the old inode's fsuid, neither the caller nor the overlay's
mounter! So this is incorrect.

Fix this bug by using ovl mounter's fsuid/fsgid to do underlying
fs's link().

Link: https://lore.kernel.org/all/20220817102952.xnvesg3a7rbv576x@wittgenstein/T
Link: https://lore.kernel.org/lkml/20220825130552.29587-1-zhangtianci.1997@bytedance.com/t
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Fixes: 51f7e52dc9 ("ovl: share inode for hard link")
Cc: <stable@vger.kernel.org> # v4.8
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Kees Cook
cf8aa9bf97 ovl: Use "buf" flexible array for memcpy() destination
The "buf" flexible array needs to be the memcpy() destination to avoid
false positive run-time warning from the recent FORTIFY_SOURCE
hardening:

  memcpy: detected field-spanning write (size 93) of single field "&fh->fb"
  at fs/overlayfs/export.c:799 (size 21)

Reported-by: syzbot+9d14351a171d0d1c7955@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000763a6c05e95a5985@google.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-12-08 10:49:46 +01:00
Al Viro
456b59e757 ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags
ovl_change_flags() is an open-coded variant of fs/fcntl.c:setfl() and it
got missed by commit 164f4064ca ("keep iocb_flags() result cached in
struct file"); the same change applies there.

Reported-by: Pierre Labastie <pierre.labastie@neuf.fr>
Fixes: 164f4064ca ("keep iocb_flags() result cached in struct file")
Cc: <stable@vger.kernel.org> # v6.0
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216738
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-11-28 11:53:28 +01:00
Chen Zhongjin
672e4268b2 ovl: fix use inode directly in rcu-walk mode
ovl_dentry_revalidate_common() can be called in rcu-walk mode.  As document
said, "in rcu-walk mode, d_parent and d_inode should not be used without
care".

Check inode here to protect access under rcu-walk mode.

Fixes: bccece1ead ("ovl: allow remote upper")
Reported-and-tested-by: syzbot+a4055c78774bbf3498bb@syzkaller.appspotmail.com
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: <stable@vger.kernel.org> # v5.7
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-11-28 11:33:05 +01:00
Christian Brauner
5b52aebef8
ovl: call posix_acl_release() after error checking
The current placement of posix_acl_release() in ovl_set_or_remove_acl()
means it can be called on an error pointer instead of actual acls.
Fix this by moving the posix_acl_release() call after the error handling.

Fixes: 0e64185732 ("ovl: implement set acl method") # mainline only
Reported-by: syzbot+3f6ef1c4586bb6fd1f61@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-03 08:18:46 +01:00
Christian Brauner
c12db92d62
ovl: port to vfs{g,u}id_t and associated helpers
A while ago we introduced a dedicated vfs{g,u}id_t type in commit
1e5267cd08 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
over a good part of the VFS. Ultimately we will remove all legacy
idmapped mount helpers that operate only on k{g,u}id_t in favor of the
new type safe helpers that operate on vfs{g,u}id_t.

Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-26 10:03:34 +02:00
Christian Brauner
200afb77cd
ovl: use stub posix acl handlers
Now that ovl supports the get and set acl inode operations and the vfs
has been switched to the new posi api, ovl can simply rely on the stub
posix acl handlers. The custom xattr handlers and associated unused
helpers can be removed.

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20 10:13:32 +02:00
Christian Brauner
31acceb975
ovl: use posix acl api
Now that posix acls have a proper api us it to copy them.

All filesystems that can serve as lower or upper layers for overlayfs
have gained support for the new posix acl api in previous patches.
So switch all internal overlayfs codepaths for copying posix acls to the
new posix acl api.

Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20 10:13:31 +02:00
Christian Brauner
0e64185732
ovl: implement set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].

In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.

Now that we have added get and set acl inode operations that allow easy
access to the dentry we give overlayfs it's own get and set acl inode
operations.

The set acl inode operation is duplicates most of the ovl posix acl
xattr handler. The main difference being that the set acl inode
operation relies on the new posix acl api. Once the vfs has been
switched over the custom posix acl xattr handler will be removed
completely.

Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.

Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20 10:13:31 +02:00
Christian Brauner
6c0a8bfb84
ovl: implement get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].

In order to build a type safe posix api around get and set acl we need
all filesystem to implement get and set acl.

Now that we have added get and set acl inode operations that allow easy
access to the dentry we give overlayfs it's own get and set acl inode
operations.

Since overlayfs is a stacking filesystem it will use the newly added
posix acl api when retrieving posix acls from the relevant layer.

Since overlayfs can also be mounted on top of idmapped layers. If
idmapped layers are used overlayfs must take the layer's idmapping into
account after it retrieved the posix acls from the relevant layer.

Note, until the vfs has been switched to the new posix acl api this
patch is a non-functional change.

Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20 10:13:30 +02:00
Christian Brauner
cac2f8b8d8
fs: rename current get acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].

The current inode operation for getting posix acls takes an inode
argument but various filesystems (e.g., 9p, cifs, overlayfs) need access
to the dentry. In contrast to the ->set_acl() inode operation we cannot
simply extend ->get_acl() to take a dentry argument. The ->get_acl()
inode operation is called from:

acl_permission_check()
-> check_acl()
   -> get_acl()

which is part of generic_permission() which in turn is part of
inode_permission(). Both generic_permission() and inode_permission() are
called in the ->permission() handler of various filesystems (e.g.,
overlayfs). So simply passing a dentry argument to ->get_acl() would
amount to also having to pass a dentry argument to ->permission(). We
should avoid this unnecessary change.

So instead of extending the existing inode operation rename it from
->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that
passes a dentry argument and which filesystems that need access to the
dentry can implement instead of ->get_inode_acl(). Filesystems like cifs
which allow setting and getting posix acls but not using them for
permission checking during lookup can simply not implement
->get_inode_acl().

This is intended to be a non-functional change.

Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20 10:13:27 +02:00
Amir Goldstein
23a8ce1641
ovl: remove privs in ovl_fallocate()
Underlying fs doesn't remove privs because fallocate is called with
privileged mounter credentials.

This fixes some failure in fstests generic/683..687.

Fixes: aab8848cee ("ovl: add ovl_fallocate()")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-18 10:09:48 +02:00
Amir Goldstein
b306e90ffa
ovl: remove privs in ovl_copyfile()
Underlying fs doesn't remove privs because copy_range/remap_range are
called with privileged mounter credentials.

This fixes some failures in fstest generic/673.

Fixes: 8ede205541 ("ovl: add reflink/copyfile/dedup support")
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-18 10:09:48 +02:00
Linus Torvalds
f721d24e5d tmpfile API change
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY0DP2AAKCRBZ7Krx/gZQ
 6/+qAQCEGQWpcC5MB17zylaX7gqzhgAsDrwtpevlno3aIv/1pQD/YWr/E8tf7WTW
 ERXRXMRx1cAzBJhUhVgIY+3ANfU2Rg4=
 =cko4
 -----END PGP SIGNATURE-----

Merge tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs tmpfile updates from Al Viro:
 "Miklos' ->tmpfile() signature change; pass an unopened struct file to
  it, let it open the damn thing. Allows to add tmpfile support to FUSE"

* tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fuse: implement ->tmpfile()
  vfs: open inside ->tmpfile()
  vfs: move open right after ->tmpfile()
  vfs: make vfs_tmpfile() static
  ovl: use vfs_tmpfile_open() helper
  cachefiles: use vfs_tmpfile_open() helper
  cachefiles: only pass inode to *mark_inode_inuse() helpers
  cachefiles: tmpfile error handling cleanup
  hugetlbfs: cleanup mknod and tmpfile
  vfs: add vfs_tmpfile_open() helper
2022-10-10 19:45:17 -07:00
Linus Torvalds
4c0ed7d8d6 whack-a-mole: constifying struct path *
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxmRQAKCRBZ7Krx/gZQ
 6+/kAQD2xyf+i4zOYVBr1NB3qBbhVS1zrni1NbC/kT3dJPgTvwEA7z7eqwnrN4zg
 scKFP8a3yPoaQBfs4do5PolhuSr2ngA=
 =NBI+
 -----END PGP SIGNATURE-----

Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs constification updates from Al Viro:
 "whack-a-mole: constifying struct path *"

* tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ecryptfs: constify path
  spufs: constify path
  nd_jump_link(): constify path
  audit_init_parent(): constify path
  __io_setxattr(): constify path
  do_proc_readlink(): constify path
  overlayfs: constify path
  fs/notify: constify path
  may_linkat(): constify path
  do_sys_name_to_handle(): constify path
  ->getprocattr(): attribute name is const char *, TYVM...
2022-10-06 17:31:02 -07:00
Linus Torvalds
7a3353c5c4 struct file-related stuff
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxjIQAKCRBZ7Krx/gZQ
 6/FPAQCNCZygQzd+54//vo4kTwv5T2Bv3hS8J51rASPJT87/BQD/TfCLS5urt/Gt
 81A1dFOfnTXseofuBKyGSXwQm0dWpgA=
 =PLre
 -----END PGP SIGNATURE-----

Merge tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs file updates from Al Viro:
 "struct file-related stuff"

* tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dma_buf_getfile(): don't bother with ->f_flags reassignments
  Change calling conventions for filldir_t
  locks: fix TOCTOU race when granting write lease
2022-10-06 17:13:18 -07:00