linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-04 01:51:34 +00:00

Author	SHA1	Message	Date
Christian Brauner	1784fbc2ed	ovl: port to new mount api We recently ported util-linux to the new mount api. Now the mount(8) tool will by default use the new mount api. While trying hard to fall back to the old mount api gracefully there are still cases where we run into issues that are difficult to handle nicely. Now with mount(8) and libmount supporting the new mount api I expect an increase in the number of bug reports and issues we're going to see with filesystems that don't yet support the new mount api. So it's time we rectify this. When ovl_fill_super() fails before setting sb->s_root, we need to cleanup sb->s_fs_info. The logic is a bit convoluted but tl;dr: If sget_fc() has succeeded fc->s_fs_info will have been transferred to sb->s_fs_info. So by the time ->fill_super()/ovl_fill_super() is called fc->s_fs_info is NULL consequently fs_context->free() won't call ovl_free_fs(). If we fail before sb->s_root() is set then ->put_super() won't be called which would call ovl_free_fs(). IOW, if we fail in ->fill_super() before sb->s_root we have to clean it up. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-06-19 14:02:01 +03:00
Amir Goldstein	ac519625ed	ovl: factor out ovl_parse_options() helper For parsing a single mount option. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-06-19 14:02:01 +03:00
Amir Goldstein	af5f2396b6	ovl: store enum redirect_mode in config instead of a string Do all the logic to set the mode during mount options parsing and do not keep the option string around. Use a constant_table to translate from enum redirect mode to string in preperation for new mount api option parsing. The mount option "off" is translated to either "follow" or "nofollow", depending on the "redirect_always_follow" build/module config, so in effect, there are only three possible redirect modes. This results in a minor change to the string that is displayed in show_options() - when redirect_dir is enabled by default and the user mounts with the option "redirect_dir=off", instead of displaying the mode "redirect_dir=off" in show_options(), the displayed mode will be either "redirect_dir=follow" or "redirect_dir=nofollow", depending on the value of "redirect_always_follow" build/module config. The displayed mode reflects the effective mode, so mounting overlayfs again with the dispalyed redirect_dir option will result with the same effective and displayed mode. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-06-19 14:02:01 +03:00
Amir Goldstein	dcb399de1e	ovl: pass ovl_fs to xino helpers Internal ovl methods should use ovl_fs and not sb as much as possible. Use a constant_table to translate from enum xino mode to string in preperation for new mount api option parsing. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-06-19 14:02:00 +03:00
Amir Goldstein	367d002d6c	ovl: clarify ovl_get_root() semantics Change the semantics to take a reference on upperdentry instead of transferrig the reference. This is needed for upcoming port to new mount api. Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-06-19 14:02:00 +03:00
Amir Goldstein	e4599d4b1a	ovl: negate the ofs->share_whiteout boolean The default common case is that whiteout sharing is enabled. Change to storing the negated no_shared_whiteout state, so we will not need to initialize it. This is the first step towards removing all config and feature initializations out of ovl_fill_super(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-06-19 14:02:00 +03:00
Amir Goldstein	42dd69ae1a	ovl: implement lazy lookup of lowerdata in data-only layers Defer lookup of lowerdata in the data-only layers to first data access or before copy up. We perform lowerdata lookup before copy up even if copy up is metadata only copy up. We can further optimize this lookup later if needed. We do best effort lazy lookup of lowerdata for d_real_inode(), because this interface does not expect errors. The only current in-tree caller of d_real_inode() is trace_uprobe and this caller is likely going to be followed reading from the file, before placing uprobes on offset within the file, so lowerdata should be available when setting the uprobe. Tested-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:14 +03:00
Amir Goldstein	4166564478	ovl: prepare for lazy lookup of lowerdata inode Make the code handle the case of numlower > 1 and missing lowerdata dentry gracefully. Missing lowerdata dentry is an indication for lazy lookup of lowerdata and in that case the lowerdata_redirect path is stored in ovl_inode. Following commits will defer lookup and perform the lazy lookup on access. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:14 +03:00
Amir Goldstein	2b21da9208	ovl: prepare to store lowerdata redirect for lazy lowerdata lookup Prepare to allow ovl_lookup() to leave the last entry in a non-dir lowerstack empty to signify lazy lowerdata lookup. In this case, ovl_lookup() stores the redirect path from metacopy to lowerdata in ovl_inode, which is going to be used later to perform the lazy lowerdata lookup. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:14 +03:00
Amir Goldstein	37ebf056d6	ovl: introduce data-only lower layers Introduce the format lowerdir=lower1:lower2::lowerdata1::lowerdata2 where the lower layers on the right of the :: separators are not merged into the overlayfs merge dirs. Data-only lower layers are only allowed at the bottom of the stack. The files in those layers are only meant to be accessible via absolute redirect from metacopy files in lower layers. Following changes will implement lookup in the data layers. This feature was requested for composefs ostree use case, where the lower data layer should only be accessiable via absolute redirects from metacopy inodes. The lower data layers are not required to a have a unique uuid or any uuid at all, because they are never used to compose the overlayfs inode st_ino/st_dev. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	9e88f90524	ovl: remove unneeded goto instructions There is nothing in the out goto target of ovl_get_layers(). Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	ab1eb5ffb7	ovl: deduplicate lowerdata and lowerstack[] The ovl_inode contains a copy of lowerdata in lowerstack[], so the lowerdata inode member can be removed. Use accessors ovl_lowerdata*() to get the lowerdata whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	ac900ed4f2	ovl: deduplicate lowerpath and lowerstack[] The ovl_inode contains a copy of lowerpath in lowerstack[0], so the lowerpath member can be removed. Use accessor ovl_lowerpath() to get the lowerpath whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	0af950f57f	ovl: move ovl_entry into ovl_inode The lower stacks of all the ovl inode aliases should be identical and there is redundant information in ovl_entry and ovl_inode. Move lowerstack into ovl_inode and keep only the OVL_E_FLAGS per overlay dentry. Following patches will deduplicate redundant ovl_inode fields. Note that for pure upper and negative dentries, OVL_E(dentry) may be NULL now, so it is imporatnt to use the ovl_numlower() accessor. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	163db0da35	ovl: factor out ovl_free_entry() and ovl_stack_*() helpers In preparation for moving lowerstack into ovl_inode. Note that in ovl_lookup() the temp stack dentry refs are now cloned into the final ovl_lowerstack instead of being transferred, so cleanup always needs to call ovl_stack_free(stack). Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	5522c9c7cb	ovl: use ovl_numlower() and ovl_lowerstack() accessors This helps fortify against dereferencing a NULL ovl_entry, before we move the ovl_entry reference into ovl_inode. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:13 +03:00
Amir Goldstein	a6ff2bc0be	ovl: use OVL_E() and OVL_E_FLAGS() accessors Instead of open coded instances, because we are about to split the two apart. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:12 +03:00
Amir Goldstein	b07d5cc93e	ovl: update of dentry revalidate flags after copy up After copy up, we may need to update d_flags if upper dentry is on a remote fs and lower dentries are not. Add helpers to allow incremental update of the revalidate flags. Fixes: `bccece1ead` ("ovl: allow remote upper") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2023-06-19 14:01:12 +03:00
Christian Brauner	0c95c025a0	fs: drop unused posix acl handlers Remove struct posix_acl_{access,default}_handler for all filesystems that don't depend on the xattr handler in their inode->i_op->listxattr() method in any way. There's nothing more to do than to simply remove the handler. It's been effectively unused ever since we introduced the new posix acl api. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2023-03-06 09:57:12 +01:00
Christian Brauner	39f60c1cce	fs: port xattr to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in `256c8aed2b` ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2023-01-19 09:24:28 +01:00
Linus Torvalds	6df7cc2268	overlayfs update for 6.2 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCY5b3+gAKCRDh3BK/laaZ PIPxAQCPgyV/X/yJFd3wVgKa3/JxcHl5qdPbwHXFuYiJCBd69QEA9LYQEeEoTLCY veGiQPkl6Sp8ZqmTbDBxqw5OaBTSMwM= =7TiE -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: - Fix a couple of bugs found by syzbot - Don't ingore some open flags set by fcntl(F_SETFL) - Fix failure to create a hard link in certain cases - Use type safe helpers for some mnt_userns transformations - Improve performance of mount - Misc cleanups * tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: Kconfig: Fix spelling mistake "undelying" -> "underlying" ovl: use inode instead of dentry where possible ovl: Add comment on upperredirect reassignment ovl: use plain list filler in indexdir and workdir cleanup ovl: do not reconnect upper index records in ovl_indexdir_cleanup() ovl: fix comment typos ovl: port to vfs{g,u}id_t and associated helpers ovl: Use ovl mounter's fsuid and fsgid in ovl_link() ovl: Use "buf" flexible array for memcpy() destination ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags ovl: fix use inode directly in rcu-walk mode	2022-12-12 20:18:26 -08:00
Chen Zhongjin	672e4268b2	ovl: fix use inode directly in rcu-walk mode ovl_dentry_revalidate_common() can be called in rcu-walk mode. As document said, "in rcu-walk mode, d_parent and d_inode should not be used without care". Check inode here to protect access under rcu-walk mode. Fixes: `bccece1ead` ("ovl: allow remote upper") Reported-and-tested-by: syzbot+a4055c78774bbf3498bb@syzkaller.appspotmail.com Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Cc: <stable@vger.kernel.org> # v5.7 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-11-28 11:33:05 +01:00
Christian Brauner	200afb77cd	ovl: use stub posix acl handlers Now that ovl supports the get and set acl inode operations and the vfs has been switched to the new posi api, ovl can simply rely on the stub posix acl handlers. The custom xattr handlers and associated unused helpers can be removed. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-10-20 10:13:32 +02:00
Christian Brauner	31acceb975	ovl: use posix acl api Now that posix acls have a proper api us it to copy them. All filesystems that can serve as lower or upper layers for overlayfs have gained support for the new posix acl api in previous patches. So switch all internal overlayfs codepaths for copying posix acls to the new posix acl api. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-10-20 10:13:31 +02:00
Linus Torvalds	f721d24e5d	tmpfile API change -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY0DP2AAKCRBZ7Krx/gZQ 6/+qAQCEGQWpcC5MB17zylaX7gqzhgAsDrwtpevlno3aIv/1pQD/YWr/E8tf7WTW ERXRXMRx1cAzBJhUhVgIY+3ANfU2Rg4= =cko4 -----END PGP SIGNATURE----- Merge tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs tmpfile updates from Al Viro: "Miklos' ->tmpfile() signature change; pass an unopened struct file to it, let it open the damn thing. Allows to add tmpfile support to FUSE" * tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse: implement ->tmpfile() vfs: open inside ->tmpfile() vfs: move open right after ->tmpfile() vfs: make vfs_tmpfile() static ovl: use vfs_tmpfile_open() helper cachefiles: use vfs_tmpfile_open() helper cachefiles: only pass inode to *mark_inode_inuse() helpers cachefiles: tmpfile error handling cleanup hugetlbfs: cleanup mknod and tmpfile vfs: add vfs_tmpfile_open() helper	2022-10-10 19:45:17 -07:00
Linus Torvalds	4c0ed7d8d6	whack-a-mole: constifying struct path * -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxmRQAKCRBZ7Krx/gZQ 6+/kAQD2xyf+i4zOYVBr1NB3qBbhVS1zrni1NbC/kT3dJPgTvwEA7z7eqwnrN4zg scKFP8a3yPoaQBfs4do5PolhuSr2ngA= =NBI+ -----END PGP SIGNATURE----- Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs constification updates from Al Viro: "whack-a-mole: constifying struct path " tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs: constify path spufs: constify path nd_jump_link(): constify path audit_init_parent(): constify path __io_setxattr(): constify path do_proc_readlink(): constify path overlayfs: constify path fs/notify: constify path may_linkat(): constify path do_sys_name_to_handle(): constify path ->getprocattr(): attribute name is const char *, TYVM...	2022-10-06 17:31:02 -07:00
Miklos Szeredi	2b1a77461f	ovl: use vfs_tmpfile_open() helper If tmpfile is used for copy up, then use this helper to create the tmpfile and open it at the same time. This will later allow filesystems such as fuse to do this operation atomically. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-09-24 07:00:00 +02:00
Al Viro	2d3430875a	overlayfs: constify path Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-09-01 17:38:07 -04:00
Christian Brauner	7e1401acd9	ovl: use vfs_set_acl_prepare() The posix_acl_from_xattr() helper should mainly be used in i_op->get_acl() handlers. It translates from the uapi struct into the kernel internal POSIX ACL representation and doesn't care about mount idmappings. Use the vfs_set_acl_prepare() helper to generate a kernel internal POSIX ACL representation in struct posix_acl format taking care to map from the mount idmapping into the filesystem's idmapping. The returned struct posix_acl is in the correct format to be cached by the VFS or passed to the filesystem's i_op->set_acl() method to write to the backing store. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>	2022-08-31 16:38:07 +02:00
Linus Torvalds	65512eb0e9	overlayfs update for 6.0 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYvDeCAAKCRDh3BK/laaZ PGjCAP9TVuId3X7Akroc9W+qswPzwlW3fwtE6+9F6ABeNJNPZAEAgU2bp95vqZRh OWP+ptnskceBcX/cRkfxkmgtiNE21wk= =sucY -----END PGP SIGNATURE----- Merge tag 'ovl-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: "Just a small update" * tag 'ovl-update-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix spelling mistakes ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh() ovl: improve ovl_get_acl() if POSIX ACL support is off ovl: fix some kernel-doc comments ovl: warn if trusted xattr creation fails	2022-08-08 11:03:11 -07:00
William Dean	4f1196288d	ovl: fix spelling mistakes fix follow spelling misktakes: decendant ==> descendant indentify ==> identify underlaying ==> underlying Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: William Dean <williamsukatube@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-08-02 15:41:10 +02:00
Yang Li	9c5dd8034e	ovl: fix some kernel-doc comments Remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/overlayfs/super.c:311: warning: Function parameter or member 'dentry' not described in 'ovl_statfs' fs/overlayfs/super.c:311: warning: Excess function parameter 'sb' description in 'ovl_statfs' fs/overlayfs/super.c:357: warning: Function parameter or member 'm' not described in 'ovl_show_options' fs/overlayfs/super.c:357: warning: Function parameter or member 'dentry' not described in 'ovl_show_options' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-07-27 16:31:31 +02:00
Miklos Szeredi	b10b85fe51	ovl: warn if trusted xattr creation fails When mounting overlayfs in an unprivileged user namespace, trusted xattr creation will fail. This will lead to failures in some file operations, e.g. in the following situation: mkdir lower upper work merged mkdir lower/directory mount -toverlay -olowerdir=lower,upperdir=upper,workdir=work none merged rmdir merged/directory mkdir merged/directory The last mkdir will fail: mkdir: cannot create directory 'merged/directory': Input/output error The cause for these failures is currently extremely non-obvious and hard to debug. Hence, warn the user and suggest using the userxattr mount option, if it is not already supplied and xattr creation fails during the self-check. Reported-by: Alois Wohlschlager <alois1@gmx-topmail.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-07-27 16:31:30 +02:00
Christian Brauner	7c4d37c269	Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily" This reverts commit `4a47c6385b`. Now that we have a proper fix for POSIX ACLs with overlayfs on top of idmapped layers revert the temporary fix. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>	2022-07-15 22:10:51 +02:00
Christian Brauner	4a47c6385b	ovl: turn of SB_POSIXACL with idmapped layers temporarily This cycle we added support for mounting overlayfs on top of idmapped mounts. Recently I've started looking into potential corner cases when trying to add additional tests and I noticed that reporting for POSIX ACLs is currently wrong when using idmapped layers with overlayfs mounted on top of it. I have sent out an patch that fixes this and makes POSIX ACLs work correctly but the patch is a bit bigger and we're already at -rc5 so I recommend we simply don't raise SB_POSIXACL when idmapped layers are used. Then we can fix the VFS part described below for the next merge window so we can have good exposure in -next. I'm going to give a rather detailed explanation to both the origin of the problem and mention the solution so people know what's going on. Let's assume the user creates the following directory layout and they have a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would expect files on your host system to be owned. For example, ~/.bashrc for your regular user would be owned by 1000:1000 and /root/.bashrc would be owned by 0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs filesystem. The user chooses to set POSIX ACLs using the setfacl binary granting the user with uid 4 read, write, and execute permissions for their .bashrc file: setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc Now they to expose the whole rootfs to a container using an idmapped mount. So they first create: mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap} mkdir -pv /vol/contpool/ctrover/{over,work} chown 10000000:10000000 /vol/contpool/ctrover/{over,work} The user now creates an idmapped mount for the rootfs: mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \ /var/lib/lxc/c2/rootfs \ /vol/contpool/lowermap This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at /vol/contpool/lowermap/home/ubuntu/.bashrc. Assume the user wants to expose these idmapped mounts through an overlayfs mount to a container. mount -t overlay overlay \ -o lowerdir=/vol/contpool/lowermap, \ upperdir=/vol/contpool/overmap/over, \ workdir=/vol/contpool/overmap/work \ /vol/contpool/merge The user can do this in two ways: (1) Mount overlayfs in the initial user namespace and expose it to the container. (2) Mount overlayfs on top of the idmapped mounts inside of the container's user namespace. Let's assume the user chooses the (1) option and mounts overlayfs on the host and then changes into a container which uses the idmapping 0:10000000:65536 which is the same used for the two idmapped mounts. Now the user tries to retrieve the POSIX ACLs using the getfacl command getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc and to their surprise they see: # file: vol/contpool/merge/home/ubuntu/.bashrc # owner: 1000 # group: 1000 user::rw- user:4294967295:rwx group::r-- mask::rwx other::r-- indicating the uid wasn't correctly translated according to the idmapped mount. The problem is how we currently translate POSIX ACLs. Let's inspect the callchain in this example: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping / sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get == ovl_posix_acl_xattr_get() \| -> ovl_xattr_get() \| -> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get() / lower filesystem callback / \|> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns / no idmapped mount /, 4); / FAILURE / -1 = from_kuid(0:10000000:65536 / caller's idmapping /, 4); } If the user chooses to use option (2) and mounts overlayfs on top of idmapped mounts inside the container things don't look that much better: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get == ovl_posix_acl_xattr_get() \| -> ovl_xattr_get() \| -> vfs_getxattr() \| -> __vfs_getxattr() \| -> handler->get() / lower filesystem callback / \|> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns, 4); / FAILURE / -1 = from_kuid(0:10000000:65536 / caller's idmapping /, 4); } As is easily seen the problem arises because the idmapping of the lower mount isn't taken into account as all of this happens in do_gexattr(). But do_getxattr() is always called on an overlayfs mount and inode and thus cannot possible take the idmapping of the lower layers into account. This problem is similar for fscaps but there the translation happens as part of vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain: setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc The expected outcome here is that we'll receive the cap_net_raw capability as we are able to map the uid associated with the fscap to 0 within our container. IOW, we want to see 0 as the result of the idmapping translations. If the user chooses option (1) we get the following callchain for fscaps: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k / initial idmapping / sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() ________________________________ -> cap_inode_getsecurity() \| \| { V \| 10000000 = make_kuid(0:0:4k / overlayfs idmapping /, 10000000); \| 10000000 = mapped_kuid_fs(0:0:4k / no idmapped mount /, 10000000); \| / Expected result is 0 and thus that we own the fscap. / \| 0 = from_kuid(0:10000000:65536 / caller's idmapping /, 10000000); \| } \| -> vfs_getxattr_alloc() \| -> handler->get == ovl_other_xattr_get() \| -> vfs_getxattr() \| -> xattr_getsecurity() \| -> security_inode_getsecurity() \| -> cap_inode_getsecurity() \| { \| 0 = make_kuid(0:0:4k / lower s_user_ns /, 0); \| 10000000 = mapped_kuid_fs(0:10000000:65536 / idmapped mount /, 0); \| 10000000 = from_kuid(0:0:4k / overlayfs idmapping /, 10000000); \| \|____________________________________________________________________\| } -> vfs_getxattr_alloc() -> handler->get == / lower filesystem callback / And if the user chooses option (2) we get: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() _______________________________ -> cap_inode_getsecurity() \| \| { V \| 10000000 = make_kuid(0:10000000:65536 / overlayfs idmapping /, 0); \| 10000000 = mapped_kuid_fs(0:0:4k / no idmapped mount /, 10000000); \| / Expected result is 0 and thus that we own the fscap. / \| 0 = from_kuid(0:10000000:65536 / caller's idmapping /, 10000000); \| } \| -> vfs_getxattr_alloc() \| -> handler->get == ovl_other_xattr_get() \| \|-> vfs_getxattr() \| -> xattr_getsecurity() \| -> security_inode_getsecurity() \| -> cap_inode_getsecurity() \| { \| 0 = make_kuid(0:0:4k / lower s_user_ns /, 0); \| 10000000 = mapped_kuid_fs(0:10000000:65536 / idmapped mount /, 0); \| 0 = from_kuid(0:10000000:65536 / overlayfs idmapping /, 10000000); \| \|____________________________________________________________________\| } -> vfs_getxattr_alloc() -> handler->get == / lower filesystem callback / We can see how the translation happens correctly in those cases as the conversion happens within the vfs_getxattr() helper. For POSIX ACLs we need to do something similar. However, in contrast to fscaps we cannot apply the fix directly to the kernel internal posix acl data structure as this would alter the cached values and would also require a rework of how we currently deal with POSIX ACLs in general which almost never take the filesystem idmapping into account (the noteable exception being FUSE but even there the implementation is special) and instead retrieve the raw values based on the initial idmapping. The correct values are then generated right before returning to userspace. The fix for this is to move taking the mount's idmapping into account directly in vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user(). To this end we simply move the idmapped mount translation into a separate step performed in vfs_{g,s}etxattr() instead of in posix_acl_fix_xattr_{from,to}_user(). To see how this fixes things let's go back to the original example. Assume the user chose option (1) and mounted overlayfs on top of idmapped mounts on the host: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k / initial idmapping / sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| \|> __vfs_getxattr() \| \| -> handler->get == ovl_posix_acl_xattr_get() \| \| -> ovl_xattr_get() \| \| -> vfs_getxattr() \| \| \|> __vfs_getxattr() \| \| \| -> handler->get() / lower filesystem callback / \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { \| \| 4 = make_kuid(&init_user_ns, 4); \| \| 10000004 = mapped_kuid_fs(0:10000000:65536 / lower idmapped mount /, 4); \| \| 10000004 = from_kuid(&init_user_ns, 10000004); \| \| \|_______________________ \| \| } \| \| \| \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { \| \| V \| 10000004 = make_kuid(&init_user_ns, 10000004); \| 10000004 = mapped_kuid_fs(&init_user_ns / no idmapped mount /, 10000004); \| 10000004 = from_kuid(&init_user_ns, 10000004); \| } \|_________________________________________________ \| \| \| \| \|> posix_acl_fix_xattr_to_user() \| { V 10000004 = make_kuid(0:0:4k / init_user_ns /, 10000004); / SUCCESS / 4 = from_kuid(0:10000000:65536 / caller's idmapping /, 10000004); } And similarly if the user chooses option (1) and mounted overayfs on top of idmapped mounts inside the container: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() \|> vfs_getxattr() \| \|> __vfs_getxattr() \| \| -> handler->get == ovl_posix_acl_xattr_get() \| \| -> ovl_xattr_get() \| \| -> vfs_getxattr() \| \| \|> __vfs_getxattr() \| \| \| -> handler->get() / lower filesystem callback / \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { \| \| 4 = make_kuid(&init_user_ns, 4); \| \| 10000004 = mapped_kuid_fs(0:10000000:65536 / lower idmapped mount /, 4); \| \| 10000004 = from_kuid(&init_user_ns, 10000004); \| \| \|_______________________ \| \| } \| \| \| \| \| \|> posix_acl_getxattr_idmapped_mnt() \| \| { V \| 10000004 = make_kuid(&init_user_ns, 10000004); \| 10000004 = mapped_kuid_fs(&init_user_ns / no idmapped mount /, 10000004); \| 10000004 = from_kuid(0(&init_user_ns, 10000004); \| \|_________________________________________________ \| } \| \| \| \|> posix_acl_fix_xattr_to_user() \| { V 10000004 = make_kuid(0:0:4k / init_user_ns /, 10000004); / SUCCESS / 4 = from_kuid(0:10000000:65536 / caller's idmappings /, 10000004); } The last remaining problem we need to fix here is ovl_get_acl(). During ovl_permission() overlayfs will call: ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() > get_acl() / on the underlying filesystem) ->inode->i_op->get_acl() == /lower filesystem callback / -> posix_acl_permission() passing through the get_acl request to the underlying filesystem. This will retrieve the acls stored in the lower filesystem without taking the idmapping of the underlying mount into account as this would mean altering the cached values for the lower filesystem. The simple solution is to have ovl_get_acl() simply duplicate the ACLs, update the values according to the idmapped mount and return it to acl_permission_check() so it can be used in posix_acl_permission(). Since overlayfs doesn't cache ACLs they'll be released right after. Link: https://github.com/brauner/mount-idmapped/issues/9 Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Fixes: `bc70682a49` ("ovl: support idmapped layers") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-07-08 15:48:31 +02:00
Christian Brauner	bc70682a49	ovl: support idmapped layers Now that overlay is able to take a layers idmapping into account allow overlay mounts to be created on top of idmapped mounts. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-04-28 16:31:12 +02:00
Amir Goldstein	ffa5723c6d	ovl: store lower path in ovl_inode Create some ovl_i_* helpers to get real path from ovl inode. Instead of just stashing struct inode for the lower layer we stash struct path for the lower layer. The helpers allow to retrieve a struct path for the relevant upper or lower layer. This will be used when retrieving information based on struct inode when copying up inode attributes from upper or lower inodes to ovl inodes and when checking permissions in ovl_permission() in following patches. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-04-28 16:31:12 +02:00
Christian Brauner	22f289ce1f	ovl: use ovl_lookup_upper() wrapper Introduce ovl_lookup_upper() as a simple wrapper around lookup_one(). Make it clear in the helper's name that this only operates on the upper layer. The wrapper will take upper layer's idmapping into account when checking permission in lookup_one(). Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-04-28 16:31:11 +02:00
Christian Brauner	a15506eac9	ovl: use ovl_do_notify_change() wrapper Introduce ovl_do_notify_change() as a simple wrapper around notify_change() to support idmapped layers. The helper mirrors other ovl_do_*() helpers that operate on the upper layers. When changing ownership of an upper object the intended ownership needs to be mapped according to the upper layer's idmapping. This mapping is the inverse to the mapping applied when copying inode information from an upper layer to the corresponding overlay inode. So e.g., when an upper mount maps files that are stored on-disk as owned by id 1001 to 1000 this means that calling stat on this object from an idmapped mount will report the file as being owned by id 1000. Consequently in order to change ownership of an object in this filesystem so it appears as being owned by id 1000 in the upper idmapped layer it needs to store id 1001 on disk. The mnt mapping helpers take care of this. All idmapping helpers are nops when no idmapped base layers are used. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-04-28 16:31:11 +02:00
Christian Brauner	576bb26345	ovl: pass ofs to creation operations Pass down struct ovl_fs to all creation helpers so we can ultimately retrieve the relevant upper mount and take the mount's idmapping into account when creating new filesystem objects. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-04-28 16:31:10 +02:00
Amir Goldstein	c914c0e27e	ovl: use wrappers to all vfs_xattr() calls Use helpers ovl_xattr() to access user/trusted.overlay.* xattrs and use helpers ovl_do_xattr() to access generic xattrs. This is a preparatory patch for using idmapped base layers with overlay. Note that a few of those places called vfs_xattr() calls directly to reduce the amount of debug output. But as Miklos pointed out since overlayfs has been stable for quite some time the debug output isn't all that relevant anymore and the additional debug in all locations was actually quite helpful when developing this patch series. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2022-04-28 16:31:10 +02:00
Muchun Song	fd60b28842	fs: allocate inode by using alloc_inode_sb() The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-03-22 15:57:03 -07:00
Christian Brauner	bb49e9e730	fs: add is_idmapped_mnt() helper Multiple places open-code the same check to determine whether a given mount is idmapped. Introduce a simple helper function that can be used instead. This allows us to get rid of the fragile open-coding. We will later change the check that is used to determine whether a given mount is idmapped. Introducing a helper allows us to do this in a single place instead of doing it for multiple places. Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-12-03 18:44:06 +01:00
Miklos Szeredi	1f5573cfe7	ovl: fix warning in ovl_create_real() Syzbot triggered the following warning in ovl_workdir_create() -> ovl_create_real(): if (!err && WARN_ON(!newdentry->d_inode)) { The reason is that the cgroup2 filesystem returns from mkdir without instantiating the new dentry. Weird filesystems such as this will be rejected by overlayfs at a later stage during setup, but to prevent such a warning, call ovl_mkdir_real() directly from ovl_workdir_create() and reject this case early. Reported-and-tested-by: syzbot+75eab84fd0af9e8bf66b@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-11-04 10:55:34 +01:00
Vyacheslav Yurkov	ca45275cd6	ovl: add ovl_allow_offline_changes() helper Allows to check whether any of extended features are enabled Signed-off-by: Vyacheslav Yurkov <Vyacheslav.Yurkov@bruker.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-08-17 11:47:44 +02:00
Vyacheslav Yurkov	e4522bc873	ovl: disable decoding null uuid with redirect_dir Currently decoding origin with lower null uuid is not allowed unless user opted-in to one of the new features that require following the lower inode of non-dir upper (index, xino, metacopy). Now we add redirect_dir too to that feature list. Signed-off-by: Vyacheslav Yurkov <Vyacheslav.Yurkov@bruker.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-08-17 11:47:44 +02:00
Miklos Szeredi	708fa01597	ovl: allow upperdir inside lowerdir Commit `146d62e5a5` ("ovl: detect overlapping layers") made sure we don't have overlapping layers, but it also broke the arguably valid use case of mount -olowerdir=/,upperdir=/subdir,.. where upperdir overlaps lowerdir on the same filesystem. This has been causing regressions. Revert the check, but only for the specific case where upperdir and/or workdir are subdirectories of lowerdir. Any other overlap (e.g. lowerdir is subdirectory of upperdir, etc) case is crazy, so leave the check in place for those. Overlaps are detected at lookup time too, so reverting the mount time check should be safe. Fixes: `146d62e5a5` ("ovl: detect overlapping layers") Cc: <stable@vger.kernel.org> # v5.2 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-04-12 12:00:37 +02:00
Giuseppe Scrivano	321b46b904	ovl: show "userxattr" in the mount data This was missed when adding the option. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Fixes: `2d2f2d7322` ("ovl: user xattr") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-04-12 12:00:37 +02:00
Chengguang Xu	568edee485	ovl: do not copy attr several times In ovl_xattr_set() we have already copied attr of real inode so no need to copy it again in ovl_posix_acl_xattr_set(). Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-04-12 12:00:37 +02:00
Chengguang Xu	d7b49b10d5	ovl: fix error for ovl_fill_super() There are some places should return -EINVAL instead of -ENOMEM in ovl_fill_super(). [Amir] Consistently set error before checking the error condition. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-04-12 12:00:36 +02:00

1 2 3 4 5 ...

313 Commits