Commit Graph

527 Commits

Author SHA1 Message Date
Miklos Szeredi
5ef88da56a ovl: helper to iterate layers
Add helper to iterate through all the layers, starting from the upper layer
(if exists) and continuing down through the lower layers.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-13 00:59:43 +01:00
Miklos Szeredi
dd662667e6 ovl: add mutli-layer infrastructure
Add multiple lower layers to 'struct ovl_fs' and 'struct ovl_entry'.

ovl_entry will have an array of paths, instead of just the dentry.  This
allows a compact array containing just the layers which exist at current
point in the tree (which is expected to be a small number for the majority
of dentries).

The number of layers is not limited by this infrastructure.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-13 00:59:43 +01:00
Miklos Szeredi
263b4a0fee ovl: dont replace opaque dir
When removing an empty opaque directory, then it makes no sense to replace
it with an exact replica of itself before removal.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-13 00:59:43 +01:00
Miklos Szeredi
1afaba1ecb ovl: make path-type a bitmap
OVL_PATH_PURE_UPPER -> __OVL_PATH_UPPER | __OVL_PATH_PURE
OVL_PATH_UPPER      -> __OVL_PATH_UPPER
OVL_PATH_MERGE      -> __OVL_PATH_UPPER | __OVL_PATH_MERGE
OVL_PATH_LOWER      -> 0

Multiple R/O layers will allow __OVL_PATH_MERGE without __OVL_PATH_UPPER.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-13 00:59:42 +01:00
Miklos Szeredi
49c21e1cac ovl: check whiteout while reading directory
Don't make a separate pass for checking whiteouts, since we can do it while
reading the upper directory.

This will make it easier to handle multiple layers.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-13 00:59:42 +01:00
Al Viro
ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
Miklos Szeredi
7676895f47 ovl: ovl_dir_fsync() cleanup
Check against !OVL_PATH_LOWER instead of OVL_PATH_MERGE.  For a copied up
directory the two are currently equivalent.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:02 +01:00
Miklos Szeredi
c9f00fdb9a ovl: pass dentry into ovl_dir_read_merged()
Pass dentry into ovl_dir_read_merged() insted of upperpath and lowerpath.
This cleans up callers and paves the way for multi-layer directory reads.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:01 +01:00
Miklos Szeredi
71d509280f ovl: use lockless_dereference() for upperdentry
Don't open code lockless_dereference() in ovl_upperdentry_dereference().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:01 +01:00
Miklos Szeredi
91c7794713 ovl: allow filenames with comma
Allow option separator (comma) to be escaped with backslash.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:00 +01:00
Miklos Szeredi
521484639e ovl: fix race in private xattr checks
Xattr operations can race with copy up.  This does not matter as long as
we consistently fiter out "trunsted.overlay.opaque" attribute on upper
directories.

Previously we checked parent against OVL_PATH_MERGE.  This is too general,
and prone to race with copy-up.  I.e. we found the parent to be on the
lower layer but ovl_dentry_real() would return the copied-up dentry,
possibly with the "opaque" attribute.

So instead use ovl_path_real() and decide to filter the attributes based on
the actual type of the dentry we'll use.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:00 +01:00
Miklos Szeredi
a105d685a8 ovl: fix remove/copy-up race
ovl_remove_and_whiteout() needs to check if upper dentry exists or not
after having locked upper parent directory.

Previously we used a "type" value computed before locking the upper parent
directory, which is susceptible to racing with copy-up.

There's a similar check in ovl_check_empty_and_clear().  This one is not
actually racy, since copy-up doesn't change the "emptyness" property of a
directory.  Add a comment to this effect, and check the existence of upper
dentry locally to make the code cleaner.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:39:59 +01:00
Miklos Szeredi
ef94b1864d ovl: rename filesystem type to "overlay"
Some distributions carry an "old" format of overlayfs while mainline has a
"new" format.

The distros will possibly want to keep the old overlayfs alongside the new
for compatibility reasons.

To make it possible to differentiate the two versions change the name of
the new one from "overlayfs" to "overlay".

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Andy Whitcroft <apw@canonical.com>
2014-11-20 16:39:59 +01:00
Miklos Szeredi
3f822c6264 ovl: don't poison cursor
ovl_cache_put() can be called from ovl_dir_reset() if the cache needs to be
rebuilt.  We did list_del() on the cursor, which results in an Oops on the
poisoned pointer in ovl_seek_cursor().

Reported-by: Jordi Pujol Palomer <jordipujolp@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Jordi Pujol Palomer <jordipujolp@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-05 08:49:38 -05:00
Miklos Szeredi
ac7576f4b1 vfs: make first argument of dir_context.actor typed
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-31 17:48:54 -04:00
Miklos Szeredi
9f2f7d4c8d ovl: initialize ->is_cursor
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-31 17:47:51 -04:00
Miklos Szeredi
d1b72cc6d8 overlayfs: fix lockdep misannotation
In an overlay directory that shadows an empty lower directory, say
/mnt/a/empty102, do:

 	touch /mnt/a/empty102/x
 	unlink /mnt/a/empty102/x
 	rmdir /mnt/a/empty102

It's actually harmless, but needs another level of nesting between
I_MUTEX_CHILD and I_MUTEX_NORMAL.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-28 18:32:47 -04:00
Miklos Szeredi
c2096537d4 ovl: fix check for cursor
ovl_cache_entry.name is now an array not a pointer, so it makes no sense
test for it being NULL.

Detected by coverity.

From: Miklos Szeredi <mszeredi@suse.cz>
Fixes: 68bf861107 ("overlayfs: make ovl_cache_entry->name an array instead of
+pointer")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-28 18:31:54 -04:00
Al Viro
d45f00ae43 overlayfs: barriers for opening upper-layer directory
make sure that
	a) all stores done by opening struct file don't leak past storing
the reference in od->upperfile
	b) the lockless side has read dependency barrier

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-28 18:27:28 -04:00
Al Viro
db6ec212b5 overlayfs: embed middle into overlay_readdir_data
same story...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-24 20:25:23 -04:00
Al Viro
49be4fb9cc overlayfs: embed root into overlay_readdir_data
no sense having it a pointer - all instances have it pointing to
local variable in the same stack frame

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-24 20:25:23 -04:00
Al Viro
68bf861107 overlayfs: make ovl_cache_entry->name an array instead of pointer
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-24 20:25:22 -04:00
Al Viro
3d268c9b13 overlayfs: don't hold ->i_mutex over opening the real directory
just use it to serialize the assignment

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-24 20:24:11 -04:00
Miklos Szeredi
69c433ed2e fs: limit filesystem stacking depth
Add a simple read-only counter to super_block that indicates how deep this
is in the stack of filesystems.  Previously ecryptfs was the only stackable
filesystem and it explicitly disallowed multiple layers of itself.

Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.

To limit the kernel stack usage we must limit the depth of the
filesystem stack.  Initially the limit is set to 2.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-24 00:14:39 +02:00
Erez Zadok
f45827e841 overlayfs: implement show_options
This is useful because of the stacking nature of overlayfs.  Users like to
find out (via /proc/mounts) which lower/upper directory were used at mount
time.

AV: even failing ovl_parse_opt() could've done some kstrdup()
AV: failure of ovl_alloc_entry() should end up with ENOMEM, not EINVAL

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-24 00:14:38 +02:00
Andy Whitcroft
cc2596392a overlayfs: add statfs support
Add support for statfs to the overlayfs filesystem.  As the upper layer
is the target of all write operations assume that the space in that
filesystem is the space in the overlayfs.  There will be some inaccuracy as
overwriting a file will copy it up and consume space we were not expecting,
but it is better than nothing.

Use the upper layer dentry and mount from the overlayfs root inode,
passing the statfs call to that filesystem.

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-24 00:14:38 +02:00
Miklos Szeredi
e9be9d5e76 overlay filesystem
Overlayfs allows one, usually read-write, directory tree to be
overlaid onto another, read-only directory tree.  All modifications
go to the upper, writable layer.

This type of mechanism is most often used for live CDs but there's a
wide variety of other uses.

The implementation differs from other "union filesystem"
implementations in that after a file is opened all operations go
directly to the underlying, lower or upper, filesystems.  This
simplifies the implementation and allows native performance in these
cases.

The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS.  This uses slightly more memory than union mounts, but dentries
are relatively small.

Currently inodes are duplicated as well, but it is a possible
optimization to share inodes for non-directories.

Opening non directories results in the open forwarded to the
underlying filesystem.  This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).

Usage:

  mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay

The following cotributions have been folded into this patch:

Neil Brown <neilb@suse.de>:
 - minimal remount support
 - use correct seek function for directories
 - initialise is_real before use
 - rename ovl_fill_cache to ovl_dir_read

Felix Fietkau <nbd@openwrt.org>:
 - fix a deadlock in ovl_dir_read_merged
 - fix a deadlock in ovl_remove_whiteouts

Erez Zadok <ezk@fsl.cs.sunysb.edu>
 - fix cleanup after WARN_ON

Sedat Dilek <sedat.dilek@googlemail.com>
 - fix up permission to confirm to new API

Robin Dong <hao.bigrat@gmail.com>
 - fix possible leak in ovl_new_inode
 - create new inode in ovl_link

Andy Whitcroft <apw@canonical.com>
 - switch to __inode_permission()
 - copy up i_uid/i_gid from the underlying inode

AV:
 - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
 - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
   lack of check for udir being the parent of upper, dropping and regaining
   the lock on udir (which would require _another_ check for parent being
   right).
 - bogus d_drop() in copyup and rename [fix from your mail]
 - copyup/remove and copyup/rename races [fix from your mail]
 - ovl_dir_fsync() leaving ERR_PTR() in ->realfile
 - ovl_entry_free() is pointless - it's just a kfree_rcu()
 - fold ovl_do_lookup() into ovl_lookup()
 - manually assigning ->d_op is wrong.  Just use ->s_d_op.
 [patches picked from Miklos]:
 * copyup/remove and copyup/rename races
 * bogus d_drop() in copyup and rename

Also thanks to the following people for testing and reporting bugs:

  Jordi Pujol <jordipujolp@gmail.com>
  Andy Whitcroft <apw@canonical.com>
  Michal Suchanek <hramrach@centrum.cz>
  Felix Fietkau <nbd@openwrt.org>
  Erez Zadok <ezk@fsl.cs.sunysb.edu>
  Randy Dunlap <rdunlap@xenotime.net>

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-24 00:14:38 +02:00