linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-04 18:13:04 +00:00

Author	SHA1	Message	Date
Al Viro	9b40bc90ab	get rid of unprotected dereferencing of mnt->mnt_ns It's safe only under namespace_sem or vfsmount_lock; all places in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in there). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:05 -05:00
Miao Xie	1e75529e3c	vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags The compiler may optimize the while loop and make the check just be done once, so we should use ACCESS_ONCE() to guard access to ->mnt_flags Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-20 13:36:18 -05:00
Eric W. Biederman	5e4a08476b	userns: Require CAP_SYS_ADMIN for most uses of setns. Andy Lutomirski <luto@amacapital.net> found a nasty little bug in the permissions of setns. With unprivileged user namespaces it became possible to create new namespaces without privilege. However the setns calls were relaxed to only require CAP_SYS_ADMIN in the user nameapce of the targed namespace. Which made the following nasty sequence possible. pid = clone(CLONE_NEWUSER \| CLONE_NEWNS); if (pid == 0) { /* child / system("mount --bind /home/me/passwd /etc/passwd"); } else if (pid != 0) { / parent */ char path[PATH_MAX]; snprintf(path, sizeof(path), "/proc/%u/ns/mnt"); fd = open(path, O_RDONLY); setns(fd, 0); system("su -"); } Prevent this possibility by requiring CAP_SYS_ADMIN in the current user namespace when joing all but the user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-12-14 16:12:03 -08:00
Eric W. Biederman	98f842e675	proc: Usable inode numbers for the namespace file descriptors. Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-20 04:19:49 -08:00
Zhao Hongjiang	ae11e0f184	userns: fix return value on mntns_install() failure Change return value from -EINVAL to -EPERM when the permission check fails. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:22 -08:00
Eric W. Biederman	0c55cfc416	vfs: Allow unprivileged manipulation of the mount namespace. - Add a filesystem flag to mark filesystems that are safe to mount as an unprivileged user. - Add a filesystem flag to mark filesystems that don't need MNT_NODEV when mounted by an unprivileged user. - Relax the permission checks to allow unprivileged users that have CAP_SYS_ADMIN permissions in the user namespace referred to by the current mount namespace to be allowed to mount, unmount, and move filesystems. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:21 -08:00
Eric W. Biederman	7a472ef4be	vfs: Only support slave subtrees across different user namespaces Sharing mount subtress with mount namespaces created by unprivileged users allows unprivileged mounts created by unprivileged users to propagate to mount namespaces controlled by privileged users. Prevent nasty consequences by changing shared subtrees to slave subtress when an unprivileged users creates a new mount namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:20 -08:00
Eric W. Biederman	771b137168	vfs: Add a user namespace reference from struct mnt_namespace This will allow for support for unprivileged mounts in a new user namespace. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:19 -08:00
Eric W. Biederman	8823c079ba	vfs: Add setns support for the mount namespace setns support for the mount namespace is a little tricky as an arbitrary decision must be made about what to set fs->root and fs->pwd to, as there is no expectation of a relationship between the two mount namespaces. Therefore I arbitrarily find the root mount point, and follow every mount on top of it to find the top of the mount stack. Then I set fs->root and fs->pwd to that location. The topmost root of the mount stack seems like a reasonable place to be. Bind mount support for the mount namespace inodes has the possibility of creating circular dependencies between mount namespaces. Circular dependencies can result in loops that prevent mount namespaces from every being freed. I avoid creating those circular dependencies by adding a sequence number to the mount namespace and require all bind mounts be of a younger mount namespace into an older mount namespace. Add a helper function proc_ns_inode so it is possible to detect when we are attempting to bind mound a namespace inode. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:18 -08:00
Jeff Layton	91a27b2a75	vfs: define struct filename and have getname() return it getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-12 20:14:55 -04:00
Al Viro	808d4e3cfd	consitify do_mount() arguments Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-11 20:02:04 -04:00
Al Viro	156cacb1d0	do_add_mount()/umount -l races normally we deal with lock_mount()/umount races by checking that mountpoint to be is still in our namespace after lock_mount() has been done. However, do_add_mount() skips that check when called with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The reason is that ->mnt_ns may be a temporary namespace created exactly to contain automounts a-la NFS4 referral handling. It's not the namespace of the caller, though, so check_mnt() would fail here. We still need to check that ->mnt_ns is non-NULL in that case, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-22 20:48:18 -04:00
Jan Kara	eb04c28288	fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:40:38 +04:00
David Howells	f015f1267b	VFS: Comment mount following code Add comments describing what the directions "up" and "down" mean and ref count handling to the VFS mount following family of functions. Signed-off-by: Valerie Aurora <vaurora@redhat.com> (Original author) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:38:32 +04:00
David Howells	be34d1a3bc	VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Valerie Aurora <valerie.aurora@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:37:27 +04:00
Al Viro	6ce6e24e72	get rid of magic in proc_namespace.c don't rely on proc_mounts->m being the first field; container_of() is there for purpose. No need to bother with ->private, while we are at it - the same container_of will do nicely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:48 +04:00
Al Viro	f7a99c5b7c	get rid of ->mnt_longterm it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:47 +04:00
Al Viro	63d37a84ab	vfs: umount_tree() might be called on subtree that had never made it __mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm() we'd done back when we set ->mnt_ns non-NULL; it should not be done to vfsmounts that had never gone through commit_tree() and friends. Kudos to lczerner for catching that one... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:55 -04:00
Andi Kleen	962830df36	brlocks/lglocks: API cleanups lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:41 -04:00
Linus Torvalds	98793265b4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)	2012-01-08 13:21:22 -08:00
Miklos Szeredi	8e8b87964b	vfs: prevent remount read-only if pending removes If there are any inodes on the super block that have been unlinked (i_nlink == 0) but have not yet been deleted then prevent the remounting the super block read-only. Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-06 23:20:13 -05:00
Miklos Szeredi	4ed5e82fe7	vfs: protect remounting superblock read-only Currently remouting superblock read-only is racy in a major way. With the per mount read-only infrastructure it is now possible to prevent most races, which this patch attempts. Before starting the remount read-only, iterate through all mounts belonging to the superblock and if none of them have any pending writes, set sb->s_readonly_remount. This indicates that remount is in progress and no further write requests are allowed. If the remount succeeds set MS_RDONLY and reset s_readonly_remount. If the remounting is unsuccessful just reset s_readonly_remount. This can result in transient EROFS errors, despite the fact the remount failed. Unfortunately hodling off writes is difficult as remount itself may touch the filesystem (e.g. through load_nls()) which would deadlock. A later patch deals with delayed writes due to nlink going to zero. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-06 23:20:12 -05:00
Miklos Szeredi	39f7c4db1d	vfs: keep list of mounts for each superblock Keep track of vfsmounts belonging to a superblock. List is protected by vfsmount_lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-06 23:20:12 -05:00
Al Viro	34c80b1d93	vfs: switch ->show_options() to struct dentry * Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-06 23:19:54 -05:00
Al Viro	d10577a8d8	vfs: trim includes a bit [folded fix for missing magic.h from Tetsuo Handa] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:13 -05:00
Al Viro	be08d6d260	switch mnt_namespace ->root to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:13 -05:00
Al Viro	0226f4923f	vfs: take /proc/*/mounts and friends to fs/proc_namespace.c rationale: that stuff is far tighter bound to fs/namespace.c than to the guts of procfs proper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:13 -05:00
Al Viro	3a2393d71d	vfs: opencode mntget() mnt_set_mountpoint() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:12 -05:00
Al Viro	909b0a88ef	vfs: spread struct mount - remaining argument of next_mnt() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:12 -05:00
Al Viro	c63181e6b6	vfs: move fsnotify junk to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:12 -05:00
Al Viro	52ba1621de	vfs: move mnt_devname Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:11 -05:00
Al Viro	1a4eeaf2a8	vfs: move mnt_list to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:11 -05:00
Al Viro	fc7be130c7	vfs: switch pnode.h macros to struct mount * Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:11 -05:00
Al Viro	863d684f94	vfs: move the rest of int fields to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:10 -05:00
Al Viro	15169fe784	vfs: mnt_id/mnt_group_id moved Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:10 -05:00
Al Viro	143c8c91ce	vfs: mnt_ns moved to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:09 -05:00
Al Viro	900148dcac	vfs: spread struct mount - mntput_no_expire Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:09 -05:00
Al Viro	95bc5f25c1	vfs: spread struct mount - do_add_mount and graft_tree Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:09 -05:00
Al Viro	6776db3d32	vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:08 -05:00
Al Viro	32301920f4	vfs: and now we can make ->mnt_master point to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:08 -05:00
Al Viro	d10e8def07	vfs: take mnt_master to struct mount make IS_MNT_SLAVE take struct mount * at the same time Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:08 -05:00
Al Viro	14cf1fa8f5	vfs: spread struct mount - remaining argument of mnt_set_mountpoint() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:07 -05:00
Al Viro	a8d56d8e4f	vfs: spread struct mount - propagate_mnt() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:07 -05:00
Al Viro	6fc7871fed	vfs: spread struct mount - get_dominating_id / do_make_slave next pile of horrors, similar to mnt_parent one; this time it's mnt_master. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:06 -05:00
Al Viro	6b41d536f7	vfs: take mnt_child/mnt_mounts to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:06 -05:00
Al Viro	68e8a9feab	vfs: all counters taken to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:06 -05:00
Al Viro	83adc75322	vfs: spread struct mount - work with counters Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:05 -05:00
Al Viro	a73324da7a	vfs: move mnt_mountpoint to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:05 -05:00
Al Viro	0714a53380	vfs: now it can be done - make mnt_parent point to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:05 -05:00
Al Viro	3376f34fff	vfs: mnt_parent moved to struct mount the second victim... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:04 -05:00
Al Viro	643822b41e	vfs: spread struct mount - is_path_reachable Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:04 -05:00
Al Viro	676da58df7	vfs: spread struct mount - mnt_has_parent Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:04 -05:00
Al Viro	1ab5973862	vfs: spread struct mount - do_umount/propagate_mount_busy Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:03 -05:00
Al Viro	44d964d609	vfs: spread struct mount mnt_set_mountpoint child argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:03 -05:00
Al Viro	87129cc0e3	vfs: spread struct mount - clone_mnt/copy_tree argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:03 -05:00
Al Viro	692afc312b	vfs: spread struct mount - shrink_submounts/select_submounts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:02 -05:00
Al Viro	761d5c38eb	vfs: spread struct mount - umount_tree argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:02 -05:00
Al Viro	1b8e5564b9	vfs: the first spoils - mnt_hash moved taken out of struct vfsmount into struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:02 -05:00
Al Viro	d5e50f74dd	vfs: spread struct mount to remaining users of ->mnt_hash Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:01 -05:00
Al Viro	cb338d06e9	vfs: spread struct mount - clone_mnt/copy_tree result Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:01 -05:00
Al Viro	0f0afb1dcf	vfs: spread struct mount - change_mnt_propagation/set_mnt_shared Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:01 -05:00
Al Viro	b105e270b4	vfs: spread struct mount - alloc_vfsmnt/free_vfsmnt/mnt_alloc_id/mnt_free_id Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:00 -05:00
Al Viro	cbbe362cd6	vfs: spread struct mount - tree_contains_unbindable Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:00 -05:00
Al Viro	0fb54e5056	vfs: spread struct mount - attach_recursive_mnt Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:00 -05:00
Al Viro	4b8b21f4fe	vfs: spread struct mount - mount group id handling Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:56:59 -05:00
Al Viro	4b2619a571	vfs: spread struct mount - commit_tree Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:56:59 -05:00
Al Viro	419148da6e	vfs: spread struct mount - attach_mnt/detach_mnt Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:56:59 -05:00
Al Viro	315fc83e56	vfs: spread struct mount - namespace.c internal iterators next_mnt() return value, first argument skip_mnt_tree() return value and argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:56:58 -05:00
Al Viro	c71053659e	vfs: spread struct mount - __lookup_mnt() result switch __lookup_mnt() to returning struct mount *; callers adjusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:56:58 -05:00
Al Viro	7d6fec45a5	vfs: start hiding vfsmount guts series Almost all fields of struct vfsmount are used only by core VFS (and a fairly small part of it, at that). The plan: embed struct vfsmount into struct mount, making the latter visible only to core parts of VFS. Then move fields from vfsmount to mount, eventually leaving only mnt_root/mnt_sb/mnt_flags in struct vfsmount. Filesystem code still gets pointers to struct vfsmount and remains unchanged; all such pointers go to struct vfsmount embedded into the instances of struct mount allocated by fs/namespace.c. When fs/namespace.c et.al. get a pointer to vfsmount, they turn it into pointer to mount (using container_of) and work with that. This is the first part of series; struct mount is introduced, allocation switched to using it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:56:57 -05:00
Al Viro	2a79f17e4a	vfs: mnt_drop_write_file() new helper (wrapper around mnt_drop_write()) to be used in pair with mnt_want_write_file(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:40 -05:00
Al Viro	79e801a906	vfs: make do_kern_mount() static the only user outside of fs/namespace.c has died Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:39 -05:00
Al Viro	aa0a4cf0ab	vfs: dentry_reset_mounted() doesn't use vfsmount argument lose it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:37 -05:00
Al Viro	6c449c8dfe	unexport put_mnt_ns(), make create_mnt_ns() static outright Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:37 -05:00
Al Viro	afac7cba7e	vfs: more mnt_parent cleanups a) mount --move is checking that ->mnt_parent is non-NULL before looking if that parent happens to be shared; ->mnt_parent is never NULL and it's not even an misspelled !mnt_has_parent() b) pivot_root open-codes is_path_reachable(), poorly. c) so does path_is_under(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:36 -05:00
Al Viro	b2dba1af3c	vfs: new internal helper: mnt_has_parent(mnt) vfsmounts have ->mnt_parent pointing either to a different vfsmount or to itself; it's never NULL and termination condition in loops traversing the tree towards root is mnt == mnt->mnt_parent. At least one place (see the next patch) is confused about what's going on; let's add an explicit helper checking it right way and use it in all places where we need it. Not that there had been too many, but... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:36 -05:00
Al Viro	aa9c0e07bb	vfs: kill pointless helpers in namespace.c mnt_{inc,dec}_count() is not cleaner than doing the corresponding mnt_add_count() directly and mnt_set_count() is not used at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:36 -05:00
Al Viro	02125a8264	fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API __d_path() API is asking for trouble and in case of apparmor d_namespace_path() getting just that. The root cause is that when __d_path() misses the root it had been told to look for, it stores the location of the most remote ancestor in root. Without grabbing references. Sure, at the moment of call it had been pinned down by what we have in path. And if we raced with umount -l, we could have very well stopped at vfsmount/dentry that got freed as soon as prepend_path() dropped vfsmount_lock. It is safe to compare these pointers with pre-existing (and known to be still alive) vfsmount and dentry, as long as all we are asking is "is it the same address?". Dereferencing is not safe and apparmor ended up stepping into that. d_namespace_path() really wants to examine the place where we stopped, even if it's not connected to our namespace. As the result, it looked at ->d_sb->s_magic of a dentry that might've been already freed by that point. All other callers had been careful enough to avoid that, but it's really a bad interface - it invites that kind of trouble. The fix is fairly straightforward, even though it's bigger than I'd like: * prepend_path() root argument becomes const. * __d_path() is never called with NULL/NULL root. It was a kludge to start with. Instead, we have an explicit function - d_absolute_root(). Same as __d_path(), except that it doesn't get root passed and stops where it stops. apparmor and tomoyo are using it. * __d_path() returns NULL on path outside of root. The main caller is show_mountinfo() and that's precisely what we pass root for - to skip those outside chroot jail. Those who don't want that can (and do) use d_path(). * __d_path() root argument becomes const. Everyone agrees, I hope. * apparmor does NOT try to use __d_path() or any of its variants when it sees that path->mnt is an internal vfsmount. In that case it's definitely not mounted anywhere and dentry_path() is exactly what we want there. Handling of sysctl()-triggered weirdness is moved to that place. * if apparmor is asked to do pathname relative to chroot jail and __d_path() tells it we it's not in that jail, the sucker just calls d_absolute_path() instead. That's the other remaining caller of __d_path(), BTW. * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway - the normal seq_file logics will take care of growing the buffer and redoing the call of ->show() just fine). However, if it gets path not reachable from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped ignoring the return value as it used to do). Reviewed-by: John Johansen <john.johansen@canonical.com> ACKed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org	2011-12-06 23:57:18 -05:00
Al Viro	d31da0f0ba	mount_subtree() pointless use-after-free d'oh... we'd carefully pinned mnt->mnt_sb down, dropped mnt and attempt to grab s_umount on mnt->mnt_sb. The trouble is, *mnt might've been overwritten by now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-11-22 12:31:21 -05:00
Al Viro	ea441d1104	new helper: mount_subtree() takes vfsmount and relative path, does lookup within that vfsmount (possibly triggering automounts) and returns the result as root of subtree suitable for return by ->mount() (i.e. a reference to dentry and an active reference to its superblock grabbed, superblock locked exclusive). btrfs and nfs switched to it instead of open-coding the sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-11-16 22:00:34 -05:00
Al Viro	c133449587	switch create_mnt_ns() to saner calling conventions, fix double mntput() in nfs Life is much saner if create_mnt_ns(mnt) drops mnt in case of error... Switch it to such calling conventions, switch callers, fix double mntput() in fs/nfs/super.c one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-11-16 16:12:14 -05:00
Jiri Kosina	2290c0d06d	Merge branch 'master' into for-next Sync with Linus tree to have `157550ff` ("mtd: add GPMI-NAND driver in the config and Makefile") as I have patch depending on that one.	2011-11-13 20:55:53 +01:00
Kautuk Consul	a127e2d518	namespace: mnt_want_write: Remove unused label 'out' I was studying the code and I saw that the out label is not being used at all so I removed it and its usage from the function. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-10-29 21:22:14 +02:00
Bryan Schumaker	a877ee03ac	vfs: add "device" tag to /proc/self/mountstats nfsiostat was failing to find mounted filesystems on kernels after 2.6.38 because of changes to show_vfsstat() by commit `c7f404b40a`. This patch adds back the "device" tag before the nfs server entry so scripts can parse the mountstats file correctly. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> CC: stable@kernel.org [>=2.6.39] Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-10-28 13:55:08 +02:00
Trond Myklebust	815d405cef	VFS: Fix the remaining automounter semantics regressions The concensus seems to be that system calls such as stat() etc should not trigger an automount. Neither should the l* versions. This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups that _should_ trigger an automount on the last path element. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ Edited to leave out the cases that are already covered by LOOKUP_OPEN, LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally force automounting for their own reasons - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-26 19:16:46 -07:00
Tim Chen	423e0ab086	VFS : mount lock scalability for internal mounts For a number of file systems that don't have a mount point (e.g. sockfs and pipefs), they are not marked as long term. Therefore in mntput_no_expire, all locks in vfs_mount lock are taken instead of just local cpu's lock to aggregate reference counts when we release reference to file objects. In fact, only local lock need to have been taken to update ref counts as these file systems are in no danger of going away until we are ready to unregister them. The attached patch marks file systems using kern_mount without mount point as long term. The contentions of vfs_mount lock is now eliminated. Before un-registering such file system, kern_unmount should be called to remove the long term flag and make the mount point ready to be freed. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:08:32 -04:00
Kay Sievers	f15146380d	fs: seq_file - add event counter to simplify poll() support Moving the event counter into the dynamically allocated 'struc seq_file' allows poll() support without the need to allocate its own tracking structure. All current users are switched over to use the new counter. Requested-by: Andrew Morton akpm@linux-foundation.org Acked-by: NeilBrown <neilb@suse.de> Tested-by: Lucas De Marchi lucas.demarchi@profusion.mobi Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:50 -04:00
Roman Borisov	7c6e984dfc	fs/namespace.c: bound mount propagation fix This issue was discovered by users of busybox. And the bug is actual for busybox users, I don't know how it affects others. Apparently, mount is called with and without MS_SILENT, and this affects mount() behaviour. But MS_SILENT is only supposed to affect kernel logging verbosity. The following script was run in an empty test directory: mkdir -p mount.dir mount.shared1 mount.shared2 touch mount.dir/a mount.dir/b mount -vv --bind mount.shared1 mount.shared1 mount -vv --make-rshared mount.shared1 mount -vv --bind mount.shared2 mount.shared2 mount -vv --make-rshared mount.shared2 mount -vv --bind mount.shared2 mount.shared1 mount -vv --bind mount.dir mount.shared2 ls -R mount.dir mount.shared1 mount.shared2 umount mount.dir mount.shared1 mount.shared2 2>/dev/null umount mount.dir mount.shared1 mount.shared2 2>/dev/null umount mount.dir mount.shared1 mount.shared2 2>/dev/null rm -f mount.dir/a mount.dir/b mount.dir/c rmdir mount.dir mount.shared1 mount.shared2 mount -vv was used to show the mount() call arguments and result. Output shows that flag argument has 0x00008000 = MS_SILENT bit: mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0 mount: mount('','mount.shared1','',0x0010c000,''):0 mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0 mount: mount('','mount.shared2','',0x0010c000,''):0 mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0 mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0 mount.dir: a b mount.shared1: mount.shared2: a b After adding --loud option to remove MS_SILENT bit from just one mount cmd: mkdir -p mount.dir mount.shared1 mount.shared2 touch mount.dir/a mount.dir/b mount -vv --bind mount.shared1 mount.shared1 2>&1 mount -vv --make-rshared mount.shared1 2>&1 mount -vv --bind mount.shared2 mount.shared2 2>&1 mount -vv --loud --make-rshared mount.shared2 2>&1 # <-HERE mount -vv --bind mount.shared2 mount.shared1 2>&1 mount -vv --bind mount.dir mount.shared2 2>&1 ls -R mount.dir mount.shared1 mount.shared2 2>&1 umount mount.dir mount.shared1 mount.shared2 2>/dev/null umount mount.dir mount.shared1 mount.shared2 2>/dev/null umount mount.dir mount.shared1 mount.shared2 2>/dev/null rm -f mount.dir/a mount.dir/b mount.dir/c rmdir mount.dir mount.shared1 mount.shared2 The result is different now - look closely at mount.shared1 directory listing. Now it does show files 'a' and 'b': mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0 mount: mount('','mount.shared1','',0x0010c000,''):0 mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0 mount: mount('','mount.shared2','',0x00104000,''):0 mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0 mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0 mount.dir: a b mount.shared1: a b mount.shared2: a b The analysis shows that MS_SILENT flag which is ON by default in any busybox-> mount operations cames to flags_to_propagation_type function and causes the error return while is_power_of_2 checking because the function expects only one bit set. This doesn't allow to do busybox->mount with any --make-[r]shared, --make-[r]private etc options. Moreover, the recently added flags_to_propagation_type() function doesn't allow us to do such operations as --make-[r]private --make-[r]shared etc. when MS_SILENT is on. The idea or clearing the MS_SILENT flag came from to Denys Vlasenko. Signed-off-by: Roman Borisov <ext-roman.borisov@nokia.com> Reported-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Alexander Shishkin <virtuoso@slind.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-05-26 07:26:44 -04:00
Linus Torvalds	be85bccaa5	Revert "vfs: Export file system uuid via /proc/<pid>/mountinfo" This reverts commit `93f1c20bc8`. It turns out that libmount misparses it because it adds a '-' character in the uuid string, which libmount then incorrectly confuses with the separator string (" - ") at the end of all the optional arguments. Upstream libmount (in the util-linux tree) has been fixed, but until that fix actually percolates up to users, we'd better not expose this change in the kernel. Let's revisit this later (possibly by exposing the UUID without any '-' characters in it, avoiding the user-space bug). Reported-by: Dave Jones <davej@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Karel Zak <kzak@redhat.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-12 13:35:56 -07:00
Mandeep Singh Baines	80cdc6dae7	fs: use appropriate printk priority levels printk()s without a priority level default to KERN_WARNING. To reduce noise at KERN_WARNING, this patch set the priority level appriopriately for unleveled printks()s. This should be useful to folks that look at dmesg warnings closely. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:10 -07:00
Al Viro	b12cea9198	change the locking order for namespace_sem Have it nested inside ->i_mutex. Instead of using follow_down() under namespace_sem, followed by grabbing i_mutex and checking that mountpoint to be is not dead, do the following: grab i_mutex check that it's not dead grab namespace_sem see if anything is mounted there if not, we've won otherwise drop locks put_path on what we had replace with what's mounted retry everything with new mountpoint to be New helper (lock_mount()) does that. do_add_mount(), do_move_mount(), do_loopback() and pivot_root() switched to it; in case of the last two that eliminates a race we used to have - original code didn't do follow_down(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-18 08:55:38 -04:00
Al Viro	27cb1572e3	fix deadlock in pivot_root() Don't hold vfsmount_lock over the loop traversing ->mnt_parent; do check_mnt(new.mnt) under namespace_sem instead; combined with namespace_sem held over all that code it'll guarantee the stability of ->mnt_parent chain all the way to the root. Doing check_mnt() outside of namespace_sem in case of pivot_root() is wrong anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-18 08:54:59 -04:00
Al Viro	9d412a43c3	vfs: split off vfsmount-related parts of vfs_kern_mount() new function: mount_fs(). Does all work done by vfs_kern_mount() except the allocation and filling of vfsmount; returns root dentry or ERR_PTR(). vfs_kern_mount() switched to using it and taken to fs/namespace.c, along with its wrappers. alloc_vfsmnt()/free_vfsmnt() made static. functions in namespace.c slightly reordered. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-17 22:10:41 -04:00
Al Viro	474a00ee13	kill simple_set_mnt() not needed anymore, since all users (->get_sb() instances) are gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-17 21:31:32 -04:00
Linus Torvalds	054cfaacf8	Merge branch 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: bury ->get_sb() nfs: switch NFS from ->get_sb() to ->mount() nfs: stop mangling ->mnt_devname on NFS vfs: new superblock methods to override /proc/*/mount{s,info} nfs: nfs_do_{ref,sub}mount() superblock argument is redundant nfs: make nfs_path() work without vfsmount nfs: store devname at disconnected NFS roots nfs: propagate devname to nfs{,4}_get_root()	2011-03-16 19:09:57 -07:00
Al Viro	c7f404b40a	vfs: new superblock methods to override /proc/*/mount{s,info} a) ->show_devname(m, mnt) - what to put into devname columns in mounts, mountinfo and mountstats b) ->show_path(m, mnt) - what to put into relative path column in mountinfo Leaving those NULL gives old behaviour. NFS switched to using those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Linus Torvalds	0f6e0e8448	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits) AppArmor: kill unused macros in lsm.c AppArmor: cleanup generated files correctly KEYS: Add an iovec version of KEYCTL_INSTANTIATE KEYS: Add a new keyctl op to reject a key with a specified error code KEYS: Add a key type op to permit the key description to be vetted KEYS: Add an RCU payload dereference macro AppArmor: Cleanup make file to remove cruft and make it easier to read SELinux: implement the new sb_remount LSM hook LSM: Pass -o remount options to the LSM SELinux: Compute SID for the newly created socket SELinux: Socket retains creator role and MLS attribute SELinux: Auto-generate security_is_socket_class TOMOYO: Fix memory leak upon file open. Revert "selinux: simplify ioctl checking" selinux: drop unused packet flow permissions selinux: Fix packet forwarding checks on postrouting selinux: Fix wrong checks for selinux_policycap_netpeer selinux: Fix check for xfrm selinux context algorithm ima: remove unnecessary call to ima_must_measure IMA: remove IMA imbalance checking ...	2011-03-16 09:15:43 -07:00
Aneesh Kumar K.V	93f1c20bc8	vfs: Export file system uuid via /proc/<pid>/mountinfo We add a per superblock uuid field. File systems should update the uuid in the fill_super callback Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
James Morris	fe3fa43039	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	2011-03-08 11:38:10 +11:00
Eric Paris	ff36fe2c84	LSM: Pass -o remount options to the LSM The VFS mount code passes the mount options to the LSM. The LSM will remove options it understands from the data and the VFS will then pass the remaining options onto the underlying filesystem. This is how options like the SELinux context= work. The problem comes in that -o remount never calls into LSM code. So if you include an LSM specific option it will get passed to the filesystem and will cause the remount to fail. An example of where this is a problem is the 'seclabel' option. The SELinux LSM hook will print this word in /proc/mounts if the filesystem is being labeled using xattrs. If you pass this word on mount it will be silently stripped and ignored. But if you pass this word on remount the LSM never gets called and it will be passed to the FS. The FS doesn't know what seclabel means and thus should fail the mount. For example an ext3 fs mounted over loop # mount -o loop /tmp/fs /mnt/tmp # cat /proc/mounts \| grep /mnt/tmp /dev/loop0 /mnt/tmp ext3 rw,seclabel,relatime,errors=continue,barrier=0,data=ordered 0 0 # mount -o remount /mnt/tmp mount: /mnt/tmp not mounted already, or bad option # dmesg EXT3-fs (loop0): error: unrecognized mount option "seclabel" or missing value This patch passes the remount mount options to an new LSM hook. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: James Morris <jmorris@namei.org>	2011-03-03 16:12:27 -05:00
J. R. Okajima	bf9faa2aa3	Unlock vfsmount_lock in do_umount By the commit `b3e19d9` 2011-01-07 fs: scale mntget/mntput vfsmount_lock was introduced around testing mnt_count. Fix the mis-typed 'unlock' Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-24 02:10:57 -05:00
Al Viro	b1e75df45a	tidy up around finish_automount() do_add_mount() and mnt_clear_expiry() are not needed outside of namespace.c anymore, now that namei has finish_automount() to use. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:47:59 -05:00
Al Viro	15f9a3f3e1	don't drop newmnt on error in do_add_mount() That gets rid of the kludge in finish_automount() - we need to keep refcount on the vfsmount as-is until we evict it from expiry list. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:41:58 -05:00
Al Viro	19a167af7c	Take the completion of automount into new helper ... and shift it from namei.c to namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:35:23 -05:00
Al Viro	7e3d0eb0b0	VFS: Fix UP compile error in fs/namespace.c mnt_longterm is there only on SMP Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-16 14:59:45 -08:00
Al Viro	f03c65993b	sanitize vfsmount refcounting changes Instead of splitting refcount between (per-cpu) mnt_count and (SMP-only) mnt_longrefs, make all references contribute to mnt_count again and keep track of how many are longterm ones. Accounting rules for longterm count: * 1 for each fs_struct.root.mnt * 1 for each fs_struct.pwd.mnt * 1 for having non-NULL ->mnt_ns * decrement to 0 happens only under vfsmount lock exclusive That allows nice common case for mntput() - since we can't drop the final reference until after mnt_longterm has reached 0 due to the rules above, mntput() can grab vfsmount lock shared and check mnt_longterm. If it turns out to be non-zero (which is the common case), we know that this is not the final mntput() and can just blindly decrement percpu mnt_count. Otherwise we grab vfsmount lock exclusive and do usual decrement-and-check of percpu mnt_count. For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm(); namespace.c uses the latter in places where we don't already hold vfsmount lock exclusive and opencodes a few remaining spots where we need to manipulate mnt_longterm. Note that we mostly revert the code outside of fs/namespace.c back to what we used to have; in particular, normal code doesn't need to care about two kinds of references, etc. And we get to keep the optimization Nick's variant had bought us... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:07 -05:00
Al Viro	7b8a53fd81	fix old umount_tree() breakage Expiry-related code calls umount_tree() several times with the same list to collect vfsmounts to. Which is fine, except that umount_tree() implicitly assumed that the list would be empty on each call - it moves the victims over there and then iterates through the list kicking them out. It's almost idempotent, so everything nearly worked. However, mnt->ghosts handling (and thus expirability checks) had been broken - that part was not idempotent... The fix is trivial - use local temporary list, splice it to the the collector list when we are through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:01 -05:00
David Howells	ea5b778a8b	Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:48 -05:00
David Howells	cc53ce53c8	Add a dentry op to allow processes to be held during pathwalk transit Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (d_manage)(struct path path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:31 -05:00
Nick Piggin	b3e19d924b	fs: scale mntget/mntput The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:33 +11:00
Nick Piggin	c6653a838b	fs: rename vfsmount counter helpers Suggested by Andreas, mnt_ prefix is clearer namespace, follows kernel conventions better, and is easier for tab complete. I introduced these names so I'll admit they were not good choices. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:33 +11:00
Nick Piggin	5f57cbcc02	fs: dcache remove d_mounted Rather than keep a d_mounted count in the dentry, set a dentry flag instead. The flag can be cleared by checking the hash table to see if there are any mounts left, which is not time critical because it is performed at detach time. The mounted state of a dentry is only used to speculatively take a look in the mount hash table if it is set -- before following the mount, vfsmount lock is taken and mount re-checked without races. This saves 4 bytes on 32-bit, nothing on 64-bit but it does provide a hole I might use later (and some configs have larger than 32-bit spinlocks which might make use of the hole). Autofs4 conversion and changelog by Ian Kent <raven@themaw.net>: In autofs4, when expring direct (or offset) mounts we need to ensure that we block user path walks into the autofs mount, which is covered by another mount. To do this we clear the mounted status so that follows stop before walking into the mount and are essentially blocked until the expire is completed. The automount daemon still finds the correct dentry for the umount due to the follow mount logic in fs/autofs4/root.c:autofs4_follow_link(), which is set as an inode operation for direct and offset mounts only and is called following the lookup that stopped at the covered mount. At the end of the expire the covering mount probably has gone away so the mounted status need not be restored. But we need to check this and only restore the mounted status if the expire failed. XXX: autofs may not work right if we have other mounts go over the top of it? Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Miklos Szeredi	be1a16a0ae	vfs: fix infinite loop caused by clone_mnt race If clone_mnt() happens while mnt_make_readonly() is running, the cloned mount might have MNT_WRITE_HOLD flag set, which results in mnt_want_write() spinning forever on this mount. Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen accidentally. But if it does happen it can hang the machine. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:24:16 -04:00
Jan Blunck	6841c05021	BKL: Remove BKL from do_new_mount() After pushing down the BKL to the get_sb/fill_super operations of the filesystems that still make usage of the BKL it is safe to remove it from do_new_mount(). I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. Signed-off-by: Jan Blunck <jblunck@infradead.org> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:43 +02:00
Valerie Aurora	7a2e8a8faa	VFS: Sanity check mount flags passed to change_mnt_propagation() Sanity check the flags passed to change_mnt_propagation(). Exactly one flag should be set. Return EINVAL otherwise. Userspace can pass in arbitrary combinations of MS_* flags to mount(). do_change_type() is called if any of MS_SHARED, MS_PRIVATE, MS_SLAVE, or MS_UNBINDABLE is set. do_change_type() clears MS_REC and then calls change_mnt_propagation() with the rest of the user-supplied flags. change_mnt_propagation() clearly assumes only one flag is set but do_change_type() does not check that this is true. For example, mount() with flags MS_SHARED \| MS_RDONLY does not actually make the mount shared or read-only but does clear MNT_UNBINDABLE. Signed-off-by: Valerie Aurora <vaurora@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-07 13:46:20 -07:00
Nick Piggin	99b7db7b8f	fs: brlock vfsmount_lock fs: brlock vfsmount_lock Use a brlock for the vfsmount lock. It must be taken for write whenever modifying the mount hash or associated fields, and may be taken for read when performing mount hash lookups. A new lock is added for the mnt-id allocator, so it doesn't need to take the heavy vfsmount write-lock. The number of atomics should remain the same for fastpath rlock cases, though code would be slightly slower due to per-cpu access. Scalability is not not be much improved in common cases yet, due to other locks (ie. dcache_lock) getting in the way. However path lookups crossing mountpoints should be one case where scalability is improved (currently requiring the global lock). The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node Altix system (high latency to remote nodes), a simple umount microbenchmark (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it took 6.8s, afterwards took 7.1s, about 5% slower. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:48 -04:00
Miklos Szeredi	532490f0a5	vfs: remove unused MNT_STRICTATIME Commit `d0adde574b` added MNT_STRICTATIME but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and MNT_NOATIME rather than setting any mount flag). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:29:47 -04:00
Miklos Szeredi	f7ad3c6be9	vfs: add helpers to get root and pwd Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:28:20 -04:00
Linus Torvalds	8c8946f509	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits) fanotify: use both marks when possible fsnotify: pass both the vfsmount mark and inode mark fsnotify: walk the inode and vfsmount lists simultaneously fsnotify: rework ignored mark flushing fsnotify: remove global fsnotify groups lists fsnotify: remove group->mask fsnotify: remove the global masks fsnotify: cleanup should_send_event fanotify: use the mark in handler functions audit: use the mark in handler functions dnotify: use the mark in handler functions inotify: use the mark in handler functions fsnotify: send fsnotify_mark to groups in event handling functions fsnotify: Exchange list heads instead of moving elements fsnotify: srcu to protect read side of inode and vfsmount locks fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called fsnotify: use _rcu functions for mark list traversal fsnotify: place marks on object in order of group memory address vfs/fsnotify: fsnotify_close can delay the final work in fput fsnotify: store struct file not struct path ... Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.	2010-08-10 11:39:13 -07:00
Al Viro	7a4dec5389	Fix sget() race with failing mount If sget() finds a matching superblock being set up, it'll grab an active reference to it and grab s_umount. That's fine - we'll wait for completion of foofs_get_sb() that way. However, if said foofs_get_sb() fails we'll end up holding the halfway-created superblock. deactivate_locked_super() called by foofs_get_sb() will just unlock the sucker since we are holding another active reference to it. What we need is a way to tell if superblock has been successfully set up. Unfortunately, neither ->s_root nor the check for MS_ACTIVE quite fit. Cheap and easy way, suitable for backport: new flag set by the (only) caller of ->get_sb(). If that flag isn't present by the time sget() grabbed s_umount on preexisting superblock it has found, it's seeing a stillborn and should just bury it with deactivate_locked_super() (and repeat the search). Longer term we want to set that flag in ->get_sb() instances (and check for it to distinguish between "sget() found us a live sb" and "sget() has allocated an sb, we need to set it up" in there, instead of checking ->s_root as we do now). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org	2010-08-09 16:49:01 -04:00
Andreas Gruenbacher	ca9c726eea	fsnotify: Infrastructure for per-mount watches Per-mount watches allow groups to listen to fsnotify events on an entire mount. This patch simply adds and initializes the fields needed in the vfsmount struct to make this happen. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:57 -04:00
Andreas Gruenbacher	2504c5d63b	fsnotify/vfsmount: add fsnotify fields to struct vfsmount This patch adds the list and mask fields needed to support vfsmount marks. These are the same fields fsnotify needs on an inode. They are not used, just declared and we note where the cleanup hook should be (the function is not yet defined) Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:57 -04:00
James Morris	539c99fd7f	Merge branch 'next' into for-linus	2010-05-18 08:57:00 +10:00
Al Viro	d83c49f3e3	Fix the regression created by "set S_DEAD on unlink()..." commit 1) i_flags simply doesn't work for mount/unlink race prevention; we may have many links to file and rm on one of those obviously shouldn't prevent bind on top of another later on. To fix it right way we need to mark _dentry_ as unsuitable for mounting upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and i_mutex on the inode in question. Set it (with dont_mount(dentry)) in unlink/rmdir/etc., check (with cant_mount(dentry)) in places in namespace.c that used to check for S_DEAD. Setting S_DEAD is still needed in places where we used to set it (for directories getting killed), since we rely on it for readdir/rmdir race prevention. 2) rename()/mount() protection has another bogosity - we unhash the target before we'd checked that it's not a mountpoint. Fixed. 3) ancient bogosity in pivot_root() - we locked i_mutex on the right directory, but checked S_DEAD on the different (and wrong) one. Noticed and fixed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-15 07:16:33 -04:00
Eric Paris	91a9420f58	security: remove dead hook sb_post_pivotroot Unused hook. Remove. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-04-12 12:18:32 +10:00
Eric Paris	3db2910177	security: remove dead hook sb_post_addmount Unused hook. Remove. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-04-12 12:18:31 +10:00
Eric Paris	82dab10453	security: remove dead hook sb_post_remount Unused hook. Remove. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-04-12 12:18:30 +10:00
Eric Paris	4b61d12c84	security: remove dead hook sb_umount_busy Unused hook. Remove. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-04-12 12:18:30 +10:00
Eric Paris	231923bd0e	security: remove dead hook sb_umount_close Unused hook. Remove. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-04-12 12:18:29 +10:00
Eric Paris	353633100d	security: remove sb_check_sb hooks Unused hook. Remove it. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-04-12 12:18:28 +10:00
Miklos Szeredi	db1f05bb85	vfs: add NOFOLLOW flag to umount(2) Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent symlink attacks in unprivileged unmounts (fuse, samba, ncpfs). Additionally, return -EINVAL if an unknown flag is used (and specify an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for the caller to determine if a flag is supported or not. CC: Eugene Teo <eugene@redhat.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:08:00 -05:00
Al Viro	8089352a13	Mirror MS_KERNMOUNT in ->mnt_flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:08:00 -05:00
Al Viro	d498b25a4f	get rid of useless vfsmount_lock use in put_mnt_ns() It hadn't been needed since we'd sanitized the logics in mark_mounts_for_expiry() (which, in turn, used to be a rudiment of bad old times when namespace_sem was per-ns). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:59 -05:00
Al Viro	9f5596af44	take check for new events in namespace (guts of mounts_poll()) to namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:59 -05:00
Al Viro	1f707137b5	new helper: iterate_mounts() apply function to vfsmounts in set returned by collect_mounts(), stop if it returns non-zero. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:57 -05:00
Valerie Aurora	495d6c9c65	VFS: Clean up shared mount flag propagation The handling of mount flags in set_mnt_shared() got a little tangled up during previous cleanups, with the following problems: * MNT_PNODE_MASK is defined as a literal constant when it should be a bitwise xor of other MNT_* flags * set_mnt_shared() clears and then sets MNT_SHARED (part of MNT_PNODE_MASK) * MNT_PNODE_MASK could use a comment in mount.h * MNT_PNODE_MASK is a terrible name, change to MNT_SHARED_MASK This patch fixes these problems. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:55 -05:00
Al Viro	796a6b521d	Kill CL_PROPAGATION, sanitize fs/pnode.c:get_source() First of all, get_source() never results in CL_PROPAGATION alone. We either get CL_MAKE_SHARED (for the continuation of peer group) or CL_SLAVE (slave that is not shared) or both (beginning of peer group among slaves). Massage the code to make that explicit, kill CL_PROPAGATION test in clone_mnt() (nothing sets CL_MAKE_SHARED without CL_PROPAGATION and in clone_mnt() we are checking CL_PROPAGATION after we'd found that there's no CL_SLAVE, so the check for CL_MAKE_SHARED would do just as well). Fix comments, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 13:00:22 -05:00
Al Viro	27d55f1f4c	do_add_mount() should sanitize mnt_flags MNT_WRITE_HOLD shouldn't leak into new vfsmount and neither should MNT_SHARED (the latter will be set properly, along with the rest of shared-subtree data structures) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 13:07:36 -05:00
Al Viro	7b43a79f32	mnt_flags fixes in do_remount() * need vfsmount_lock over modifying it * need to preserve MNT_SHARED/MNT_UNBINDABLE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 13:01:26 -05:00
Al Viro	df1a1ad297	attach_recursive_mnt() needs to hold vfsmount_lock over set_mnt_shared() race in mnt_flags update Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 12:57:40 -05:00
Al Viro	8ad08d8a0c	may_umount() needs namespace_sem otherwise it races with clone_mnt() changing mnt_share/mnt_slaves Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 12:56:08 -05:00
Linus Torvalds	a2770d86b3	Revert "fix mismerge with Trond's stuff (create_mnt_ns() export is gone now)" This reverts commit `e9496ff46a`. Quoth Al: "it's dependent on a lot of other stuff not currently in mainline and badly broken with current fs/namespace.c. Sorry, badly out-of-order cherry-pick from old queue. PS: there's a large pending series reworking the refcounting and lifetime rules for vfsmounts that will, among other things, allow to rip a subtree away _without_ dissolving connections in it, to be garbage-collected when all active references are gone. It's considerably saner wrt "is the subtree busy" logics, but it's nowhere near being ready for merge at the moment; this changeset is one of the things becoming possible with that sucker, but it certainly shouldn't have been picked during this cycle. My apologies..." Noticed-by: Eric Paris <eparis@redhat.com> Requested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-17 12:51:05 -08:00
Al Viro	e9496ff46a	fix mismerge with Trond's stuff (create_mnt_ns() export is gone now) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:44 -05:00
Tetsuo Handa	a27ab9f26b	LSM: Pass original mount flags to security_sb_mount(). This patch allows LSM modules to determine based on original mount flags passed to mount(). A LSM module can get masked mount flags (if needed) by flags &= ~(MS_NOSUID \| MS_NOEXEC \| MS_NODEV \| MS_ACTIVE \| MS_NOATIME \| MS_NODIRATIME \| MS_RELATIME\| MS_KERNMOUNT \| MS_STRICTATIME); Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: James Morris <jmorris@namei.org>	2009-10-12 10:56:03 +11:00
Vegard Nossum	eca6f534e6	fs: fix overflow in sys_mount() for in-kernel calls sys_mount() reads/copies a whole page for its "type" parameter. When do_mount_root() passes a kernel address that points to an object which is smaller than a whole page, copy_mount_options() will happily go past this memory object, possibly dereferencing "wild" pointers that could be in any state (hence the kmemcheck warning, which shows that parts of the next page are not even allocated). (The likelihood of something going wrong here is pretty low -- first of all this only applies to kernel calls to sys_mount(), which are mostly found in the boot code. Secondly, I guess if the page was not mapped, exact_copy_from_user() _would_ in fact handle it correctly because of its access_ok(), etc. checks.) But it is much nicer to avoid the dubious reads altogether, by stopping as soon as we find a NUL byte. Is there a good reason why we can't do something like this, using the already existing strndup_from_user()? [akpm@linux-foundation.org: make copy_mount_string() static] [AV: fix compat mount breakage, which involves undoing akpm's change above] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: al <al@dizzy.pdmi.ras.ru>	2009-09-24 08:40:15 -04:00
OGAWA Hirofumi	2d8dd38a5a	vfs: mnt_want_write_file(): fix special file handling I suspect that mnt_want_write_file() may have wrong assumption. I think mnt_want_write_file() is assuming it increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's special_file(), it is false? Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-07 10:39:56 -07:00
Alexey Dobriyan	b43f3cbd21	headers: mnt_namespace.h redux Fix various silly problems wrt mnt_namespace.h: - exit_mnt_ns() isn't used, remove it - done that, sched.h and nsproxy.h inclusions aren't needed - mount.h inclusion was need for vfsmount_lock, but no longer - remove mnt_namespace.h inclusion from files which don't use anything from mnt_namespace.h Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 09:31:56 -07:00
Al Viro	f21f62208a	... and the same for vfsmount id/mount group id Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:26 -04:00
Trond Myklebust	3b22edc573	VFS: Switch init_mount_tree() to use the new create_mnt_ns() helper Eliminates some duplicated code... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
Trond Myklebust	cf8d2c11cb	VFS: Add VFS helper functions for setting up private namespaces The purpose of this patch is to improve the remote mount path lookup support for distributed filesystems such as the NFSv4 client. When given a mount command of the form "mount server:/foo/bar /mnt", the NFSv4 client is required to look up the filehandle for "server:/", and then look up each component of the remote mount path "foo/bar" in order to find the directory that is actually going to be mounted on /mnt. Following that remote mount path may involve following symlinks, crossing server-side mount points and even following referrals to filesystem volumes on other servers. Since the standard VFS path lookup code already supports walking paths that contain all these features (using in-kernel automounts for following referrals) we would like to be able to reuse that rather than duplicate the full path traversal functionality in the NFSv4 client code. This patch therefore defines a VFS helper function create_mnt_ns(), that sets up a temporary filesystem namespace and attaches a root filesystem to it. It exports the create_mnt_ns() and put_mnt_ns() function for use by filesystem modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	616511d039	VFS: Uninline the function put_mnt_ns() In order to allow modules to use it without having to export vfsmount_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Al Viro	4aa98cf768	Push BKL down into do_remount_sb() [folded fix from Jiri Slaby] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Al Viro	7f78d4cd4c	Push BKL down beyond VFS-only parts of do_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Al Viro	6fac98dd21	Push BKL into do_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Alexey Dobriyan	f3da392e9f	dcache: extrace and use d_unlinked() d_unlinked() will be used in middle-term to ban checkpointing when opened but unlinked file is detected, and in long term, to detect such situation and special case on it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:06 -04:00
npiggin@suse.de	96029c4e09	fs: introduce mnt_clone_write This patch speeds up lmbench lat_mmap test by about another 2% after the first patch. Before: avg = 462.286 std = 5.46106 After: avg = 453.12 std = 9.58257 (50 runs of each, stddev gives a reasonable confidence) It does this by introducing mnt_clone_write, which avoids some heavyweight operations of mnt_want_write if called on a vfsmount which we know already has a write count; and mnt_want_write_file, which can call mnt_clone_write if the file is open for write. After these two patches, mnt_want_write and mnt_drop_write go from 7% on the profile down to 1.3% (including mnt_clone_write). [AV: mnt_want_write_file() should take file alone and derive mnt from it; not only all callers have that form, but that's the only mnt about which we know that it's already held for write if file is opened for write] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:02 -04:00
npiggin@suse.de	d3ef3d7351	fs: mnt_want_write speedup This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it. A microbenchmark yes, but it exercises some important paths in the mm. Before: avg = 501.9 std = 14.7773 After: avg = 462.286 std = 5.46106 (50 runs of each, stddev gives a reasonable confidence, but there is quite a bit of variation there still) It does this by removing the complex per-cpu locking and counter-cache and replaces it with a percpu counter in struct vfsmount. This makes the code much simpler, and avoids spinlocks (although the msync is still pretty costly, unfortunately). It results in about 900 bytes smaller code too. It does increase the size of a vfsmount, however. It should also give a speedup on large systems if CPUs are frequently operating on different mounts (because the existing scheme has to operate on an atomic in the struct vfsmount when switching between mounts). But I'm most interested in the single threaded path performance for the moment. [AV: minor cleanup] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:02 -04:00
Al Viro	1c755af4df	switch lookup_mnt() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:01 -04:00
Al Viro	9393bd07cf	switch follow_down() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:01 -04:00
Al Viro	589ff870ed	Switch collect_mounts() to struct path Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:01 -04:00
Al Viro	dd5cae6e97	Don't bother with check_mnt() in do_add_mount() on shrinkable ones These guys are what we add as submounts; checks for "is that attached in our namespace" are simply irrelevant for those and counterproductive for use of private vfsmount trees a-la what NFS folks want. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:35:59 -04:00
Al Viro	2a32cebd6c	Fix races around the access to ->s_options Put generic_show_options read access to s_options under rcu_read_lock, split save_mount_options() into "we are setting it the first time" (uses in foo_fill_super()) and "we are relacing and freeing the old one", synchronize_rcu() before kfree() in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:51:34 -04:00
Alessio Igor Bogani	67e55205ec	vfs: umount_begin BKL pushdown Push BKL down into ->umount_begin() Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:49:38 -04:00
Al Viro	e5d67f0715	Touch all affected namespaces on propagation of mount We shouldn't just touch the namespace of current process Caught-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-04-20 23:01:15 -04:00
Andi Kleen	613cbe3d48	Don't set relatime when noatime is specified Since commit `0a1c01c947` ("Make relatime default") when a file system is mounted explicitely with noatime it gets both the MNT_RELATIME and MNT_NOATIME bits set. This shows up like this in /proc/mounts: /dev/xxx /yyy ext3 rw,noatime,relatime,errors=continue,data=writeback 0 0 That looks strange. The VFS uses noatime in this case, but both flags are set. So it's more a cosmetic issue, but still better to fix. Cc: mjg@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-19 10:46:47 -07:00
Al Viro	5ad4e53bd5	Get rid of indirect include of fs_struct.h Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:27 -04:00
Al Viro	3e93cd6718	Take fs_struct handling to new file (fs/fs_struct.c) Pure code move; two new helper functions for nfsd and daemonize (unshare_fs_struct() and daemonize_fs_struct() resp.; for now - the same code as used to be in callers). unshare_fs_struct() exported (for nfsd, as copy_fs_struct()/exit_fs() used to be), copy_fs_struct() and exit_fs() don't need exports anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	f8ef3ed2be	Get rid of bumping fs_struct refcount in pivot_root(2) Not because execve races with _that_ are serious - we really need a situation when final drop of fs_struct refcount is done by something that used to have it as current->fs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:25 -04:00
Linus Torvalds	3ae5080f4c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...	2009-03-27 16:23:12 -07:00
Sukadev Bhattiprolu	a3ec947c85	vfs: simple_set_mnt() should return void simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Matthew Garrett	0a1c01c947	Make relatime default Change the default behaviour of the kernel to use relatime for all filesystems. This can be overridden with the "strictatime" mount option. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 11:01:10 -07:00
Matthew Garrett	d0adde574b	Add a strictatime mount option Add support for explicitly requesting full atime updates. This makes it possible for kernels to default to relatime but still allow userspace to override it. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 10:56:35 -07:00
Al Viro	1a88b5364b	Fix incomplete __mntput locking Getting this wrong caused WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() due to optimistically checking cpu_writer->mnt outside the spinlock. Here's what we really want: * we know that nobody will set cpu_writer->mnt to mnt from now on * all changes to that sucker are done under cpu_writer->lock * we want the laziest equivalent of spin_lock(&cpu_writer->lock); if (likely(cpu_writer->mnt != mnt)) { spin_unlock(&cpu_writer->lock); continue; } /* do stuff */ that would make sure we won't miss earlier setting of ->mnt done by another CPU. Anyway, for now we just move the spin_lock() earlier and move the test into the properly locked region. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-and-tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-17 14:02:08 -08:00
Heiko Carstens	3480b25743	[CVE-2009-0029] System call wrappers part 14 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:24 +01:00
Heiko Carstens	bdc480e3be	[CVE-2009-0029] System call wrappers part 10 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:22 +01:00
Julia Lawall	5cc4a0341a	fs/namespace.c: drop code after return The extra semicolon serves no purpose. Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Richard Genoud <richard.genoud@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:38 -05:00
James Morris	2b82892565	Merge branch 'master' into next Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 11:29:12 +11:00
David Howells	da9592edeb	CRED: Wrap task credential accesses in the filesystem subsystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:05 +11:00
Eric W. Biederman	afef80b3d8	vfs: fix shrink_submounts In the last refactoring of shrink_submounts a variable was not completely renamed. So finish the renaming of mnt to m now. Without this if you attempt to mount an nfs mount that has both automatic nfs sub mounts on it, and has normal mounts on it. The unmount will succeed when it should not. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-12 17:17:17 -08:00
Dan Williams	0e55a7cca4	[RFC PATCH] touch_mnt_namespace when the mount flags change Daemons that need to be launched while the rootfs is read-only can now poll /proc/mounts to be notified when their O_RDWR requests may no longer end in EROFS. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-23 05:13:23 -04:00
Al Viro	0a0d8a4675	[PATCH] no need for noinline stuff in fs/namespace.c anymore Stack footprint from hell had been due to many struct nameidata in there. No more. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-23 03:34:22 -04:00
Al Viro	2d92ab3c62	[PATCH] finally get rid of nameidata in namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-23 03:34:20 -04:00
Al Viro	8d66bf5481	[PATCH] pass struct path * to do_add_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:32 -04:00
Al Viro	2d8f30380a	[PATCH] sanitize __user_walk_fd() et.al. * do not pass nameidata; struct path is all the callers want. * switch to new helpers: user_path_at(dfd, pathname, flags, &path) user_path(pathname, &path) user_lpath(pathname, &path) user_path_dir(pathname, &path) (fail if not a directory) The last 3 are trivial macro wrappers for the first one. * remove nameidata in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:34 -04:00
Li Zefan	88b387824f	[PATCH] vfs: use kstrdup() and check failing allocation - use kstrdup() instead of kmalloc() + memcpy() - return NULL if allocating ->mnt_devname failed - mnt_devname should be const Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:24 -04:00
Al Viro	7f2da1e7d0	[PATCH] kill altroot long overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:20 -04:00
Arjan van de Ven	5c752ad9f3	Use WARN() in fs/ Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Eric Paris	2069f45784	LSM/SELinux: show LSM mount options in /proc/mounts This patch causes SELinux mount options to show up in /proc/mounts. As with other code in the area seq_put errors are ignored. Other LSM's will not have their mount options displayed until they fill in their own security_sb_show_options() function. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:05 +10:00
Harvey Harrison	8e24eea728	fs: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:54 -07:00
Jan Blunck	7ec02ef159	vfs: remove lives_below_in_same_fs() Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is providing the same functionality. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:06 -07:00
Jan Kara	8794b5b246	quota: remove superfluous DQUOT_OFF() in fs/namespace.c We don't need to turn quotas off before remounting root ro, because do_remount_sb() already handles this. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:33 -07:00
Al Viro	42faad9965	[PATCH] restore sane ->umount_begin() API Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-25 09:23:25 -04:00
Miklos Szeredi	97e7e0f71d	[patch 7/7] vfs: mountinfo: show dominating group id Show peer group ID of nearest dominating group that has intersection with the mount's namespace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:05:09 -04:00
Ram Pai	2d4d4864ac	[patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfo [mszeredi@suse.cz] rewrite and split big patch into managable chunks /proc/mounts in its current form lacks important information: - propagation state - root of mount for bind mounts - the st_dev value used within the filesystem - identifier for each mount and it's parent It also suffers from the following problems: - not easily extendable - ambiguity of mountpoints within a chrooted environment - doesn't distinguish between filesystem dependent and independent options - doesn't distinguish between per mount and per super block options This patch introduces /proc/<pid>/mountinfo which attempts to address all these deficiencies. Code shared between /proc/<pid>/mounts and /proc/<pid>/mountinfo is extracted into separate functions. Thanks to Al Viro for the help in getting the design right. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:05:03 -04:00
Miklos Szeredi	a1a2c409b6	[patch 5/7] vfs: mountinfo: allow using process root Allow /proc/<pid>/mountinfo to use the root of <pid> to calculate mountpoints. - move definition of 'struct proc_mounts' to <linux/mnt_namespace.h> - add the process's namespace and root to this structure - pass a pointer to 'struct proc_mounts' into seq_operations In addition the following cleanups are made: - use a common open function for /proc/<pid>/{mounts,mountstat} - surround namespace.c part of these proc files with #ifdef CONFIG_PROC_FS - make the seq_operations structures const Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:04:57 -04:00
Miklos Szeredi	719f5d7f0b	[patch 4/7] vfs: mountinfo: add mount peer group ID Add a unique ID to each peer group using the IDR infrastructure. The identifiers are reused after the peer group dissolves. The IDR structures are protected by holding namepspace_sem for write while allocating or deallocating IDs. IDs are allocated when a previously unshared vfsmount becomes the first member of a peer group. When a new member is added to an existing group, the ID is copied from one of the old members. IDs are freed when the last member of a peer group is unshared. Setting the MNT_SHARED flag on members of a subtree is done as a separate step, after all the IDs have been allocated. This way an allocation failure can be cleaned up easilty, without affecting the propagation state. Based on design sketch by Al Viro. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:04:51 -04:00
Miklos Szeredi	73cd49ecdd	[patch 3/7] vfs: mountinfo: add mount ID Add a unique ID to each vfsmount using the IDR infrastructure. The identifiers are reused after the vfsmount is freed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:04:45 -04:00
Al Viro	8c3ee42e80	[PATCH] get rid of more nameidata passing in namespace.c Further reduction of stack footprint (sys_pivot_root()); lose useless BKL in there, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-21 23:13:47 -04:00
Al Viro	b5266eb4c8	[PATCH] switch a bunch of LSM hooks from nameidata to path Namely, ones from namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-21 23:13:23 -04:00
Al Viro	1a60a28077	[PATCH] lock exclusively in collect_mounts() and drop_collected_mounts() Taking namespace_sem shared there isn't worth the trouble, especially with vfsmount ID allocation about to be added. That way we know that umount_tree(), copy_tree() and clone_mnt() are _always_ serialized by namespace_sem. umount_tree() still needs vfsmount_lock (it manipulates hash chains, among other things), but that's a separate story. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-21 23:11:09 -04:00
Dave Hansen	2e4b7fcd92	[PATCH] r/o bind mounts: honor mount writer counts at remount Originally from: Herbert Poetzl <herbert@13thfloor.at> This is the core of the read-only bind mount patch set. Note that this does _not_ add a "ro" option directly to the bind mount operation. If you require such a mount, you must first do the bind, then follow it up with a 'mount -o remount,ro' operation: If you wish to have a r/o bind mount of /foo on bar: mount --bind /foo /bar mount -o remount,ro /bar Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:29:27 -04:00
Dave Hansen	3d733633a6	[PATCH] r/o bind mounts: track numbers of writers to mounts This is the real meat of the entire series. It actually implements the tracking of the number of writers to a mount. However, it causes scalability problems because there can be hundreds of cpus doing open()/close() on files on the same mnt at the same time. Even an atomic_t in the mnt has massive scalaing problems because the cacheline gets so terribly contended. This uses a statically-allocated percpu variable. All want/drop operations are local to a cpu as long that cpu operates on the same mount, and there are no writer count imbalances. Writer count imbalances happen when a write is taken on one cpu, and released on another, like when an open/close pair is performed on two Upon a remount,ro request, all of the data from the percpu variables is collected (expensive, but very rare) and we determine if there are any outstanding writers to the mount. I've written a little benchmark to sit in a loop for a couple of seconds in several cpus in parallel doing open/write/close loops. http://sr71.net/~dave/linux/openbench.c The code in here is a a worst-possible case for this patch. It does opens on a _pair_ of files in two different mounts in parallel. This should cause my code to lose its "operate on the same mount" optimization completely. This worst-case scenario causes a 3% degredation in the benchmark. I could probably get rid of even this 3%, but it would be more complex than what I have here, and I think this is getting into acceptable territory. In practice, I expect writing more than 3 bytes to a file, as well as disk I/O to mask any effects that this has. (To get rid of that 3%, we could have an #defined number of mounts in the percpu variable. So, instead of a CPU getting operate only on percpu data when it accesses only one mount, it could stay on percpu data when it only accesses N or fewer mounts.) [AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:29:27 -04:00
Dave Hansen	8366025eb8	[PATCH] r/o bind mounts: stub functions This patch adds two function mnt_want_write() and mnt_drop_write(). These are used like a lock pair around and fs operations that might cause a write to the filesystem. Before these can become useful, we must first cover each place in the VFS where writes are performed with a want/drop pair. When that is complete, we can actually introduce code that will safely check the counts before allowing r/w<->r/o transitions to occur. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:25:32 -04:00
Al Viro	6758f953d0	[PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:48:04 -04:00
Al Viro	c35038beca	[PATCH] do shrink_submounts() for all fs types ... and take it out of ->umount_begin() instances. Call with all locks already taken (by do_umount()) and leave calling release_mounts() to caller (it will do release_mounts() anyway, so we can just put into the same list). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:58 -04:00
Al Viro	bcc5c7d2b6	[PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts() ... and fix a race on access of ->mnt_share et.al. without namespace_sem in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:52 -04:00
Al Viro	7c4b93d826	[PATCH] count ghost references to vfsmounts make propagate_mount_busy() exclude references from the vfsmounts that had been isolated by umount_tree() and are just waiting for release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:46 -04:00
Al Viro	1a39068954	[PATCH] reduce stack footprint in namespace.c A lot of places misuse struct nameidata when they need struct path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:40 -04:00
Jan Blunck	c32c2f63a9	d_path: Make seq_path() use a struct path argument seq_path() is always called with a dentry and a vfsmount from a struct path. Make seq_path() take it directly as an argument. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	ac748a09fc	Make set_fs_{root,pwd} take a struct path In nearly all cases the set_fs_{root,pwd}() calls work on a struct path. Change the function to reflect this and use path_get() here. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	6ac08c39a1	Use struct path in fs_struct * Use struct path in fs_struct. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	1d957f9bf8	Introduce path_put() * Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	4ac9137858	Embed a struct path into struct nameidata instead of nd->{dentry,mnt} This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 `6894212` 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	429731b155	Remove path_release_on_umount() path_release_on_umount() should only be called from sys_umount(). I merged the function into sys_umount() instead of having in in namei.c. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:32 -08:00
Eric Sandeen	2dafe1c4d6	reduce large do_mount stack usage with noinlines do_mount() uses a whopping 616 bytes of stack on x86_64 in 2.6.24-mm1, largely thanks to gcc inlining the various helper functions. noinlining these can slim it down a lot; on my box this patch gets it down to 168, which is mostly the struct nameidata nd; left on the stack. These functions are called only as do_mount() helpers; none of them should be in any path that would see a performance benefit from inlining... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:44 -08:00
Miklos Szeredi	b3b304a23a	mount options: add generic_show_options() Add a new s_options field to struct super_block. Filesystems can save mount options passed to them in mount or remount. It is automatically freed when the superblock is destroyed. A new helper function, generic_show_options() is introduced, which uses this field to display the mount options in /proc/mounts. Another helper function, save_mount_options() may be used by filesystems to save the options in the super block. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:39 -08:00
Eric Dumazet	13f14b4d8b	Use ilog2() in fs/namespace.c We can use ilog2() in fs/namespace.c to compute hash_bits and hash_mask at compile time, not runtime. [akpm@linux-foundation.org: clean it all up] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:09 -08:00
Greg Kroah-Hartman	00d2666623	kobject: convert main fs kobject to use kobject_create This also renames fs_subsys to fs_kobj to catch all current users with a build error instead of a build warning which can easily be missed. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman	3514faca19	kobject: remove struct kobj_type from struct kset We don't need a "default" ktype for a kset. We should set this explicitly every time for each kset. This change is needed so that we can make ksets dynamic, and cleans up one of the odd, undocumented assumption that the kset/kobject/ktype model has. This patch is based on a lot of help from Kay Sievers. Nasty bug in the block code was found by Dave Young <hidave.darkstar@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:10 -08:00
Al Viro	8aec080945	[PATCH] new helpers - collect_mounts() and release_collected_mounts() Get a snapshot of a subtree, creating private clones of vfsmounts for all its components and release such snapshot resp. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-10-21 02:37:25 -04:00
Pavel Emelyanov	8bf9725c29	pid namespaces: introduce MS_KERNMOUNT flag This flag tells the .get_sb callback that this is a kern_mount() call so that it can trust data pointer to be valid in-kernel one. If this flag is passed from the user process, it is cleared since the data pointer is not a valid kernel object. Running a few steps forward - this will be needed for proc to create the superblock and store a valid pid namespace on it during the namespace creation. The reason, why the namespace cannot live without proc mount is described in the appropriate patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Denis Cheng	74bf17cffc	fs: remove the unused mempages parameter Since the mempages parameter is actually not used, they should be removed. Now there is only files_init use the mempages parameter, files_init(mempages); but I don't think the adaptation to mempages in files_init is really useful; and if files_init also changed to the prototype void (*func)(void), the wrapper vfs_caches_init would also not need the mempages parameter. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:49 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Adrian Bunk	948730b0e3	fs/namespace.c should #include "internal.h" Every file should include the headers containing the prototypes for its global functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:50 -07:00
Eric W. Biederman	213dd266d4	namespace: ensure clone_flags are always stored in an unsigned long While working on unshare support for the network namespace I noticed we were putting clone flags in an int. Which is weird because the syscall uses unsigned long and we at least need an unsigned to properly hold all of the unshare flags. So to make the code consistent, this patch updates the code to use unsigned long instead of int for the clone flags in those places where we get it wrong today. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:48 -07:00
Cedric Le Goater	467e9f4b50	fix create_new_namespaces() return value dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong, create_new_namespaces() uses ERR_PTR() to catch an error. This means that the subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or copy_utsname(). Modify create_new_namespaces() to also use the errors returned by the copy_*_ns routines and not to systematically return ENOMEM. [oleg@tv-sign.ru: better changelog] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Pavel Emelianov	b0765fb857	Make /proc/self/mounts(tats) use seq_list_xxx helpers One more simple and stupid switching to the new API. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:42 -07:00
Miklos Szeredi	ee6f958291	check privileges before setting mount propagation There's a missing check for CAP_SYS_ADMIN in do_change_type(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:12 -07:00
Pavel Emelianov	b5e618181a	Introduce a handy list_first_entry macro There are many places in the kernel where the construction like foo = list_entry(head->next, struct foo_struct, list); are used. The code might look more descriptive and neat if using the macro list_first_entry(head, type, member) \ list_entry((head)->next, type, member) Here is the macro itself and the examples of its usage in the generic code. If it will turn out to be useful, I can prepare the set of patches to inject in into arch-specific code, drivers, networking, etc. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:11 -07:00
Miklos Szeredi	79c0b2df79	add filesystem subtype support There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the filesystem is fuse based or not. So there's a conflict of interest in how this should be represented in fstab, mtab and /proc/mounts. The current scheme is to encode the real filesystem type in the mount source. So an sshfs mount looks like this: sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,... This url-ish syntax works OK for sshfs and similar filesystems. However for block device based filesystems (ntfs-3g, zfs) it doesn't work, since the kernel expects the mount source to be a real device name. A possibly better scheme would be to encode the real type in the type field as "type.subtype". So fuse mounts would look like this: /dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,... user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,... This patch adds the necessary code to the kernel so that this can be correctly displayed in /proc/mounts. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:01 -07:00
Badari Pulavarty	e3222c4ecc	Merge sys_clone()/sys_unshare() nsproxy and namespace handling sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy__ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy__ns() routines. - Removed unnessary !ns checks from copy__ns() and added BUG_ON() just incase. - Get rid of all individual unshare__ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Valerie Henson	47ae32d6a5	[PATCH] relative atime Add "relatime" (relative atime) support. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. A corresponding patch against mount(8) is available at http://userweb.kernel.org/~akpm/mount-relative-atime.txt Signed-off-by: Valerie Henson <val_henson@linux.intel.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Kirill Korotaev	6b3286ed11	[PATCH] rename struct namespace to struct mnt_namespace Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with other namespaces being developped for the containers : pid, uts, ipc, etc. 'namespace' variables and attributes are also renamed to 'mnt_ns' Signed-off-by: Kirill Korotaev <dev@sw.ru> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:51 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Serge E. Hallyn	1651e14e28	[PATCH] namespaces: incorporate fs namespace into nsproxy This moves the mount namespace into the nsproxy. The mount namespace count now refers to the number of nsproxies point to it, rather than the number of tasks. As a result, the unshare_namespace() function in kernel/fork.c no longer checks whether it is being shared. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
David Howells	07f3f05c1e	[PATCH] BLOCK: Move extern declarations out of fs/.c into header files [try #6 ] Create a new header file, fs/internal.h, for common definitions local to the sources in the fs/ directory. Move extern definitions that should be in header files from fs/.c to fs/internal.h or other main header files where they span directories. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:18 +02:00
Randy Dunlap	15a67dd8cc	[PATCH] fs/namespace: handle init/registration errors Check and handle init errors. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:05 -07:00
Andrew Morton	f20a9ead0d	sysfs: add proper sysfs_init() prototype Don't be crufty. Mark it __must_check too. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-09-25 21:08:39 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Akinobu Mita	1bfba4e8ea	[PATCH] core: use list_move() This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B). Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:17 -07:00
Trond Myklebust	816724e65c	Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ Conflicts: fs/nfs/inode.c fs/super.c Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch 'VFS: Permit filesystem to override root dentry on mount'	2006-06-24 13:07:53 -04:00
David Howells	454e2398be	[PATCH] VFS: Permit filesystem to override root dentry on mount Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: () the get_sb_() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. () If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). () generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[] dentries with child trees. [] Anonymous until discovered from another tree. () The documentation has been adjusted, including the additional bit of changing ext2_ into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-23 07:42:45 -07:00
Trond Myklebust	8b512d9a88	VFS: Remove dependency of ->umount_begin() call on MNT_FORCE Allow filesystems to decide to perform pre-umount processing whether or not MNT_FORCE is set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-06-09 09:34:18 -04:00
Trond Myklebust	5528f911b4	VFS: Add shrink_submounts() Allow a submount to be marked as being 'shrinkable' by means of the vfsmount->mnt_flags, and then add a function 'shrink_submounts()' which attempts to recursively unmount these submounts. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-06-09 09:34:17 -04:00
Andrew Morton	eee391a66d	[PATCH] revert "vfs: propagate mnt_flags into do_loopback/vfsmount" Revert commit `f6422f17d3`, due to Valdis.Kletnieks@vt.edu wrote: > > There seems to have been a bug introduced in this changeset: > > Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind' > misbehaves: > > > # mkdir /foo > > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo > > # mkdir /foo/bar > > # mount --bind /foo/bar /foo > > # tail -2 /proc/mounts > > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0 > > none /foo tmpfs rw 0 0 > > Reverting this changeset causes both mounts to have the same options. > > (Thanks to Stephen Smalley for tracking down the changeset...) > Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Christoph Hellwig <hch@infradead.org> Cc: <Valdis.Kletnieks@vt.edu> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-15 11:20:57 -07:00
Herbert Poetzl	f6422f17d3	[PATCH] vfs: propagate mnt_flags into do_loopback/vfsmount The mnt_flags are propagated into do_loopback(), so that they can be stored with the vfsmount Signed-off-by: Herbert Poetzl <herbert@13thfloor.at> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:41 -07:00
Ian Kent	e3474a8eb3	[PATCH] autofs4: change may_umount* functions to boolean Change the functions may_umount and may_umount_tree to boolean functions to aid code readability. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:40 -08:00
Eric Dumazet	fa3536cc14	[PATCH] Use __read_mostly on some hot fs variables I discovered on oprofile hunting on a SMP platform that dentry lookups were slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in a cache line that contained inodes_stat. So each time inodes_stats is changed by a cpu, other cpus have to refill their cache line. This patch moves some variables to the __read_mostly section, in order to avoid false sharing. RCU dentry lookups can go full speed. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:56 -08:00

... 3 4 5 6 7 ...

491 Commits