linux

Author	SHA1	Message	Date
Olof Johansson	e9a404580c	nfs: Fix build break with CONFIG_NFS_V4=n Signed-off-by: Olof Johansson <olof@lixom.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 19:27:46 -07:00
Chris Malley	3932bf6059	sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake Spelling error in sysfs_create_file kerneldoc. Signed-off-by: Chris Malley <mail@chrismalley.co.uk> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 03:14:32 +02:00
Uwe Kleine-König	dbe7f76dd6	fix typo "insted" -> "instead" Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:55:04 +02:00
Linus Torvalds	b04cde34cf	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: NFSv4: Fix an rpc_cred reference leakage in fs/nfs/delegation.c NFSv4: Ensure that we wait for the CLOSE request to complete NFS: Fix a race in sillyrename NFS: Fix a writeback race...	2007-10-19 14:31:42 -07:00
Jan Engelhardt	96de0e252c	Convert files to UTF-8 and some cleanups * Convert files to UTF-8. * Also correct some people's names (one example is Eißfeldt, which was found in a source file. Given that the author used an ß at all in a source file indicates that the real name has in fact a 'ß' and not an 'ss', which is commonly used as a substitute for 'ß' when limited to 7bit.) * Correct town names (Goettingen -> Göttingen) * Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313) Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:21:04 +02:00
Trond Myklebust	603c83da19	NFSv4: Fix an rpc_cred reference leakage in fs/nfs/delegation.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-19 17:19:30 -04:00
Trond Myklebust	a49c3c7736	NFSv4: Ensure that we wait for the CLOSE request to complete Otherwise, we do end up breaking close-to-open semantics. We also end up breaking some of the silly-rename tests in Connectathon on some setups. Please refer to the bug-report at http://bugzilla.linux-nfs.org/show_bug.cgi?id=150 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-19 17:19:25 -04:00
Trond Myklebust	565277f63c	NFS: Fix a race in sillyrename lookup() and sillyrename() can race one another because the sillyrename() completion cannot take the parent directory's inode->i_mutex since the latter may be held by whoever is calling dput(). We therefore have little option but to add extra locking to ensure that nfs_lookup() and nfs_atomic_open() do not race with the sillyrename completion. If somebody has looked up the sillyrenamed file in the meantime, we just transfer the sillydelete information to the new dentry. Please refer to the bug-report at http://bugzilla.linux-nfs.org/show_bug.cgi?id=150 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-19 17:19:16 -04:00
Trond Myklebust	61e930a904	NFS: Fix a writeback race... This patch fixes a regression that was introduced by commit `44dd151d5c` We cannot zero the user page in nfs_mark_uptodate() any more, since a) We'd be modifying the page without holding the page lock b) We can race with other updates of the page, most notably because of the call to nfs_wb_page() in nfs_writepage_setup(). Instead, we do the zeroing in nfs_update_request() if we see that we're creating a request that might potentially be marked as up to date. Thanks to Olivier Paquet for reporting the bug and providing a test-case. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-19 17:18:57 -04:00
Robert P. J. Day	3a4fa0a25d	Fix misspellings of "system", "controller", "interrupt" and "necessary". Fix the various misspellings of "system", controller", "interrupt" and "[un]necessary". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:10:43 +02:00
Linus Torvalds	ec2626815b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: fix guest time accounting going faster than user time accounting	2007-10-19 12:07:03 -07:00
Linus Torvalds	2843483d2e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (51 commits) [CIFS] log better errors on failed mounts [CIFS] Return better error when server requires signing but client forbids [CIFS] fix typo [CIFS] acl support part 4 [CIFS] Fix minor problems noticed by scan [CIFS] fix bad handling of EAGAIN error on kernel_recvmsg in cifs_demultiplex_thread [CIFS] build break [CIFS] endian fixes [CIFS] endian fixes in new acl code [CIFS] Fix some endianness problems in new acl code [CIFS] missing #endif from a previous patch [CIFS] formatting fixes [CIFS] Break up unicode_sessetup string functions [CIFS] parse server_GUID in SPNEGO negProt response [CIFS] [CIFS] Fix endian conversion problem in posix mkdir [CIFS] fix build break when lanman not enabled [CIFS] remove two sparse warnings [CIFS] remove compile warnings when debug disabled [CIFS] CIFS ACL support part 3 ...	2007-10-19 12:00:58 -07:00
Linus Torvalds	f768f9d375	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] cleanup fid types mess [XFS] fixups after behavior removal merge into mainline git	2007-10-19 11:55:50 -07:00
Pavel Emelyanov	69cccb887a	Use task_pid_nr() instead of pid_nr(task_pid()) There are two places that do so - the cgroups subsystem and the autofs code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Ian Kent <raven@themaw.net> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:43 -07:00
Pavel Emelyanov	457c25107b	Remove unused variables from fs/proc/base.c When removing the explicit task_struct->pid usage I found that proc_readfd_common() and proc_pident_readdir() get this field, but do not use it at all. So this cleanup is a cheap help with the task_struct->pid isolation. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:43 -07:00
Pavel Emelyanov	ba25f9dcc4	Use helpers to obtain task pid in printks The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:43 -07:00
Eugene Teo	270f722d4d	Fix tsk->exit_state usage tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD. A non-zero test is the same as tsk->exit_state & (EXIT_ZOMBIE \| EXIT_DEAD), so just testing tsk->exit_state is sufficient. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:42 -07:00
Neil Horman	d85f50d5e1	proc: export a processes resource limits via /proc/pid Currently, there exists no method for a process to query the resource limits of another process. They can be inferred via some mechanisms but they cannot be explicitly determined. Given that this information can be usefull to know during the debugging of an application, I've written this patch which exports all of a processes limits via /proc/<pid>/limits. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:42 -07:00
Jiri Slaby	1276b103c2	fs/select, remove unused macros fs/select, remove unused macros this is due to preparation for global BIT macro Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:41 -07:00
Pavel Emelyanov	bac0abd617	Isolate some explicit usage of task->tgid With pid namespaces this field is now dangerous to use explicitly, so hide it behind the helpers. Also the pid and pgrp fields o task_struct and signal_struct are to be deprecated. Unfortunately this patch cannot be sent right now as this leads to tons of warnings, so start isolating them, and deprecate later. Actually the p->tgid == pid has to be changed to has_group_leader_pid(), but Oleg pointed out that in case of posix cpu timers this is the same, and thread_group_leader() is more preferable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	228ebcbe63	Uninline find_task_by_xxx set of functions The find_task_by_something is a set of macros are used to find task by pid depending on what kind of pid is proposed - global or virtual one. All of them are wrappers above the most generic one - find_task_by_pid_type_ns() - and just substitute some args for it. It turned out, that dereferencing the current->nsproxy->pid_ns construction and pushing one more argument on the stack inline cause kernel text size to grow. This patch moves all this stuff out-of-line into kernel/pid.c. Together with the next patch it saves a bit less than 400 bytes from the .text section. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	b488893a39	pid namespaces: changes to show virtual ids to user This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	6f4e643353	pid namespaces: initialize the namespace's proc_mnt The namespace's proc_mnt must be kern_mount-ed to make this pointer always valid, independently of whether the user space mounted the proc or not. This solves raced in proc_flush_task, etc. with the proc_mnt switching from NULL to not-NULL. The initialization is done after the init's pid is created and hashed to make proc_get_sb() finr it and get for root inode. Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the superblock holds the namespace we must explicitly break this circle to destroy all the stuff. This is done after the init of the namespace dies. Running a few steps forward - when init exits it will kill all its children, so no proc_mnt will be needed after its death. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	130f77ecb2	pid namespaces: make proc_flush_task() actually from entries from multiple namespaces This means that proc_flush_task_mnt() is to be called for many proc mounts and with different ids, depending on the namespace this pid is to be flushed from. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	07543f5c75	pid namespaces: make proc have multiple superblocks - one for each namespace Each pid namespace have to be visible through its own proc mount. Thus we need to have per-namespace proc trees with their own superblocks. We cannot easily show different pid namespace via one global proc tree, since each pid refers to different tasks in different namespaces. E.g. pid 1 refers to the init task in the initial namespace and to some other task when seeing from another namespace. Moreover - pid, exisintg in one namespace may not exist in the other. This approach has one move advantage is that the tasks from the init namespace can see what tasks live in another namespace by reading entries from another proc tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	198fe21b0a	pid namespaces: helpers to find the task by its numerical ids When searching the task by numerical id on may need to find it using global pid (as it is done now in kernel) or by its virtual id, e.g. when sending a signal to a task from one namespace the sender will specify the task's virtual id and we should find the task by this value. [akpm@linux-foundation.org: fix gfs2 linkage] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	60347f6716	pid namespaces: prepare proc_flust_task() to flush entries from multiple proc trees The first part is trivial - we just make the proc_flush_task() to operate on arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to it. The other change is more tricky: I moved the proc_flush_task() call in release_task() higher to address the following problem. When flushing task from many proc trees we need to know the set of ids (not just one pid) to find the dentries' names to flush. Thus we need to pass the task's pid to proc_flush_task() as struct pid is the only object that can provide all the pid numbers. But after __exit_signal() task has detached all his pids and this information is lost. This creates a tiny gap for proc_pid_lookup() to bring some dentries back to tree and keep them in hash (since pids are still alive before __exit_signal()) till the next shrink, but since proc_flush_task() does not provide a 100% guarantee that the dentries will be flushed, this is OK to do so. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Pavel Emelyanov	8bf9725c29	pid namespaces: introduce MS_KERNMOUNT flag This flag tells the .get_sb callback that this is a kern_mount() call so that it can trust data pointer to be valid in-kernel one. If this flag is passed from the user process, it is cleared since the data pointer is not a valid kernel object. Running a few steps forward - this will be needed for proc to create the superblock and store a valid pid namespace on it during the namespace creation. The reason, why the namespace cannot live without proc mount is described in the appropriate patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Matthias Kaehlcke	d473012710	fs/super.c: use list_for_each_entry() instead of list_for_each() fs/super.c: use list_for_each_entry() instead of list_for_each() in sget() [akpm@linux-foundation.org: clean up some crap while we're there] Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Matthias Kaehlcke	b70c394099	fs/eventpoll.c: use list_for_each_entry() instead of list_for_each() fs/eventpoll.c: use list_for_each_entry() instead of list_for_each() in ep_poll_safewake() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Matthias Kaehlcke	cfdaf9e5f9	fs/file_table.c: use list_for_each_entry() instead of list_for_each() fs/file_table.c: use list_for_each_entry() instead of list_for_each() in fs_may_remount_ro() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Pavel Emelyanov	cf7b708c8d	Make access to task's nsproxy lighter When someone wants to deal with some other taks's namespaces it has to lock the task and then to get the desired namespace if the one exists. This is slow on read-only paths and may be impossible in some cases. E.g. Oleg recently noticed a race between unshare() and the (sent for review in cgroups) pid namespaces - when the task notifies the parent it has to know the parent's namespace, but taking the task_lock() is impossible there - the code is under write locked tasklist lock. On the other hand switching the namespace on task (daemonize) and releasing the namespace (after the last task exit) is rather rare operation and we can sacrifice its speed to solve the issues above. The access to other task namespaces is proposed to be performed like this: rcu_read_lock(); nsproxy = task_nsproxy(tsk); if (nsproxy != NULL) { / * * work with the namespaces here * e.g. get the reference on one of them * / } / * * NULL task_nsproxy() means that this task is * almost dead (zombie) * / rcu_read_unlock(); This patch has passed the review by Eric and Oleg :) and, of course, tested. [clg@fr.ibm.com: fix unshare()] [ebiederm@xmission.com: Update get_net_ns_by_pid] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Sukadev Bhattiprolu	3743ca05ff	pid namespaces: use task_pid() to find leader's pid Use task_pid() to get leader's 'struct pid' and avoid the find_pid(). Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Sukadev Bhattiprolu	88f21d8182	pid namespaces: rename child_reaper() function Rename the child_reaper() function to task_child_reaper() to be similar to other task_* functions and to distinguish the function from 'struct pid_namspace.child_reaper'. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Sukadev Bhattiprolu	2894d650cd	pid namespaces: define and use task_active_pid_ns() wrapper With multiple pid namespaces, a process is known by some pid_t in every ancestor pid namespace. Every time the process forks, the child process also gets a pid_t in every ancestor pid namespace. While a process is visible in >=1 pid namespaces, it can see pid_t's in only one pid namespace. We call this pid namespace it's "active pid namespace", and it is always the youngest pid namespace in which the process is known. This patch defines and uses a wrapper to find the active pid namespace of a process. The implementation of the wrapper will be changed in when support for multiple pid namespaces are added. Changelog: 2.6.22-rc4-mm2-pidns1: - [Pavel Emelianov, Alexey Dobriyan] Back out the change to use task_active_pid_ns() in child_reaper() since task->nsproxy can be NULL during task exit (so child_reaper() continues to use init_pid_ns). to implement child_reaper() since init_pid_ns.child_reaper to implement child_reaper() since tsk->nsproxy can be NULL during exit. 2.6.21-rc6-mm1: - Rename task_pid_ns() to task_active_pid_ns() to reflect that a process can have multiple pid namespaces. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Pavel Emelianov	a47afb0f9d	pid namespaces: round up the API The set of functions process_session, task_session, process_group and task_pgrp is confusing, as the names can be mixed with each other when looking at the code for a long time. The proposals are to * equip the functions that return the integer with _nr suffix to represent that fact, * and to make all functions work with task (not process) by making the common prefix of the same name. For monotony the routines signal_session() and set_signal_session() are replaced with task_session_nr() and set_task_session(), especially since they are only used with the explicit task->signal dereference. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Paul Menage	8793d854ed	Task Control Groups: make cpusets a client of cgroups Remove the filesystem support logic from the cpusets system and makes cpusets a cgroup subsystem The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get passed through to the cgroup filesystem with the appropriate options to emulate the old cpuset filesystem behaviour. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	a424316ca1	Task Control Groups: add procfs interface Add: /proc/cgroups - general system info /proc/*/cgroup - per-task cgroup membership info [a.p.zijlstra@chello.nl: cgroups: bdi init hooks] Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Jeff Mahoney	cb680c1be6	reiserfs: ignore on disk s_bmap_nr value Implement support for file systems larger than 8 TiB. The reiserfs superblock contains a 16 bit value for counting the number of bitmap blocks. The rest of the disk format supports file systems up to 2^32 blocks, but the bitmap block limitation artificially limits this to 8 TiB with a 4KiB block size. Rather than trust the superblock's 16-bit bitmap block count, we calculate it dynamically based on the number of blocks in the file system. When an incorrect value is observed in the superblock, it is zeroed out, ensuring that older kernels will not be able to mount the file system. Userspace support has already been implemented and shipped in reiserfsprogs 3.6.20. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	4d20851d37	reiserfs: remove first_zero_hint The first_zero_hint metadata caching was never actually used, and it's of dubious optimization quality. This patch removes it. It doesn't actually shrink the size of the reiserfs_bitmap_info struct, since that doesn't work with block sizes larger than 8K. There was a big fixme in there, and with all the work lately in allowing block size > page size, I might as well kill the fixme as well. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	3ee1667042	reiserfs: fix usage of signed ints for block numbers Do a quick signedness check for block numbers. There are a number of places where signed integers are used for block numbers, which limits the usable file system size to 8 TiB. The disk format, excepting a problem which will be fixed in the following patch, supports file systems up to 16 TiB in size. This patch cleans up those sites so that we can enable the full usable size. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	6c57c2c8d3	reiserfs: fix memset byte count during resize Correct the memset in reiserfs_resize to clear the memory allocated for the new bitmap info structs. Previously, it would clear the memory used by the old size. Depending on the contents of memory, this could cause incorrect caching behavior for bitmap blocks in the newly allocated area. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	d4c3d19d0c	reiserfs: use is_reusable to catch corruption Build in is_reusable() unconditionally and use it to catch corruption before it reaches the block freeing paths. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	8e186e454e	reiserfs: dont use BUG when panicking Change reiserfs_panic() to use panic() initially instead of BUG(). Using BUG() ignores the configurable panic behavior, so systems that should be failing and rebooting are left hanging. This causes problems in active/standby HA scenarios. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	7598392894	reiserfs: fix up lockdep warnings Add I_MUTEX_XATTR annotations to the inode locking in the reiserfs xattr code. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jose R. Santos	9ad163ae0d	JBD: Fix JBD warnings when compiling with CONFIG_JBD_DEBUG Note from Mingming's JBD2 fix: Noticed all warnings are occurs when the debug level is 0. Then found the "jbd2: Move jbd2-debug file to debugfs" patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b changed the jbd2_journal_enable_debug from int type to u8, makes the jbd_debug comparision is always true when the debugging level is 0. Thus the compile warning occurs. Thought about changing the jbd2_journal_enable_debug data type back to int, but can't, because the jbd2-debug is moved to debug fs, where calling debugfs_create_u8() to create the debugfs entry needs the value to be u8 type. Even if we changed the data type back to int, the code is still buggy, kernel should not print jbd2 debug message if the jbd2_journal_enable_debug is set to 0. But this is not the case. The fix is change the level of debugging to 1. The same should fixed in ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we probably should fix it all together. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jan Kara	7a266e75cf	jbd: fix commit code to properly abort journal We should really call journal_abort() and not __journal_abort_hard() in case of errors. The latter call does not record the error in the journal superblock and thus filesystem won't be marked as with errors later (and user could happily mount it without any warning). Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jose R. Santos	c2a9159cdd	jbd: config_jbd_debug cannot create /proc entry The jbd-debug file used to be located in /proc/sys/fs/jbd-debug, but create_proc_entry() does not do lookups on file names that are more that one directory deep. This causes the entry creation to fail and hence, no proc file is created. Instead of fixing this on procfs might as well move the jbd2-debug file to debugfs which would be the preferred location for this kind of tunable. The new location is now /sys/kernel/debug/jbd/jbd-debug. [akpm@linux-foundation.org: zillions of cleanups] Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Mingming Cao	8c3478a523	JBD/ext3 cleanups: convert to kzalloc Convert kmalloc to kzalloc() and get rid of the memset(). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:34 -07:00
Coly Li	3ed75eb8f1	setup vma->vm_page_prot by vm_get_page_prot() This patch uses vm_get_page_prot() to setup vma->vm_page_prot. Though inside vm_get_page_prot() the protection flags is AND with (VM_READ\|VM_WRITE\|VM_EXEC\|VM_SHARED), it does not hurt correct code. Signed-off-by: Coly Li <coyli@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:34 -07:00
Miklos Szeredi	c18479fe01	put declaration of put_filesystem() in fs.h Declarations go into headers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Ram Pai <linuxram@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:33 -07:00
Christian Borntraeger	f9e26291be	sched: fix guest time accounting going faster than user time accounting cputime_add already adds, dont do it twice. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-19 20:52:40 +02:00
Christoph Hellwig	c6143911a7	[XFS] cleanup fid types mess Currently XFs has three different fid types: struct fid, struct xfs_fid and struct xfs_fid2 with hte latter two beeing identicaly and the first one beeing the same size but an unstructured array with the same size. This patch consolidates all this to alway uuse struct xfs_fid. This patch is required for an upcoming patch series from me that revamps the nfs exporting code and introduces a Linux-wide struct fid. SGI-PV: 970336 SGI-Modid: xfs-linux-melb:xfs-kern:29651a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-19 18:02:55 +10:00
Christoph Hellwig	c8fcfac5a2	[XFS] fixups after behavior removal merge into mainline git Fixup for lack of dmapi support and no quota module support. SGI-PV: 969985 Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-19 17:14:45 +10:00
Steve French	a761ac579b	[CIFS] log better errors on failed mounts Also returns more accurate errors to mount for the cases of account expired and password expired Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-10-18 21:45:27 +00:00
Stephen Hemminger	c80544dc0b	sparse pointer use of zero as null Get rid of sparse related warnings from places that use integer as NULL pointer. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Ian Kent <raven@themaw.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	0e9663ee45	fuse: add blksize field to fuse_attr There are cases when the filesystem will be passed the buffer from a single read or write call, namely: 1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go through the page cache, but go directly to the userspace fs 2) currently buffered writes are done with single page requests, but if Nick's ->perform_write() patch goes it, it will be possible to do larger write requests. But only if the original write() was also bigger than a page. In these cases the filesystem might want to give a hint to the app about the optimal I/O size. Allow the userspace filesystem to supply a blksize value to be returned by stat() and friends. If the field is zero, it defaults to the old PAGE_CACHE_SIZE value. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	f33321141b	fuse: add support for mandatory locking For mandatory locking the userspace filesystem needs to know the lock ownership for read, write and truncate operations. This patch adds the necessary fields to the protocol. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	b25e82e567	fuse: add helper for asynchronous writes This patch adds a new helper function fuse_write_fill() which makes it possible to send WRITE requests asynchronously. A new flag for WRITE requests is also added which indicates that this a write from the page cache, and not a "normal" file write. This patch is in preparation for writable mmap support. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	93a8c3cd9e	fuse: add list of writable files to fuse_inode Each WRITE request must carry a valid file descriptor. When a page is written back from a memory mapping, the file through which the page was dirtied is not available, so a new mechananism is needed to find a suitable file in ->writepage(s). A list of fuse_files is added to fuse_inode. The file is removed from the list in fuse_release(). This patch is in preparation for writable mmap support. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	a9ff4f8705	fuse: support BSD locking semantics It is trivial to add support for flock(2) semantics to the existing protocol, by setting the lock owner field to the file pointer, and passing a new FUSE_LK_FLOCK flag with the locking request. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	6ff958edbf	fuse: add atomic open+truncate support This patch allows fuse filesystems to implement open(..., O_TRUNC) as a single request, instead of separate truncate and open requests. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	17637cbaba	fuse: improve utimes support Add two new flags for setattr: FATTR_ATIME_NOW and FATTR_MTIME_NOW. These mean, that atime or mtime should be changed to the current time. Also it is now possible to update atime or mtime individually, not just together. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	d139d7ffd0	VFS: allow filesystems to implement atomic open+truncate Add a new attribute flag ATTR_OPEN, with the meaning: "truncation was initiated by open() due to the O_TRUNC flag". This way filesystems wanting to implement truncation within their ->open() method can ignore such truncate requests. This is a quick & dirty hack, but it comes for free. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	49d4914fd7	fuse: clean up open file passing in setattr Clean up supplying open file to the setattr operation. In addition to being a cleanup it prepares for the changes in the way the open file is passed to the setattr method. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	c79e322f63	fuse: add file handle to getattr operation Add necessary protocol changes for supplying a file handle with the getattr operation. Step the API version to 7.9. This patch doesn't actually supply the file handle, because that needs some kind of VFS support, which we haven't yet been able to agree upon. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	1fb69e7817	fuse: fix race between getattr and write Getattr and lookup operations can be running in parallel to attribute changing operations, such as write and setattr. This means, that if for example getattr was slower than a write, the cached size attribute could be set to a stale value. To prevent this race, introduce a per-filesystem attribute version counter. This counter is incremented whenever cached attributes are modified, and the incremented value stored in the inode. Before storing new attributes in the cache, getattr and lookup check, using the version number, whether the attributes have been modified during the request's lifetime. If so, the returned attributes are not cached, because they might be stale. Thanks to Jakub Bogusz for the bug report and test program. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jakub Bogusz <jakub.bogusz@gemius.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	e57ac68378	fuse: fix allowing operations The following operation didn't check if sending the request was allowed: setattr listxattr statfs Some other operations don't explicitly do the check, but VFS calls ->permission() which checks this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Eric Sandeen	42a2b6ad71	ext3: fix setup_new_group_blocks locking setup_new_group_blocks() manipulates the group descriptor block bh under the block_bitmap bh's lock. It shouldn't matter since nobody but resize should be touching these blocks, but it's worth fixing up. Signed-off-by: Eric Sandeen <sandeen@redhat.com> C: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Takashi Sato	0f0a89ebe1	ext3: support large blocksize up to PAGESIZE This patch set supports large block size(>4k, <=64k) in ext3 just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext3 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 2 patches. [1/2] ext3: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext3: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext3, and able to handle empty directory block. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Andi Drebes	4176ed5938	fs/cramfs/inode.c: replace hardcoded value with preprocessor constant Remove the hardcoded value 256 in fs/cramfs/inode.c and replaces it with CRAMFS_MAXPATHLEN. Tested on an i386 box. Signed-off-by: Andi Drebes <lists-receive@programmierforen.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Andi Drebes	6bbfb07766	fs/cramfs/inode.c: remove unused variable Remove a variable that is never read. Signed-off-by: Andi Drebes <lists-receive@programmierforen.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Jeff Layton	d32c4f2626	CIFS: ignore mode change if it's just for clearing setuid/setgid bits If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing the setuid/setgid bits. For CIFS, skip the mode change and let the server handle it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Jeff Layton	188b95dd8e	NFS: if ATTR_KILL_SID bits are set, then skip mode change If the ATTR_KILL_SID bits are set then any mode change is only for clearing the setuid/setgid bits. For NFS, skip the mode change and let the server handle it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Jeff Layton	6de0ec00ba	VFS: make notify_change pass ATTR_KILL_SID to setattr operations When an unprivileged process attempts to modify a file that has the setuid or setgid bits set, the VFS will attempt to clear these bits. The VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid mask, and then call notify_change to clear these bits and set the mode accordingly. With a networked filesystem (NFS and CIFS in particular but likely others), the client machine or process may not have credentials that allow for setting the mode. In some situations, this can lead to file corruption, an operation failing outright because the setattr fails, or to races that lead to a mode change being reverted. In this situation, we'd like to just leave the handling of this to the server and ignore these bits. The problem is that by the time the setattr op is called, the VFS has already reinterpreted the ATTR_KILL_ bits into a mode change. The setattr operation has no way to know its intent. The following patch fixes this by making notify_change no longer clear the ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before handing it off to the setattr inode op. setattr can then check for the presence of these bits, and if they're set it can assume that the mode change was only for the purposes of clearing these bits. This means that we now have an implicit assumption that notify_change is never called with ATTR_MODE and either ATTR_KILL_S*ID bit set. Nothing currently enforces that, so this patch also adds a BUG() if that occurs. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Jeff Layton	cdd6fe6e2f	reiserfs: turn of ATTR_KILL_S*ID at beginning of reiserfs_setattr reiserfs_setattr can call notify_change recursively using the same iattr struct. This could cause it to trip the BUG() in notify_change. Fix reiserfs to clear those bits near the beginning of the function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Jeff Layton	8a0ce7d99a	knfsd: only set ATTR_KILL_SID if ATTR_MODE isn't being explicitly set It's theoretically possible for a single SETATTR call to come in that sets the mode and the uid/gid. In that case, don't set the ATTR_KILL_SID bits since that would trip the BUG() in notify_change. Just fix up the mode to have the same effect. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Jeff Layton	1ac564ecab	ecryptfs: allow lower fs to interpret ATTR_KILL_SID Make sure ecryptfs doesn't trip the BUG() in notify_change. This also allows the lower filesystem to interpret ATTR_KILL_SID in its own way. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:21 -07:00
Alexey Dobriyan	6212e3a388	Remove struct task_struct::io_wait Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b during 2.6.9 development. Commit introduced io_wait field which remained write-only than and still remains write-only. Also garbage collect macros which "use" io_wait. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:20 -07:00
Steve French	abb63d6c3d	[CIFS] Return better error when server requires signing but client forbids Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-10-18 02:58:40 +00:00
Steve French	d628ddb62d	[CIFS] fix typo Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-10-17 23:06:07 +00:00
Steve French	a750e77c21	[CIFS] acl support part 4 Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-10-17 22:50:39 +00:00
Eric Sandeen	149041070d	ext4: lighten up resize transaction requirements When resizing online, setup_new_group_blocks attempts to reserve a potentially very large transaction, depending on the current filesystem geometry. For some journal sizes, there may not be enough room for this transaction, and the online resize will fail. The patch below resizes & restarts the transaction as necessary while setting up the new group, and should work with even the smallest journal. Tested with something like: [root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768 [root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384 [root@newbox ~]# mount -o loop fsfile mnt/ [root@newbox ~]# resize2fs /dev/loop0 resize2fs 1.40.2 (12-Jul-2007) Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required old desc_blocks = 1, new_desc_blocks = 1 Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks. resize2fs: No space left on device While trying to add group #2 [root@newbox ~]# dmesg \| tail -n 1 JBD: resize2fs wants too many credits (258 > 256) [root@newbox ~]# With the below change, it works. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com>	2007-10-17 18:50:04 -04:00
Eric Sandeen	5b615287b3	ext4: fix setup_new_group_blocks locking setup_new_group_blocks() manipulates the group descriptor block bh under the block_bitmap bh's lock. It shouldn't matter since nobody but resize should be touching these blocks, but it's worth fixing up. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:50:04 -04:00
Aneesh Kumar K.V	ac39849ddc	ext4: sparse fixes Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	d8dd0b4543	ext4: Convert ext4_extent_idx.ei_leaf to ext4_extent_idx.ei_leaf_lo Convert ext4_extent_idx.ei_leaf ext4_extent_idx.ei_leaf_lo This helps in finding BUGs due to direct partial access of these split 48 bit values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	b377611d11	ext4: Convert ext4_extent.ee_start to ext4_extent.ee_start_lo Convert ext4_extent.ee_start to ext4_extent.ee_start_lo This helps in finding BUGs due to direct partial access of these split 48 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	308ba3ece7	ext4: Convert s_r_blocks_count and s_free_blocks_count Convert s_r_blocks_count and s_free_blocks_count to s_r_blocks_count_lo and s_free_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	6bc9feff14	ext4: Convert s_blocks_count to s_blocks_count_lo Convert s_blocks_count to s_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	5272f83727	ext4: Convert bg_inode_bitmap and bg_inode_table Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo and bg_inode_table_lo. This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix one direct partial access Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	3a14589cce	ext4: Convert bg_block_bitmap to bg_block_bitmap_lo Convert bg_block_bitmap to bg_block_bitmap_lo This helps in catching some BUGS due to direct partial access of these split fields. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:01 -04:00
Jose R. Santos	ce42158179	ext4: FLEX_BG Kernel support v2. This feature relaxes check restrictions on where each block groups meta data is located within the storage media. This allows for the allocation of bitmaps or inode tables outside the block group boundaries in cases where bad blocks forces us to look for new blocks which the owning block group can not satisfy. This will also allow for new meta-data allocation schemes to improve performance and scalability. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:01 -04:00
Aneesh Kumar K.V	c1bddad949	ext4: Fix sparse warnings Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:01 -04:00
Andreas Dilger	717d50e497	Ext4: Uninitialized Block Groups In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count / __le16 bg_checksum; / crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:00 -04:00
Eric Sandeen	4074fe3736	ext4: remove #ifdef CONFIG_EXT4_INDEX CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext4_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:00 -04:00
Coly Li	f077d0d7ea	ext4: Remove (partial, never completed) fragment support Fragment support in ext2/3/4 was never implemented, and it probably will never be implemented. So remove it from ext4. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:49:59 -04:00
Jose R. Santos	6f38c74f5a	JBD2: debug code cleanup. Mostly stolen from akpm's JBD cleanup patch. - use `#ifdef foo' instead of `#if defined(foo)' - Make journal_enable_debug __read_mostly just for the heck of it - Make jbd_debugfs_dir and jbd_debug static - debugfs_remove(NULL) is legal: remove unneeded tests - remove unnecessary empty loops Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:49:59 -04:00
Jan Kara	a7fa2baf8e	jbd2: fix commit code to properly abort journal We should really call journal_abort() and not __journal_abort_hard() in case of errors. The latter call does not record the error in the journal superblock and thus filesystem won't be marked as with errors later (and user could happily mount it without any warning). Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:49:58 -04:00
Mingming Cao	cd02ff0b14	jbd2: JBD_XXX to JBD2_XXX naming cleanup change JBD_XXX macros to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:49:58 -04:00
Mingming Cao	d802ffa885	JBD2/Ext4: Convert kmalloc to kzalloc in jbd2/ext4 Convert kmalloc to kzalloc() and get rid of the memset(). Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:49:57 -04:00

1 2 3 4 5 ...

7072 Commits