linux

Author	SHA1	Message	Date
Eric W. Biederman	8e96e3b7b8	userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-03 03:29:34 -07:00
David Howells	1dce27c5aa	Wrap accesses to the fd_sets in struct fdtable Wrap accesses to the fd_sets in struct fdtable (for recording open files and close-on-exec flags) so that we can move away from using fd_sets since we abuse the fd_set structs by not allocating the full-sized structure under normal circumstances and by non-core code looking at the internals of the fd_sets. The first abuse means that use of FD_ZERO() on these fd_sets is not permitted, since that cannot be told about their abnormal lengths. This introduces six wrapper functions for setting, clearing and testing close-on-exec flags and fd-is-open flags: void __set_close_on_exec(int fd, struct fdtable fdt); void __clear_close_on_exec(int fd, struct fdtable fdt); bool close_on_exec(int fd, const struct fdtable fdt); void __set_open_fd(int fd, struct fdtable fdt); void __clear_open_fd(int fd, struct fdtable fdt); bool fd_is_open(int fd, const struct fdtable fdt); Note that I've prepended '__' to the names of the set/clear functions because they require the caller to hold a lock to use them. Note also that I haven't added wrappers for looking behind the scenes at the the array. Possibly that should exist too. Signed-off-by: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk>	2012-02-19 10:30:52 -08:00
Serge E. Hallyn	2e14967075	userns: rename is_owner_or_cap to inode_owner_or_capable And give it a kernel-doc comment. [akpm@linux-foundation.org: btrfs changed in linux-next] Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-23 19:47:13 -07:00
Al Viro	1abf0c718f	New kind of open files - "location only". New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does NOT affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Namhyung Kim	3cd90ea42f	vfs: sparse: add __FMODE_EXEC FMODE_EXEC is a constant type of fmode_t but was used with normal integer constants. This results in following warnings from sparse. Fix it using new macro __FMODE_EXEC. fs/exec.c:116:58: warning: restricted fmode_t degrades to integer fs/exec.c:689:58: warning: restricted fmode_t degrades to integer fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 16:03:19 -08:00
Linus Torvalds	55f335a885	fasync: Fix placement of FASYNC flag comment In commit `f7347ce4ee` ("fasync: re-organize fasync entry insertion to allow it under a spinlock") Arnd took an earlier patch of mine that had the comment about the FASYNC flag above the wrong function. When the fasync_add_entry() function was split to introduce the new fasync_insert_entry() helper function, the code that actually cares about the FASYNC bit moved to that new helper. So just move the comment to the right point. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:17:02 -07:00
Linus Torvalds	f7347ce4ee	fasync: re-organize fasync entry insertion to allow it under a spinlock You currently cannot use "fasync_helper()" in an atomic environment to insert a new fasync entry, because it will need to allocate the new "struct fasync_struct". Yet fcntl_setlease() wants to call this under lock_flocks(), which is in the process of being converted from the BKL to a spinlock. In order to fix this, this abstracts out the actual fasync list insertion and the fasync allocations into functions of their own, and teaches fs/locks.c to pre-allocate the fasync_struct entry. That way the actual list insertion can happen while holding the required spinlock. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bfields@redhat.com: rebase on top of my changes to Arnd's patch] Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-27 22:06:17 +02:00
James Bottomley	3ab04d5cf9	vfs: take O_NONBLOCK out of the O_* uniqueness test O_NONBLOCK on parisc has a dual value: #define O_NONBLOCK 000200004 /* HPUX has separate NDELAY & NONBLOCK / It is caught by the O_ bits uniqueness check and leads to a parisc compile error. The fix would be to take O_NONBLOCK out. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Jamie Lokier <jamie@shareable.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-09 18:57:25 -07:00
Wu Fengguang	454eedb890	vfs: O_* bit numbers uniqueness check The O_* bit numbers are defined in 20+ arch/*, and can silently overlap. Add a compile time check to ensure the uniqueness as suggested by David Miller. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:02 -07:00
Andrew Morton	f4985dc714	fs/fcntl.c:kill_fasync_rcu() fa_lock must be IRQ-safe Fix a lockdep-splat-causing regression introduced by commit `989a297920` ("fasync: RCU and fine grained locking"). kill_fasync() can be called from both process and hard-irq context, so fa_lock must be taken with IRQs disabled. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16230 Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:32 -07:00
Dan Carpenter	5b54470dad	fcntl: return -EFAULT if copy_to_user fails copy_to_user() returns the number of bytes remaining, but we want to return -EFAULT. ret = fcntl(fd, F_SETOWN_EX, NULL); With the original code ret would be 8 here. V2: Takuya Yoshikawa pointed out a similar issue in f_getown_ex() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-06-04 17:16:28 -04:00
Jens Axboe	ee9a3607fb	Merge branch 'master' into for-2.6.35 Conflicts: fs/ext3/fsync.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-21 21:27:26 +02:00
Jens Axboe	35f3d14dbb	pipe: add support for shrinking and growing pipes This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for growing and shrinking the size of a pipe and adjusts pipe.c and splice.c (and relay and network splice) usage to work with these larger (or smaller) pipes. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-21 21:12:40 +02:00
Eric Dumazet	989a297920	fasync: RCU and fine grained locking kill_fasync() uses a central rwlock, candidate for RCU conversion, to avoid cache line ping pongs on SMP. fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short section instead during whole list scan. Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and fasync_{remove\|add}_entry(). This spinlock is IRQ safe, so sock_fasync() doesnt need its own implementation and can use fasync_helper(), to reduce code size and complexity. We can remove __kill_fasync() direct use in net/socket.c, and rename it to kill_fasync_rcu(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-21 16:19:29 -07:00
Jiri Slaby	d554ed895d	fs: use rlimit helpers Make sure compiler won't do weird things with limits. E.g. fetching them twice may return 2 different values after writable limits are implemented. I.e. either use rlimit helpers added in commit `3e10e716ab` ("resource: add helpers for fetching rlimits") or ACCESS_ONCE if not applicable. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:29 -08:00
Linus Torvalds	80e1e82398	Fix race in tty_fasync() properly This reverts commit `7036251180` ("tty: fix race in tty_fasync") and commit `b04da8bfdf` ("fnctl: f_modown should call write_lock_irqsave/ restore") that tried to fix up some of the fallout but was incomplete. It turns out that we really cannot hold 'tty->ctrl_lock' over calling __f_setown, because not only did that cause problems with interrupt disables (which the second commit fixed), it also causes a potential ABBA deadlock due to lock ordering. Thanks to Tetsuo Handa for following up on the issue, and running lockdep to show the problem. It goes roughly like this: - f_getown gets filp->f_owner.lock for reading without interrupts disabled, so an interrupt that happens while that lock is held can cause a lockdep chain from f_owner.lock -> sighand->siglock. - at the same time, the tty->ctrl_lock -> f_owner.lock chain that commit `7036251180` introduced, together with the pre-existing sighand->siglock -> tty->ctrl_lock chain means that we have a lock dependency the other way too. So instead of extending tty->ctrl_lock over the whole __f_setown() call, we now just take a reference to the 'pid' structure while holding the lock, and then release it after having done the __f_setown. That still guarantees that 'struct pid' won't go away from under us, which is all we really ever needed. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Américo Wang <xiyou.wangcong@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-02-07 10:26:01 -08:00
Greg Kroah-Hartman	b04da8bfdf	fnctl: f_modown should call write_lock_irqsave/restore Commit `7036251180` exposed that f_modown() should call write_lock_irqsave instead of just write_lock_irq so that because a caller could have a spinlock held and it would not be good to renable interrupts. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tavis Ormandy <taviso@google.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-26 17:25:38 -08:00
Linus Torvalds	53281b6d34	fasync: split 'fasync_helper()' into separate add/remove functions Yes, the add and remove cases do share the same basic loop and the locking, but the compiler can inline and then CSE some of the end result anyway. And splitting it up makes the code way easier to follow, and makes it clearer exactly what the semantics are. In particular, we must make sure that the FASYNC flag in file->f_flags exactly matches the state of "is this file on any fasync list", since not only is that flag visible to user space (F_GETFL), but we also use that flag to check whether we need to remove any fasync entries on file close. We got that wrong for the case of a mixed use of file locking (which tries to remove any fasync entries for file leases) and fasync. Splitting the function up also makes it possible to do some future optimizations without making the function even messier. In particular, since the FASYNC flag has to match the state of "is this on a list", we can do the following future optimizations: - on remove, we don't even need to get the locks and traverse the list if FASYNC isn't set, since we can know a priori that there is no point (this is effectively the same optimization that we already do in __fput() wrt removing fasync on file close) - on add, we can use the FASYNC flag to decide whether we are changing an existing entry or need to allocate a new one. but this is just the cleanup + fix for the FASYNC flag. Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Tested-by: Tavis Ormandy <taviso@google.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Matt Mackall <mpm@selenic.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 10:05:29 -08:00
Peter Zijlstra	978b4053ae	fcntl: rename F_OWNER_GID to F_OWNER_PGRP This is for consistency with various ioctl() operations that include the suffix "PGRP" in their names, and also for consistency with PRIO_PGRP, used with setpriority() and getpriority(). Also, using PGRP instead of GID avoids confusion with the common abbreviation of "group ID". I'm fine with anything that makes it more consistent, and if PGRP is what is the predominant abbreviation then I see no need to further confuse matters by adding a third one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-17 17:40:33 -08:00
Peter Zijlstra	ba0a6c9f6f	fcntl: add F_[SG]ETOWN_EX In order to direct the SIGIO signal to a particular thread of a multi-threaded application we cannot, like suggested by the manpage, put a TID into the regular fcntl(F_SETOWN) call. It will still be send to the whole process of which that thread is part. Since people do want to properly direct SIGIO we introduce F_SETOWN_EX. The need to direct SIGIO comes from self-monitoring profiling such as with perf-counters. Perf-counters uses SIGIO to notify that new sample data is available. If the signal is delivered to the same task that generated the new sample it can augment that data by inspecting the task's user-space state right after it returns from the kernel. This is esp. convenient for interpreted or virtual machine driven environments. Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex as argument: struct f_owner_ex { int type; pid_t pid; }; Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Tested-by: stephane eranian <eranian@googlemail.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Roland McGrath <roland@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:01 -07:00
Oleg Nesterov	06f1631a16	signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() group_send_sig_info()->check_kill_permission() assumes that current is the sender and uses current_cred(). This is not true in send_sigio_to_task() case. From the security pov the sender is not current, but the task which did fcntl(F_SETOWN), that is why we have sigio_perm() which uses the right creds to check. Fortunately, send_sigio() always sends either SEND_SIG_PRIV or SI_FROMKERNEL() signal, so check_kill_permission() does nothing. But still it would be tidier to avoid this bogus security check and save a couple of cycles. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stephane eranian <eranian@googlemail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:01 -07:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Oleg Nesterov	8eeee4e2f0	send_sigio_to_task: sanitize the usage of fown->signum send_sigio_to_task() reads fown->signum several times, we can race with F_SETSIG which changes ->signum lockless. In theory, this can fool security checks or we can call group_send_sig_info() with the wrong ->si_signo which does not match "int sig". Change the code to cache ->signum. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 15:36:17 -07:00
Oleg Nesterov	2f38d70fb4	shift current_cred() from __f_setown() to f_modown() Shift current_cred() from __f_setown() to f_modown(). This reduces the number of arguments and saves 48 bytes from fs/fcntl.o. [ Note: this doesn't clear euid/uid when pid is set to NULL. But if f_owner.pid == NULL we never use f_owner.uid/euid. Otherwise we'd have a bug anyway: we must not send signals if pid was reset to NULL. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 14:19:00 -07:00
Jeff Mahoney	2b79bc4f7e	dup2: Fix return value with oldfd == newfd and invalid fd The return value of dup2 when oldfd == newfd and the fd isn't valid is not getting properly sign extended. We end up with 4294967287 instead of -EBADF. I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory (2.6.29-rc5), and Ubuntu 9.04 (2.6.28). This patch uses a signed int for the error value so it is properly extended. Commit `6c5d0512a0` introduced this regression. Reported-by: Jiri Dluhos <jdluhos@novell.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-11 12:18:06 -07:00
Jonathan Corbet	4a6a449969	Fix a lockdep warning in fasync_helper() Lockdep gripes if file->f_lock is taken in a no-IRQ situation, since that is not always the case. We don't really want to disable IRQs for every acquisition of f_lock; instead, just move it outside of fasync_lock. Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2009-03-30 08:00:24 -06:00
Jonathan Corbet	60aa49243d	Rationalize fasync return values Most fasync implementations do something like: return fasync_helper(...); But fasync_helper() will return a positive value at times - a feature used in at least one place. Thus, a number of other drivers do: err = fasync_helper(...); if (err < 0) return err; return 0; In the interests of consistency and more concise code, it makes sense to map positive return values onto zero where ->fasync() is called. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2009-03-16 08:34:35 -06:00
Jonathan Corbet	76398425bb	Move FASYNC bit handling to f_op->fasync() Removing the BKL from FASYNC handling ran into the challenge of keeping the setting of the FASYNC bit in filp->f_flags atomic with regard to calls to the underlying fasync() function. Andi Kleen suggested moving the handling of that bit into fasync(); this patch does exactly that. As a result, we have a couple of internal API changes: fasync() must now manage the FASYNC bit, and it will be called without the BKL held. As it happens, every fasync() implementation in the kernel with one exception calls fasync_helper(). So, if we make fasync_helper() set the FASYNC bit, we can avoid making any changes to the other fasync() functions - as long as those functions, themselves, have proper locking. Most fasync() implementations do nothing but call fasync_helper() - which has its own lock - so they are easily verified as correct. The BKL had already been pushed down into the rest. The networking code has its own version of fasync_helper(), so that code has been augmented with explicit FASYNC bit handling. Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Miller <davem@davemloft.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2009-03-16 08:32:27 -06:00
Jonathan Corbet	db1dd4d376	Use f_lock to protect f_flags Traditionally, changes to struct file->f_flags have been done under BKL protection, or with no protection at all. This patch causes all f_flags changes after file open/creation time to be done under protection of f_lock. This allows the removal of some BKL usage and fixes a number of longstanding (if microscopic) races. Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2009-03-16 08:32:27 -06:00
Heiko Carstens	a26eab2400	[CVE-2009-0029] System call wrappers part 15 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:24 +01:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Jonathan Corbet	218d11a8b0	Fix a race condition in FASYNC handling Changeset `a238b790d5` (Call fasync() functions without the BKL) introduced a race which could leave file->f_flags in a state inconsistent with what the underlying driver/filesystem believes. Revert that change, and also fix the same races in ioctl_fioasync() and ioctl_fionbio(). This is a minimal, short-term fix; the real fix will not involve the BKL. Reported-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-05 15:35:10 -08:00
David Howells	c69e8d9c01	CRED: Use RCU to access another task's creds and to release a task's own creds Use RCU to access another task's creds and to release a task's own creds. This means that it will be possible for the credentials of a task to be replaced without another task (a) requiring a full lock to read them, and (b) seeing deallocated memory. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:19 +11:00
David Howells	86a264abe5	CRED: Wrap current->cred and a few other accessors Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:18 +11:00
David Howells	b6dff3ec5e	CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:16 +11:00
David Howells	da9592edeb	CRED: Wrap task credential accesses in the filesystem subsystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:05 +11:00
Al Viro	1b7e190b47	[PATCH] clean dup2() up a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:24 -04:00
Al Viro	1027abe882	[PATCH] merge locate_fd() and get_unused_fd() New primitive: alloc_fd(start, flags). get_unused_fd() and get_unused_fd_flags() become wrappers on top of it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:23 -04:00
Al Viro	4e1e018ecc	[PATCH] fix RLIM_NOFILE handling * dup2() should return -EBADF on exceeded sysctl_nr_open * dup() should not return -EINVAL even if you have rlimit set to 0; it should get -EMFILE instead. Check for orig_start exceeding rlimit taken to sys_fcntl(). Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF. Consequently, remaining checks for rlimit are taken to expand_files(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:45 -04:00
Al Viro	6c5d0512a0	[PATCH] get rid of corner case in dup3() entirely Since Ulrich is OK with getting rid of dup3(fd, fd, flags) completely, to hell the damn thing goes. Corner case for dup2() is handled in sys_dup2() (complete with -EBADF if dup2(fd, fd) is called with fd that is not open), the rest is done in dup3(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:44 -04:00
Ulrich Drepper	3c333937ee	[PATCH] dup3 fix Al Viro notice one cornercase that the new dup3() code. The dup2() function, as a special case, handles dup-ing to the same file descriptor. In this case the current dup3() code does nothing at all. I.e., it ingnores the flags parameter. This shouldn't happen, the close-on-exec flag should be set if requested. In case the O_CLOEXEC bit in the flags parameter is not set the dup3() function should behave in this respect identical to dup2(). This means dup3(fd, fd, 0) should not actively reset the c-o-e flag. The patch below implements this minor change. [AV: credits to Artur Grabowski for bringing that up as potential subtle point in dup2() behaviour] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:39 -04:00
Ulrich Drepper	336dd1f70f	flag parameters: dup2 This patch adds the new dup3 syscall. It extends the old dup2 syscall by one parameter which is meant to hold a flag value. Support for the O_CLOEXEC flag is added in this patch. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_dup3 # ifdef __x86_64__ # define __NR_dup3 292 # elif defined __i386__ # define __NR_dup3 330 # else # error "need __NR_dup3" # endif #endif int main (void) { int fd = syscall (__NR_dup3, 1, 4, 0); if (fd == -1) { puts ("dup3(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("dup3(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC); if (fd == -1) { puts ("dup3(O_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("dup3(O_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Jonathan Corbet	a238b790d5	Call fasync() functions without the BKL lock_kernel() calls have been pushed down into code which needs it, so there is no need to take the BKL at this level anymore. This work inspired and aided by Andi Kleen's unlocked_fasync() patches. Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-07-02 15:06:28 -06:00
Al Viro	9f3acc3140	[PATCH] split linux/file.h Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-05-01 13:08:16 -04:00
Al Viro	f8f95702f0	[PATCH] sanitize locate_fd() * 'file' argument is unused; lose it. * move setting flags from the caller (dupfd()) to locate_fd(); pass cloexec flag as new argument. Note that files_fdtable() that used to be in dupfd() isn't needed in the place in locate_fd() where the moved code ends up - we know that ->file_lock hadn't been dropped since the last time we calculated fdt because we can get there only if expand_files() returns 0 and it doesn't drop/reacquire in that case. * move getting/dropping ->file_lock into locate_fd(). Now the caller doesn't need to do anything with files_struct files anymore and we can move that inside locate_fd() as well, killing the struct files_struct argument. At that point locate_fd() is extremely similar to get_unused_fd_flags() and the next patches will merge those two. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-25 09:24:05 -04:00
Harvey Harrison	fc9b52cd8f	fs: remove fastcall, it is always empty [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Pavel Emelyanov	6c5f3e7b43	Pidns: make full use of xxx_vnr() calls Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were _all_ converted to operate on the current pid namespace. After this each call like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo) one. Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where appropriate. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:29 -08:00
Pavel Emelyanov	b488893a39	pid namespaces: changes to show virtual ids to user This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Ulrich Drepper	22d2b35b20	F_DUPFD_CLOEXEC implementation One more small change to extend the availability of creation of file descriptors with FD_CLOEXEC set. Adding a new command to fcntl() requires no new system call and the overall impact on code size if minimal. If this patch gets accepted we will also add this change to the next revision of the POSIX spec. To test the patch, use the following little program. Adjust the value of F_DUPFD_CLOEXEC appropriately. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #ifndef F_DUPFD_CLOEXEC # define F_DUPFD_CLOEXEC 12 #endif int main (int argc, char *argv[]) { if (argc > 1) { if (fcntl (3, F_GETFD) == 0) { puts ("descriptor not closed"); exit (1); } if (errno != EBADF) { puts ("error not EBADF"); exit (1); } exit (0); } int fd = fcntl (STDOUT_FILENO, F_DUPFD_CLOEXEC, 0); if (fd == -1 && errno == EINVAL) { puts ("F_DUPFD_CLOEXEC not supported"); return 0; } if (fd != 3) { puts ("program called with descriptors other than 0,1,2"); return 1; } execl ("/proc/self/exe", "/proc/self/exe", "1", NULL); puts ("execl failed"); return 1; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:01 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00

1 2

71 Commits