sys_mount() reads/copies a whole page for its "type" parameter. When
do_mount_root() passes a kernel address that points to an object which is
smaller than a whole page, copy_mount_options() will happily go past this
memory object, possibly dereferencing "wild" pointers that could be in any
state (hence the kmemcheck warning, which shows that parts of the next
page are not even allocated).
(The likelihood of something going wrong here is pretty low -- first of
all this only applies to kernel calls to sys_mount(), which are mostly
found in the boot code. Secondly, I guess if the page was not mapped,
exact_copy_from_user() _would_ in fact handle it correctly because of its
access_ok(), etc. checks.)
But it is much nicer to avoid the dubious reads altogether, by stopping as
soon as we find a NUL byte. Is there a good reason why we can't do
something like this, using the already existing strndup_from_user()?
[akpm@linux-foundation.org: make copy_mount_string() static]
[AV: fix compat mount breakage, which involves undoing akpm's change above]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: al <al@dizzy.pdmi.ras.ru>
Most call sites of unload_nls() do:
if (nls)
unload_nls(nls);
Check the pointer inside unload_nls() like we do in kfree() and
simplify the call sites.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steve French <sfrench@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently we held s_umount while a filesystem is frozen, despite that we
might return to userspace and unlock it from a different process. Instead
grab an active reference to keep the file system busy and add an explicit
check for frozen filesystems in remount and reject the remount instead
of blocking on s_umount.
Add a new get_active_super helper to super.c for use by freeze_bdev that
grabs an active reference to a superblock from a given block device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we have the freeze count there is not much reason for bd_mount_sem
anymore. The actual freeze/thaw operations are serialized using the
bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is
get_sb_bdev which tries to prevent mounting a filesystem while the block
device is frozen. Instead of add a check for bd_fsfreeze_count and
return -EBUSY if a filesystem is frozen. While that is a change in user
visible behaviour a failing mount is much better for this case rather
than having the mount process stuck uninterruptible for a long time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
the two places inside exofs that where taking the BKL were:
exofs_put_super() - .put_super
and
exofs_sync_fs() - which is .sync_fs and is also called from
.write_super.
Now exofs_sync_fs() is protected from itself by also taking
the sb_lock.
exofs_put_super() directly calls exofs_sync_fs() so there is no
danger between these two either.
In anyway there is absolutely nothing dangerous been done
inside exofs_sync_fs().
Unless there is some subtle race with the actual lifetime of
the super_block in regard to .put_super and some other parts
of the VFS. Which is highly unlikely.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
romfs_fill_super() assumes that romfs_iget() returns NULL when
it fails. romfs_iget() actually returns ERR_PTR(-ve) in that
case...
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add two helpers that allow access to the seq_file's own buffer, but
hide the internal details of seq_files.
This allows easier implementation of special purpose filling
functions. It also cleans up some existing functions which duplicated
the seq_file logic.
Make these inline functions in seq_file.h, as suggested by Al.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As Johannes Weiner pointed out, one of the range checks in do_sendfile
is redundant and is already checked in rw_verify_area.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
sb->s_maxbytes is supposed to indicate the maximum size of a file that can
exist on the filesystem. It's declared as an unsigned long long.
Even if a filesystem has no inherent limit that prevents it from using
every bit in that unsigned long long, it's still problematic to set it to
anything larger than MAX_LFS_FILESIZE. There are places in the kernel
that cast s_maxbytes to a signed value. If it's set too large then this
cast makes it a negative number and generally breaks the comparison.
Change s_maxbytes to be loff_t instead. That should help eliminate the
temptation to set it too large by making it a signed value.
Also, add a warning for couple of releases to help catch filesystems that
set s_maxbytes too large. Eventually we can either convert this to a
BUG() or just remove it and in the hope that no one will get it wrong now
that it's a signed value.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If fiemap_check_ranges is passed a large enough value, then it's
possible that the value would be cast to a signed value for comparison
against s_maxbytes when we change it to loff_t. Make sure that doesn't
happen by explicitly casting s_maxbytes to an unsigned value for the
purposes of comparison.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Love <rlove@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently all simple_attr.set handlers return 0 on success and negative
codes on error. Fix simple_attr_write() to return these error codes.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
seq_path_root() is returning a return value of successful __d_path()
instead of returning a negative value when mangle_path() failed.
This is not a bug so far because nobody is using return value of
seq_path_root().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Do a similar optimization as earlier for touch_atime. Getting the lock in
mnt_get_write is relatively costly, so try all avenues to avoid it first.
This patch is careful to still only update inode fields inside the lock
region.
This didn't show up in benchmarks, but it's easy enough to do.
[akpm@linux-foundation.org: fix typo in comment]
[hugh.dickins@tiscali.co.uk: fix inverted test of mnt_want_write_file()]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Valerie Aurora <vaurora@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Some benchmark testing shows touch_atime to be high up in profile logs for
IO intensive workloads. Most likely that's due to the lock in
mnt_want_write(). Unfortunately touch_atime first takes the lock, and
then does all the other tests that could avoid atime updates (like noatime
or relatime).
Do it the other way round -- first try to avoid the update and only then
if that didn't succeed take the lock. That works because none of the
atime avoidance tests rely on locking.
This also eliminates a goto.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Valerie Aurora <vaurora@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hugetlbfs needs to do special things instead of truncate_inode_pages().
Currently, it copied generic_forget_inode() except for
truncate_inode_pages() call which is asking for trouble (the code there
isn't trivial). So create a separate function generic_detach_inode()
which does all the list magic done in generic_forget_inode() and call
it from hugetlbfs_forget_inode().
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add device-id and inode number for better debugging. This was suggested
by Andreas in one of the threads
http://article.gmane.org/gmane.comp.file-systems.ext4/12062 .
"If anyone has a chance, fixing this error message to be not-useless would
be good... Including the device name and the inode number would help
track down the source of the problem."
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Impact: have simple_read_from_buffer conform to standards
It was brought to my attention by Andrew Morton, Theodore Tso, and H.
Peter Anvin that a read from userspace should only return -EFAULT if
nothing was actually read.
Looking at the simple_read_from_buffer I noticed that this function does
not conform to that rule. This patch fixes that function.
[akpm@linux-foundation.org: simplification suggested by hpa]
[hpa@zytor.com: fix count==0 handling]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
cpumask: Move deprecated functions to end of header.
cpumask: remove unused deprecated functions, avoid accusations of insanity
cpumask: use new-style cpumask ops in mm/quicklist.
cpumask: use mm_cpumask() wrapper: x86
cpumask: use mm_cpumask() wrapper: um
cpumask: use mm_cpumask() wrapper: mips
cpumask: use mm_cpumask() wrapper: mn10300
cpumask: use mm_cpumask() wrapper: m32r
cpumask: use mm_cpumask() wrapper: arm
cpumask: Use accessors for cpu_*_mask: um
cpumask: Use accessors for cpu_*_mask: powerpc
cpumask: Use accessors for cpu_*_mask: mips
cpumask: Use accessors for cpu_*_mask: m32r
cpumask: remove arch_send_call_function_ipi
cpumask: arch_send_call_function_ipi_mask: s390
cpumask: arch_send_call_function_ipi_mask: powerpc
cpumask: arch_send_call_function_ipi_mask: mips
cpumask: arch_send_call_function_ipi_mask: m32r
cpumask: arch_send_call_function_ipi_mask: alpha
cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
...
* remove asm/atomic.h inclusion from linux/utsname.h --
not needed after kref conversion
* remove linux/utsname.h inclusion from files which do not need it
NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit c02e3f361c ("kmod: fix race in usermodehelper code")
The patch is wrong. UMH_WAIT_EXEC is called with VFORK what ensures
that the child finishes prior returing back to the parent. No race.
In fact, the patch makes it even worse because it does the thing it
claims not do:
- It calls ->complete() on UMH_WAIT_EXEC
- the complete() callback may de-allocated subinfo as seen in the
following call chain:
[<c009f904>] (__link_path_walk+0x20/0xeb4) from [<c00a094c>] (path_walk+0x48/0x94)
[<c00a094c>] (path_walk+0x48/0x94) from [<c00a0a34>] (do_path_lookup+0x24/0x4c)
[<c00a0a34>] (do_path_lookup+0x24/0x4c) from [<c00a158c>] (do_filp_open+0xa4/0x83c)
[<c00a158c>] (do_filp_open+0xa4/0x83c) from [<c009ba90>] (open_exec+0x24/0xe0)
[<c009ba90>] (open_exec+0x24/0xe0) from [<c009bfa8>] (do_execve+0x7c/0x2e4)
[<c009bfa8>] (do_execve+0x7c/0x2e4) from [<c0026a80>] (kernel_execve+0x34/0x80)
[<c0026a80>] (kernel_execve+0x34/0x80) from [<c004b514>] (____call_usermodehelper+0x130/0x148)
[<c004b514>] (____call_usermodehelper+0x130/0x148) from [<c0024858>] (kernel_thread_exit+0x0/0x8)
and the path pointer was NULL. Good that ARM's kernel_execve()
doesn't check the pointer for NULL or else I wouldn't notice it.
The only race there might be is with UMH_NO_WAIT but it is too late for
me to investigate it now. UMH_WAIT_PROC could probably also use VFORK
and we could save one exec. So the only race I see is with UMH_NO_WAIT
and recent scheduler changes where the child does not always run first
might have trigger here something but as I said, it is late....
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new ones have pretty kerneldoc. Move the old ones to the end to
avoid confusing people.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: benh@kernel.crashing.org
We're not forcing removal of the old cpu_ functions, but we might as
well delete the now-unused ones.
Especially CPUMASK_ALLOC and friends. I actually got a phone call (!)
from a hacker who thought I had introduced them as the new cpumask
API. He seemed bewildered that I had lost all taste.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: benh@kernel.crashing.org
Makes code futureproof against the impending change to mm->cpu_vm_mask (to be a pointer).
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes code futureproof against the impending change to mm->cpu_vm_mask
(to be a pointer).
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Also change the actual arg name here to "mm" (which it is), not "task".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Hirokazu Takata <takata@linux-m32r.org> (fixes)
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use the accessors rather than frobbing bits directly (the new versions
are const).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Use the accessors rather than frobbing bits directly (the new versions
are const).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Use the accessors rather than frobbing bits directly (the new versions
are const).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Use the accessors rather than frobbing bits directly (the new versions
are const).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask(), and by defining
it, the old arch_send_call_function_ipi is defined by the core code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask(), and by defining
it, the old arch_send_call_function_ipi is defined by the core code.
We also take the chance to wean the implementations off the
obsolescent for_each_cpu_mask(): making send_ipi_mask take the pointer
seemed the most natural way to ensure all implementations used
for_each_cpu.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask(), and by defining
it, the old arch_send_call_function_ipi is defined by the core code.
We also take the chance to wean the implementations off the
obsolescent for_each_cpu_mask(): making send_ipi_mask take the pointer
seemed the most natural way to ensure all implementations used
for_each_cpu.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask().
We also take the chance to wean the send_ipi_message off the
obsolescent for_each_cpu_mask(): making it take a pointer seemed the
most natural way to do this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
smp_call_function_many is the new version: it takes a pointer. Also,
use mm accessor macro while we're changing this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
set_cpus_allowed() is on the way out; replace it with
set_cpus_allowed_ptr().
Reference: http://lkml.org/lkml/2008/11/6/448
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
By 7be23e278f, mask field was deleted by irqaction. However, it was not
deleted from comment.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Up until 1.1.83, the primitive human tribes used struct sigaction for
interrupts. The sa_mask field was overloaded to hold a pointer to the
name.
When someone created the new "struct irqaction" they carried across
the "mask" field as a kind of ancestor worship: the fact that it was
unused makes clear its spiritual significance.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>