We don't change pid_ns->child_reaper when the main thread of the
subnamespace init exits. As Robert Rex <robert.rex@exasol.com> pointed
out this is wrong.
Yes, the re-parenting itself works correctly, but if the reparented task
exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
main thread is zombie its ->nsproxy was already cleared by
exit_task_namespaces().
Introduce the new function, find_new_reaper(), which finds the new
->parent for the re-parenting and changes ->child_reaper if needed. Kill
the now unneeded exit_child_reaper().
Also move the changing of ->child_reaper from zap_pid_ns_processes() to
find_new_reaper(), this consolidates the games with ->child_reaper and
makes it stable under tasklist_lock.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391
Reported-by: Robert Rex <robert.rex@exasol.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child. But this doesn't mean ->children list is empty, we
may have EXIT_DEAD tasks which are not visible to do_wait(). In that case
the subsequent forget_original_parent() will crash the kernel because it
will try to re-parent these tasks to the NULL reaper.
Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.
Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter. These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.
Note that we must change ->child_reaper, otherwise
forget_original_parent() will use reaper == father, and in that case we
will hit the (correct) BUG_ON(!list_empty(&father->children)).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At91_mci is abusing dma_free_coherent(), which may not be called with IRQs
disabled. I saw "mkfs.ext3" on an MMC card objecting voluminously as each
write completed:
WARNING: at arch/arm/mm/consistent.c:368 dma_free_coherent+0x2c/0x224()
[<c002726c>] (dump_stack+0x0/0x14) from [<c00387d4>] (warn_on_slowpath+0x4c/0x68)
[<c0038788>] (warn_on_slowpath+0x0/0x68) from [<c0028768>] (dma_free_coherent+0x2c/0x224)
r6:00008008 r5:ffc06000 r4:00000000
[<c002873c>] (dma_free_coherent+0x0/0x224) from [<c01918ac>] (at91_mci_irq+0x374/0x420)
[<c0191538>] (at91_mci_irq+0x0/0x420) from [<c0065d9c>] (handle_IRQ_event+0x2c/0x6c)
...
This bug has been around for a LONG time. The MM warning is from late
2005, but the driver merged a year later ... so I'm puzzled why nobody
noticed this before now.
The fix involves noting that this buffer shouldn't be DMA-coherent; it's
just used for normal DMA writes. So replace it with standard kmalloc()
buffering and DMA mapping calls.
This is the quickie fix. A better one would not rely on allocating large
bounce buffers. (Note that dma_alloc_coherent could have failed too, but
that case was ignored... kmalloc is a bit more likely to fail though.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Pierre Ossman <drzeus-mmc@drzeus.cx>
Cc: Andrew Victor <linux@maxim.org.za>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes to tighten the check for UARTs that don't correctly
re-assert THRE (01c194d927: "serial 8250:
tighten test for using backup timer") caused problems when such a UART was
opened for the second time - the bug could only successfully be detected
at first initialization. For users of this version of this particular
UART IP it is fatal.
This patch stores the information about the bug in the bugs field of the
port structure when the port is first started up so subsequent opens can
check this bit even if the test for the bug fails.
David Brownell: "My own exposure to this is that the UART on DaVinci
hardware, which TI allegedly derived from its original 16550 logic, has
periodically gone from working to unusable with the mainline 8250.c ...
and back and forth a bunch. Currently it's "unusable", a regression from
some previous versions. With this patch from Will, it's usable."
Signed-off-by: Will Newton <will.newton@gmail.com>
Acked-by: Alex Williamson <alex.williamson@hp.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Brownell <david-b@pacbell.net>
Cc: <stable@kernel.org> [2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: vmlinux.o(.data+0x1f5c0): Section mismatch in reference from the variable contig_page_data to the variable .init.data:bootmem_node_data
The variable contig_page_data references
the variable __initdata bootmem_node_data
If the reference is valid then annotate the
variable with __init* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Sean MacLennan <smaclennan@pikatech.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The exit function neglects to remove debugfs entries, leading to a BUG
on reload.
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Russ Dill <Russ.Dill@gmail.com>
Acked-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dio write returns EIO when try_to_release_page fails because bh is
still referenced.
The patch
commit 3f31fddfa2
Author: Mingming Cao <cmm@us.ibm.com>
Date: Fri Jul 25 01:46:22 2008 -0700
jbd: fix race between free buffer and commit transaction
was merged into 2.6.27-rc1, but I noticed that this patch is not enough
to fix the race.
I did fsstress test heavily to 2.6.27-rc1, and found that dio write still
sometimes got EIO through this test.
The patch above fixed race between freeing buffer(dio) and committing
transaction(jbd) but I discovered that there is another race, freeing
buffer(dio) and ext3/4_ordered_writepage.
: background_writeout()
->write_cache_pages()
->ext3_ordered_writepage()
walk_page_buffers() -> take a bh ref
block_write_full_page() -> unlock_page
: <- end_page_writeback
: <- race! (dio write->try_to_release_page fails)
walk_page_buffers() ->release a bh ref
ext3_ordered_writepage holds bh ref and does unlock_page remaining
taking a bh ref, so this causes the race and failure of
try_to_release_page.
To fix this race, I used the approach of falling back to buffered
writes if try_to_release_page() fails on a page.
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have gotten to the root cause of the hugetlb badness I reported back on
August 15th. My system has the following memory topology (note the
overlapping node):
Node 0 Memory: 0x8000000-0x44000000
Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000
setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking
for a pageblock to move onto the MIGRATE_RESERVE list. Finding no
candidates, it happily continues the scan into 0x8000000-0x44000000. When
a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on
the wrong zone. Oops.
setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the location of the NTFS homepage in several files.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix device tree ... don't forget to set the parent device
* let init/exit code be removed where practical
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
[bart: splitted it from bigger DaVinci patch, s/hw.parent/hw.dev/]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
hwif_to_node() incorrectly assumes that hwif->dev always belongs to
a PCI device. This results in ide-cs oopsing in init_irq() after
commit c56c5648a3 accidentally fixed
device tree registration for ide-cs. Fix it by using dev_to_node().
Thanks to Martin Michlmayr and Larry Finger for help with debugging
the issue.
Reported-by: Martin Michlmayr <tbm@cyrius.com>
Tested-by: Martin Michlmayr <tbm@cyrius.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The sff_dma_ops struct should be wrapped by BLK_DEV_IDEDMA_SFF instead
of BLK_DEV_IDEDMA_PCI.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'for-linus' of git://neil.brown.name/md:
Fix problem with waiting while holding rcu read lock in md/bitmap.c
Remove invalidate_partition call from do_md_stop.
With the new firmware infrastructure in 2.6.27, some files are generated and shouldn't be
diffed; add these 2 to the "dontdiff" file
Signed-off-by: Arjan van de Ven <arjan@Linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a typo in commit 2a2a64714d "Disable MWAIT via DMI on broken Compal board".
It allows the nomwait dmi check to actually detect the Acer 5220.
Signed-off-by: Dennis Jansen <dennis.jansen@web.de>
Tested-by: Dennis Jansen <dennis.jansen@web.de>
Acked-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux:
nfsd: fix buffer overrun decoding NFSv4 acl
sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports
nfsd: fix compound state allocation error handling
svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet
Fix operator precedence bug in atari_keyb_init, which caused a failure on CT60
Signed-off-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A parisc allmodconfig build produces this:
arch/parisc/hpux/fs.c:107: error: 'buffer' undeclared (first use in this function)
Introduced by commit da574983de ("[PATCH]
fix hpux_getdents()").
Helge Dille also reported this in bugzilla 11461:
http://bugzilla.kernel.org/show_bug.cgi?id=11461
and he posted an identical patch.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The recent commit 16d9679f33caf7e683471647d1472bfe133d858 changed
check_hung_task() to filter out the TASK_KILLABLE tasks. We can
move this check to the caller which has to test t->state anyway.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warning for new function:
Warning(linux-2.6.27-rc5-git2//kernel/resource.c:448): No description found for parameter 'root'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: restore original behavior of /proc/partition when there's no partition
remove blk_register_filter and blk_unregister_filter in gendisk
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: setup_valid_addr_bitmap_from_pavail() should be __init
sparc: Fix resource flags for PCI children in OF device tree.
sparc32: Implement smp_call_function_single().
Breaking lines due to some imaginary problem with a long line length is
often stupid and wrong, but never more so when it splits a string that
is printed out into multiple lines. This really ended up making it much
harder to find where some error strings were printed out, because a
simple 'grep' didn't work.
I'm sure there is tons more of this particular idiocy hiding in other
places, but this particular case hit me once more last week. So fix it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The MEMGETREGIONINFO ioctl() in mtdchar.c was clobbering user memory by
overwriting more than intended, due the size of struct mtd_erase_region_info
changing in commit 0ecbc81adf ('Support
for auto locking flash on power up').
Fix avoids this by copying struct members one by one with put_user(), as there
is no longer a convenient struct to use the size of as the length argument to
copy_to_user().
Signed-off-by: Zev Weiss <zevweiss@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch fixes a memory leak in an error path.
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
got rid of compilation warning:
ISO C90 forbids mixed declarations and code
Signed-off-by: Cordelia Sam <cordesam@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The array we kmalloc() here is not large enough.
Thanks to Johann Dahm and David Richter for bug report and testing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Richter <richterd@citi.umich.edu>
Tested-by: Johann Dahm <jdahm@umich.edu>
Vegard Nossum reported
----------------------
> I noticed that something weird is going on with /proc/sys/sunrpc/transports.
> This file is generated in net/sunrpc/sysctl.c, function proc_do_xprt(). When
> I "cat" this file, I get the expected output:
> $ cat /proc/sys/sunrpc/transports
> tcp 1048576
> udp 32768
> But I think that it does not check the length of the buffer supplied by
> userspace to read(). With my original program, I found that the stack was
> being overwritten by the characters above, even when the length given to
> read() was just 1.
David Wagner added (among other things) that copy_to_user could be
probably used here.
Ingo Oeser suggested to use simple_read_from_buffer() here.
The conclusion is that proc_do_xprt doesn't check for userside buffer
size indeed so fix this by using Ingo's suggestion.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Greg Banks <gnb@sgi.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the cstate_alloc call so that if it fails, the response is setup to
encode the NFS error. The out label now means that the
nfsd4_compound_state has not been allocated.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
/proc/partitions didn't use to write out the header if there was no
partition. However, recent commit 66c64afe changed the behavior.
This is nothing major but there's no reason to change user visible
behavior without a good rationale. Restore the original behavior.
Note that 2.6.28 has clean up changes scheduled which will replace
this rather hacky implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
pxa2xx-i2s: probe actual device and use it for clk_get call
thus fixing error during startup hook
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Added the EQ distortion fix to the dell_m6_core_init.
Signed-off-by: Matthew Ranostay <mranostay@embeddedalley.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
A recent patch to protect the rdev list with rcu locking leaves us
with a problem because we can sleep on memalloc while holding the
rcu lock.
The rcu lock is only needed while walking the linked list as
uninteresting devices (failed or spares) can be removed at any time.
So only take the rcu lock while actually walking the linked list.
Take a refcount on the rdev during the time when we drop the lock
and do the memalloc to start IO.
When we return to the locked code, all the interesting devices
on the list will not have moved, so we can simply use
list_for_each_continue_rcu to pick up where we left off.
Signed-off-by: NeilBrown <neilb@suse.de>
When stopping an md array, or just switching to read-only, we
currently call invalidate_partition while holding the mddev lock.
The main reason for this is probably to ensure all dirty buffers
are flushed (invalidate_partition calls fsync_bdev).
However if any dirty buffers are found, it will almost certainly cause
a deadlock as starting writeout will require an update to the
superblock, and performing that updates requires taking the mddev
lock - which is already held.
This deadlock can be demonstrated by running "reboot -f -n" with
a root filesystem on md/raid, and some dirty buffers in memory.
All other calls to stop an array should already happen after a flush.
The normal sequence is to stop using the array (e.g. umount) which
will cause __blkdev_put to call sync_blockdev. Then open the
array and issue the STOP_ARRAY ioctl while the buffers are all still
clean.
So this invalidate_partition is normally a no-op, except for one case
where it will cause a deadlock.
So remove it.
This patch possibly addresses the regression recored in
http://bugzilla.kernel.org/show_bug.cgi?id=11460
and
http://bugzilla.kernel.org/show_bug.cgi?id=11452
though it isn't yet clear how it ever worked.
Signed-off-by: NeilBrown <neilb@suse.de>
If this triggers its bad, however some machines seem to have been
triggering it for ages and we didn't know until we added the debug.
So downgrade the debug now so people don't call this a regression.
Signed-off-by: Dave Airlie <airlied@redhat.com>