linux

Author	SHA1	Message	Date
Wim Van Sebroeck	78d88fc012	watchdog: WatchDog Timer Driver Core - Add ioctl call Add support for extra ioctl calls by adding a ioctl watchdog operation. This operation will be called before we do our own handling of ioctl commands. This way we can override the internal ioctl command handling and we can also add extra ioctl commands. The ioctl watchdog operation should return the appropriate error codes or -ENOIOCTLCMD if the ioctl command should be handled through the internal ioctl handling of the framework. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:16 +00:00
Wim Van Sebroeck	7e192b9c42	watchdog: WatchDog Timer Driver Core - Add nowayout feature Add support for the nowayout feature to the WatchDog Timer Driver Core framework. This feature prevents the watchdog timer from being stopped. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:14 +00:00
Wim Van Sebroeck	017cf08051	watchdog: WatchDog Timer Driver Core - Add Magic Close feature Add support for the Magic Close feature to the WatchDog Timer Driver Core framework. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:12 +00:00
Wim Van Sebroeck	014d694e5d	watchdog: WatchDog Timer Driver Core - Add WDIOC_SETTIMEOUT and WDIOC_GETTIMEOUT ioctl This part add's the WDIOC_SETTIMEOUT and WDIOC_GETTIMEOUT ioctl functionality to the WatchDog Timer Driver Core framework. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:11 +00:00
Wim Van Sebroeck	234445b4e4	watchdog: WatchDog Timer Driver Core - Add WDIOC_SETOPTIONS ioctl This part add's the WDIOC_SETOPTIONS ioctl functionality to the WatchDog Timer Driver Core framework. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:09 +00:00
Wim Van Sebroeck	c2dc00e494	watchdog: WatchDog Timer Driver Core - Add WDIOC_KEEPALIVE ioctl This part add's the WDIOC_KEEPALIVE ioctl functionality to the WatchDog Timer Driver Core framework. Please note that the WDIOF_KEEPALIVEPING bit has to be set in the watchdog_info options field. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:07 +00:00
Wim Van Sebroeck	2fa03560ab	watchdog: WatchDog Timer Driver Core - Add basic ioctl functionality This part add's the basic ioctl functionality to the WatchDog Timer Driver Core framework. The supported ioctl call's are: WDIOC_GETSUPPORT WDIOC_GETSTATUS WDIOC_GETBOOTSTATUS Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:05 +00:00
Wim Van Sebroeck	43316044d4	watchdog: WatchDog Timer Driver Core - Add basic framework The WatchDog Timer Driver Core is a framework that contains the common code for all watchdog-driver's. It also introduces a watchdog device structure and the operations that go with it. This is the introduction of this framework. This part supports the minimal watchdog userspace API (or with other words: the functionality to use /dev/watchdog's open, release and write functionality as defined in the simplest watchdog API). Extra functionality will follow in the next set of patches. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Wolfram Sang <w.sang@pengutronix.de>	2011-07-28 08:01:04 +00:00
Thomas Mingarelli	5efc7a6222	watchdog: hpwdt: add next gen HP servers This patch is required to enable hpwdt to work on next generation HP servers with iLO. Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-28 08:00:56 +00:00
Wim Van Sebroeck	2260286886	watchdog: it8712f_wdt.c: improve includes remove unneeded pci.h include. and include ioport.h to avoid build errors for the region functions. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-26 21:28:23 +00:00
Jean-Christophe Plagniol-Villard	e7b39145b5	watchdog: at91sam9/wdt: move register header to drivers move register header to drivers Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-26 21:22:15 +00:00
Alejandro Cabrera	e9659e69b0	watchdog: Add Xilinx watchdog timer driver Watchdog timer device driver for Xilinx xps_timebase_wdt compatible ip cores. It takes watchdog timer configuration from device tree and it needs that its parent has defined the property "clock-frecuency". It is compatible with watchdog timer kernel API, so user apps like watchdogd may talk with it. Signed-off-by: Alejandro Cabrera <aldaya@gmail.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-26 21:21:29 +00:00
Wolfram Sang	2fc5d52b21	watchdog: remove empty pm-functions While checking what watchdog drivers usually do in suspend/resume to spot common behaviour for the watchdog framework, I found these drivers which do nothing but add some cruft. Remove it, it is superfluous. New approaches should probably be done with pm_ops anyway. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-26 21:21:16 +00:00
Nick Bowler	081d83a339	watchdog: sp805: Flush posted writes in enable/disable. There are no reads in these functions, so if MMIO writes are posted, the writes in enable/disable may not have completed by the time these functions return. If the functions run from different CPUs, it's in theory possible for the writes to be interleaved, which would be disastrous for this driver. At the very least, we need an mmiowb() before releasing the lock, but since it seems desirable for the watchdog timer to be actually stopped or reset when these functions return, read the lock register to force the writes out. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-26 21:19:56 +00:00
Nick Bowler	da3e515024	watchdog: sp805: Don't write 0 to the load value register. At least on the Versatile Express' V2M, calling wdt_disable followed by wdt_enable, for instance by running the following sequence: echo V > /dev/watchdog; echo V > /dev/watchdog results in an immediate reset. The wdt_disable function writes 0 to the load register; while the watchdog interrupts are disabled at this point, this special value is defined to trigger an interrupt immediately. It appears that in this instance, the reset happens when the interrupts are subsequently enabled by wdt_enable. Putting in a short delay after writing a new load value in wdt_enable solves the issue, but it seems cleaner to simply never write 0 to the load register at all: according to the hardware docs, writing 0 to the control register suffices to stop the counter, and the write of 0 to the load register is questionable anyway since this register resets to 0xffffffff. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-26 21:19:43 +00:00
Shawn Guo	f5a427eede	watchdog: imx2_wdt: add device tree probe support Adds device tree probe support for imx2_wdt driver. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-26 21:19:16 +00:00
Thomas Abraham	9487a9cc71	watchdog: s3c2410: Add support for device tree based probe This patch adds the of_match_table to enable s3c2410-wdt driver to be probed when watchdog device node is found in the device tree. Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-22 09:00:43 +00:00
Peter Fordham	641e4f4495	watchdog: mpcore_wdt: Add suspend/resume support. Add support for suspend and resume to the MPCore watchdog driver. Signed-off-by: Peter Fordham <peter.fordham@gmail.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-22 09:00:20 +00:00
Florian Fainelli	fad0a9dd0d	watchdog: mtx1-wdt: use dev_{err,info} instead of printk() use dev_{err,info} instead of printk(KERN_{ERR,INFO} ...) Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-22 08:59:43 +00:00
Sascha Hauer	bfbc5e272d	watchdog: i.MX: use IMX_HAVE_PLATFORM_IMX2_WDT to depend on The i.MX architecture provides IMX_HAVE_PLATFORM_* macros to signal that a selected SoC supports a certain hardware. Use them instead of depending on ARCH_* directly. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Cc: linux-watchdog@vger.kernel.org	2011-07-22 08:58:38 +00:00
Jamie Iles	c9353ae1c6	watchdog: add support for the Synopsys DesignWare WDT The Synopsys DesignWare watchdog is found in several ARM based systems and provides a choice of 16 timeout periods depending on the clock input. The watchdog cannot be disabled once started. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Acked-by: Viresh Kumar <viresh.kumar@st.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-22 08:57:38 +00:00
Jonathan McDowell	7ccdb9467b	watchdog: pc87413_wdt: Cleanup pc87413 watchdog driver to use Inspired by Nat Gurumoorthy's recent patches for cleaning up the it87 drivers to use request_muxed_region for accessing the SuperIO area on these chips, and the fact I have a GPIO driver for the pc8741x basically ready for submission, here is a patch to cleanup the pc87413 watchdog driver to use request_muxed_region for accessing the SuperIO area. It also pulls out the details about the SWC IO area on initial driver load, and properly does a request_region for that area - there's no requirement to touch the SuperIO area after doing the initial watchdog enable and IO base retrieval. While I have hardware with a pc87413 on it it is not wired in a way that allows the watchdog to reboot the machine, so I have not been able to fully test these changes - I have checked that the driver correctly initialises itself still and requests the SWC io region ok. Signed-Off-By: Jonathan McDowell <noodles@earth.li> Signed-Off-By: Wim Van Sebroeck <wim@iguana.be>	2011-07-22 08:56:41 +00:00
Wim Van Sebroeck	97b08a6221	watchdog: iTCO_wdt: clean-up PCI device ID's Clean up of the iTCO_wdt PCI device ID's. Own macro is replaced by the PCI_VDEVICE macro. Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-22 08:56:11 +00:00
Nat Gurumoorthy	a134b82560	watchdog: Use "request_muxed_region" in it87 watchdog drivers Changes the it87 watchdog drivers to use "request_muxed_region". Serialize access to the hardware by using "request_muxed_region" macro defined by Alan Cox. Call to this macro will hold off the requestor if the resource is currently busy. The use of the above macro makes it possible to get rid of spinlocks in it8712f_wdt.c and it87_wdt.c watchdog drivers. This also greatly simplifies the implementation of it87_wdt.c driver. "superio_enter" will return an error if call to "request_muxed_region" fails. Rest of the code change is to ripple an error return from superio_enter to the top level. Signed-off-by: Nat Gurumoorthy <natg@google.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-07-22 08:55:23 +00:00
Linus Torvalds	02f8c6aee8	Linux 3.0	2011-07-21 19:17:23 -07:00
Linus Torvalds	1f922d0770	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: sparc,kgdbts: fix compile regression with kgdb test suite	2011-07-21 17:20:57 -07:00
Jason Wessel	33d8881af5	sparc,kgdbts: fix compile regression with kgdb test suite Commit `63ab25ebbc` (kgdbts: unify/generalize gdb breakpoint adjustment) introduced a compile regression on sparc. kgdbts.c: In function 'check_and_rewind_pc': kgdbts.c:307: error: implicit declaration of function 'instruction_pointer_set' Simply add the correct macro definition for instruction pointer on the Sparc architecture. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Acked-by: David S. Miller <davem@davemloft.net>	2011-07-21 17:29:49 -05:00
Linus Torvalds	2bafc7a275	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: CIFS: Fix wrong length in cifs_iovec_read	2011-07-21 14:28:01 -07:00
Linus Torvalds	57a6fa9acd	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Make Dell Latitude E6420 use reboot=pci x86: Make Dell Latitude E5420 use reboot=pci	2011-07-21 12:25:39 -07:00
H. Peter Anvin	a536877e77	x86: Make Dell Latitude E6420 use reboot=pci Yet another variant of the Dell Latitude series which requires reboot=pci. From the E5420 bug report by Daniel J Blueman: > The E6420 is affected also (same platform, different casing and > features), which provides an external confirmation of the issue; I can > submit a patch for that later or include it if you prefer: > http://linux.koolsolutions.com/2009/08/04/howto-fix-linux-hangfreeze-during-reboots-and-restarts/ Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>	2011-07-21 11:47:17 -07:00
Daniel J Blueman	b7798d28ec	x86: Make Dell Latitude E5420 use reboot=pci Rebooting on the Dell E5420 often hangs with the keyboard or ACPI methods, but is reliable via the PCI method. [ hpa: this was deferred because we believed for a long time that the recent reshuffling of the boot priorities in commit `660e34cebf` fixed this platform. Unfortunately that turned out to be incorrect. ] Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Link: http://lkml.kernel.org/r/1305248699-2347-1-git-send-email-daniel.blueman@gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>	2011-07-21 11:45:49 -07:00
Linus Torvalds	ad21b11577	Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6 * 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6: drm/i915: Fix unfenced alignment on pre-G33 hardware drm/i915: Add quirk to disable SSC on Lenovo U160 LVDS	2011-07-21 11:07:18 -07:00
Linus Torvalds	b91da88fed	vfs: drop conditional inode prefetch in __do_lookup_rcu It seems to hurt performance in real life. Yes, the inode will be used later, but the conditional doesn't seem to predict all that well (negative dentries are not uncommon) and it looks like the cost of prefetching is simply higher than depending on the cache doing the right thing. As usual. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-21 11:01:42 -07:00
Jan Beulich	b307d4655a	FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop The compiler, at least for ix86 and m68k, validly warns that the comparison: next <= (loff_t)-1 is always true (and it's always true also for x86-64 and probably all other arches - as long as pgoff_t isn't wider than loff_t). The intention appears to be to avoid wrapping of "next", so rather than eliminating the pointless comparison, fix the loop to indeed get exited when "next" would otherwise wrap. On m68k the following warning is observed: fs/fscache/page.c: In function '__fscache_uncache_all_inode_pages': fs/fscache/page.c:979: warning: comparison is always false due to limited range of data type Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-21 10:59:16 -07:00
Pavel Shilovsky	2cebaa58b7	CIFS: Fix wrong length in cifs_iovec_read Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-21 00:48:05 +00:00
Linus Torvalds	cf6ace16a3	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: signal: align __lock_task_sighand() irq disabling and RCU softirq,rcu: Inform RCU of irq_exit() activity sched: Add irq_{enter,exit}() to scheduler_ipi() rcu: protect __rcu_read_unlock() against scheduler-using irq handlers rcu: Streamline code produced by __rcu_read_unlock() rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special rcu: decrease rcu_report_exp_rnp coupling with scheduler	2011-07-20 15:56:25 -07:00
Linus Torvalds	acc11eab70	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Avoid creating superfluous NUMA domains on non-NUMA systems sched: Allow for overlapping sched_domain spans sched: Break out cpu_power from the sched_group structure	2011-07-20 15:55:48 -07:00
Linus Torvalds	919d25a710	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86. reboot: Make Dell Latitude E6320 use reboot=pci x86, doc only: Correct real-mode kernel header offset for init_size x86: Disable AMD_NUMA for 32bit for now	2011-07-20 15:33:59 -07:00
Ingo Molnar	d1e9ae47a0	Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/urgent	2011-07-20 20:59:26 +02:00
Paul E. McKenney	a841796f11	signal: align __lock_task_sighand() irq disabling and RCU The __lock_task_sighand() function calls rcu_read_lock() with interrupts and preemption enabled, but later calls rcu_read_unlock() with interrupts disabled. It is therefore possible that this RCU read-side critical section will be preempted and later RCU priority boosted, which means that rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but with interrupts disabled. This results in lockdep splats, so this commit nests the RCU read-side critical section within the interrupt-disabled region of code. This prevents the RCU read-side critical section from being preempted, and thus prevents the attempt to deboost with interrupts disabled. It is quite possible that a better long-term fix is to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-07-20 11:04:54 -07:00
Peter Zijlstra	ec433f0c51	softirq,rcu: Inform RCU of irq_exit() activity The rcu_read_unlock_special() function relies on in_irq() to exclude scheduler activity from interrupt level. This fails because exit_irq() can invoke the scheduler after clearing the preempt_count() bits that in_irq() uses to determine that it is at interrupt level. This situation can result in failures as follows: $task IRQ SoftIRQ rcu_read_lock() /* do stuff / <preempt> \|= UNLOCK_BLOCKED rcu_read_unlock() --t->rcu_read_lock_nesting irq_enter(); / do stuff, don't use RCU / irq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); invoke_softirq() ttwu(); spin_lock_irq(&pi->lock) rcu_read_lock(); / do stuff / rcu_read_unlock(); rcu_read_unlock_special() rcu_report_exp_rnp() ttwu() spin_lock_irq(&pi->lock) / deadlock */ rcu_read_unlock_special(t); Ed can simply trigger this 'easy' because invoke_softirq() immediately does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff first, but even without that the above happens. Cure this by also excluding softirqs from the rcu_read_unlock_special() handler and ensuring the force_irqthreads ksoftirqd/# wakeup is done from full softirq context. [ Alternatively, delaying the ->rcu_read_lock_nesting decrement until after the special handling would make the thing more robust in the face of interrupts as well. And there is a separate patch for that. ] Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-07-20 10:50:12 -07:00
Peter Zijlstra	c5d753a55a	sched: Add irq_{enter,exit}() to scheduler_ipi() Ensure scheduler_ipi() calls irq_{enter,exit} when it does some actual work. Traditionally we never did any actual work from the resched IPI and all magic happened in the return from interrupt path. Now that we do do some work, we need to ensure irq_{enter,exit} are called so that we don't confuse things. This affects things like timekeeping, NO_HZ and RCU, basically everything with a hook in irq_enter/exit. Explicit examples of things going wrong are: sched_clock_cpu() -- has a callback when leaving NO_HZ state to take a new reading from GTOD and TSC. Without this callback, time is stuck in the past. RCU -- needs in_irq() to work in order to avoid some nasty deadlocks Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-07-20 10:50:11 -07:00
Paul E. McKenney	10f39bb1b2	rcu: protect __rcu_read_unlock() against scheduler-using irq handlers The addition of RCU read-side critical sections within runqueue and priority-inheritance lock critical sections introduced some deadlock cycles, for example, involving interrupts from __rcu_read_unlock() where the interrupt handlers call wake_up(). This situation can cause the instance of __rcu_read_unlock() invoked from interrupt to do some of the processing that would otherwise have been carried out by the task-level instance of __rcu_read_unlock(). When the interrupt-level instance of __rcu_read_unlock() is called with a scheduler lock held from interrupt-entry/exit situations where in_irq() returns false, deadlock can result. This commit resolves these deadlocks by using negative values of the per-task ->rcu_read_lock_nesting counter to indicate that an instance of __rcu_read_unlock() is in flight, which in turn prevents instances from interrupt handlers from doing any special processing. This patch is inspired by Steven Rostedt's earlier patch that similarly made __rcu_read_unlock() guard against interrupt-mediated recursion (see https://lkml.org/lkml/2011/7/15/326), but this commit refines Steven's approach to avoid the need for preemption disabling on the __rcu_read_unlock() fastpath and to also avoid the need for manipulating a separate per-CPU variable. This patch avoids need for preempt_disable() by instead using negative values of the per-task ->rcu_read_lock_nesting counter. Note that nested rcu_read_lock()/rcu_read_unlock() pairs are still permitted, but they will never see ->rcu_read_lock_nesting go to zero, and will therefore never invoke rcu_read_unlock_special(), thus preventing them from seeing the RCU_READ_UNLOCK_BLOCKED bit should it be set in ->rcu_read_unlock_special. This patch also adds a check for ->rcu_read_unlock_special being negative in rcu_check_callbacks(), thus preventing the RCU_READ_UNLOCK_NEED_QS bit from being set should a scheduling-clock interrupt occur while __rcu_read_unlock() is exiting from an outermost RCU read-side critical section. Of course, __rcu_read_unlock() can be preempted during the time that ->rcu_read_lock_nesting is negative. This could result in the setting of the RCU_READ_UNLOCK_BLOCKED bit after __rcu_read_unlock() checks it, and would also result it this task being queued on the corresponding rcu_node structure's blkd_tasks list. Therefore, some later RCU read-side critical section would enter rcu_read_unlock_special() to clean up -- which could result in deadlock if that critical section happened to be in the scheduler where the runqueue or priority-inheritance locks were held. This situation is dealt with by making rcu_preempt_note_context_switch() check for negative ->rcu_read_lock_nesting, thus refraining from queuing the task (and from setting RCU_READ_UNLOCK_BLOCKED) if we are already exiting from the outermost RCU read-side critical section (in other words, we really are no longer actually in that RCU read-side critical section). In addition, rcu_preempt_note_context_switch() invokes rcu_read_unlock_special() to carry out the cleanup in this case, which clears out the ->rcu_read_unlock_special bits and dequeues the task (if necessary), in turn avoiding needless delay of the current RCU grace period and needless RCU priority boosting. It is still illegal to call rcu_read_unlock() while holding a scheduler lock if the prior RCU read-side critical section has ever had either preemption or irqs enabled. However, the common use case is legal, namely where then entire RCU read-side critical section executes with irqs disabled, for example, when the scheduler lock is held across the entire lifetime of the RCU read-side critical section. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-07-20 10:50:11 -07:00
Peter Zijlstra	d110235d2c	sched: Avoid creating superfluous NUMA domains on non-NUMA systems When creating sched_domains, stop when we've covered the entire target span instead of continuing to create domains, only to later find they're redundant and throw them away again. This avoids single node systems from touching funny NUMA sched_domain creation code and reduces the risks of the new SD_OVERLAP code. Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Anton Blanchard <anton@samba.org> Cc: mahesh@linux.vnet.ibm.com Cc: benh@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1311180177.29152.57.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-20 18:54:33 +02:00
Peter Zijlstra	e3589f6c81	sched: Allow for overlapping sched_domain spans Allow for sched_domain spans that overlap by giving such domains their own sched_group list instead of sharing the sched_groups amongst each-other. This is needed for machines with more than 16 nodes, because sched_domain_node_span() will generate a node mask from the 16 nearest nodes without regard if these masks have any overlap. Currently sched_domains have a sched_group that maps to their child sched_domain span, and since there is no overlap we share the sched_group between the sched_domains of the various CPUs. If however there is overlap, we would need to link the sched_group list in different ways for each cpu, and hence sharing isn't possible. In order to solve this, allocate private sched_groups for each CPU's sched_domain but have the sched_groups share a sched_group_power structure such that we can uniquely track the power. Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-08bxqw9wis3qti9u5inifh3y@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-20 18:32:41 +02:00
Peter Zijlstra	9c3f75cbd1	sched: Break out cpu_power from the sched_group structure In order to prepare for non-unique sched_groups per domain, we need to carry the cpu_power elsewhere, so put a level of indirection in. Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-qkho2byuhe4482fuknss40ad@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-20 18:32:40 +02:00
Linus Torvalds	e6625fa48e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix file mode calculation	2011-07-19 22:10:28 -07:00
Linus Torvalds	47126d807a	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: davinci: DM365 EVM: fix video input mux bits ARM: davinci: Check for NULL return from irq_alloc_generic_chip arm: davinci: Fix low level gpio irq handlers' argument	2011-07-19 22:10:05 -07:00
Shaohua Li	4746efded8	vmscan: fix a livelock in kswapd I'm running a workload which triggers a lot of swap in a machine with 4 nodes. After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or kswapd2 are keeping running and I can't access filesystem, but most memory is free. This looks like a regression since commit `08951e5459` ("mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely"). Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if all zones have watermark ok, end_zone will keep 0. Later sleeping_prematurely() always returns true. Because this is an order 3 wakeup, and if classzone_idx is 0, both balanced_pages and present_pages in pgdat_balanced() are 0. We add a special case here. If a zone has no page, we think it's balanced. This fixes the livelock. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-19 22:09:31 -07:00
Akinobu Mita	f7b88631a8	fs/libfs.c: fix simple_attr_write() on 32bit machines Assume that /sys/kernel/debug/dummy64 is debugfs file created by debugfs_create_x64(). # cd /sys/kernel/debug # echo 0x1234567812345678 > dummy64 # cat dummy64 0x0000000012345678 # echo 0x80000000 > dummy64 # cat dummy64 0xffffffff80000000 A value larger than INT_MAX cannot be written to the debugfs file created by debugfs_create_u64 or debugfs_create_x64 on 32bit machine. Because simple_attr_write() uses simple_strtol() for the conversion. To fix this, use simple_strtoll() instead. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-19 22:09:30 -07:00

1 2 3 4 5 ...

255063 Commits