* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
NOHZ: reevaluate idle sleep length after add_timer_on()
clocksource: revert: use init_timer_deferrable for clocksource_watchdog
The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly
when the last chunk in the read-list spanned multiple pages. This
resulted in a kernel panic when the wrong context was used to
build the RPC iovec page list.
RDMA_READ is used to fetch RPC data from the client for
NFS_WRITE requests. A scatter-gather is used to map the
advertised client side buffer to the server-side iovec and
associated page list.
WR contexts are used to convey which scatter-gather entries are
handled by each WR. When the write data is large, a single RPC may
require multiple RDMA_READ requests so the contexts for a single RPC
are chained together in a linked list. The last context in this list
is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes,
the CQ handler code can enqueue the RPC for processing.
The code in rdma_read_xdr was setting this bit on the last two
contexts on this list when the last read-list chunk spanned multiple
pages. This caused the svc_rdma_recvfrom logic to incorrectly build
the RPC and caused the kernel to crash because the second-to-last
context doesn't contain the iovec page list.
Modified the condition that sets this bit so that it correctly detects
the last context for the RPC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tested-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 8fa5913d54, which
caused various interesting problems for people, including wrong resource
allocations. See for example bugzilla entry "2.6.25-rc2: ohci1394
problem (MMIO broken)" at
http://bugzilla.kernel.org/show_bug.cgi?id=10080
And Gary Hade says:
"The same change had also exposed an issue reported by Paul Martin that
has been causing an Oops while hotplugging ThinkPads to a ThinkPad
Dock II. See
http://lkml.org/lkml/2008/2/19/405http://bugzilla.kernel.org/show_bug.cgi?id=9961
I have a fix for the ThinkPad docking Oops but if the issue being
discussed here is caused by the transparent bridge sizing removal
change I totally agree that it should be reverted."
The transparent bridge sizing removal change was motivated by
insufficient PCI memory resource for a transparent bridge window that
was being created as a result of expansion ROM(s) being included in
the transparent bridge sizing calculations.
A later "PCI: Remove default PCI expansion ROM memory allocation"
change ( re: http://lkml.org/lkml/2007/12/11/361 ) removes the
expansion ROM(s) from the transparent bridge sizing calculations which
actually resolves the original issue in a different manner. So, even
if the "PCI: remove transparent bridge sizing" is not problematic it
is no longer needed anyway."
Identified-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Tested-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have been printing these messages at KERN_ERR since 2.6.24,
per http://bugzilla.kernel.org/show_bug.cgi?id=9535
But KERN_ERR pops up on a console booted with "quiet"
and causes users to get alarmed and file bugs
about the message itself:
https://bugzilla.redhat.com/show_bug.cgi?id=436589
So reduce the severity of these messages to
KERN_WARNING, which is not printed by "quiet".
This message will still be seen without "quiet",
but a lot of messages are printed in that mode
and it will be less likely to cause undue alarm.
We could go all the way to KERN_DEBUG, but this
is a real warning after all, so it seems prudent
not to require "debug" to see it.
Signed-off-by: Len Brown <len.brown@intel.com>
Commit 556a169dab ("slab: fix bootstrap on
memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
patch fixes list_add() corruption in mm/slab.c seen on the ES7000.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Avoid warnings about unused functions if neither SLUB_DEBUG nor CONFIG_SLABINFO
is defined. This patch will be reversed when slab defrag is merged since slab
defrag requires count_partial() to determine the fragmentation status of
slab caches.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
relay doesn't reference the pages it adds, however we need a non-NULL
hook or splice_to_pipe() can oops.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
I found that relay files can be read by pread(2). I fix it,
for relay files are not capable of seeking.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The unlazy_fpu() path calls in to save_fpu() if the task has
TIF_USEDFPU set. save_fpu() being the crap API that it is has the side
effect of clearing the flag itself, which presently doesn't happen
if we're using FPU emulation. Fix this up for now, pending an overhaul
in 2.6.26.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Presently with preempt enabled there's the possibility to be preempted
after the TIF_USEDFPU test and the register save, leading to bogus
state post-__switch_to(). Use an explicit preempt_disable()/enable()
pair around unlazy_fpu()/clear_fpu() to avoid this. Follows the x86
change.
Reported-by: Takuo Koguchi <takuo.koguchi.sw@hitachi.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Commit 8b7817f3a9 ([IPSEC]: Add ICMP host
relookup support) introduced some dst leaks on error paths: the rt
pointer can be forgotten to be put. Fix it bu going to a proper label.
Found after net namespace's lo refused to unregister :) Many thanks to
Den for valuable help during debugging.
Herbert pointed out, that xfrm_lookup() will put the rtable in case
of error itself, so the first goto fix is redundant.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given that there are no apparent calls to lock_kernel() or
unlock_kernel() under net/ax25, delete the TODO reference related to
that.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list
method to determine whether it supports multicast. Drivers implementing
secondary unicast support use set_rx_mode however.
Check for both dev->set_multicast_mode and dev->set_rx_mode to determine
multicast capabilities.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse still doesn't like the funny cast we make from a scalar to a
"union semun" (which is correct by the C language and in particular
works with the sparc64 calling conventions, but sparse doesn't grok
that yet).
Signed-off-by: David S. Miller <davem@davemloft.net>
It should be a "struct ktermios" not a "struct termios".
Based upon a build warning reported by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
add_timer_on() can add a timer on a CPU which is currently in a long
idle sleep, but the timer wheel is not reevaluated by the nohz code on
that CPU. So a timer can be delayed for quite a long time. This
triggered a false positive in the clocksource watchdog code.
To avoid this we need to wake up the idle CPU and enforce the
reevaluation of the timer wheel for the next timer event.
Add a function, which checks a given CPU for idle state, marks the
idle task with NEED_RESCHED and sends a reschedule IPI to notify the
other CPU of the change in the timer wheel.
Call this function from add_timer_on().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
--
include/linux/sched.h | 6 ++++++
kernel/sched.c | 43 +++++++++++++++++++++++++++++++++++++++++++
kernel/timer.c | 10 +++++++++-
3 files changed, 58 insertions(+), 1 deletion(-)
Add 'UL' markers to DCU_* macros.
Declare C functions called from assembler in entry.h
Declare C functions called from within the sparc64 arch
code in include/asm-sparc64/*.h headers as appropriate.
Remove unused routines in traps.c
Signed-off-by: David S. Miller <davem@davemloft.net>
IFF_ALLMULTI is an indication from the network stack to the driver
to disable multicast filters, drivers should never set it directly.
Since the UML networking device doesn't have any filtering capabilites,
it doesn't the set_multicast_list function at all, it is kept so userspace
can still issue SIOCADDMULTI/SIOCDELMULTI ioctls however.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity
or dev_change_flags. Setting it directly causes two unwanted effects:
- the next dev_change_flags call will notice a difference between
dev->gflags and the actual flags, enable promisc/allmulti
mode and incorrectly update dev->gflags
- this keeps the underlying device in promisc/allmulti mode until
the VLAN device is deleted
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 9b12e18cdc
'ACPI: cpuidle: Support C1 idle time accounting'
was implicated in a 100% C0 idle regression.
http://bugzilla.kernel.org/show_bug.cgi?id=10076
It pointed out a potential problem where the menu governor
may get confused by the C-state residency time from poll
idle or C1 idle, where this timing info is not accurate.
This inaccuracy is due to interrupts being handled
before we account for C-state exit.
Do not mark TIME_VALID for CO poll state.
Mark C1 time as valid only with the MWAIT (CSTATE_FFH) entry method.
This makes governors use the timing information only when it is correct and
eliminates any wrong policy decisions that may result from invalid timing
information.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
We create a local header file entry.h, under arch/sparc64/kernel/,
that we can use to declare routines either defined in assembler
or only invoked from assembler. As well as other data objects
which are private to the inner sparc64 kernel arch code.
Signed-off-by: David S. Miller <davem@davemloft.net>
cpuidle C-state sysfs node time and usage are very easy to overflow because
they are all of unsigned int type, time will overflow within about two hours,
usage will take longer time to overflow, but they are increasing for ever.
This patch will convert them to unsigned long long.
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This original patch
http://ussg.iu.edu/hypermail/linux/kernel/0712.2/1451.html
was intending to add acpi_unlazy_tlb() to acpi_idle_enter_bm(),
which is used for C3 entry.
But it was merged incorrectly as commmit
bde6f5f59c
'x86: voluntary leave_mm before entering ACPI C3'
so the call was instead added to acpi_idle_enter_simple()
(which is C2 entry routine), probably due to identical
context in that function.
Move the call back to acpi_idle_enter_bm().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Move them further from the main kernel image area
to facilitate larger kernel sizes.
Adjust comments to match.
Signed-off-by: David S. Miller <davem@davemloft.net>
- Handling TX completions on the same cpu as the sender.
Signed-off-by: Surjit Reang <surjit.reang@neterion.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Some ROMs on embedded devices store incorrect values for
the PHY address of the ethernet device.
It looks like the number is sign-extended.
Truncate the value by applying the PHY-address mask to it.
The patch was tested on a bcm47xx embedded system (where the bug
triggers) and a bcm4400 PCI card.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
According to: Documentation/networking/netdevices.txt:
<cite>
napi->poll:
..........
Context: softirq
will be called with interrupts disabled by netconsole.
</cite>
napi->poll() could be called either with interrupts enabled
(in softirq context) or disabled (by netconsole), so the irq flag
should be preserved.
Inspired by Ingo's resent forcedeth patch :-)
Signed-off-by: Marin Mitov <mitov@issp.bas.bg>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When query for OID_GEN_PHYSICAL_MEDIUM fails, uninitialized pointer
'phym' is being accessed in generic_rndis_bind(), resulting OOPS.
Patch fixes phym to be initialized and setup correctly when
rndis_query() for physical medium fails.
Bug was introduced by following commit:
commit 039ee17d1b
Author: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Date: Sun Jan 27 23:34:33 2008 +0200
Reported-by: Dmitri Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Using iWARP with a Chelsio T3 NIC generates the following lockdep warning:
=================================
[ INFO: inconsistent lock state ]
2.6.25-rc6 #50
---------------------------------
inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
(&adap->sge.reg_lock){-+..}, at: [<ffffffff880e5ee2>] cxgb_offload_ctl+0x3af/0x507 [cxgb3]
The problem is that reg_lock is used with plain spin_lock() in
drivers/net/cxgb3/sge.c but is used with spin_lock_irqsave() in
drivers/net/cxgb3/cxgb3_offload.c. This is technically a false
positive, since the uses in sge.c are only in the initialization and
cleanup paths and cannot overlap with any use in interrupt context.
The best fix is probably just to use spin_lock_irq() with reg_lock in
sge.c. Even though it's not strictly required for correctness, it
avoids triggering lockdep and the extra overhead of disabling
interrupts is not important at all in the initialization and cleanup
slow paths.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The Hirose USB-100 adapter uses a dm9601 chip.
Reported by Robert Brockway.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Marvell PHY m88e1111 (not sure about other models, but think they too)
works in two modes: fiber and copper. In Marvell PHY driver (that we
have in current community kernels) code supported only copper mode,
and this is not configurable, bits for copper mode are simply written
in registers during PHY initialization.
This patch adds support for both modes.
Signed-off-by: Alexandr Smirnov <asmirnov@ru.mvista.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Don't count rx dropped packets based on return value of netif_receive_skb(),
which is misleading.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Tested-by: Vernon Mauery <mauery@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
o eliminate tx lock in netxen adapter struct, instead pound on netdev
tx lock appropriately.
o remove old "concurrent transmit" code that unnecessarily drops and
reacquires tx lock in hard_xmit_frame(), this is already serialized
the netdev xmit lock.
o reduce scope of tx lock in tx cleanup. tx cleanup operates on
different section of the ring than transmitting cpus and is
guarded by producer and consumer indices. This fixes a race
caused by rx softirq preemption on realtime kernels.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Tested-by: Vernon Mauery <mauery@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
o separate and simpler irq handler for msi interrupts, avoids few checks
than legacy mode.
o avoid redudant tx_has_work() and rx_has_work() checks in interrupt
and napi, which can uncork irq based on racy (lockless) access to tx
and rx ring indices. If we get interrupt, there's sufficient reason to
schedule napi.
o replenish rx ring more often, remove self-imposed threshold rcv_free
that prevents posting rx desc to card. This improves performance in
low memory.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Tested-by: Vernon Mauery <mauery@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Recent netxen firmware has new scheme of generating MSI interrupts, it
raises interrupt and blocks itself, waiting for driver to unmask. This
reduces chance of spurious interrupts.
The driver will be able to deal with older firmware as well.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Tested-by: Vernon Mauery <mauery@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Deepak Saxena <dsaxena@plexity.net>
Cc: Nicolas Pitre <nico@cam.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>