Commit Graph

119871 Commits

Author SHA1 Message Date
john stultz
6c9bacb41c time: catch xtime_nsec underflows and fix them
Impact: fix time warp bug

Alex Shi, along with Yanmin Zhang have been noticing occasional time
inconsistencies recently. Through their great diagnosis, they found that
the xtime_nsec value used in update_wall_time was occasionally going
negative. After looking through the code for awhile, I realized we have
the possibility for an underflow when three conditions are met in
update_wall_time():

  1) We have accumulated a second's worth of nanoseconds, so we
     incremented xtime.tv_sec and appropriately decrement xtime_nsec.
     (This doesn't cause xtime_nsec to go negative, but it can cause it
      to be small).

  2) The remaining offset value is large, but just slightly less then
     cycle_interval.

  3) clocksource_adjust() is speeding up the clock, causing a
     corrective amount (compensating for the increase in the multiplier
     being multiplied against the unaccumulated offset value) to be
     subtracted from xtime_nsec.

This can cause xtime_nsec to underflow.

Unfortunately, since we notify the NTP subsystem via second_overflow()
whenever we accumulate a full second, and this effects the error
accumulation that has already occured, we cannot simply revert the
accumulated second from xtime nor move the second accumulation to after
the clocksource_adjust call without a change in behavior.

This leaves us with (at least) two options:

1) Simply return from clocksource_adjust() without making a change if we
   notice the adjustment would cause xtime_nsec to go negative.

This would work, but I'm concerned that if a large adjustment was needed
(due to the error being large), it may be possible to get stuck with an
ever increasing error that becomes too large to correct (since it may
always force xtime_nsec negative). This may just be paranoia on my part.

2) Catch xtime_nsec if it is negative, then add back the amount its
   negative to both xtime_nsec and the error.

This second method is consistent with how we've handled earlier rounding
issues, and also has the benefit that the error being added is always in
the oposite direction also always equal or smaller then the correction
being applied. So the risk of a corner case where things get out of
control is lessened.

This patch fixes bug 11970, as tested by Yanmin Zhang
http://bugzilla.kernel.org/show_bug.cgi?id=11970

Reported-by: alex.shi@intel.com
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 08:43:02 +01:00
Uwe Kleine-König
2cc002c4bb netx-eth: initialize per device spinlock
The spinlock used in the netx-eth driver was never properly initialized.
This was noticed using CONFIG_DEBUG_SPINLOCK=y

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 22:18:59 -08:00
Ilpo Järvinen
f8269a495a tcp: make urg+gso work for real this time
I should have noticed this earlier... :-) The previous solution
to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
patterns, or even worse, a steep downward slope into packet
counting because each skb pcount would be truncated to pcount
of 2 and then the following fragments of the later portion would
restore the window again.

Basically this reverts "tcp: Do not use TSO/GSO when there is
urgent data" (33cf71cee1). It also removes some unnecessary code
from tcp_current_mss that didn't work as intented either (could
be that something was changed down the road, or it might have
been broken since the dawn of time) because it only works once
urg is already written while this bug shows up starting from
~64k before the urg point.

The retransmissions already are split to mss sized chunks, so
only new data sending paths need splitting in case they have
a segment otherwise suitable for gso/tso. The actually check
can be improved to be more narrow but since this is late -rc
already, I'll postpone thinking the more fine-grained things.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 21:24:48 -08:00
Baruch Siach
5176da7e53 enc28j60: Fix sporadic packet loss (corrected again)
Packet data read from the RX buffer the when the RSV is at the end of the RX
buffer does not warp around. This causes packet loss, as the actual data is
never read. Fix this by calculating the right packet data location.

Thanks to Shachar Shemesh for suggesting the fix.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Claudio Lanconelli <lanconelli.claudio@eptar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 21:16:06 -08:00
Pascal Terjan
bd0914104c hysdn: fix writing outside the field on 64 bits
ifa_local is assumed to be unsigned long which lead to writing the address
at dev->dev_addr-2 instead of +2

noticed thanks to gcc:

drivers/isdn/hysdn/hysdn_net.c: In function `net_open':
drivers/isdn/hysdn/hysdn_net.c:91: warning: array subscript is below array bounds

Signed-off-by: Pascal Terjan <pterjan@mandriva.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 21:01:28 -08:00
Wilfried Klaebe
1c594c05a7 b1isa: fix b1isa_exit() to really remove registered capi controllers
On "/etc/init.d/capiutils stop", this oops happened.

The oops happens on reading /proc/capi/controllers because
capi_ctrl->procinfo is called for the wrongly not unregistered
controller, which points to b1isa_procinfo(), which was removed on
module unload.

b1isa_exit() did not call b1isa_remove() for its controllers because
io[0] == 0 on module unload despite having been 0x340 on module load.

Besides, just removing the controllers that where added on module
load time and not those that were added later via b1isa_add_card() is
wrong too - the place where all added cards are found is isa_dev[].

relevant dmesg lines:

[    0.000000] Linux version 2.6.27.4 (w@shubashi) (gcc version 4.3.2 (Debian 4.3.2-1) ) #3 Thu Oct 30 16:49:03 CET 2008

[   67.403555] CAPI Subsystem Rev 1.1.2.8
[   68.529154] capifs: Rev 1.1.2.3
[   68.563292] capi20: Rev 1.1.2.7: started up with major 68 (middleware+capifs)
[   77.026936] b1: revision 1.1.2.2
[   77.049992] b1isa: revision 1.1.2.3
[   77.722655] kcapi: Controller [001]: b1isa-340 attached
[   77.722671] b1isa: AVM B1 ISA at i/o 0x340, irq 5, revision 255
[   81.272669] b1isa-340: card 1 "B1" ready.
[   81.272683] b1isa-340: card 1 Protocol: DSS1
[   81.272689] b1isa-340: card 1 Linetype: point to multipoint
[   81.272695] b1isa-340: B1-card (3.11-03) now active
[   81.272702] kcapi: card [001] "b1isa-340" ready.

[  153.721281] kcapi: card [001] down.
[  154.151889] BUG: unable to handle kernel paging request at e87af000
[  154.152081] IP: [<e87af000>]
[  154.153292] *pde = 2655b067 *pte = 00000000
[  154.153307] Oops: 0000 [#1]
[  154.153360] Modules linked in: rfcomm l2cap ppdev lp ipt_MASQUERADE tun capi capifs kernelcapi ac battery nfsd exportfs nfs lockd nfs_acl sunrpc sit tunnel4 bridge stp llc ipt_REJECT ipt_LOG xt_tcpudp xt_state iptable_filter iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables nls_utf8 isofs nls_base zlib_inflate loop ipv6 netconsole snd_via82xx dvb_usb_dib0700 gameport dib7000p dib7000m dvb_usb snd_ac97_codec ac97_bus dvb_core mt2266 snd_pcm tuner_xc2028 dib3000mc dibx000_common mt2060 dib0070 snd_page_alloc snd_mpu401_uart snd_seq_midi snd_seq_midi_event btusb snd_rawmidi bluetooth snd_seq snd_timer snd_seq_device snd via686a i2c_viapro soundcore i2c_core parport_pc parport button dm_mirror dm_log dm_snapshot floppy sg ohci1394 uhci_hcd ehci_hcd 8139too mii ieee1394 usbcore sr_mod cdrom sd_mod thermal processor fan [last unloaded: b1]
[  154.153360]
[  154.153360] Pid: 4132, comm: capiinit Not tainted (2.6.27.4 #3)
[  154.153360] EIP: 0060:[<e87af000>] EFLAGS: 00010286 CPU: 0
[  154.153360] EIP is at 0xe87af000
[  154.153360] EAX: e6b9ccc8 EBX: e6b9ccc8 ECX: e87a0c67 EDX: e87af000
[  154.153360] ESI: e142bbc0 EDI: e87a56e0 EBP: e0505f0c ESP: e0505ee4
[  154.153360]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  154.153360] Process capiinit (pid: 4132, ti=e0504000 task=d1196cf0 task.ti=e0504000)
[  154.153360] Stack: e879f650 00000246 e0505ef4 c01472eb e0505f0c 00000246 e7001780 fffffff4
[  154.153360]        fffffff4 e142bbc0 e0505f48 c01a56c6 00000400 b805e000 d102dc80 e142bbe0
[  154.153360]        00000000 e87a56e0 00000246 e12617ac 00000000 00000000 e1261760 fffffffb
[  154.153360] Call Trace:
[  154.153360]  [<e879f650>] ? controller_show+0x20/0x90 [kernelcapi]
[  154.153360]  [<c01472eb>] ? trace_hardirqs_on+0xb/0x10
[  154.153360]  [<c01a56c6>] ? seq_read+0x126/0x2f0
[  154.153360]  [<c01a55a0>] ? seq_read+0x0/0x2f0
[  154.153360]  [<c01c033c>] ? proc_reg_read+0x5c/0x90
[  154.153360]  [<c0189919>] ? vfs_read+0x99/0x140
[  154.153360]  [<c01c02e0>] ? proc_reg_read+0x0/0x90
[  154.153360]  [<c0189a7d>] ? sys_read+0x3d/0x70
[  154.153360]  [<c0103c3d>] ? sysenter_do_call+0x12/0x35
[  154.153360]  =======================
[  154.153360] Code:  Bad EIP value.
[  154.153360] EIP: [<e87af000>] 0xe87af000 SS:ESP 0068:e0505ee4
[  154.153360] ---[ end trace 23750b6c2862de94 ]---

Signed-off-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 20:57:19 -08:00
Joseph Myers
726c12f57d sparc64: Fix VIS emulation bugs
This patch fixes some bugs in VIS emulation that cause the GCC test
failure

FAIL: gcc.target/sparc/pdist-3.c execution test

for both 32-bit and 64-bit testing on hardware lacking these
instructions.  The emulation code for the pdist instruction uses
RS1(insn) for both source registers rs1 and rs2, which is obviously
wrong and leads to the instruction doing nothing (the observed
problem), and further inspection of the code shows that RS1 uses a
shift of 24 and RD a shift of 25, which clearly cannot both be right;
examining SPARC documentation indicates the correct shift for RS1 is
14.

This patch fixes the bug if single-stepping over the affected
instruction in the debugger, but not if the testcase is run
standalone.  For that, Wind River has another patch I hope they will
send as a followup to this patch submission.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 19:36:05 -08:00
Eric Anholt
0235439232 drm/i915: Return error in i915_gem_set_to_gtt_domain if we're not in the GTT.
It's only for flushing caches appropriately for GTT access, not for actually
getting it there.  Prevents potential smashing of cpu read/write domains on
unbound objects.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:24:47 +10:00
Keith Packard
ac94a962b2 drm/i915: Retry execbuffer pinning after clearing the GTT
If we fail to pin all of the buffers in an execbuffer request, go through
and clear the GTT and try again to see if its just a matter of fragmentation

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:22:06 +10:00
Keith Packard
646f0f6e43 drm/i915: Move the execbuffer domain computations together
This eliminates the dev_set_domain function and just in-lines it
where its used, with the goal of moving the manipulation and use of
invalidate_domains and flush_domains closer together. This also
avoids calling add_request unless some domain has been flushed.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:22:02 +10:00
Keith Packard
c0d9082928 drm/i915: Rename object_set_domain to object_set_to_gpu_domain
Now that the CPU and GTT domain operations are isolated to their own
functions, the previously general-purpose set_domain function is now used
only to set GPU domains. It also has no failure cases, which is important as
this eliminates any possible interruption of the computation of new object
domains and subsequent emmission of the flushing instructions into the ring.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:21:58 +10:00
Eric Anholt
e47c68e9c5 drm/i915: Make a single set-to-cpu-domain path and use it wherever needed.
This fixes several domain management bugs, including potential lack of cache
invalidation for pread, potential failure to wait for set_domain(CPU, 0),
and more, along with producing more intelligible code.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:21:55 +10:00
Eric Anholt
2ef7eeaa55 drm/i915: Make a single set-to-gtt-domain path.
This fixes failure to flush caches in the relocation update path, and
failure to wait in the set_domain ioctl, each of which could lead to incorrect
rendering.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:21:52 +10:00
Eric Anholt
b670d81582 drm/i915: If interrupted while setting object domains, still emit the flush.
Otherwise, we would leave the objects in an inconsistent state, such as
write_domain == 0 but on the flushing list.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:21:48 +10:00
Eric Anholt
ce44b0ea3d drm/i915: Move flushing list cleanup from flush request retire to request emit.
obj_priv->write_domain is "write domain if the GPU went idle now", not
"write domain at this moment."  By postponing the clear, we confused the
concept, required more storage, and potentially emitted more flushes than
are required.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:21:45 +10:00
Eric Anholt
a7f014f2de drm/i915: Respect GM965/GM45 bit-17-instead-of-bit-11 option for swizzling.
This fixes readpixels and buffer corruption when swapped out and in by
disabling tiling on them.

Now that we know that the bit 17 mode isn't just a mistake of older chipsets,
we'll need to work on a clever fix so that we can get the performance of
tiling on these chipsets, but that will require intrusive changes targeted
at the next kernel release, not this one.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-12-04 11:21:41 +10:00
David Howells
004b50f4ed MN10300: Introduce barriers to replace removed volatiles in gdbstub 16550 driver
Introduce into the MN10300 gdbstub 16550 driver a couple of barrier() calls to
replace the removed volatility of the input/output index variables for the Rx
ring buffer.  A previous patch added them into the on-chip serial port driver.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-03 16:47:40 -08:00
James Morris
8711cca225 MAINTAINERS: Add security subsystem maintainer
Add myself as overall maintainer of the security subsystem (generally,
components under the top-level security directory).  This addresses
the lack of an official maintainer for the increasing number of
security projects being incorporated into the kernel.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-03 16:47:40 -08:00
Linus Torvalds
feaf3848a8 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: fix setting of max_segment_size and seg_boundary mask
  block: internal dequeue shouldn't start timer
  block: set disk->node_id before it's being used
  When block layer fails to map iov, it calls bio_unmap_user to undo
2008-12-03 16:45:56 -08:00
Linus Torvalds
a771132783 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc/83xx: Fix MCU support merge issue in mpc8349emitx.dts
  powerpc: Fix dma_map_sg() cache flushing on non coherent platforms
2008-12-03 16:41:15 -08:00
Linus Torvalds
2433c41789 Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
  NLM: client-side nlm_lookup_host() should avoid matching on srcaddr
  nfsd: use of unitialized list head on error exit in nfs4recover.c
  Add a reference to sunrpc in svc_addsock
  nfsd: clean up grace period on early exit
2008-12-03 16:40:37 -08:00
Linus Torvalds
cd92a17eec iTCO_wdt: fix typo when setting TCO_EN bit
The code used '&= 0x00002000' when it tried to set the TCO_EN bit, which
obviously didn't set that bit at all, but instead just reset all the
other bits in the SMI_EN register.

This bug seemingly caused various random behavior, with Frans Pop
reporting that X.org just silently hung at startup and Rafael Wysocki
reports the fan spinning with full speed.

See
	http://lkml.org/lkml/2008/12/3/178
	http://bugzilla.kernel.org/show_bug.cgi?id=12162

The problem seems to have been triggered by "[WATCHDOG] iTCO_wdt :
problem with rebooting on new ICH9 based motherboards" (commit
7cd5b08be3), but the bogus code existed
before that too (in the "supermicro_old_pre_stop()" function), it just
apparently never showed up due to different logic.

In that commit the broken code got moved around and now gets executed
much more.

Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Frans Pop <elendil@planet.nl>
Cc: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-03 16:20:19 -08:00
Rusty Russell
e8e8e80ee0 sparc: asm/bitops.h should define __fls
bitops_64.h includes the generic one; pretty sure 32 should too.

(Found by using __fls in generic code and breaking sparc defconfig build:
 thanks Stephen and linux-next!)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 16:04:52 -08:00
Oliver Hartkopp
d253eee201 can: Fix CAN_(EFF|RTR)_FLAG handling in can_filter
Due to a wrong safety check in af_can.c it was not possible to filter
for SFF frames with a specific CAN identifier without getting the
same selected CAN identifier from a received EFF frame also.

This fix has a minimum (but user visible) impact on the CAN filter
API and therefore the CAN version is set to a new date.

Indeed the 'old' API is still working as-is. But when now setting
CAN_(EFF|RTR)_FLAG in can_filter.can_mask you might get less traffic
than before - but still the stuff that you expected to get for your
defined filter ...

Thanks to Kurt Van Dijck for pointing at this issue and for the review.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 15:52:35 -08:00
remi.denis-courmont@nokia
bd7df21920 Phonet: do not dump addresses from other namespaces
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 15:42:09 -08:00
Ingo Molnar
66a05d6b47 Merge branch 'oprofile-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into x86/urgent 2008-12-03 18:52:46 +01:00
William Cohen
3d337c653c x86/oprofile: fix Intel cpu family 6 detection
Alan Jenkins wrote:
> This is on an EeePC 701, /proc/cpuinfo as attached.
>
> Is this expected?  Will the next release work?
>
> Thanks, Alan
>
> # opcontrol --setup --no-vmlinux
> cpu_type 'unset' is not valid
> you should upgrade oprofile or force the use of timer mode
>
> # opcontrol -v
> opcontrol: oprofile 0.9.4 compiled on Nov 29 2008 22:44:10
>
> # cat /dev/oprofile/cpu_type
> i386/p6
> # uname -r
> 2.6.28-rc6eeepc

Hi Alan,

Looking at the kernel driver code for oprofile it can return the "i386/p6" for
the cpu_type. However, looking at the user-space oprofile code there isn't the
matching entry in libop/op_cpu_type.c or the events/unit_mask files in
events/i386 directory.

The Intel AP-485 says this is a "Intel Pentium M processor model D". Seems like
the oprofile kernel driver should be identifying the processor as "i386/p6_mobile"

The driver identification code doesn't look quite right in nmi_init.c

http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=blob;f=arch/x86/oprofile/nmi_int.c;h=022cd41ea9b4106e5884277096e80e9088a7c7a9;hb=HEAD

has:

409         case 10 ... 13:
410                 *cpu_type = "i386/p6";
411                 break;

Referring to the Intel AP-485:
case 10 and 11 should produce "i386/piii"
case 13 should produce "i386/p6_mobile"

I didn't see anything for case 12.

Something like the attached patch. I don't have a celeron machine to verify that
changes in this area of the kernel fix thing.

-Will

Signed-off-by: William Cohen <wcohen@redhat.com>
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-03 17:17:17 +01:00
Anton Vorontsov
dafdb61313 powerpc/83xx: Fix MCU support merge issue in mpc8349emitx.dts
Just found the merge issue in 442746989d
("powerpc/83xx: Add support for MCU microcontroller in .dts files"):
the commit adds the MCU controller node into the DMA node, which is
wrong because the MCU sits on the I2C bus. Fix this by moving the MCU
node into the I2C controller node.

The original patch[1] was OK though. ;-)

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-03 09:56:02 -06:00
Eric Dumazet
9ea84ad77d oprofile: fix CPU unplug panic in ppro_stop()
If oprofile statically compiled in kernel, a cpu unplug triggers
a panic in ppro_stop(), because a NULL pointer is dereferenced.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-03 15:58:51 +01:00
Milan Broz
0e435ac26e block: fix setting of max_segment_size and seg_boundary mask
Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
devices.

When stacking devices (LVM over MD over SCSI) some of the request queue
parameters are not set up correctly in some cases by default, namely
max_segment_size and and seg_boundary mask.

If you create MD device over SCSI, these attributes are zeroed.

Problem become when there is over this mapping next device-mapper mapping
- queue attributes are set in DM this way:

request_queue   max_segment_size  seg_boundary_mask
SCSI                65536             0xffffffff
MD RAID1                0                      0
LVM                 65536                 -1 (64bit)

Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
physical segments according to these parameters.

During the generic_make_request() is segment cout recalculated and can
increase bio->bi_phys_segments count over the allowed limit.  (After
bio_clone() in stack operation.)

Thi is specially problem in CCISS driver, where it produce OOPS here

    BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);

(MAXSEGENTRIES is 31 by default.)

Sometimes even this command is enough to cause oops:

  dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10

This command generates bios with 250 sectors, allocated in 32 4k-pages
(last page uses only 1024 bytes).

For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
unfortunatelly on lower layer it is recalculated to 32 segments and this
violates CCISS restriction and triggers BUG_ON().

The patch tries to fix it by:

 * initializing attributes above in queue request constructor
   blk_queue_make_request()

 * make sure that blk_queue_stack_limits() inherits setting

 (DM uses its own function to set the limits because it
 blk_queue_stack_limits() was introduced later.  It should probably switch
 to use generic stack limit function too.)

 * sets the default seg_boundary value in one place (blkdev.h)

 * use this mask as default in DM (instead of -1, which differs in 64bit)

Bugs related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=471639
http://bugzilla.kernel.org/show_bug.cgi?id=8672

Signed-off-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-03 12:55:55 +01:00
Ingo Molnar
c36910c147 Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2008-12-03 12:54:45 +01:00
Tejun Heo
53a08807c0 block: internal dequeue shouldn't start timer
blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
both start the timeout timer.  Barrier code dequeues the original
barrier request but doesn't passes the request itself to lower level
driver, only broken down proxy requests; however, as the original
barrier code goes through the same dequeue path and timeout timer is
started on it.  If barrier sequence takes long enough, this timer
expires but the low level driver has no idea about this request and
oops follows.

Timeout timer shouldn't have been started on the original barrier
request as it never goes through actual IO.  This patch unexports
elv_dequeue_request(), which has no external user anyway, and makes it
operate on elevator proper w/o adding the timer and make
blkdev_dequeue_request() call elv_dequeue_request() and add timer.
Internal users which don't pass the request to driver - barrier code
and end_that_request_last() - are converted to use
elv_dequeue_request().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-03 12:41:26 +01:00
Cheng Renquan
bf91db18ac block: set disk->node_id before it's being used
disk->node_id will be refered in allocating in disk_expand_part_tbl, so we
should set it before disk->node_id is refered.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-03 12:41:20 +01:00
Petr Vandrovec
53cc0b2948 When block layer fails to map iov, it calls bio_unmap_user to undo
mapping.  Which is good if pages were mapped - but if they were provided
by someone else and just copied then bad things happen - pages are
released once here, and once by caller, leading to user triggerable BUG
at include/linux/mm.h:246.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-03 12:41:20 +01:00
Joerg Roedel
09ee17eb8e AMD IOMMU: fix possible race while accessing iommu->need_sync
The access to the iommu->need_sync member needs to be protected by the
iommu->lock. Otherwise this is a possible race condition. Fix it with
this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-03 12:20:46 +01:00
Joerg Roedel
f91ba19064 AMD IOMMU: set device table entry for aliased devices
In some rare cases a request can arrive an IOMMU with its originial
requestor id even it is aliased. Handle this by setting the device table
entry to the same protection domain for the original and the aliased
requestor id.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-03 12:20:46 +01:00
Richard Kennedy
eac9fbc6a9 AMD IOMMU: struct amd_iommu remove padding on 64 bit
Remove 16 bytes of padding from struct amd_iommu on 64bit builds
reducing its size to 120 bytes, allowing it to span one fewer
cachelines.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-03 12:20:46 +01:00
Denis V. Lunev
e93f1be503 [MTD] [NAND] fix OOPS accessing flash operations over STM flash on PXA
STM 2Gb flash is a large-page NAND flash.  Set operations accordingly.
This field is dereferenced without a check in several places resulting in
OOPS.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Eric Miao <ymiao3@marvell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-12-03 10:47:20 +00:00
Roel Kluin
201955463a check_hung_task(): unsigned sysctl_hung_task_warnings cannot be less than 0
Impact: fix warnings-limit cutoff check for debug feature

unsigned sysctl_hung_task_warnings cannot be less than 0

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 10:11:51 +01:00
Joerg Roedel
70d7d35757 x86: fix broken flushing in GART nofullflush path
Impact: remove stale IOTLB entries

In the non-default nofullflush case the GART is only flushed when
next_bit wraps around. But it can happen that an unmap operation unmaps
memory which is behind the current next_bit location. If these addresses
are reused it may result in stale GART IO/TLB entries. Fix this by
setting the GART next_bit always behind an unmapped location.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 10:02:41 +01:00
Chris Torek
ee4ee52727 sparc64: Fix bug in PTRACE_SETFPREGS64 handling.
From: Chris Torek <chris.torek@windriver.com>

>The SPARC64 kernel code for PTRACE_SETFPREGS64 appears to be an exact copy 
>of that for PTRACE_GETFPREGS64.  This means that gdbserver and native 
>64-bit GDB cannot set floating-point registers.

It looks like a simple typo.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 00:47:28 -08:00
Paul Moore
d25830e550 netlabel: Fix a potential NULL pointer dereference
Fix a potential NULL pointer dereference seen when trying to remove a
static label configuration with an invalid address/mask combination.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 00:37:04 -08:00
Michael Chan
efba01803c bnx2: Add workaround to handle missed MSI.
The bnx2 chips do not support per MSI vector masking.  On 5706/5708, new MSI
address/data are stored only when the MSI enable bit is toggled.  As a result,
SMP affinity no longer works in the latest kernel.  A more serious problem is
that the driver will no longer receive interrupts when the MSI receiving CPU
goes offline.

The workaround in this patch only addresses the problem of CPU going offline.
When that happens, the driver's timer function will detect that it is making
no forward progress on pending interrupt events and will recover from it.

Eric Dumazet reported the problem.

We also found that if an interrupt is internally asserted while MSI and INTA
are disabled, the chip will end up in the same state after MSI is re-enabled.
The same workaround is needed for this problem. 

Signed-off-by: Michael Chan <mchan@broadcom.com>
Tested-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 00:36:15 -08:00
Wei Yongjun
d5654efd3f xfrm: Fix kernel panic when flush and dump SPD entries
After flush the SPD entries, dump the SPD entries will cause kernel painc.

Used the following commands to reproduct:

- echo 'spdflush;' | setkey -c
- echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64  any -P out ipsec \
  ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
  spddump;' | setkey -c
- echo 'spdflush; spddump;' | setkey -c
- echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64  any -P out ipsec \
  ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
  spddump;' | setkey -c

This is because when flush the SPD entries, the SPD entry is not remove
from the list.

This patch fix the problem by remove the SPD entry from the list.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-03 00:27:18 -08:00
Benjamin Herrenschmidt
2434bbb30e powerpc: Fix dma_map_sg() cache flushing on non coherent platforms
On PowerPC 4xx or other non cache-coherent platforms, we lost the
appropriate cache flushing in dma_map_sg() when merging the 32 and
64-bit DMA code (commit 4fc665b88a,
"powerpc: Merge 32 and 64-bit dma code").  This restores it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-03 18:24:08 +11:00
Linus Torvalds
f6f7b52e2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [WATCHDOG] hpwdt: Fix kdump when using hpwdt
  [WATCHDOG] hpwdt: set the mapped BIOS address space as executable
  [WATCHDOG] iTCO_wdt: add PCI ID's for ICH9 & ICH10 chipsets
  [WATCHDOG] iTCO_wdt : correct status clearing
  [WATCHDOG] iTCO_wdt : problem with rebooting on new ICH9 based motherboards
  [WATCHDOG] fix mtx1_wdt compilation failure
2008-12-02 15:58:20 -08:00
Linus Torvalds
51eaaa6776 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: pre-allocate bulk-read buffer
  UBIFS: do not allocate too much
  UBIFS: do not print scary memory allocation warnings
  UBIFS: allow for gaps when dirtying the LPT
  UBIFS: fix compilation warnings
  MAINTAINERS: change UBI/UBIFS git tree URLs
  UBIFS: endian handling fixes and annotations
  UBIFS: remove printk
2008-12-02 15:56:55 -08:00
Linus Torvalds
b7d6266062 Merge branch 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: MMU: avoid creation of unreachable pages in the shadow
  KVM: ppc: stop leaking host memory on VM exit
  KVM: MMU: fix sync of ptes addressed at owner pagetable
  KVM: ia64: Fix: Use correct calling convention for PAL_VPS_RESUME_HANDLER
  KVM: ia64: Fix incorrect kbuild CFLAGS override
  KVM: VMX: Fix interrupt loss during race with NMI
  KVM: s390: Fix problem state handling in guest sigp handler
2008-12-02 15:56:17 -08:00
Linus Torvalds
e6d9f0fb5f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix offset calculation in compute_size()
  rtc: rtc-starfire fixes
2008-12-02 15:55:43 -08:00
Linus Torvalds
e1825e7515 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  MAINTAINERS: add netdev to ATM
  ATM: horizon, fix hrz_probe fail path
  pppol2tp: Add missing sock_put() in pppol2tp_release()
  net: Fix soft lockups/OOM issues w/ unix garbage collector
  macvlan: don't broadcast PAUSE frames to macvlan devices
  Phonet: fix oops in phonet_address_del() on non-Phonet device
  netfilter: ctnetlink: fix GFP_KERNEL allocation under spinlock
  sungem: Fix PCS_MIICTRL register write in gem_init_phy().
  net: make skb_truesize_bug() call WARN()
  net: hp-plus uses eip_poll
  net/wireless/reg.c: fix bad WARN_ON in if statement
  ath5k: disable beacon filter when station is not associated
  ath5k: fix Security issue in DebugFS part of ath5k
  ath9k: correct expected max RX buffer size
  ath9k: Fix SW-IOMMU bounce buffer starvation
  mac80211 : Fix setting ad-hoc mode and non-ibss channel
  iwlagn: fix DMA sync
  phylib: Add Vitesse VSC8221 SGMII PHY
  rose: zero length frame filtering in af_rose.c
  bridge: netfilter: fix update_pmtu crash with GRE
  ...
2008-12-02 15:55:05 -08:00