Commit Graph

45224 Commits

Author SHA1 Message Date
Suresh Siddha
318fe7df9d iommu: Move IOMMU specific code to intel-iommu.c
Move the IOMMU specific routines to intel-iommu.c leaving the
dmar.c to the common ACPI dmar code shared between DMA-remapping
and Interrupt-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: youquan.song@intel.com
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.282401285@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21 10:21:54 +02:00
Suresh Siddha
41750d31fc x86, x2apic: Enable the bios request for x2apic optout
On the platforms which are x2apic and interrupt-remapping
capable, Linux kernel is enabling x2apic even if the BIOS
doesn't. This is to take advantage of the features that x2apic
brings in.

Some of the OEM platforms are running into issues because of
this, as their bios is not x2apic aware. For example, this was
resulting in interrupt migration issues on one of the platforms.
Also if the BIOS SMI handling uses APIC interface to send SMI's,
then the BIOS need to be aware of x2apic mode that OS has
enabled.

On some of these platforms, BIOS doesn't have a HW mechanism to
turnoff the x2apic feature to prevent OS from enabling it.

To resolve this mess, recent changes to the VT-d2 specification:

 http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf

includes a mechanism that provides BIOS a way to request system
software to opt out of enabling x2apic mode.

Look at the x2apic optout flag in the DMAR tables before
enabling the x2apic mode in the platform. Also print a warning
that we have disabled x2apic based on the BIOS request.

Kernel boot parameter "intremap=no_x2apic_optout" can be used to
override the BIOS x2apic optout request.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.171766616@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21 10:21:50 +02:00
Linus Torvalds
1f0772fe27 Merge git://github.com/davem330/net
* git://github.com/davem330/net:
  tcp: fix validation of D-SACK
  tcp: fix build error if !CONFIG_SYN_COOKIES
2011-09-19 07:48:04 -07:00
Eric Dumazet
e05c82d366 tcp: fix build error if !CONFIG_SYN_COOKIES
commit 946cedccbd (tcp: Change possible SYN flooding messages)
added a build error if CONFIG_SYN_COOKIES=n

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-18 21:48:01 -04:00
Linus Torvalds
6bf3b0dc32 Merge branch 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6:
  mfd: Fix omap-usb-host build failure
  mfd: Make omap-usb-host TLL mode work again
  mfd: Set MAX8997 irq pointer
  mfd: Fix initialisation of tps65910 interrupts
  mfd: Check for twl4030-madc NULL pointer
  mfd: Copy the device pointer to the twl4030-madc structure
  mfd: Rename wm8350 static gpio_set_debounce()
  mfd: Fix value of WM8994_CONFIGURE_GPIO
2011-09-18 18:18:55 -07:00
Linus Torvalds
b0e7031ac0 Merge git://github.com/davem330/net
* git://github.com/davem330/net: (62 commits)
  ipv6: don't use inetpeer to store metrics for routes.
  can: ti_hecc: include linux/io.h
  IRDA: Fix global type conflicts in net/irda/irsysctl.c v2
  net: Handle different key sizes between address families in flow cache
  net: Align AF-specific flowi structs to long
  ipv4: Fix fib_info->fib_metrics leak
  caif: fix a potential NULL dereference
  sctp: deal with multiple COOKIE_ECHO chunks
  ibmveth: Fix checksum offload failure handling
  ibmveth: Checksum offload is always disabled
  ibmveth: Fix issue with DMA mapping failure
  ibmveth: Fix DMA unmap error
  pch_gbe: support ML7831 IOH
  pch_gbe: added the process of FIFO over run error
  pch_gbe: fixed the issue which receives an unnecessary packet.
  sfc: Use 64-bit writes for TX push where possible
  Revert "sfc: Use write-combining to reduce TX latency" and follow-ups
  bnx2x: Fix ethtool advertisement
  bnx2x: Fix 578xx link LED
  bnx2x: Fix XMAC loopback test
  ...
2011-09-18 11:02:26 -07:00
dpward
aa1c366e4f net: Handle different key sizes between address families in flow cache
With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16 17:47:28 -04:00
David Ward
728871bc05 net: Align AF-specific flowi structs to long
AF-specific flowi structs are now passed to flow_key_compare, which must
also be aligned to a long.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16 17:45:07 -04:00
Max Matveev
d5ccd49660 sctp: deal with multiple COOKIE_ECHO chunks
Attempt to reduce the number of IP packets emitted in response to single
SCTP packet (2e3216cd) introduced a complication - if a packet contains
two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the
socket while processing first COOKIE_ECHO and then loses the association
and forgets to uncork the socket. To deal with the issue add new SCTP
command which can be used to set association explictly. Use this new
command when processing second COOKIE_ECHO chunk to restore the context
for SCTP state machine.

Signed-off-by: Max Matveev <makc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16 17:17:22 -04:00
David S. Miller
52b9aca7ae Merge branch 'master' of ../netdev/ 2011-09-16 01:09:02 -04:00
Michael S. Tsirkin
48c830120f net: copy userspace buffers on device forwarding
dev_forward_skb loops an skb back into host networking
stack which might hang on the memory indefinitely.
In particular, this can happen in macvtap in bridged mode.
Copy the userspace fragments to avoid blocking the
sender in that case.

As this patch makes skb_copy_ubufs extern now,
I also added some documentation and made it clear
the SKBTX_DEV_ZEROCOPY flag automatically instead
of doing it in all callers. This can be made into a separate
patch if people feel it's worth it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 14:49:44 -04:00
Eric Dumazet
946cedccbd tcp: Change possible SYN flooding messages
"Possible SYN flooding on port xxxx " messages can fill logs on servers.

Change logic to log the message only once per listener, and add two new
SNMP counters to track :

TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.

Based on a prior patch from Tom Herbert, and suggestions from David.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 14:49:43 -04:00
Russell King
4f5b04800a drivers/gpio/gpio-generic.c: fix build errors
Building a kernel with hotplug disabled results in a link failure:

  `bgpio_remove' referenced in section `___ksymtab_gpl+bgpio_remove' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o

This is because of bgpio_remove() is exported.  It is illegal to export
symbols which are discarded either at link time or as part of an
init/exit section.

Fix this by dropping the __devexit attributation from bgpio_remove().
Also drop the __devinit attributation from bgpio_init().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-14 18:09:38 -07:00
Johannes Weiner
185efc0f9a memcg: Revert "memcg: add memory.vmscan_stat"
Revert the post-3.0 commit 82f9d486e5 ("memcg: add
memory.vmscan_stat").

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between.  Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-14 18:09:38 -07:00
Randy Dunlap
bff747c58c regulator: fix kernel-doc warning in consumer.h
Fix kernel-doc warning about internal/private data by marking it
as "private:" so that kernel-doc will ignore it.

  Warning(include/linux/regulator/consumer.h:128): No description found for parameter 'ret'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-08 14:43:03 -07:00
Randy Dunlap
d2f152878d wireless: fix kernel-doc warning in net/cfg80211.h
Fix kernel-doc warning in net/cfg80211.h:

  Warning(include/net/cfg80211.h:1884): No description found for parameter 'registered'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-08 14:43:03 -07:00
Linus Torvalds
b0fb422281 Merge branch 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86, perf: Check that current->mm is alive before getting user callchain
  perf_event: Fix broken calc_timer_values()
  perf events: Fix slow and broken cgroup context switch code
2011-09-07 13:00:11 -07:00
Mark Brown
8efcc57ded mfd: Fix value of WM8994_CONFIGURE_GPIO
This needs to be an out of band value for the register and on this device
registers are 16 bit so we must shift left one to the 17th bit.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-09-06 16:37:58 +02:00
Jim Garlick
51b8b4fb32 fs/9p: Use protocol-defined value for lock/getlock 'type' field.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2011-09-06 08:17:16 -05:00
Aneesh Kumar K.V
f88657ce3f fs/9p: Add OS dependent open flags in 9p protocol
Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2011-09-06 08:17:15 -05:00
Maciej Żenczykowski
ec0506dbe4 net: relax PKTINFO non local ipv6 udp xmit check
Allow transparent sockets to be less restrictive about
the source ip of ipv6 udp packets being sent.

Google-Bug-Id: 5018138
Signed-off-by: Maciej Żenczykowski <maze@google.com>
CC: "Erik Kline" <ek@google.com>
CC: "Lorenzo Colitti" <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-30 17:39:01 -04:00
Linus Torvalds
0f43dd546d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (42 commits)
  netpoll: fix incorrect access to skb data in __netpoll_rx
  cassini: init before use in cas_interruptN.
  can: ti_hecc: Fix uninitialized spinlock in probe
  can: ti_hecc: Fix unintialized variable
  net: sh_eth: fix the compile error
  net/phy: fix DP83865 phy interrupt handler
  sendmmsg/sendmsg: fix unsafe user pointer access
  ibmveth: Fix leak when recycling skb and hypervisor returns error
  arp: fix rcu lockdep splat in arp_process()
  bridge: fix a possible use after free
  bridge: Pseudo-header required for the checksum of ICMPv6
  mcast: Fix source address selection for multicast listener report
  MAINTAINERS: Update GIT trees for network development
  ath9k: Fix PS wrappers in ath9k_set_coverage_class
  carl9170: Fix mismatch in carl9170_op_set_key mutex lock-unlock
  wl12xx: add max_sched_scan_ssids value to the hw description
  wl12xx: Fix validation of pm_runtime_get_sync return value
  wl12xx: Remove obsolete testmode NVS push command
  bcma: add uevent to the bus, to autoload drivers
  ath9k_hw: Fix STA (AR9485) bringup issue due to incorrect MAC address
  ...
2011-08-29 13:38:29 -07:00
Stephane Eranian
a8d757ef07 perf events: Fix slow and broken cgroup context switch code
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 <not counted> cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 <not counted> cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-29 12:28:33 +02:00
NeilBrown
f5b9409973 All Arch: remove linkage for sys_nfsservctl system call
The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26 15:09:58 -07:00
Linus Torvalds
efe45ab1ee Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  omap-serial: Allow IXON and IXOFF to be disabled.
  TTY: serial, document ignoring of uart->ops->startup error
  TTY: pty, fix pty counting
  8250: Fix race condition in serial8250_backup_timeout().
  serial/8250_pci: delete duplicate data definition
  8250_pci: add support for Rosewill RC-305 4x serial port card
  tty: Add "spi:" prefix for spi modalias
  atmel_serial: fix atmel_default_console_device
  serial: 8250_pnp: add Intermec CV60 touchscreen device
  drivers/serial/ucc_uart.c: Fix compiler warning
  pch_uart: Set PCIe bus number using probe parameter
  serial: samsung: Fix build error
2011-08-26 13:06:06 -07:00
Linus Torvalds
3ab47029d9 Merge branch 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  drivers:misc: ti-st: fix unexpected UART close
  drivers:misc: ti-st: free skb on firmware download
  drivers:misc: ti-st: wait for completion at fail
  drivers:misc: ti-st: reinit completion before send
  drivers:misc: ti-st: fail-safe on wrong pkt type
  drivers:misc: ti-st: reinit completion on ver read
  drivers:misc:ti-st: platform hooks for chip states
  drivers:misc: ti-st: avoid a misleading dbg msg
  base/devres.c: quiet sparse noise about context imbalance
  pti: add missing CONFIG_PCI dependency
  drivers/base/devtmpfs.c: correct annotation of `setup_done'
  driver core: fix kernel-doc warning in platform.c
  firmware: fix google/gsmi.c build warning
2011-08-26 13:05:09 -07:00
John W. Linville
e2e6be56df Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2011-08-26 10:33:51 -04:00
Dilan Lee
cc7993f643 backlight: add a callback 'notify_after' for backlight control
We need a callback to do some things after pwm_enable, pwm_disable
and pwm_config.

Signed-off-by: Dilan Lee <dilee@nvidia.com>
Reviewed-by: Robert Morell <rmorell@nvidia.com>
Reviewed-by: Arun Murthy <arun.murthy@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:34 -07:00
Alexandre Bounine
284fb68d00 rapidio: fix use of non-compatible registers
Replace/remove use of RIO v.1.2 registers/bits that are not
forward-compatible with newer versions of RapidIO specification.

RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.

Use of removed (since RIO v.1.3) register bits affects users of
currently available 1.3 and 2.x compliant devices who may use not so
recent kernel versions.

Removing checks for unsupported bits makes corresponding routines
compatible with all versions of RapidIO specification.  Therefore,
backporting makes stable kernel versions compliant with RIO v.1.3 and
later as well.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:34 -07:00
Evgeniy Polyakov
a801876638 MAINTAINERS: Evgeniy has moved
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:33 -07:00
Josh Boyer
e096d0c7e2 lockdep: Add helper function for dir vs file i_mutex annotation
Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists.  As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes.  Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

    find/645 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac

    but task is already holding lock:
     (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
    vfs_readdir+0x5b/0xb4

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
          [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
          [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
          [<ffffffff81111557>] mmap_region+0x258/0x432
          [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
          [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
          [<ffffffff8100c858>] sys_mmap+0x22/0x24
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

    -> #0 (&mm->mmap_sem){++++++}:
          [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff81109541>] might_fault+0x89/0xac
          [<ffffffff81149cff>] filldir+0x6f/0xc7
          [<ffffffff811586ea>] dcache_readdir+0x67/0x205
          [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
          [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 10:50:18 -07:00
Linus Torvalds
e33f2d238e Merge branch 'urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback
* 'urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback:
  squeeze max-pause area and drop pass-good area
2011-08-25 10:40:12 -07:00
Linus Torvalds
f385b6974b Merge branch '3.1-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* '3.1-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (21 commits)
  target: Convert acl_node_lock to be IRQ-disabling
  target: Make locking in transport_deregister_session() IRQ safe
  tcm_fc: init/exit functions should not be protected by "#ifdef MODULE"
  target: Print subpage too for unhandled MODE SENSE pages
  iscsi-target: Fix iscsit_allocate_se_cmd_for_tmr failure path bugs
  iscsi-target: Implement iSCSI target IPv6 address printing.
  target: Fix task SGL chaining breakage with transport_allocate_data_tasks
  target: Fix task count > 1 handling breakage and use max_sector page alignment
  target: Add missing DATA_SG_IO transport_cmd_get_valid_sectors check
  target: Fix SYNCHRONIZE_CACHE zero LBA + range breakage
  target: Remove duplicate task completions in transport_emulate_control_cdb
  target: Fix WRITE_SAME usage with transport_get_size
  target: Add WRITE_SAME (10) parsing and refactor passthrough checks
  target: Fix write payload exception handling with ->new_cmd_map
  iscsi-target: forever loop bug in iscsit_attach_ooo_cmdsn()
  iscsi-target: remove duplicate return
  target: Convert target_core_rd.c to use use BUG_ON
  iscsi-target: Fix leak on failure in iscsi_copy_param_list()
  target: Use ERR_CAST inlined function
  target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0
  ...
2011-08-25 10:30:51 -07:00
Andi Kleen
be27425dcc Add a personality to report 2.6.x version numbers
I ran into a couple of programs which broke with the new Linux 3.0
version.  Some of those were binary only.  I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.

For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.

  $ uname -a
  Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux

  $ hpacucli ctrl all show

  Error: No controllers detected.

  $ rpm -qf /usr/sbin/hpacucli
  hpacucli-8.75-12.0

Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":

  https://bugzilla.mozilla.org/show_bug.cgi?id=664564

It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.

This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead.  The x is the x in 3.x.

I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.

Some programs also read /proc/sys/kernel/osrelease.  This can be worked
around in user space with mount --bind (and a mount namespace)

To use:

  wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
  gcc -o uname26 uname26.c
  ./uname26 program

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 10:17:28 -07:00
Linus Torvalds
051732bcbe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
  fuse: mark pages accessed when written to
  fuse: delete dead .write_begin and .write_end aops
  fuse: fix flock
  fuse: fix non-ANSI void function notation
2011-08-24 09:14:42 -07:00
Jiri Slaby
24d406a6bf TTY: pty, fix pty counting
tty_operations->remove is normally called like:
queue_release_one_tty
 ->tty_shutdown
   ->tty_driver_remove_tty
     ->tty_operations->remove

However tty_shutdown() is called from queue_release_one_tty() only if
tty_operations->shutdown is NULL. But for pty, it is not.
pty_unix98_shutdown() is used there as ->shutdown.

So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
called. This results in invalid pty_count. I.e. what can be seen in
/proc/sys/kernel/pty/nr.

I see this was already reported at:
  https://lkml.org/lkml/2009/11/5/370
But it was not fixed since then.

This patch is kind of a hackish way. The problem lies in ->install. We
allocate there another tty (so-called tty->link). So ->install is
called once, but ->remove twice, for both tty and tty->link. The fix
here is to count both tty and tty->link and divide the count by 2 for
user.

And to have ->remove called, let's make tty_driver_remove_tty() global
and call that from pty_unix98_shutdown() (tty_operations->shutdown).

While at it, let's document that when ->shutdown is defined,
tty_shutdown() is not called.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-23 10:10:38 -07:00
Pavan Savoy
0d7c5f2572 drivers:misc:ti-st: platform hooks for chip states
Certain platform specific or Host-WiLink Interface specific actions would be
required to be taken when the chip is being enabled and after the chip is
disabled such as configuration of the mux modes for the GPIO of host connected
to the nshutdown of the chip or relinquishing UART after the chip is disabled.

Similar actions can also be taken when the chip is in deep sleep or when the
chip is awake. Performance enhancements such as configuring the host to run
faster when chip is awake and slower when chip is asleep can also be made
here.

Signed-off-by: Pavan Savoy <pavan_savoy@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22 14:13:32 -07:00
Nicholas Bellinger
052605c6ca target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0
This patch changes target_emulate_inquiry_std() to set the 'not connected'
(0x35) bit in standard INQUIRY response data when we are processing a
request to a virtual LUN=0 mapping from struct se_device *g_lun0_dev that
have been setup for us in transport_lookup_cmd_lun().

This addresses an issue where qla2xxx FC clients need to be able
to create demo-mode I_T FC Nexuses by default, but should not be
exposing the default set of TPG LUNs to all FC clients.  This includes
adding an new optional target_core_fabric_ops->tpg_check_demo_mode_login_only()
caller to allow demo_mode nexuses to skip the old default of bulding
a demo-mode MappedLUNs list via core_tpg_add_node_to_devs().

(roland: Add missing tpg_check_demo_mode_login_only check in core_dev_add_lun)

Reported-by: Roland Dreier <roland@purestorage.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@risingtidesystems.com>
2011-08-22 19:25:35 +00:00
Stanislaw Gruszka
ecb4433550 mac80211: fix suspend/resume races with unregister hw
Do not call ->suspend, ->resume methods after we unregister wiphy. Also
delete sta_clanup timer after we finish wiphy unregister to avoid this:

WARNING: at lib/debugobjects.c:262 debug_print_object+0x85/0xa0()
Hardware name: 6369CTO
ODEBUG: free active (active state 0) object type: timer_list hint: sta_info_cleanup+0x0/0x180 [mac80211]
Modules linked in: aes_i586 aes_generic fuse bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq mperf ext2 dm_mod uinput thinkpad_acpi hwmon sg arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 cfg80211 i2c_i801 iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom yenta_socket ahci libahci pata_acpi ata_generic ata_piix i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: microcode]
Pid: 5663, comm: pm-hibernate Not tainted 3.1.0-rc1-wl+ #19
Call Trace:
 [<c0454cfd>] warn_slowpath_common+0x6d/0xa0
 [<c05e05e5>] ? debug_print_object+0x85/0xa0
 [<c05e05e5>] ? debug_print_object+0x85/0xa0
 [<c0454dae>] warn_slowpath_fmt+0x2e/0x30
 [<c05e05e5>] debug_print_object+0x85/0xa0
 [<f8a808e0>] ? sta_info_alloc+0x1a0/0x1a0 [mac80211]
 [<c05e0bd2>] debug_check_no_obj_freed+0xe2/0x180
 [<c051175b>] kfree+0x8b/0x150
 [<f8a126ae>] cfg80211_dev_free+0x7e/0x90 [cfg80211]
 [<f8a13afd>] wiphy_dev_release+0xd/0x10 [cfg80211]
 [<c068d959>] device_release+0x19/0x80
 [<c05d06ba>] kobject_release+0x7a/0x1c0
 [<c07646a8>] ? rtnl_unlock+0x8/0x10
 [<f8a13adb>] ? wiphy_resume+0x6b/0x80 [cfg80211]
 [<c05d0640>] ? kobject_del+0x30/0x30
 [<c05d1a6d>] kref_put+0x2d/0x60
 [<c05d056d>] kobject_put+0x1d/0x50
 [<c08015f4>] ? mutex_lock+0x14/0x40
 [<c068d60f>] put_device+0xf/0x20
 [<c069716a>] dpm_resume+0xca/0x160
 [<c04912bd>] hibernation_snapshot+0xcd/0x260
 [<c04903df>] ? freeze_processes+0x3f/0x90
 [<c049151b>] hibernate+0xcb/0x1e0
 [<c048fdc0>] ? pm_async_store+0x40/0x40
 [<c048fe60>] state_store+0xa0/0xb0
 [<c048fdc0>] ? pm_async_store+0x40/0x40
 [<c05d0200>] kobj_attr_store+0x20/0x30
 [<c0575ea4>] sysfs_write_file+0x94/0xf0
 [<c051e26a>] vfs_write+0x9a/0x160
 [<c0575e10>] ? sysfs_open_file+0x200/0x200
 [<c051e3fd>] sys_write+0x3d/0x70
 [<c080959f>] sysenter_do_call+0x12/0x28

Cc: stable@kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22 14:21:40 -04:00
Linus Torvalds
5ccc38740a Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
  Revert "cfq: Remove special treatment for metadata rqs."
  block: fix flush machinery for stacking drivers with differring flush flags
  block: improve rq_affinity placement
  blktrace: add FLUSH/FUA support
  Move some REQ flags to the common bio/request area
  allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
  xen/blkback: Make description more obvious.
  cfq-iosched: Add documentation about idling
  block: Make rq_affinity = 1 work as expected
  block: swim3: fix unterminated of_device_id table
  block/genhd.c: remove useless cast in diskstats_show()
  drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
  drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
  bsg-lib: add module.h include
  cfq-iosched: Reduce linked group count upon group destruction
  blk-throttle: correctly determine sync bio
  loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
  loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
  loop: add management interface for on-demand device allocation
  loop: replace linked list of allocated devices with an idr index
  ...
2011-08-19 10:47:07 -07:00
Linus Torvalds
0c3bef6128 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: OF: Don't crash when bridge parent is NULL.
  PCI: export pcie_bus_configure_settings symbol
  PCI: code and comments cleanup
  PCI: make cardbus-bridge resources optional
  PCI: make SRIOV resources optional
  PCI : ability to relocate assigned pci-resources
  PCI: honor child buses add_size in hot plug configuration
  PCI: Set PCI-E Max Payload Size on fabric
2011-08-19 10:02:37 -07:00
Wu Fengguang
bb0822954a squeeze max-pause area and drop pass-good area
Revert the pass-good area introduced in ffd1f609ab ("writeback:
introduce max-pause and pass-good dirty limits") and make the max-pause
area smaller and safe.

This fixes ~30% performance regression in the ext3 data=writeback
fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.

Using deadline scheduler also has a regression, but not that big as CFQ,
so this suggests we have some write starvation.

The test logs show that

- the disks are sometimes under utilized

- global dirty pages sometimes rush high to the pass-good area for
  several hundred seconds, while in the mean time some bdi dirty pages
  drop to very low value (bdi_dirty << bdi_thresh).  Then suddenly the
  global dirty pages dropped under global dirty threshold and bdi_dirty
  rush very high (for example, 2 times higher than bdi_thresh). During
  which time balance_dirty_pages() is not called at all.

So the problems are

1) The random writes progress so slow that they break the assumption of
   the max-pause logic that "8 pages per 200ms is typically more than
   enough to curb heavy dirtiers".

2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
   for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
   and then (bdi_thresh >> bdi_dirty) for others.

3) The higher max-pause/pass-good thresholds somehow leads to the bad
   swing of dirty pages.

The fix is to allow the task to slightly dirty over task_bdi_thresh, but
no way to exceed bdi_dirty and/or global dirty_thresh.

Tests show that it fixed the JBOD regression completely (both behavior
and performance), while still being able to cut down large pause times
in balance_dirty_pages() for single-disk cases.

Reported-by: Li Shaohua <shaohua.li@intel.com>
Tested-by: Li Shaohua <shaohua.li@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-08-19 22:42:07 +08:00
Linus Torvalds
b4fd4ae6c6 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
2011-08-17 13:15:25 -07:00
Ian Campbell
aa462abe8a mm: fix __page_to_pfn for a const struct page argument
This allows the cast in lowmem_page_address (introduced as a warning
fixup to 33dd4e0ec9 "mm: make some struct page's const") to be
removed.

Propagate const'ness to page_to_section() as well since it is required
by __page_to_pfn.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Ian Campbell
f991879473 mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const.
Followup to 33dd4e0ec9 "mm: make some struct page's const" which missed the
HASHED_PAGE_VIRTUAL case.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Linus Torvalds
6cac952960 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rtc: Limit RTC PIE frequency
  rtc: Fix hrtimer deadlock
  rtc: Handle errors correctly in rtc_irq_set_state()

Fixup trivial conflicts in drivers/rtc/interface.c due to slightly
trivially versions of the same patch coming in two different ways.
2011-08-17 10:28:33 -07:00
Linus Torvalds
950d0a10d1 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq: Track the owner of irq descriptor
  irq: Always set IRQF_ONESHOT if no primary handler is specified
  genirq: Fix wrong bit operation
2011-08-17 10:23:50 -07:00
Jeff Moyer
4853abaae7 block: fix flush machinery for stacking drivers with differring flush flags
Commit ae1b153962, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
        struct request *rq;

        while (1) {
-               while (!list_empty(&q->queue_head)) {
+               if (!list_empty(&q->queue_head)) {
                        rq = list_entry_rq(q->queue_head.next);
-                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
-                           (rq->cmd_flags & REQ_FLUSH_SEQ))
-                               return rq;
-                       rq = blk_do_flush(q, rq);
-                       if (rq)
-                               return rq;
+                       return rq;
                }

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
        unsigned int fflags = q->flush_flags; /* may change, cache it */
        bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
        bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
        bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
        REQ_FUA);
        unsigned skip = 0;
...
        if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                rq->cmd_flags &= ~REQ_FLUSH;
		if (!has_fua)
			rq->cmd_flags &= ~REQ_FUA;
	        return rq;
	}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all.  Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking.  While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io).  Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers.  So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance.  Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-15 21:37:25 +02:00
Linus Torvalds
97c24d1d45 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: remove unused "ddr" parameter in struct mmc_ios
  mmc: dw_mmc: Fix DDR mode support.
  mmc: core: use defined R1_STATE_PRG macro for card status
  mmc: sdhci: use f_max instead of host->clock for timeouts
  mmc: sdhci: move timeout_clk calculation farther down
  mmc: sdhci: check host->clock before using it as a denominator
  mmc: Revert "mmc: sdhci: Fix SDHCI_QUIRK_TIMEOUT_USES_SDCLK"
  mmc: tmio: eliminate unused variable 'mmc' warning
  mmc: esdhc-imx: fix card interrupt loss on freescale eSDHC
  mmc: sdhci-s3c: Fix build for header change
  mmc: dw_mmc: Fix mask in IDMAC_SET_BUFFER1_SIZE macro
  mmc: cb710: fix possible pci_dev leak in cb710_pci_configure()
  mmc: core: Detect eMMC v4.5 ext_csd entries
  mmc: mmc_test: avoid stalled file in debugfs
  mmc: sdhci-s3c: add BROKEN_ADMA_ZEROLEN_DESC quirk
  mmc: sdhci: pxav3: controller needs 32 bit ADMA addressing
  mmc: sdhci: fix retuning timer wrongly deleted in sdhci_tasklet_finish
2011-08-14 12:28:15 -07:00
Rafael J. Wysocki
17f2ae7f67 PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
Function genpd_queue_power_off_work() is not defined for
CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build
error to happen in that case.  Fix the problem by making
pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-14 13:34:31 +02:00