Commit Graph

46461 Commits

Author SHA1 Message Date
Mark Brown
19940b3d55 ASoC: Ensure we get an impedence reported for WM8958 jack detect
Occasionally we may see an accessory reported before we have a stable
impedance for the accessory. If this happens then reread the status in
order to ensure that the handler can take the appropriate action for the
status change.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-11-04 23:24:59 +00:00
Dan Carpenter
589665f5a6 bonding: comparing a u8 with -1 is always false
slave->duplex is a u8 type so the in bond_info_show_slave() when we
check "if (slave->duplex == -1)", it's always false.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-04 18:37:23 -04:00
Johannes Berg
4b6cc7284d netlink: clarify attribute length check documentation
The documentation for how the length of attributes
is checked is wrong ("Exact length" isn't true, the
policy checks are for "minimum length") and a bit
misleading. Make it more complete and explain what
really happens.

Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-04 17:48:23 -04:00
Oleg Nesterov
6f35c4abd7 PM / Freezer: Reimplement wait_event_freezekillable using freezer_do_not_count/freezer_count
Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
TASK_KILLABLE tasks too" updated fake_signal_wake_up() used by freezer
to wake up KILLABLE tasks.  Sending unsolicited wakeups to tasks in
killable sleep is dangerous as there are code paths which depend on
tasks not waking up spuriously from KILLABLE sleep.

For example. sys_read() or page can sleep in TASK_KILLABLE assuming
that wait/down/whatever _killable can only fail if we can not return
to the usermode.  TASK_TRACED is another obvious example.

The offending commit was to resolve freezer hang during system PM
operations caused by KILLABLE sleeps in network filesystems.
wait_event_freezekillable(), which depends on the spurious KILLABLE
wakeup, was added by f06ac72e92 "cifs, freezer: add
wait_event_freezekillable and have cifs use it" to be used to
implement killable & freezable sleeps in network filesystems.

To prepare for reverting of 27920651fe, this patch reimplements
wait_event_freezekillable() using freezer_do_not_count/freezer_count()
so that it doesn't depend on the spurious KILLABLE wakeup.  This isn't
very nice but should do for now.

[tj: Refreshed patch to apply to linus/master and updated commit
    description on Rafael's request.]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-04 22:28:15 +01:00
Tony Lindgren
a96d69d1b0 PM / OPP: Fix build when CONFIG_PM_OPP is not set
Commit 03ca370fbf (PM / OPP: Add
OPP availability change notifier) does not compile if CONFIG_PM_OPP
is not set:

arch/arm/plat-omap/omap-pm-noop.o: In function `opp_get_notifier':
include/linux/opp.h:103: multiple definition of `opp_get_notifier'
include/linux/opp.h:103: first defined here

Also fix incorrect comment.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-04 22:28:13 +01:00
Srivatsa S. Bhat
4e71c9545b PM / Sleep: Remove unused symbol 'suspend_cpu_hotplug'
Remove the suspend_cpu_hotplug declaration, which doesn't correspond
to an existing variable.

[rjw: Added the changelog.]

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-04 22:28:09 +01:00
David S. Miller
39b02648d2 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2011-11-04 17:14:34 -04:00
Jeff Layton
1788ea6e3b nfs: when attempting to open a directory, fall back on normal lookup (try #5)
commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return
from an OPEN call. Prior to that patch, that caused the client to fall
back to doing a normal lookup. When that patch went in, the code began
returning that error to userspace. The d_revalidate codepath however
never had the corresponding change, so it was still possible to end up
with a NULL ctx->state pointer after that.

That patch caused a regression. When we attempt to open a directory that
does not have a cached dentry, that open now errors out with EISDIR. If
you attempt the same open with a cached dentry, it will succeed.

Fix this by reverting the change in nfs_atomic_lookup and allowing
attempts to open directories to fall back to a normal lookup

Also, add a NFSv4-specific f_ops->open routine that just returns
-ENOTDIR. This should never be called if things are working properly,
but if it ever is, then the dprintk may help in debugging.

To facilitate this, a new file_operations field is also added to the
nfs_rpc_ops struct.

Cc: stable@kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-04 16:39:04 -04:00
Linus Torvalds
6736c04799 Merge branch 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (25 commits)
  nfs: set vs_hidden on nfs4_callback_version4 (try #2)
  pnfs-obj: Support for RAID5 read-4-write interface.
  pnfs-obj: move to ore 03: Remove old raid engine
  pnfs-obj: move to ore 02: move to ORE
  pnfs-obj: move to ore 01: ore_layout & ore_components
  pnfs-obj: Rename objlayout_io_state => objlayout_io_res
  pnfs-obj: Get rid of objlayout_{alloc,free}_io_state
  pnfs-obj: Return PNFS_NOT_ATTEMPTED in case of read/write_pagelist
  pnfs-obj: Remove redundant EOF from objlayout_io_state
  nfs: Remove unused variable from write.c
  nfs: Fix unused variable warning from file.c
  NFS: Remove no-op less-than-zero checks on unsigned variables.
  NFS: Clean up nfs4_xdr_dec_secinfo()
  NFS: Fix documenting comment for nfs_create_request()
  NFS4: fix cb_recallany decode error
  nfs4: serialize layoutcommit
  SUNRPC: remove rpcbind clients destruction on module cleanup
  SUNRPC: remove rpcbind clients creation during service registering
  NFSd: call svc rpcbind cleanup explicitly
  SUNRPC: cleanup service destruction
  ...
2011-11-04 12:27:43 -07:00
John W. Linville
22097fd297 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2011-11-04 14:46:34 -04:00
Johannes Berg
54d5026e7c mac80211: warn only once about not finding a rate
The warning really shouldn't happen, but until we
find the reason why it does don't spew it all the
time, just once is enough to know we've hit it.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-11-04 13:38:06 -04:00
Kuninori Morimoto
dd2c0ca1b1 sh: clkfwk: add clk_rate_mult_range_round()
This provides a clk_rate_mult_range_round() helper for use by some of the
CPG PLL ranged multipliers, following the same approach as used by the
div ranges.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-11-05 00:49:27 +09:00
Linus Torvalds
1046a2c428 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (144 commits)
  [media] saa7134.h: Suppress compiler warnings when CONFIG_VIDEO_SAA7134_RC is not set
  [media] it913x [VER 1.07] Support for single ITE 9135 devices
  [media] Support for Terratec G1
  [media] cx25821: off by one in cx25821_vidioc_s_input()
  [media] media: tea5764: reconcile Kconfig symbol and macro
  [media] omap_vout: Add poll() support
  [media] omap3isp: preview: Add crop support on the sink pad
  [media] omap3isp: preview: Rename min/max input/output sizes defines
  [media] omap3isp: preview: Remove horizontal averager support
  [media] omap3isp: Report the ISP revision through the media controller API
  [media] omap3isp: ccdc: remove redundant operation
  [media] omap3isp: Fix memory leaks in initialization error paths
  [media] omap3isp: Add missing mutex_destroy() calls
  [media] omap3isp: Move *_init_entities() functions to the init/cleanup section
  [media] omap3isp: Move media_entity_cleanup() from unregister() to cleanup()
  [media] MFC: Change MFC firmware binary name
  [media] vb2: add vb2_get_unmapped_area in vb2 core
  [media] v4l: Add v4l2 subdev driver for S5K6AAFX sensor
  [media] v4l: Add AUTO option for the V4L2_CID_POWER_LINE_FREQUENCY control
  [media] media: ov6650: stylistic improvements
  ...
2011-11-04 07:58:25 -07:00
Linus Torvalds
6b1506c668 Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  dt: add empty of_machine_is_compatible
  ahci: add DT binding for Calxeda AHCI controller
  dt/platform: minor cleanup
  dt: add empty of_alias_get_id() for non-dt builds
2011-11-04 07:49:29 -07:00
Phil Edworthy
3af1f8a41f serial: sh-sci: Fix up SH-2A SCIF support.
This fixes up support for SH-2(A) SCIFs by introducing a new regtype. As
expected, it's close to the SH-4A SCIF with fifodata, but still different
enough to warrant its own type.

Fixes up a number of FIFO overflows and similar for both SH7203/SH7264.

Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Tested-by: Federico Fuga <fuga@studiofuga.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-11-04 22:33:30 +09:00
Christoph Hellwig
d29a5b6acc target: remove SCF_EMULATE_CDB_ASYNC
All ->execute_task instances now need to complete the I/O explicitly,
which can either happen synchronously or asynchronously.

Note that a lot of the CDB emulations appear to return success even if
some lowlevel operations failed.  Given that this is an existing issue
this patch doesn't change that fact.

(nab: Adding missing switch breaks in PR-IN + PR_OUT)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-11-04 10:43:35 +00:00
Christoph Hellwig
e76a35d6c8 target: pass the se_task to the CDB emulation callback
We want to be able to handle all CDBs through it and remove hacks like
always using the first task in a CDB in target_report_luns.

Also rename the callback to ->execute_task to better describe its use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-11-04 08:00:17 +00:00
Nicholas Bellinger
a17f091d1a target: Add generic active I/O shutdown logic
This patch adds the initial pieces of generic active I/O shutdown logic.
This is intended to be a 'opt-in' feature for fabric modules that
includes the following functions to provide a mechinism for fabric
modules to track se_cmd via se_session->sess_cmd_list:

*) target_get_sess_cmd() - Add se_cmd to sess->sess_cmd_list, called
   from fabric module incoming I/O path.
*) target_put_sess_cmd() - Check for completion or drop se_cmd from
   ->sess_cmd_list
*) target_splice_sess_cmd_list() - Splice active I/O list from
   ->sess_cmd_list to ->sess_wait_list, can called with HW fabric
   lock held.
*) target_wait_for_sess_cmds() - Walk ->sess_wait_list waiting on
   individual ->cmd_wait_comp.  Optional transport_wait_for_tasks()
   call.

target_splice_sess_cmd_list() is allowed to be called under HW fabric
lock, and performs the splice into se_sess->sess_wait_list and set
se_cmd->cmd_wait_set.  Then target_wait_for_sess_cmds() walks the list
waiting for individual target_put_sess_cmd() fabric callbacks to
complete.

It also adds TFO->check_release_cmd() to split the completion and memory
release calls, where a fabric module uses target_put_sess_cmd() to check
for I/O completion during session shutdown.  This is currently pushed out
into fabric modules as current fabric code may sleep here waiting for
TFO->check_stop_free() to complete in main response path, and because
target_wait_for_sess_cmds() calling TFO->release_cmd() to free fabric
descriptor memory directly.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
2011-11-04 07:50:26 +00:00
Stephen Warren
50e07f888c dt: add empty of_machine_is_compatible
The patch adds an empty function for non-dt build, so that
drivers migrating to dt can save some '#ifdef CONFIG_OF'.

v3: New patch

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-11-04 00:12:51 -04:00
Linus Torvalds
6dbbd92522 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
  be2net: Add detect UE feature for Lancer
  be2net: Prevent CQ full condition for Lancer
  be2net: Fix disabling multicast promiscous mode
  be2net: Fix endian issue in RX filter command
  af_packet: de-inline some helper functions
  MAINTAINERS: Add can-gw include to maintained files
  net: Add back alignment for size for __alloc_skb
  net: add missing bh_unlock_sock() calls
  l2tp: fix race in l2tp_recv_dequeue()
  ixgbevf: Update release version
  ixgbe: DCB, return max for IEEE traffic classes
  ixgbe: fix reading of the buffer returned by the firmware
  ixgbe: Fix compiler warnings
  ixgbe: fix smatch splat due to missing NULL check
  ixgbe: fix disabling of Tx laser at probe
  ixgbe: Fix link issues caused by a reset while interface is down
  igb: Fix for I347AT4 PHY cable length unit detection
  e100: make sure vlan support isn't advertised on old adapters
  e1000e: demote a debugging WARN to a debug log message
  net: fix typo in drivers/net/ethernet/xilinx/ll_temac_main.c
  ...
2011-11-03 21:05:43 -07:00
Grant Likely
3983138c01 Merge branch 'for-grant' of git://sources.calxeda.com/kernel/linux into devicetree/merge 2011-11-03 23:32:20 -04:00
Scott Jiang
6f524ec156 [media] vb2: add vb2_get_unmapped_area in vb2 core
no mmu system needs get_unmapped_area file operations to do mmap

Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:29:13 -02:00
Sylwester Nawrocki
bfa8dd3a05 [media] v4l: Add v4l2 subdev driver for S5K6AAFX sensor
This driver exposes preview mode operation of the S5K6AAFX sensor with
embedded SoC ISP. The native capture (snapshot) operation mode is not
supported.
Following controls are available:
 manual/auto exposure and gain, power line frequency (anti-flicker),
 saturation, sharpness, brightness, contrast, white balance temperature,
 color effects, horizontal/vertical image flip, frame interval,
 auto white balance.
RGB component gains are currently exposed through private controls.

Reviewed-by: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:29:11 -02:00
Sylwester Nawrocki
d26a6635b2 [media] v4l: Add AUTO option for the V4L2_CID_POWER_LINE_FREQUENCY control
V4L2_CID_POWER_LINE_FREQUENCY control allows applications to instruct
a driver what is the power line frequency so an appropriate filter
can be used by the device to cancel flicker by compensating the light
intensity ripple. Currently in the menu we have entries for 50 Hz and
60 Hz and for entirely disabling the anti-flicker filter.
However some devices are capable of automatically detecting the
frequency, so add V4L2_CID_POWER_LINE_FREQUENCY_AUTO entry for them.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:29:09 -02:00
Guennadi Liakhovetski
2f0babb7e4 [media] V4L: soc-camera: make (almost) all client drivers re-usable outside of the framework
The most important change in this patch is direct linking to struct
soc_camera_link via the client->dev.platform_data pointer. This makes most
of the soc-camera client drivers also usable outside of the soc-camera
framework. After this change all what is needed for these drivers to
function are inclusions of soc-camera headers for some convenience macros,
suitably configured platform data, which is anyway always required, and
loaded soc-camera core module for library functions. If desired, these
library functions can be made generic in the future and moved to a more
neutral location.

The only two client drivers, that still depend on soc-camera are:

mt9t031: it uses struct video_device for its PM. Since no hardware is
available, alternative methods cannot be tested.

ov6650: it uses struct soc_camera_device to pass its sense data back to
the bridge driver. A generic v4l2-subdevice approach should be developed
to perform this.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:29:03 -02:00
Guennadi Liakhovetski
1a99b972a8 [media] V4L: add .g_std() core V4L2 subdevice operation
VIDIOC_G_STD can return the current TV-norm to the user in one of two ways:
if an .vidioc_g_std() ioctl operation is provided by the driver, it is
called, otherwise the value ot the .current_norm field of struct
video_device is returned. Since subdevice drivers currently have no access
to struct video_device objects, the only way to provide this information to
the user is by implementing a .g_std() method.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:29:02 -02:00
Guennadi Liakhovetski
3e0ec41c5c [media] V4L: dynamically allocate video_device nodes in subdevices
Currently only very few drivers actually use video_device nodes, embedded
in struct v4l2_subdev. Allocate these nodes dynamically for those drivers
to save memory for the rest.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:29:01 -02:00
Guennadi Liakhovetski
443f483aa2 [media] V4L: mt9m001, mt9v022: use internally cached pixel code
Using the internally cached pixel code, instead of the one, provided by
the soc-camera, removes one more use of struct soc_camera_device in these
drivers. Also remove the no longer needed soc_camera_from_i2c() inline
function.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:59 -02:00
Guennadi Liakhovetski
14178aa57c [media] V4L: soc-camera: start removing struct soc_camera_device from client drivers
Remove most trivial uses of struct soc_camera_device from most client
drivers, abstracting some of them inside inline functions. Next steps
will eliminate remaining uses and modify inline functions to not use
struct soc_camera_device.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:58 -02:00
Guennadi Liakhovetski
09362ec25c [media] V4L: docbook documentation for struct v4l2_create_buffers
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:57 -02:00
Hans Verkuil
0934d94a52 [media] soc_camera: remove the now obsolete struct soc_camera_ops
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
[g.liakhovetski@gmx.de: mt9m001 hunk moved to an earlier patch]
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:55 -02:00
Hans Verkuil
d34bfcd2a1 [media] sh_mobile_ceu_camera: implement the control handler
And since this is the last and only host driver that uses controls, also
remove the now obsolete control fields from soc_camera.h.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
[g.liakhovetski@gmx.de: moved code around, fixed problems]
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:35 -02:00
Hans Verkuil
ee02da6455 [media] soc_camera: add control handler support
The soc_camera framework is switched over to use the control framework.
After this patch none of the controls in subdevs or host drivers are available,
until those drivers are also converted to the control framework.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
[g.liakhovetski@gmx.de: moved code around, fixed problems]
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:34 -02:00
Guennadi Liakhovetski
2d86401c2c [media] V4L: vb2: add support for buffers of different sizes on a single queue
The two recently added ioctl()s VIDIOC_CREATE_BUFS and VIDIOC_PREPARE_BUF
allow user-space applications to allocate video buffers of different
sizes and hand them over to the driver for fast switching between
different frame formats. This patch adds support for buffers of different
sizes on the same buffer-queue to vb2.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:29 -02:00
Guennadi Liakhovetski
fc714e70dd [media] V4L: vb2: prepare to support multi-size buffers
In preparation for the forthcoming VIDIOC_CREATE_BUFS ioctl add a
"const struct v4l2_format *" argument to the .queue_setup() vb2
operation.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:28 -02:00
Guennadi Liakhovetski
2150158b31 [media] V4L: add two new ioctl()s for multi-size videobuffer management
A possibility to preallocate and initialise buffers of different sizes
in V4L2 is required for an efficient implementation of a snapshot
mode. This patch adds two new ioctl()s: VIDIOC_CREATE_BUFS and
VIDIOC_PREPARE_BUF and defines respective data structures.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:24 -02:00
Guennadi Liakhovetski
ebc087d090 [media] V4L: add a new videobuf2 buffer state VB2_BUF_STATE_PREPARED
This patch prepares for a better separation of the buffer preparation
stage.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:23 -02:00
Guennadi Liakhovetski
d839fe17a1 [media] V4L: soc-camera: remove soc-camera client bus-param operations and supporting code
soc-camera has been completely ported over to V4L2 subdevice mbus-config
operations, soc-camera client bus-param operations and supporting code
can now be removed.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:18 -02:00
Guennadi Liakhovetski
1067247f56 [media] V4L: soc_camera_platform: remove superfluous soc-camera client operations
Now that all soc-camera hosts have been ported to use V4L2 subdevice
mediabus-config operations and soc-camera client bus-parameter operations
have been made optional, they can be removed.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:12 -02:00
Guennadi Liakhovetski
3e1b6b72b9 [media] V4L: ov772x: remove superfluous soc-camera client operations
Now that all soc-camera hosts have been ported to use V4L2 subdevice
mediabus-config operations and soc-camera client bus-parameter operations
have been made optional, they can be removed.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:28:05 -02:00
Guennadi Liakhovetski
9d3baeb462 [media] V4L: soc-camera: compatible bus-width flags
With the new subdevice media-bus configuration methods bus-width is not
configured along with other bus parameters, instead, it is derived from
the data format. With those methods it is convenient to specify
supported bus-widths in the platform data as (1 << (width - 1)). We
redefine SOCAM_DATAWIDTH_* flags to use the same convention to make
platform data seemlessly reusable.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:27:36 -02:00
Guennadi Liakhovetski
84c760a5de [media] V4L: soc_camera_platform: support the new mbus-config subdev ops
Extend the driver to also support [gs]_mbus_config() subdevice video
operations.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:27:35 -02:00
Guennadi Liakhovetski
2891f37c26 [media] V4L: ov772x: rename macros to not pollute the global namespace
Macros, defined in a header under include/ should be kept in a local
namespace and not pollute the global one.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:27:23 -02:00
Guennadi Liakhovetski
32c69fcc78 [media] V4L: soc-camera: add helper functions for new bus configuration type
Add helper functions to process the new media bus configuration type
similar to soc_camera_apply_sensor_flags() and
soc_camera_bus_param_compatible().

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-03 18:27:13 -02:00
Linus Torvalds
a0a4194c94 Merge branch 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6
* 'for-next' of git://git.infradead.org/users/sameo/mfd-2.6: (80 commits)
  mfd: Fix missing abx500 header file updates
  mfd: Add missing <linux/io.h> include to intel_msic
  x86, mrst: add platform support for MSIC MFD driver
  mfd: Expose TurnOnStatus in ab8500 sysfs
  mfd: Remove support for early drop ab8500 chip
  mfd: Add support for ab8500 v3.3
  mfd: Add ab8500 interrupt disable hook
  mfd: Convert db8500-prcmu panic() into pr_crit()
  mfd: Refactor db8500-prcmu request_clock() function
  mfd: Rename db8500-prcmu init function
  mfd: Fix db5500-prcmu defines
  mfd: db8500-prcmu voltage domain consumers additions
  mfd: db8500-prcmu reset code retrieval
  mfd: db8500-prcmu tweak for modem wakeup
  mfd: Add db8500-pcmu watchdog accessor functions for watchdog
  mfd: hwacc power state db8500-prcmu accessor
  mfd: Add db8500-prcmu accessors for PLL and SGA clock
  mfd: Move to the new db500 PRCMU API
  mfd: Create a common interface for dbx500 PRCMU drivers
  mfd: Initialize DB8500 PRCMU regs
  ...

Fix up trivial conflicts in
	arch/arm/mach-imx/mach-mx31moboard.c
	arch/arm/mach-omap2/board-omap3beagle.c
	arch/arm/mach-u300/include/mach/irqs.h
	drivers/mfd/wm831x-spi.c
2011-11-03 09:40:51 -07:00
Linus Torvalds
cf0223503e Merge branch 'sh-latest' of git://github.com/pmundt/linux-sh
* 'sh-latest' of git://github.com/pmundt/linux-sh:
  sh: Add default uImage rule for sh7757lcr
  sh: modify the asm/sh_eth.h to linux/sh_eth.h in sh7757lcr
  sh: userimask.c needs linux/stat.h
  sh: pfc: Add GPIO IRQ support
  sh: modify the asm/sh_eth.h to linux/sh_eth.h in some boards
  sh: pfc: Remove unused gpio_in_use member
  sh: add parameters for EHCI and RIIC in clock-sh7757.c
  sh: kexec: Add PHYSICAL_START
  SH: irq: Remove IRQF_DISABLED
  sh: pfc: get_config_reg() shift clean up
  sh: intc: Add IRQ trigger bit field check
  sh: drop unused Kconfig symbol
  sh: Fix implicit declaration of function numa_node_id
  sh: kexec: Register crashk_res
  sh: ecovec: add renesas_usbhs DMAEngine support
2011-11-03 08:22:06 -07:00
Linus Torvalds
3f8ddb032a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock:
  hwspinlock: add MAINTAINERS entries
  hwspinlock/omap: omap_hwspinlock_remove should be __devexit
  hwspinlock/u8500: add hwspinlock driver
  hwspinlock/core: register a bank of hwspinlocks in a single API call
  hwspinlock/core: remove stubs for register/unregister
  hwspinlock/core: use a mutex to protect the radix tree
  hwspinlock/core/omap: fix id issues on multiple hwspinlock devices
  hwspinlock/omap: simplify allocation scheme
  hwspinlock/core: simplify 'owner' handling
  hwspinlock/core: simplify Kconfig

Fix up trivial conflicts (addition of omap_hwspinlock_pdata, removal of
omap_spinlock_latency) in arch/arm/mach-omap2/hwspinlock.c

Also, do an "evil merge" to fix a compile error in omap_hsmmc.c which
for some reason was reported in the same email thread as the "please
pull hwspinlock changes".
2011-11-03 08:05:35 -07:00
Trond Myklebust
31cbecb4ab Merge branch 'osd-devel' into nfs-for-next 2011-11-02 23:56:40 -04:00
Linus Torvalds
43672a0784 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm:
  dm: raid fix device status indicator when array initializing
  dm log userspace: add log device dependency
  dm log userspace: fix comment hyphens
  dm: add thin provisioning target
  dm: add persistent data library
  dm: add bufio
  dm: export dm get md
  dm table: add immutable feature
  dm table: add always writeable feature
  dm table: add singleton feature
  dm kcopyd: add dm_kcopyd_zero to zero an area
  dm: remove superfluous smp_mb
  dm: use local printk ratelimit
  dm table: propagate non rotational flag
2011-11-02 17:02:37 -07:00
Linus Torvalds
6681ba7ec4 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits)
  MAINTAINERS: add an entry for Edac Sandy Bridge driver
  edac: tag sb_edac as EXPERIMENTAL, as it requires more testing
  EDAC: Fix incorrect edac mode reporting in sb_edac
  edac: sb_edac: Add it to the building system
  edac: Add an experimental new driver to support Sandy Bridge CPU's
  i7300_edac: Fix error cleanup logic
  i7core_edac: Initialize memory name with cpu, channel, bank
  i7core_edac: Fix compilation on 32 bits arch
  i7core_edac: scrubbing fixups
  EDAC: Correct Kconfig dependencies
  i7core_edac: return -ENODEV if no MC is found
  i7core_edac: use edac's own way to print errors
  MAINTAINERS: remove dropped edac_mce.* from the file
  i7core_edac: Drop the edac_mce facility
  x86, MCE: Use notifier chain only for MCE decoding
  EDAC i7core: Use mce socketid for better compatibility
  i7core_edac: Don't enable memory scrubbing for Xeon 35xx
  i7core_edac: Add scrubbing support
  edac: Move edac main structs to include/linux/edac.h
  i7core_edac: Fix oops when trying to inject errors
  ...
2011-11-02 16:55:15 -07:00
Linus Torvalds
092f4c56c1 Merge branch 'akpm' (Andrew's incoming - part two)
Says Andrew:

 "60 patches.  That's good enough for -rc1 I guess.  I have quite a lot
  of detritus to be rechecked, work through maintainers, etc.

 - most of the remains of MM
 - rtc
 - various misc
 - cgroups
 - memcg
 - cpusets
 - procfs
 - ipc
 - rapidio
 - sysctl
 - pps
 - w1
 - drivers/misc
 - aio"

* akpm: (60 commits)
  memcg: replace ss->id_lock with a rwlock
  aio: allocate kiocbs in batches
  drivers/misc/vmw_balloon.c: fix typo in code comment
  drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
  w1: disable irqs in critical section
  drivers/w1/w1_int.c: multiple masters used same init_name
  drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
  drivers/power/ds2780_battery.c: add a nolock function to w1 interface
  drivers/power/ds2780_battery.c: create central point for calling w1 interface
  w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
  pps gpio client: add missing dependency
  pps: new client driver using GPIO
  pps: default echo function
  include/linux/dma-mapping.h: add dma_zalloc_coherent()
  sysctl: make CONFIG_SYSCTL_SYSCALL default to n
  sysctl: add support for poll()
  RapidIO: documentation update
  drivers/net/rionet.c: fix ethernet address macros for LE platforms
  RapidIO: fix potential null deref in rio_setup_device()
  RapidIO: add mport driver for Tsi721 bridge
  ...
2011-11-02 16:07:27 -07:00
Andrew Bresticker
c1e2ee2dc4 memcg: replace ss->id_lock with a rwlock
While back-porting Johannes Weiner's patch "mm: memcg-aware global
reclaim" for an internal effort, we noticed a significant performance
regression during page-reclaim heavy workloads due to high contention of
the ss->id_lock.  This lock protects idr map, and serializes calls to
idr_get_next() in css_get_next() (which is used during the memcg hierarchy
walk).

Since idr_get_next() is just doing a look up, we need only serialize it
with respect to idr_remove()/idr_get_new().  By making the ss->id_lock a
rwlock, contention is greatly reduced and performance improves.

Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
each core (one file + container per core) in parallel on a NUMA machine.
Result is the time for the test to complete in 1 of the containers.
Both kernels included Johannes' memcg-aware global reclaim patches.

Before rwlock patch: 1710.778s
After rwlock patch: 152.227s

Signed-off-by: Andrew Bresticker <abrestic@google.com>
Cc: Paul Menage <menage@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:03 -07:00
Jeff Moyer
080d676de0 aio: allocate kiocbs in batches
In testing aio on a fast storage device, I found that the context lock
takes up a fair amount of cpu time in the I/O submission path.  The reason
is that we take it for every I/O submitted (see __aio_get_req).  Since we
know how many I/Os are passed to io_submit, we can preallocate the kiocbs
in batches, reducing the number of times we take and release the lock.

In my testing, I was able to reduce the amount of time spent in
_raw_spin_lock_irq by .56% (average of 3 runs).  The command I used to
test this was:

   aio-stress -O -o 2 -o 3 -r 8 -d 128 -b 32 -i 32 -s 16384 <dev>

I also tested the patch with various numbers of events passed to
io_submit, and I ran the xfstests aio group of tests to ensure I didn't
break anything.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Daniel Ehrenberg <dehrenberg@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:03 -07:00
James Nuss
161520451d pps: new client driver using GPIO
This client driver allows you to use a GPIO pin as a source for PPS
signals.  Platform data [1] are used to specify the GPIO pin number,
label, assert event edge type, and whether clear events are captured.

This driver is based on the work by Ricardo Martins who submitted an
initial implementation [2] of a PPS IRQ client driver to the linuxpps
mailing-list on Dec 3 2010.

[1] include/linux/pps-gpio.h
[2] http://ml.enneenne.com/pipermail/linuxpps/2010-December/004155.html

[akpm@linux-foundation.org: remove unneeded cast of void*]
Signed-off-by: James Nuss <jamesnuss@nanometrics.ca>
Cc: Ricardo Martins <rasm@fe.up.pt>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Ricardo Martins <rasm@fe.up.pt>
Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Cc: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:02 -07:00
Andrew Morton
842fa69f3e include/linux/dma-mapping.h: add dma_zalloc_coherent()
Lots of driver code does a dma_alloc_coherent() and then zeroes out the
memory with a memset.  Make it easy for them.

Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:02 -07:00
Lucas De Marchi
f1ecf06854 sysctl: add support for poll()
Adding support for poll() in sysctl fs allows userspace to receive
notifications of changes in sysctl entries.  This adds a infrastructure to
allow files in sysctl fs to be pollable and implements it for hostname and
domainname.

[akpm@linux-foundation.org: s/declare/define/ for definitions]
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Greg KH <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:02 -07:00
Alexandre Bounine
48618fb4e5 RapidIO: add mport driver for Tsi721 bridge
Add RapidIO mport driver for IDT TSI721 PCI Express-to-SRIO bridge device.
 The driver provides full set of callback functions defined for mport
devices in RapidIO subsystem.  It also is compatible with current version
of RIONET driver (Ethernet over RapidIO messaging services).

This patch is applicable to kernel versions starting from 2.6.39.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Chul Kim <chul.kim@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:01 -07:00
Manfred Spraul
f567a18590 include/linux/sem.h: make sysv_sem empty if SYSVIPC is disabled
For the sysvsem undo, each task struct contains a sysv_sem structure with
a pointer to the undo information.

This pointer is only necessary if sysvipc is enabled - thus the pointer
can be made conditional on CONFIG_SYSVIPC.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:01 -07:00
Manfred Spraul
e57940d719 ipc/sem.c: remove private structures from public header file
include/linux/sem.h contains several structures that are only used within
ipc/sem.c.

The patch moves them into ipc/sem.c - there is no need to expose the
structures to the whole kernel.

No functional changes, only whitespace cleanups and 80-char per line
fixes.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:01 -07:00
Johannes Weiner
9b272977e3 memcg: skip scanning active lists based on individual size
Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.

The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.

This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.

Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.

Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:00 -07:00
Raghavendra K T
c0ff4b8540 memcg: rename mem variable to memcg
The memcg code sometimes uses "struct mem_cgroup *mem" and sometimes uses
"struct mem_cgroup *memcg".  Rename all mem variables to memcg in source
file.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:59 -07:00
Sami Kerola
b6eb48d02d minix: describe usage of different magic numbers
One can get this information from minix/inode.c, but adding the
explanations at the definition sites is more appropriate.

Signed-off-by: Sami Kerola <kerolasa@iki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:59 -07:00
Andrea Arcangeli
b35a35b556 thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
70b50f94f1 mm: thp: tail page refcounting fix
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Linus Torvalds
80c2861672 Merge git://github.com/rustyrussell/linux
* git://github.com/rustyrussell/linux:
  virtio-blk: use ida to allocate disk index
  virtio: Add platform bus driver for memory mapped virtio device
  virtio: Dont add "config" to list for !per_vq_vector
  virtio: console: wait for first console port for early console output
  virtio: console: add port stats for bytes received, sent and discarded
  virtio: console: make discard_port_data() use get_inbuf()
  virtio: console: rename variable
  virtio: console: make get_inbuf() return port->inbuf if present
  virtio: console: Fix return type for get_inbuf()
  virtio: console: Use wait_event_freezable instead of _interruptible
  virtio: console: Ignore port name update request if name already set
  virtio: console: Fix indentation
  virtio: modify vring_init and vring_size to take account of the layout containing *_event_idx
  virtio.h: correct comment for struct virtio_driver
  virtio-net: Use virtio_config_val() for retrieving config
  virtio_config: Add virtio_config_val_len()
  virtio-console: Use virtio_config_val() for retrieving config
2011-11-02 15:00:56 -07:00
John W. Linville
c125d5e846 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth 2011-11-02 15:15:51 -04:00
Linus Torvalds
d211858837 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
  vfs: add d_prune dentry operation
  vfs: protect i_nlink
  filesystems: add set_nlink()
  filesystems: add missing nlink wrappers
  logfs: remove unnecessary nlink setting
  ocfs2: remove unnecessary nlink setting
  jfs: remove unnecessary nlink setting
  hypfs: remove unnecessary nlink setting
  vfs: ignore error on forced remount
  readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
  vfs: fix dentry leak in simple_fill_super()
2011-11-02 11:41:01 -07:00
Linus Torvalds
f1f8935a5c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
  jbd2: Unify log messages in jbd2 code
  jbd/jbd2: validate sb->s_first in journal_get_superblock()
  ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
  ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
  ext4: fix a typo in struct ext4_allocation_context
  ext4: Don't normalize an falloc request if it can fit in 1 extent.
  ext4: remove comments about extent mount option in ext4_new_inode()
  ext4: let ext4_discard_partial_buffers handle unaligned range correctly
  ext4: return ENOMEM if find_or_create_pages fails
  ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
  ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
  ext4: optimize locking for end_io extent conversion
  ext4: remove unnecessary call to waitqueue_active()
  ext4: Use correct locking for ext4_end_io_nolock()
  ext4: fix race in xattr block allocation path
  ext4: trace punch_hole correctly in ext4_ext_map_blocks
  ext4: clean up AGGRESSIVE_TEST code
  ext4: move variables to their scope
  ext4: fix quota accounting during migration
  ext4: migrate cleanup
  ...
2011-11-02 10:06:20 -07:00
Linus Torvalds
34116645d9 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Cleanup metadata flags handling
  udf: Skip mirror metadata FE loading when metadata FE is ok
  ext3: Allow quota file use root reservation
  udf: Remove web reference from UDF MAINTAINERS entry
  quota: Drop path reference on error exit from quotactl
  udf: Neaten udf_debug uses
  udf: Neaten logging output, use vsprintf extension %pV
  udf: Convert printks to pr_<level>
  udf: Rename udf_warning to udf_warn
  udf: Rename udf_error to udf_err
  udf: Promote some debugging messages to udf_error
  ext3: Remove the obsolete broken EXT3_IOC32_WAIT_FOR_READONLY.
  udf: Add readpages support for udf.
  ext3/balloc.c: local functions should be static
  ext2: fix the outdated comment in ext2_nfs_get_inode()
  ext3: remove deprecated oldalloc
  fs/ext3/balloc.c: delete useless initialization
  fs/ext2/balloc.c: delete useless initialization
  ext3: fix message in ext3_remount for rw-remount case
  ext3: Remove i_mutex from ext3_sync_file()

Fix up trivial (printf format cleanup) conflicts in fs/udf/udfdecl.h
2011-11-02 10:05:22 -07:00
Linus Walleij
b958f7a7cb mfd: Fix missing abx500 header file updates
I missed to include a patch adding the new silicon revision define
CUT3P3 and remove the retired CUT0 versions of AB8500. Also delete
the reference to the retired AB3550 from the header.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-11-02 17:24:06 +01:00
Nicholas Bellinger
3151d069e9 target: Remove core TRANSPORT_FREE_CMD_INTR usage
This patch drops TRANSPORT_FREE_CMD_INTR usage from target core, which
includes the removal of transport_generic_free_cmd_intr() symbol,
TRANSPORT_FREE_CMD_INTR usage in transport_processing_thread(), and
special case LUN_RESET handling to skip TRANSPORT_FREE_CMD_INTR processing
in core_tmr_drain_cmd_list().  We now expect that fabric modules will
use an internal workqueue to provide process context when releasing
se_cmd descriptor resources via transport_generic_free_cmd().

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Madhuranath Iyengar <mni@risingtidesystems.com>
Signed-off-by: Nicholas Bellinger <nab@risingtidesystems.com>
2011-11-02 15:58:46 +00:00
Nicholas Bellinger
88dd9e26d6 target: Make TFO->check_stop_free return free status
This patch converts target_core_fabric_ops->check_stop_free() usage in
transport_cmd_check_stop() and associated fabric module usage to
return '1' when the passed se_cmd has been released directly within
->check_stop_free(), or return '0' when the passed se_cmd has not
been released.

This addresses an issue where transport_cmd_finish_abort() ->
transport_cmd_check_stop_to_fabric() was leaking descriptors during
LUN_RESET for modules using ->check_stop_free(), but not directly
releasing se_cmd in all cases.

Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@risingtidesystems.com>
2011-11-02 15:58:30 +00:00
Sage Weil
f0023bc617 vfs: add d_prune dentry operation
This adds a d_prune dentry operation that is called by the VFS prior to
pruning (i.e. unhashing and killing) a hashed dentry from the dcache.
Wrap dentry_lru_del() and use the new _prune() helper in the cases where we
are about to unhash and kill the dentry.

This will be used by Ceph to maintain a flag indicating whether the
complete contents of a directory are contained in the dcache, allowing it
to satisfy lookups and readdir without addition server communication.

Renumber a few DCACHE_* #defines to group DCACHE_OP_PRUNE with the other
DCACHE_OP_ bits.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
Miklos Szeredi
a78ef704a8 vfs: protect i_nlink
Prevent direct modification of i_nlink by making it const and adding a
non-const __i_nlink alias.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
Miklos Szeredi
bfe8684869 filesystems: add set_nlink()
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
Andy Whitcroft
1fa1e7f615 readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
Since the commit below which added O_PATH support to the *at() calls, the
error return for readlink/readlinkat for the empty pathname has switched
from ENOENT to EINVAL:

  commit 65cfc67223
  Author: Al Viro <viro@zeniv.linux.org.uk>
  Date:   Sun Mar 13 15:56:26 2011 -0400

    readlinkat(), fchownat() and fstatat() with empty relative pathnames

This is both unexpected for userspace and makes readlink/readlinkat
inconsistant with all other interfaces; and inconsistant with our stated
return for these pathnames.

As the readlinkat call does not have a flags parameter we cannot use the
AT_EMPTY_PATH approach used in the other calls.  Therefore expose whether
the original path is infact entry via a new user_path_at_empty() path
lookup function.  Use this to determine whether to default to EINVAL or
ENOENT for failures.

Addresses http://bugs.launchpad.net/bugs/817187

[akpm@linux-foundation.org: remove unused getname_flags()]
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:42 +01:00
Thomas Hellstrom
cd2b89e7e8 vmwgfx: Reinstate the update_layout ioctl
We need to redefine a connector as "connected" if it matches a window
in the host preferred GUI layout.
Otherwise "smart" window managers would turn on Xorg outputs that we don't
want to be on.

This reinstates the update_layout and adds the following information to
the modesetting system.
a) Connection status <-> Equivalent to real hardware connection status
b) Preferred mode <-> Equivalent to real hardware reading EDID
c) Host window position <-> Equivalent to a real hardware scanout address
dynamic register.

It should be noted that there is no assumption here about what should be
displayed and where. Only how to access the host windows.

This also bumps minor to signal availability of the new IOCTL.

Based on code originally written by Jakob Bornecrantz

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-11-02 08:30:31 +00:00
Linus Torvalds
367069f16e Merge branch 'next/dt' of git://git.linaro.org/people/arnd/arm-soc
* 'next/dt' of git://git.linaro.org/people/arnd/arm-soc:
  ARM: gic: use module.h instead of export.h
  ARM: gic: fix irq_alloc_descs handling for sparse irq
  ARM: gic: add OF based initialization
  ARM: gic: add irq_domain support
  irq: support domains with non-zero hwirq base
  of/irq: introduce of_irq_init
  ARM: at91: add at91sam9g20 and Calao USB A9G20 DT support
  ARM: at91: dt: at91sam9g45 family and board device tree files
  arm/mx5: add device tree support for imx51 babbage
  arm/mx5: add device tree support for imx53 boards
  ARM: msm: Add devicetree support for msm8660-surf
  msm_serial: Add devicetree support
  msm_serial: Use relative resources for iomem

Fix up conflicts in arch/arm/mach-at91/{at91sam9260.c,at91sam9g45.c}
2011-11-01 21:02:35 -07:00
Linus Torvalds
81a3c10ce8 Merge branch 'next/cleanup2' of git://git.linaro.org/people/arnd/arm-soc
* 'next/cleanup2' of git://git.linaro.org/people/arnd/arm-soc: (31 commits)
  ARM: OMAP: Warn if omap_ioremap is called before SoC detection
  ARM: OMAP: Move set_globals initialization to happen in init_early
  ARM: OMAP: Map SRAM later on with ioremap_exec()
  ARM: OMAP: Remove calls to SRAM allocations for framebuffer
  ARM: OMAP: Avoid cpu_is_omapxxxx usage until map_io is done
  ARM: OMAP1: Use generic map_io, init_early and init_irq
  arm/dts: OMAP3+: Add mpu, dsp and iva nodes
  arm/dts: OMAP4: Add a main ocp entry bound to l3-noc driver
  ARM: OMAP2+: l3-noc: Add support for device-tree
  ARM: OMAP2+: board-generic: Add i2c static init
  ARM: OMAP2+: board-generic: Add DT support to generic board
  arm/dts: Add support for OMAP3 Beagle board
  arm/dts: Add initial device tree support for OMAP3 SoC
  arm/dts: Add support for OMAP4 SDP board
  arm/dts: Add support for OMAP4 PandaBoard
  arm/dts: Add initial device tree support for OMAP4 SoC
  ARM: OMAP: omap_device: Add a method to build an omap_device from a DT node
  ARM: OMAP: omap_device: Add omap_device_[alloc|delete] for DT integration
  of: Add helpers to get one string in multiple strings property
  ARM: OMAP2+: devices: Remove all omap_device_pm_latency structures
  ...

Fix up trivial header file conflicts in arch/arm/mach-omap2/board-generic.c
2011-11-01 20:58:25 -07:00
Linus Torvalds
cd9a0b6bd6 Merge branch 'next/pm' of git://git.linaro.org/people/arnd/arm-soc
* 'next/pm' of git://git.linaro.org/people/arnd/arm-soc: (66 commits)
  ARM: CSR: PM: use outer_resume to resume L2 cache
  ARM: CSR: call l2x0_of_init to init L2 cache of SiRFprimaII
  ARM: OMAP: voltage: voltage layer present, even when CONFIG_PM=n
  ARM: CSR: PM: add sleep entry for SiRFprimaII
  ARM: CSR: PM: save/restore irq status in suspend cycle
  ARM: CSR: PM: save/restore timer status in suspend cycle
  OMAP4: PM: TWL6030: add cmd register
  OMAP4: PM: TWL6030: fix ON/RET/OFF voltages
  OMAP4: PM: TWL6030: address 0V conversions
  OMAP4: PM: TWL6030: fix uv to voltage for >0x39
  OMAP4: PM: TWL6030: fix voltage conversion formula
  omap: voltage: add a stub header file for external/regulator use
  OMAP2+: VC: more registers are per-channel starting with OMAP5
  OMAP3+: voltage: update nominal voltage in voltdm_scale() not VC post-scale
  OMAP3+: voltage: rename omap_voltage_get_nom_volt -> voltdm_get_voltage
  OMAP3+: voltdm: final removal of omap_vdd_info
  OMAP3+: voltage: move/rename curr_volt from vdd_info into struct voltagedomain
  OMAP3+: voltage: rename scale and reset functions using voltdm_ prefix
  OMAP3+: VP: combine setting init voltage into common function
  OMAP3+: VP: remove unused omap_vp_get_curr_volt()
  ...

Fix up trivial conflict in arch/arm/mach-prima2/l2x0.c (code removal vs
edit)
2011-11-01 20:22:01 -07:00
Linus Torvalds
ac5761a650 Merge branch 'next/timer' of git://git.linaro.org/people/arnd/arm-soc
* 'next/timer' of git://git.linaro.org/people/arnd/arm-soc:
  clocksource: fixup ux500 build problems
  ARM: omap: use __devexit_p in dmtimer driver
  ARM: ux500: Reprogram timers upon resume
  ARM: plat-nomadik: timer: Export reset functions
  ARM: plat-nomadik: timer: Add support for periodic timers
  ARM: ux500: Move timer code to separate file
  ARM: ux500: add support for clocksource DBX500 PRCMU
  clocksource: add DBX500 PRCMU Timer support
  ARM: plat-nomadik: MTU sched_clock as an option
  ARM: OMAP: dmtimer: add error handling to export APIs
  ARM: OMAP: dmtimer: low-power mode support
  ARM: OMAP: dmtimer: skip reserved timers
  ARM: OMAP: dmtimer: pm_runtime support
  ARM: OMAP: dmtimer: switch-over to platform device driver
  ARM: OMAP: dmtimer: platform driver
  ARM: OMAP2+: dmtimer: convert to platform devices
  ARM: OMAP1: dmtimer: conversion to platform devices
  ARM: OMAP2+: dmtimer: add device names to flck nodes
  ARM: OMAP: Add support for dmtimer v2 ip
2011-11-01 20:18:05 -07:00
Pawel Moll
edfd52e636 virtio: Add platform bus driver for memory mapped virtio device
This patch, based on virtio PCI driver, adds support for memory
mapped (platform) virtio device. This should allow environments
like qemu to use virtio-based block & network devices even on
platforms without PCI support.

One can define and register a platform device which resources
will describe memory mapped control registers and "mailbox"
interrupt. Such device can be also instantiated using the Device
Tree node with compatible property equal "virtio,mmio".

Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Michael S.Tsirkin <mst@redhat.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:41:01 +10:30
Wang Sheng-Hui
00b894e874 virtio: modify vring_init and vring_size to take account of the layout containing *_event_idx
Based on the layout description in the comments, take account of
the *_event_idx in functions vring_init and vring_size.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:40:59 +10:30
Wang Sheng-Hui
5f41f8bfc9 virtio.h: correct comment for struct virtio_driver
The patch is against 3.0.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:40:59 +10:30
Sasha Levin
3ead6f4d42 virtio_config: Add virtio_config_val_len()
This patch adds virtio_config_val_len() which allows retrieving variable
length data from the virtio config space only if a specific feature is on.

Cc: Amit Shah <amit.shah@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:40:58 +10:30
Linus Torvalds
b4beb4bf99 Merge branch 'for-linus/i2c-3.2' of git://git.fluff.org/bjdooks/linux
* 'for-linus/i2c-3.2' of git://git.fluff.org/bjdooks/linux: (47 commits)
  i2c-s3c2410: Add device tree support
  i2c-s3c2410: Keep a copy of platform data and use it
  i2c-nomadik: cosmetic coding style corrections
  i2c-au1550: dev_pm_ops conversion
  i2c-au1550: increase timeout waiting for master done
  i2c-au1550: remove unused ack_timeout
  i2c-au1550: remove usage of volatile keyword
  i2c-tegra: __iomem annotation fix
  i2c-eg20t: Add initialize processing in case i2c-error occurs
  i2c-eg20t: Fix flag setting issue
  i2c-eg20t: add stop sequence in case wait-event timeout occurs
  i2c-eg20t: Separate error processing
  i2c-eg20t: Fix 10bit access issue
  i2c-eg20t: Modify returned value s32 to long
  i2c-eg20t: Fix bus-idle waiting issue
  i2c-designware: Fix PCI core warning on suspend/resume
  i2c-designware: Add runtime power management support
  i2c-designware: Add support for Designware core behind PCI devices.
  i2c-designware: Push all register reads/writes into the core code.
  i2c-designware: Support multiple cores using same ISR
  ...
2011-11-01 15:07:19 -07:00
Linus Torvalds
f3c3f06705 Merge branch 'for-linus' of git://opensource.wolfsonmicro.com/regulator
* 'for-linus' of git://opensource.wolfsonmicro.com/regulator: (22 commits)
  regulator: Constify constraints name
  regulator: Fix possible nullpointer dereference in regulator_enable()
  regulator: gpio-regulator add dependency on GENERIC_GPIO
  regulator: Add module.h include to gpio-regulator
  regulator: Add driver for gpio-controlled regulators
  regulator: remove duplicate REG_CTRL2 defines in tps65023
  regulator: Clarify documentation for regulator-regulator supplies
  regulator: Fix some bitrot in the machine driver documentation
  regulator: tps65023: Added support for the similiar TPS65020 chip
  regulator: tps65023: Setting correct core regulator for tps65021
  regulator: tps65023: Set missing bit for update core-voltage
  regulator: tps65023: Fixes i2c configuration issues
  regulator: Add debugfs file showing the supply map table
  regulator: tps6586x: add SMx slew rate setting
  regulator: tps65023: Fixes i2c configuration issues
  regulator: tps6507x: Remove num_voltages array
  regulator: max8952: removed unused mutex.
  regulator: fix regulator/consumer.h kernel-doc warning
  regulator: Ensure enough enable time for max8649
  regulator: 88pm8607: Fix off-by-one value range checking in the case of no id is matched
  ...
2011-11-01 15:06:20 -07:00
Arjan van de Ven
73cb88ecb9 net: make the tcp and udp file_operations for the /proc stuff const
the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.

the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01 17:56:14 -04:00
Linus Torvalds
1c39865151 Merge branch 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
* 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: make pstore write function return normal success/fail value
  pstore: change mutex locking to spin_locks
  pstore: defer inserting OOPS entries into pstore
2011-11-01 10:52:29 -07:00
Linus Torvalds
f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Roland Dreier
504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Linus Torvalds
dc47d3810c Merge git://github.com/herbertx/crypto
* git://github.com/herbertx/crypto: (48 commits)
  crypto: user - Depend on NET instead of selecting it
  crypto: user - Add dependency on NET
  crypto: talitos - handle descriptor not found in error path
  crypto: user - Initialise match in crypto_alg_match
  crypto: testmgr - add twofish tests
  crypto: testmgr - add blowfish test-vectors
  crypto: Make hifn_795x build depend on !ARCH_DMA_ADDR_T_64BIT
  crypto: twofish-x86_64-3way - fix ctr blocksize to 1
  crypto: blowfish-x86_64 - fix ctr blocksize to 1
  crypto: whirlpool - count rounds from 0
  crypto: Add userspace report for compress type algorithms
  crypto: Add userspace report for cipher type algorithms
  crypto: Add userspace report for rng type algorithms
  crypto: Add userspace report for pcompress type algorithms
  crypto: Add userspace report for nivaead type algorithms
  crypto: Add userspace report for aead type algorithms
  crypto: Add userspace report for givcipher type algorithms
  crypto: Add userspace report for ablkcipher type algorithms
  crypto: Add userspace report for blkcipher type algorithms
  crypto: Add userspace report for ahash type algorithms
  ...
2011-11-01 09:24:41 -07:00
Alex Deucher
00dfb8df5b drm/radeon/kms: properly set panel mode for eDP
This should make eDP more reliable.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-11-01 16:01:58 +00:00
Thomas Hellstrom
a7331e5cb2 drm: Introduce "Virtual" connectors and encoders
This will allow us to attach various properties specific to virtual
monitors in the future.

Note that we don't export an EDID property for "Virtual" connectors.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-11-01 16:01:42 +00:00
Linus Torvalds
c87d5d5947 Merge Qualcom Hexagon architecture
This is the fifth version of the patchset (with one tiny whitespace fix)
to the Linux kernel to support the Qualcomm Hexagon architecture.

Between now and the next pull requests, Richard Kuo should have his key
signed, etc., and should be back on kernel.org.  In the meantime, this
got merged as a emailed patch-series.

* Hexagon: (36 commits)
  Add extra arch overrides to asm-generic/checksum.h
  Hexagon: Add self to MAINTAINERS
  Hexagon: Add basic stacktrace functionality for Hexagon architecture.
  Hexagon: Add configuration and makefiles for the Hexagon architecture.
  Hexagon: Comet platform support
  Hexagon: kgdb support files
  Hexagon: Add page-fault support.
  Hexagon: Add page table header files & etc.
  Hexagon: Add ioremap support
  Hexagon: Provide DMA implementation
  Hexagon: Implement basic TLB management routines for Hexagon.
  Hexagon: Implement basic cache-flush support
  Hexagon: Provide basic implementation and/or stubs for I/O routines.
  Hexagon: Add user access functions
  Hexagon: Add locking types and functions
  Hexagon: Add SMP support
  Hexagon: Provide basic debugging and system trap support.
  Hexagon: Add ptrace support
  Hexagon: Add time and timer functions
  Hexagon: Add interrupts
  ...
2011-11-01 07:48:13 -07:00
Linas Vepstas
4e29198e1c Add extra arch overrides to asm-generic/checksum.h
There are plausible reasons for architectures to provide their own
versions of csum_partial_copy_nocheck and csum_tcpudp_magic.
By protecting these, the architecture can still re-use the
asm-generic checksum.h, instead of copying it.

Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-01 07:34:21 -07:00
Richard Kuo
dd472da380 Hexagon: Add locking types and functions
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-01 07:34:20 -07:00
Borislav Petkov
4140c54266 i7core_edac: Drop the edac_mce facility
Remove edac_mce pieces and use the normal MCE decoder notifier chain by
retaining the same functionality with considerably less code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:24 -02:00
Krzysztof Wilczynski
e23ebf0fa9 ipvs: Fix compilation error in ip_vs.h for ip_vs_confirm_conntrack function.
This is to address the following error during the compilation:

  In file included from kernel/sysctl_binary.c:6:
  include/net/ip_vs.h:1406: error: expected identifier or ‘(’ before ‘{’ token
  make[1]: *** [kernel/sysctl_binary.o] Error 1
  make[1]: *** Waiting for unfinished jobs....

That manifests itself when CONFIG_IP_VS_NFCT is undefined in .config file.

Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-11-01 09:20:01 +01:00
Pablo Neira Ayuso
8d83f63b19 netfilter: export NAT definitions through linux/netfilter_ipv4/nf_nat.h
This patch exports several definitions that used to live under
include/net/netfilter/nf_nat.h. These definitions, although not
exported, have been used by iptables and other userspace
applications like miniupnpd since long time. Basically, these
userspace tools included some internal definition of the required
structures and they assume no changes in the binary representation
(which is OK indeed).

To resolve this situation, this patch makes public the required
structure and install them in INSTALL_HDR_PATH.

See: https://bugs.gentoo.org/376873, for more information.

This patch is heavily based on the initial patch sent by:

Anthony G. Basile <blueness@gentoo.org>

Which was entitled:

netfilter: export sanitized nf_nat.h to INSTALL_HDR_PATH

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:52 +01:00
Simon Horman
4a516f1108 ipvs: Remove unused return value of protocol state transitions
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:33 +01:00
Simon Horman
3c2de2ae02 ipvs: Remove unused parameter from ip_vs_confirm_conntrack()
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:29 +01:00
Marcos Paulo de Souza
f83347df57 include: linux: skbuf.h: Fix parameter documentation
Fixes parameter name of skb_frag_dmamap function to silence warning on
make htmldocs.

Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01 00:55:48 -04:00
Linus Torvalds
094803e0aa Merge branch 'akpm' (Andrew's incoming)
Quoth Andrew:

 - Most of MM.  Still waiting for the poweroc guys to get off their
   butts and review some threaded hugepages patches.

 - alpha

 - vfs bits

 - drivers/misc

 - a few core kerenl tweaks

 - printk() features

 - MAINTAINERS updates

 - backlight merge

 - leds merge

 - various lib/ updates

 - checkpatch updates

* akpm: (127 commits)
  epoll: fix spurious lockdep warnings
  checkpatch: add a --strict check for utf-8 in commit logs
  kernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete
  llist-return-whether-list-is-empty-before-adding-in-llist_add-fix
  wireless: at76c50x: follow rename pack_hex_byte to hex_byte_pack
  fat: follow rename pack_hex_byte() to hex_byte_pack()
  security: follow rename pack_hex_byte() to hex_byte_pack()
  kgdb: follow rename pack_hex_byte() to hex_byte_pack()
  lib: rename pack_hex_byte() to hex_byte_pack()
  lib/string.c: fix strim() semantics for strings that have only blanks
  lib/idr.c: fix comment for ida_get_new_above()
  lib/percpu_counter.c: enclose hotplug only variables in hotplug ifdef
  lib/bitmap.c: quiet sparse noise about address space
  lib/spinlock_debug.c: print owner on spinlock lockup
  lib/kstrtox: common code between kstrto*() and simple_strto*() functions
  drivers/leds/leds-lp5521.c: check if reset is successful
  leds: turn the blink_timer off before starting to blink
  leds: save the delay values after a successful call to blink_set()
  drivers/leds/leds-gpio.c: use gpio_get_value_cansleep() when initializing
  drivers/leds/leds-lm3530.c: add __devexit_p where needed
  ...
2011-10-31 17:46:07 -07:00
Joe Perches
67d0a07544 kernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete
Mark obsolete/deprecated strict_strto<foo> and simple_strto<foo> functions
and macros as obsolete.

Update checkpatch to warn about their use.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:57 -07:00
Andrew Morton
fc23af34b0 llist-return-whether-list-is-empty-before-adding-in-llist_add-fix
clarify comment

Cc: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:57 -07:00
Andy Shevchenko
55036ba76b lib: rename pack_hex_byte() to hex_byte_pack()
As suggested by Andrew Morton in [1] there is better to have most
significant part first in the function name.

[1] https://lkml.org/lkml/2011/9/20/22

There is no functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mimi Zohar <zohar@us.ibm.com>
Cc: James Morris <jmorris@namei.org>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:56 -07:00
Magnus Damm
2b67c95b74 drivers/leds/leds-renesas-tpu.c: move Renesas TPU LED driver platform data
Use the platform_data include directory for the TPU LED driver, as
suggested by Paul Mundt.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:55 -07:00
Magnus Damm
f59b6f9f32 leds: Renesas TPU LED driver
Add V2 of the LED driver for a single timer channel for the TPU hardware
block commonly found in Renesas SoCs.

The driver has been written with optimal Power Management in mind, so to
save power the LED is driven as a regular GPIO pin in case of maximum
brightness and power off which allows the TPU hardware to be idle and
which in turn allows the clocks to be stopped and the power domain to be
turned off transparently.

Any other brightness level requires use of the TPU hardware in PWM mode.
TPU hardware device clocks and power are managed through Runtime PM.
System suspend and resume is known to be working - during suspend the LED
is set to off by the generic LED code.

The TPU hardware timer is equipeed with a 16-bit counter together with an
up-to-divide-by-64 prescaler which makes the hardware suitable for
brightness control.  Hardware blink is unsupported.

The LED PWM waveform has been verified with a Fluke 123 Scope meter on a
sh7372 Mackerel board.  Tested with experimental sh7372 A3SP power domain
patches.  Platform device bind/unbind tested ok.

V2 has been tested on the DS2 LED of the sh73a0-based AG5EVM.

[axel.lin@gmail.com: include linux/module.h]
Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Mark Brown
0556dc340e backlight: fix broken regulator API usage in l4f00242t03
The regulator support in the l4f00242t03 is very non-idiomatic.  Rather
than requesting the regulators based on the device name and the supply
names used by the device the driver requires boards to pass system
specific supply names around through platform data.  The driver also
conditionally requests the regulators based on this platform data, adding
unneeded conditional code to the driver.

Fix this by removing the platform data and converting to the standard
idiom, also updating all in tree users of the driver.  As no datasheet
appears to be available for the LCD I'm guessing the names for the
supplies based on the existing users and I've no ability to do anything
more than compile test.

The use of regulator_set_voltage() in the driver is also problematic,
since fixed voltages are required the expectation would be that the
voltages would be fixed in the constraints set by the machines rather than
manually configured by the driver, but is less problematic.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Joe Perches
b9075fa968 treewide: use __printf not __attribute__((format(printf,...)))
Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Mark Brown
ec400c9fab lis3lv02d: make regulator API usage unconditional
The regulator API contains a range of features for stubbing itself out
when not in use and for transparently restricting the actual effect of
regulator API calls where they can't be supported on a particular system
so that drivers don't need to individually implement this.  Simplify the
driver slightly by making use of this idiom.

The only in tree user is ecovec24 which does not use the regulator API.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:52 -07:00
Kyungmin Park
d43a87e68e mm: compaction: make compact_zone_order() static
There's no compact_zone_order() user outside file scope, so make it static.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:49 -07:00
Mikulas Patocka
09f363c736 vmscan: fix shrinker callback bug in fs/super.c
The callback must not return -1 when nr_to_scan is zero. Fix the bug in
fs/super.c and add this requirement to the callback specification.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:49 -07:00
Joe Perches
3ee9a4f086 mm: neaten warn_alloc_failed
Add __attribute__((format (printf...) to the function to validate format
and arguments.  Use vsprintf extension %pV to avoid any possible message
interleaving.  Coalesce format string.  Convert printks/pr_warning to
pr_warn.

[akpm@linux-foundation.org: use the __printf() macro]
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:48 -07:00
Sonic Zhang
06d5e032ad include/asm-generic/page.h: calculate virt_to_page and page_to_virt via predefined macro
On NOMMU architectures, if physical memory doesn't start from 0,
ARCH_PFN_OFFSET is defined to generate page index in mem_map array.
Because virtual address is equal to physical address, PAGE_OFFSET is
always 0.  virt_to_page and page_to_virt should not index page by
PAGE_OFFSET directly.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:48 -07:00
Andrea Arcangeli
37a1c49a91 thp: mremap support and TLB optimization
This adds THP support to mremap (decreases the number of split_huge_page()
calls).

Here are also some benchmarks with a proggy like this:

===
#define _GNU_SOURCE
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>

#define SIZE (5UL*1024*1024*1024)

int main()
{
        static struct timeval oldstamp, newstamp;
	long diffsec;
	char *p, *p2, *p3, *p4;
	if (posix_memalign((void **)&p, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p2, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p3, 2*1024*1024, 4096))
		perror("memalign"), exit(1);

	memset(p, 0xff, SIZE);
	memset(p2, 0xff, SIZE);
	memset(p3, 0x77, 4096);
	gettimeofday(&oldstamp, NULL);
	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
	gettimeofday(&newstamp, NULL);
	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
	printf("usec %ld\n", diffsec);
	if (p == MAP_FAILED || p4 != p3)
	//if (p == MAP_FAILED)
		perror("mremap"), exit(1);
	if (memcmp(p4, p2, SIZE))
		printf("mremap bug\n"), exit(1);
	printf("ok\n");

	return 0;
}
===

THP on

 Performance counter stats for './largepage13' (3 runs):

          69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
             60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
         676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
             29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
        1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
              8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)

        7.314454164  seconds time elapsed   ( +-   0.023% )

THP off

 Performance counter stats for './largepage13' (3 runs):

        1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
           9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
        2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
           3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
        6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
              8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)

        9.693655243  seconds time elapsed   ( +-   0.069% )

grep thp /proc/vmstat
thp_fault_alloc 35849
thp_fault_fallback 0
thp_collapse_alloc 3
thp_collapse_alloc_failed 0
thp_split 0

thp_split 0 confirms no thp split despite plenty of hugepages allocated.

The measurement of only the mremap time (so excluding the 3 long
memset and final long 10GB memory accessing memcmp):

THP on

usec 14824
usec 14862
usec 14859

THP off

usec 256416
usec 255981
usec 255847

With an older kernel without the mremap optimizations (the below patch
optimizes the non THP version too).

THP on

usec 392107
usec 390237
usec 404124

THP off

usec 444294
usec 445237
usec 445820

I guess with a threaded program that sends more IPI on large SMP it'd
create an even larger difference.

All debug options are off except DEBUG_VM to avoid skewing the
results.

The only problem for native 2M mremap like it happens above both the
source and destination address must be 2M aligned or the hugepmd can't be
moved without a split but that is an hardware limitation.

[akpm@linux-foundation.org: coding-style nitpicking]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:48 -07:00
Sam Ravnborg
0a93ebef69 memblock: add memblock_start_of_DRAM()
SPARC32 require access to the start address.  Add a new helper
memblock_start_of_DRAM() to give access to the address of the first
memblock - which contains the lowest address.

The awkward name was chosen to match the already present
memblock_end_of_DRAM().

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:47 -07:00
Mitsuo Hayasaka
f5252e009d mm: avoid null pointer access in vm_struct via /proc/vmallocinfo
The /proc/vmallocinfo shows information about vmalloc allocations in
vmlist that is a linklist of vm_struct.  It, however, may access pages
field of vm_struct where a page was not allocated.  This results in a null
pointer access and leads to a kernel panic.

Why this happens: In __vmalloc_node_range() called from vmalloc(), newly
allocated vm_struct is added to vmlist at __get_vm_area_node() and then,
some fields of vm_struct such as nr_pages and pages are set at
__vmalloc_area_node().  In other words, it is added to vmlist before it is
fully initialized.  At the same time, when the /proc/vmallocinfo is read,
it accesses the pages field of vm_struct according to the nr_pages field
at show_numa_info().  Thus, a null pointer access happens.

The patch adds the newly allocated vm_struct to the vmlist *after* it is
fully initialized.  So, it can avoid accessing the pages field with
unallocated page when show_numa_info() is called.

Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:47 -07:00
Akinobu Mita
798248206b lib/string.c: introduce memchr_inv()
memchr_inv() is mainly used to check whether the whole buffer is filled
with just a specified byte.

The function name and prototype are stolen from logfs and the
implementation is from SLUB.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: Joern Engel <joern@logfs.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:47 -07:00
Mel Gorman
49ea7eb65e mm: vmscan: immediately reclaim end-of-LRU dirty pages when writeback completes
When direct reclaim encounters a dirty page, it gets recycled around the
LRU for another cycle.  This patch marks the page PageReclaim similar to
deactivate_page() so that the page gets reclaimed almost immediately after
the page gets cleaned.  This is to avoid reclaiming clean pages that are
younger than a dirty page encountered at the end of the LRU that might
have been something like a use-once page.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:47 -07:00
Mel Gorman
ee72886d8e mm: vmscan: do not writeback filesystem pages in direct reclaim
Testing from the XFS folk revealed that there is still too much I/O from
the end of the LRU in kswapd.  Previously it was considered acceptable by
VM people for a small number of pages to be written back from reclaim with
testing generally showing about 0.3% of pages reclaimed were written back
(higher if memory was low).  That writing back a small number of pages is
ok has been heavily disputed for quite some time and Dave Chinner
explained it well;

	It doesn't have to be a very high number to be a problem. IO
	is orders of magnitude slower than the CPU time it takes to
	flush a page, so the cost of making a bad flush decision is
	very high. And single page writeback from the LRU is almost
	always a bad flush decision.

To complicate matters, filesystems respond very differently to requests
from reclaim according to Christoph Hellwig;

	xfs tries to write it back if the requester is kswapd
	ext4 ignores the request if it's a delayed allocation
	btrfs ignores the request

As a result, each filesystem has different performance characteristics
when under memory pressure and there are many pages being dirtied.  In
some cases, the request is ignored entirely so the VM cannot depend on the
IO being dispatched.

The objective of this series is to reduce writing of filesystem-backed
pages from reclaim, play nicely with writeback that is already in progress
and throttle reclaim appropriately when writeback pages are encountered.
The assumption is that the flushers will always write pages faster than if
reclaim issues the IO.

A secondary goal is to avoid the problem whereby direct reclaim splices
two potentially deep call stacks together.

There is a potential new problem as reclaim has less control over how long
before a page in a particularly zone or container is cleaned and direct
reclaimers depend on kswapd or flusher threads to do the necessary work.
However, as filesystems sometimes ignore direct reclaim requests already,
it is not expected to be a serious issue.

Patch 1 disables writeback of filesystem pages from direct reclaim
	entirely. Anonymous pages are still written.

Patch 2 removes dead code in lumpy reclaim as it is no longer able
	to synchronously write pages. This hurts lumpy reclaim but
	there is an expectation that compaction is used for hugepage
	allocations these days and lumpy reclaim's days are numbered.

Patches 3-4 add warnings to XFS and ext4 if called from
	direct reclaim. With patch 1, this "never happens" and is
	intended to catch regressions in this logic in the future.

Patch 5 disables writeback of filesystem pages from kswapd unless
	the priority is raised to the point where kswapd is considered
	to be in trouble.

Patch 6 throttles reclaimers if too many dirty pages are being
	encountered and the zones or backing devices are congested.

Patch 7 invalidates dirty pages found at the end of the LRU so they
	are reclaimed quickly after being written back rather than
	waiting for a reclaimer to find them

I consider this series to be orthogonal to the writeback work but it is
worth noting that the writeback work affects the viability of patch 8 in
particular.

I tested this on ext4 and xfs using fs_mark, a simple writeback test based
on dd and a micro benchmark that does a streaming write to a large mapping
(exercises use-once LRU logic) followed by streaming writes to a mix of
anonymous and file-backed mappings.  The command line for fs_mark when
botted with 512M looked something like

./fs_mark -d  /tmp/fsmark-2676  -D  100  -N  150  -n  150  -L  25  -t  1  -S0  -s  10485760

The number of files was adjusted depending on the amount of available
memory so that the files created was about 3xRAM.  For multiple threads,
the -d switch is specified multiple times.

The test machine is x86-64 with an older generation of AMD processor with
4 cores.  The underlying storage was 4 disks configured as RAID-0 as this
was the best configuration of storage I had available.  Swap is on a
separate disk.  Dirty ratio was tuned to 40% instead of the default of
20%.

Testing was run with and without monitors to both verify that the patches
were operating as expected and that any performance gain was real and not
due to interference from monitors.

Here is a summary of results based on testing XFS.

512M1P-xfs           Files/s  mean                 32.69 ( 0.00%)     34.44 ( 5.08%)
512M1P-xfs           Elapsed Time fsmark                    51.41     48.29
512M1P-xfs           Elapsed Time simple-wb                114.09    108.61
512M1P-xfs           Elapsed Time mmap-strm                113.46    109.34
512M1P-xfs           Kswapd efficiency fsmark                 62%       63%
512M1P-xfs           Kswapd efficiency simple-wb              56%       61%
512M1P-xfs           Kswapd efficiency mmap-strm              44%       42%
512M-xfs             Files/s  mean                 30.78 ( 0.00%)     35.94 (14.36%)
512M-xfs             Elapsed Time fsmark                    56.08     48.90
512M-xfs             Elapsed Time simple-wb                112.22     98.13
512M-xfs             Elapsed Time mmap-strm                219.15    196.67
512M-xfs             Kswapd efficiency fsmark                 54%       56%
512M-xfs             Kswapd efficiency simple-wb              54%       55%
512M-xfs             Kswapd efficiency mmap-strm              45%       44%
512M-4X-xfs          Files/s  mean                 30.31 ( 0.00%)     33.33 ( 9.06%)
512M-4X-xfs          Elapsed Time fsmark                    63.26     55.88
512M-4X-xfs          Elapsed Time simple-wb                100.90     90.25
512M-4X-xfs          Elapsed Time mmap-strm                261.73    255.38
512M-4X-xfs          Kswapd efficiency fsmark                 49%       50%
512M-4X-xfs          Kswapd efficiency simple-wb              54%       56%
512M-4X-xfs          Kswapd efficiency mmap-strm              37%       36%
512M-16X-xfs         Files/s  mean                 60.89 ( 0.00%)     65.22 ( 6.64%)
512M-16X-xfs         Elapsed Time fsmark                    67.47     58.25
512M-16X-xfs         Elapsed Time simple-wb                103.22     90.89
512M-16X-xfs         Elapsed Time mmap-strm                237.09    198.82
512M-16X-xfs         Kswapd efficiency fsmark                 45%       46%
512M-16X-xfs         Kswapd efficiency simple-wb              53%       55%
512M-16X-xfs         Kswapd efficiency mmap-strm              33%       33%

Up until 512-4X, the FSmark improvements were statistically significant.
For the 4X and 16X tests the results were within standard deviations but
just barely.  The time to completion for all tests is improved which is an
important result.  In general, kswapd efficiency is not affected by
skipping dirty pages.

1024M1P-xfs          Files/s  mean                 39.09 ( 0.00%)     41.15 ( 5.01%)
1024M1P-xfs          Elapsed Time fsmark                    84.14     80.41
1024M1P-xfs          Elapsed Time simple-wb                210.77    184.78
1024M1P-xfs          Elapsed Time mmap-strm                162.00    160.34
1024M1P-xfs          Kswapd efficiency fsmark                 69%       75%
1024M1P-xfs          Kswapd efficiency simple-wb              71%       77%
1024M1P-xfs          Kswapd efficiency mmap-strm              43%       44%
1024M-xfs            Files/s  mean                 35.45 ( 0.00%)     37.00 ( 4.19%)
1024M-xfs            Elapsed Time fsmark                    94.59     91.00
1024M-xfs            Elapsed Time simple-wb                229.84    195.08
1024M-xfs            Elapsed Time mmap-strm                405.38    440.29
1024M-xfs            Kswapd efficiency fsmark                 79%       71%
1024M-xfs            Kswapd efficiency simple-wb              74%       74%
1024M-xfs            Kswapd efficiency mmap-strm              39%       42%
1024M-4X-xfs         Files/s  mean                 32.63 ( 0.00%)     35.05 ( 6.90%)
1024M-4X-xfs         Elapsed Time fsmark                   103.33     97.74
1024M-4X-xfs         Elapsed Time simple-wb                204.48    178.57
1024M-4X-xfs         Elapsed Time mmap-strm                528.38    511.88
1024M-4X-xfs         Kswapd efficiency fsmark                 81%       70%
1024M-4X-xfs         Kswapd efficiency simple-wb              73%       72%
1024M-4X-xfs         Kswapd efficiency mmap-strm              39%       38%
1024M-16X-xfs        Files/s  mean                 42.65 ( 0.00%)     42.97 ( 0.74%)
1024M-16X-xfs        Elapsed Time fsmark                   103.11     99.11
1024M-16X-xfs        Elapsed Time simple-wb                200.83    178.24
1024M-16X-xfs        Elapsed Time mmap-strm                397.35    459.82
1024M-16X-xfs        Kswapd efficiency fsmark                 84%       69%
1024M-16X-xfs        Kswapd efficiency simple-wb              74%       73%
1024M-16X-xfs        Kswapd efficiency mmap-strm              39%       40%

All FSMark tests up to 16X had statistically significant improvements.
For the most part, tests are completing faster with the exception of the
streaming writes to a mixture of anonymous and file-backed mappings which
were slower in two cases

In the cases where the mmap-strm tests were slower, there was more
swapping due to dirty pages being skipped.  The number of additional pages
swapped is almost identical to the fewer number of pages written from
reclaim.  In other words, roughly the same number of pages were reclaimed
but swapping was slower.  As the test is a bit unrealistic and stresses
memory heavily, the small shift is acceptable.

4608M1P-xfs          Files/s  mean                 29.75 ( 0.00%)     30.96 ( 3.91%)
4608M1P-xfs          Elapsed Time fsmark                   512.01    492.15
4608M1P-xfs          Elapsed Time simple-wb                618.18    566.24
4608M1P-xfs          Elapsed Time mmap-strm                488.05    465.07
4608M1P-xfs          Kswapd efficiency fsmark                 93%       86%
4608M1P-xfs          Kswapd efficiency simple-wb              88%       84%
4608M1P-xfs          Kswapd efficiency mmap-strm              46%       45%
4608M-xfs            Files/s  mean                 27.60 ( 0.00%)     28.85 ( 4.33%)
4608M-xfs            Elapsed Time fsmark                   555.96    532.34
4608M-xfs            Elapsed Time simple-wb                659.72    571.85
4608M-xfs            Elapsed Time mmap-strm               1082.57   1146.38
4608M-xfs            Kswapd efficiency fsmark                 89%       91%
4608M-xfs            Kswapd efficiency simple-wb              88%       82%
4608M-xfs            Kswapd efficiency mmap-strm              48%       46%
4608M-4X-xfs         Files/s  mean                 26.00 ( 0.00%)     27.47 ( 5.35%)
4608M-4X-xfs         Elapsed Time fsmark                   592.91    564.00
4608M-4X-xfs         Elapsed Time simple-wb                616.65    575.07
4608M-4X-xfs         Elapsed Time mmap-strm               1773.02   1631.53
4608M-4X-xfs         Kswapd efficiency fsmark                 90%       94%
4608M-4X-xfs         Kswapd efficiency simple-wb              87%       82%
4608M-4X-xfs         Kswapd efficiency mmap-strm              43%       43%
4608M-16X-xfs        Files/s  mean                 26.07 ( 0.00%)     26.42 ( 1.32%)
4608M-16X-xfs        Elapsed Time fsmark                   602.69    585.78
4608M-16X-xfs        Elapsed Time simple-wb                606.60    573.81
4608M-16X-xfs        Elapsed Time mmap-strm               1549.75   1441.86
4608M-16X-xfs        Kswapd efficiency fsmark                 98%       98%
4608M-16X-xfs        Kswapd efficiency simple-wb              88%       82%
4608M-16X-xfs        Kswapd efficiency mmap-strm              44%       42%

Unlike the other tests, the fsmark results are not statistically
significant but the min and max times are both improved and for the most
part, tests completed faster.

There are other indications that this is an improvement as well.  For
example, in the vast majority of cases, there were fewer pages scanned by
direct reclaim implying in many cases that stalls due to direct reclaim
are reduced.  KSwapd is scanning more due to skipping dirty pages which is
unfortunate but the CPU usage is still acceptable

In an earlier set of tests, I used blktrace and in almost all cases
throughput throughout the entire test was higher.  However, I ended up
discarding those results as recording blktrace data was too heavy for my
liking.

On a laptop, I plugged in a USB stick and ran a similar tests of tests
using it as backing storage.  A desktop environment was running and for
the entire duration of the tests, firefox and gnome terminal were
launching and exiting to vaguely simulate a user.

1024M-xfs            Files/s  mean               0.41 ( 0.00%)        0.44 ( 6.82%)
1024M-xfs            Elapsed Time fsmark               2053.52   1641.03
1024M-xfs            Elapsed Time simple-wb            1229.53    768.05
1024M-xfs            Elapsed Time mmap-strm            4126.44   4597.03
1024M-xfs            Kswapd efficiency fsmark              84%       85%
1024M-xfs            Kswapd efficiency simple-wb           92%       81%
1024M-xfs            Kswapd efficiency mmap-strm           60%       51%
1024M-xfs            Avg wait ms fsmark                5404.53     4473.87
1024M-xfs            Avg wait ms simple-wb             2541.35     1453.54
1024M-xfs            Avg wait ms mmap-strm             3400.25     3852.53

The mmap-strm results were hurt because firefox launching had a tendency
to push the test out of memory.  On the postive side, firefox launched
marginally faster with the patches applied.  Time to completion for many
tests was faster but more importantly - the "Avg wait" time as measured by
iostat was far lower implying the system would be more responsive.  It was
also the case that "Avg wait ms" on the root filesystem was lower.  I
tested it manually and while the system felt slightly more responsive
while copying data to a USB stick, it was marginal enough that it could be
my imagination.

This patch: do not writeback filesystem pages in direct reclaim.

When kswapd is failing to keep zones above the min watermark, a process
will enter direct reclaim in the same manner kswapd does.  If a dirty page
is encountered during the scan, this page is written to backing storage
using mapping->writepage.

This causes two problems.  First, it can result in very deep call stacks,
particularly if the target storage or filesystem are complex.  Some
filesystems ignore write requests from direct reclaim as a result.  The
second is that a single-page flush is inefficient in terms of IO.  While
there is an expectation that the elevator will merge requests, this does
not always happen.  Quoting Christoph Hellwig;

	The elevator has a relatively small window it can operate on,
	and can never fix up a bad large scale writeback pattern.

This patch prevents direct reclaim writing back filesystem pages by
checking if current is kswapd.  Anonymous pages are still written to swap
as there is not the equivalent of a flusher thread for anonymous pages.
If the dirty pages cannot be written back, they are placed back on the LRU
lists.  There is now a direct dependency on dirty page balancing to
prevent too many pages in the system being dirtied which would prevent
reclaim making forward progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:46 -07:00
Christoph Lameter
e10d59f2c3 mm: add comments to explain mm_struct fields
Add comments to explain the page statistics field in the mm_struct.

[akpm@linux-foundation.org: add missing ;]
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:46 -07:00
Christoph Lameter
bc3e53f682 mm: distinguish between mlocked and pinned pages
Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".

The difference between mlocking and pinning is:

A. mlocked pages are marked with PG_mlocked and are exempt from
   swapping. Page migration may move them around though.
   They are kept on a special LRU list.

B. Pinned pages cannot be moved because something needs to
   directly access physical memory. They may not be on any
   LRU list.

I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:

Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.

This patch introduces a separate counter for pinned pages and
accounts them seperately.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Mike Marciniszyn <infinipath@qlogic.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:46 -07:00
David Rientjes
43362a4977 oom: fix race while temporarily setting current's oom_score_adj
test_set_oom_score_adj() was introduced in 72788c3856 ("oom: replace
PF_OOM_ORIGIN with toggling oom_score_adj") to temporarily elevate
current's oom_score_adj for ksm and swapoff without requiring an
additional per-process flag.

Using that function to both set oom_score_adj to OOM_SCORE_ADJ_MAX and
then reinstate the previous value is racy since it's possible that
userspace can set the value to something else itself before the old value
is reinstated.  That results in userspace setting current's oom_score_adj
to a different value and then the kernel immediately setting it back to
its previous value without notification.

To fix this, a new compare_swap_oom_score_adj() function is introduced
with the same semantics as the compare and swap CAS instruction, or
CMPXCHG on x86.  It is used to reinstate the previous value of
oom_score_adj if and only if the present value is the same as the old
value.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:45 -07:00
David Rientjes
c9f01245b6 oom: remove oom_disable_count
This removes mm->oom_disable_count entirely since it's unnecessary and
currently buggy.  The counter was intended to be per-process but it's
currently decremented in the exit path for each thread that exits, causing
it to underflow.

The count was originally intended to prevent oom killing threads that
share memory with threads that cannot be killed since it doesn't lead to
future memory freeing.  The counter could be fixed to represent all
threads sharing the same mm, but it's better to remove the count since:

 - it is possible that the OOM_DISABLE thread sharing memory with the
   victim is waiting on that thread to exit and will actually cause
   future memory freeing, and

 - there is no guarantee that a thread is disabled from oom killing just
   because another thread sharing its mm is oom disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:45 -07:00
Minchan Kim
f80c067361 mm: zone_reclaim: make isolate_lru_page() filter-aware
In __zone_reclaim case, we don't want to shrink mapped page.  Nonetheless,
we have isolated mapped page and re-add it into LRU's head.  It's
unnecessary CPU overhead and makes LRU churning.

Of course, when we isolate the page, the page might be mapped but when we
try to migrate the page, the page would be not mapped.  So it could be
migrated.  But race is rare and although it happens, it's no big deal.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Minchan Kim
39deaf8585 mm: compaction: make isolate_lru_page() filter-aware
In async mode, compaction doesn't migrate dirty or writeback pages.  So,
it's meaningless to pick the page and re-add it to lru list.

Of course, when we isolate the page in compaction, the page might be dirty
or writeback but when we try to migrate the page, the page would be not
dirty, writeback.  So it could be migrated.  But it's very unlikely as
isolate and migration cycle is much faster than writeout.

So, this patch helps cpu overhead and prevent unnecessary LRU churning.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Minchan Kim
4356f21d09 mm: change isolate mode from #define to bitwise type
Change ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.

Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags.  INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."

This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Christopher Yeoh
fcf634098c Cross Memory Attach
The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Andrew Morton
6eea69dd8b include/linux/dmar.h: forward-declare struct acpi_dmar_header
x86_64 allnoconfig:

In file included from arch/x86/kernel/pci-dma.c:3:
include/linux/dmar.h:248: warning: 'struct acpi_dmar_header' declared inside parameter list
include/linux/dmar.h:248: warning: its scope is only this definition or declaration, which is probably not what you want

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Clemens Ladisch
07a723097c dma-mapping: fix sync_single_range_* DMA debugging
Commit 5fd75a7850 (dma-mapping: remove unnecessary sync_single_range_*
in dma_map_ops) unified not only the dma_map_ops but also the
corresponding debug_dma_sync_* calls.  This led to spurious WARN()ings
like the following because the DMA debug code was no longer able to detect
the DMA buffer base address without the separate offset parameter:

WARNING: at lib/dma-debug.c:911 check_sync+0xce/0x446()
firewire_ohci 0000:04:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x00000000cedaa400] [size=1024 bytes]
Call Trace: ...
 [<ffffffff811326a5>] check_sync+0xce/0x446
 [<ffffffff81132ad9>] debug_dma_sync_single_for_device+0x39/0x3b
 [<ffffffffa01d6e6a>] ohci_queue_iso+0x4f3/0x77d [firewire_ohci]
 ...

To fix this, unshare the sync_single_* and sync_single_range_*
implementations so that we are able to call the correct debug_dma_sync_*
functions.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:44 -07:00
Paul Gortmaker
67b84999b1 Revert "tracing: Include module.h in define_trace.h"
This reverts commit 3a9f987b31.

With all the files that are real modules now having module.h
explicitly called out for inclusion, and no reliance on any
implicit presence of module.h assumed, we should no longer
need this workaround.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:35 -04:00
Paul Gortmaker
ec53cf23c0 irq: don't put module.h into irq.h for tracking irqgen modules.
Recent commit "irq: Track the  owner of irq descriptor" in
commit ID b6873807a7 placed module.h into linux/irq.h
but we are trying to limit module.h inclusion to just C files
that really need it, due to its size and number of children
includes.  This targets just reversing that include.

Add in the basic "struct module" since that is all we really need
to ensure things compile.  In theory, b687380 should have added the
module.h include to the irqdesc.h header as well, but the implicit
module.h everywhere presence masked this from showing up.  So give
it the "struct module" as well.

As for the C files, irqdesc.c is only using THIS_MODULE, so it
does not need module.h - give it export.h instead.  The C file
irq/manage.c is now (as of b687380) using try_module_get and
module_put and so it needs module.h (which it already has).

Also convert the irq_alloc_descs variants to macros, since all
they really do is is call the __irq_alloc_descs primitive.
This avoids including export.h and no debug info is lost.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:35 -04:00
Paul Gortmaker
1d58996da6 bluetooth: macroize two small inlines to avoid module.h
These two small inlines make calls to try_module_get() and
module_put() which would force us to keep module.h present
within yet another common include header.  We can avoid this
by turning them into macros.  The hci_dev_hold construct
is patterned off of raw_spin_trylock_irqsave() in spinlock.h

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:34 -04:00
Paul Gortmaker
69e7dae409 ip_vs.h: fix implicit use of module_get/module_put from module.h
This file was using the module get/put functions in two simple inline
functions.  But module_get/put were only within scope because of
the implicit presence of module.h being everywhere.

Rather than add module.h to another file in include/  -- which is
exactly the thing we are trying to avoid, simply convert these
one-line functions into a define, as per what was done for the
device_schedule_callback() in commit 523ded71de.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:34 -04:00
Paul Gortmaker
34641c6d00 nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
The implicit presence of module.h everywhere meant that this header
also was getting moduleparam.h which defines struct kernel_param.

Since it only needs to know that kernel_param is a struct, call that
out instead of adding an include of moduleparam.h -- to get rid of this:

include/net/netfilter/nf_conntrack.h:316: warning: 'struct kernel_param' declared inside parameter list
include/net/netfilter/nf_conntrack.h:316: warning: its scope is only this definition or declaration, which is probably not what you want

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:33 -04:00
Paul Gortmaker
de47725421 include: replace linux/module.h with "struct module" wherever possible
The <linux/module.h> pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used <linux/something.h>
files, as it results in a measureable increase on compile times.

The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.

There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.

Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:32 -04:00
Paul Gortmaker
eb5589a8f0 include: convert various register fcns to macros to avoid include chaining
The original implementations reference THIS_MODULE in an inline.
We could include <linux/export.h>, but it is better to avoid chaining.

Fortunately someone else already thought of this, and made a similar
inline into a #define in <linux/device.h> for device_schedule_callback(),
[see commit 523ded71de] so follow that precedent here.

Also bubble up any __must_check that were used on the prev. wrapper inline
functions up one to the real __register functions, to preserve any prev.
sanity checks that were used in those instances.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:32 -04:00
Paul Gortmaker
7c926402a7 crypto.h: remove unused crypto_tfm_alg_modname() inline
The <linux/crypto.h> (which is in turn in common headers
like tcp.h) wants to use module_name() in an inline fcn.
But having all of <linux/module.h> along for the ride is
overkill and slows down compiles by a measureable amount,
since it in turn includes lots of headers.

Since the inline is never used anywhere in the kernel[1],
we can just remove it, and then also remove the module.h
include as well.

In all the many crypto modules, there were some relying on
crypto.h including module.h -- for them we now explicitly
call out module.h for inclusion.

[1] git grep shows some staging drivers also define the same
static inline, but they also never ever use it.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:31 -04:00
Paul Gortmaker
e2ffa376f6 uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
Once we clean up the implicit presence of module.h (and all its
sub-includes), we'll see an implicit dependency on page.h for
the PAGE_SIZE define.  So fix it in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:31 -04:00
Paul Gortmaker
246359d379 pm_runtime.h: explicitly requires notifier.h
This file was getting notifier.h via device.h --> module.h but
the module.h inclusion is going away, so add notifier.h directly.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:30 -04:00
Paul Gortmaker
a8efa9d6bf linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
The implicit presence of module.h and all its sub-includes was
masking these implicit header usages:

include/linux/dmaengine.h:684: warning: 'struct page' declared inside parameter list
include/linux/dmaengine.h:684: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/dmaengine.h:687: warning: 'struct page' declared inside parameter list
include/linux/dmaengine.h:736:2: error: implicit declaration of function 'bitmap_zero'

With input from Stephen Rothwell <sfr@canb.auug.org.au>

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:30 -04:00
Paul Gortmaker
1986c93f09 miscdevice.h: fix up implicit use of lists and types
By removing the implicit presence of module.h from this file, we
will see things like:

In file included from fs/dlm/user.c:9:
include/linux/miscdevice.h:50: error: field ‘list’ has incomplete type
include/linux/miscdevice.h:54: error: expected specifier-qualifier-list before ‘mode_t’

Call out lists.h and types.h for inclusion to fix each of the
above respectively.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:29 -04:00
Paul Gortmaker
bb2eac66ee stop_machine.h: fix implicit use of smp.h for smp_processor_id
This will show up on MIPS when we fix all the implicit header presences
that are because of module.h being everywhere.

In file included from kernel/trace/ftrace.c:16:
include/linux/stop_machine.h: In function 'stop_one_cpu':
include/linux/stop_machine.h:50: error: implicit declaration of function 'smp_processor_id'
include/linux/stop_machine.h: In function 'stop_cpus':
include/linux/stop_machine.h:80: error: implicit declaration of function 'raw_smp_processor_id'

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:28 -04:00
Paul Gortmaker
d0a9940289 of: fix implicit use of errno.h in include/linux/of.h
It shows up as a build failure on MIPS, as it is used in
three of_property function stubs.

include/linux/of.h:275: error: 'ENOSYS' undeclared (first use in this function)
include/linux/of.h:282: error: 'ENOSYS' undeclared (first use in this function)
include/linux/of.h:295: error: 'ENOSYS' undeclared (first use in this function)

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:28 -04:00
Paul Gortmaker
7755c47123 of_platform.h: delete needless include <linux/module.h>
There is nothing modular in this file, and no reason to drag
in all the 357 headers that module.h brings with it, since
it just slows down compiles.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:27 -04:00
Paul Gortmaker
feede37ec3 acpi: remove module.h include from platform/aclinux.h
This file had an include of module.h which was probably added
in relation to this line:

  #define ACPI_EXPORT_SYMBOL(symbol)  EXPORT_SYMBOL(symbol);

However, we really expect symbol exporters to grab export.h
themselves, and since this is only a define, we can remove
the module.h include without aclinux.h itself causing any
compile issues.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:27 -04:00
Paul Gortmaker
ddac6021fc miscdevice.h: delete unnecessary inclusion of module.h
This file has a define MODULE_ALIAS_MISCDEV which in turn will
use the MODULE_ALIAS define, but only if the former is explicitly
used by modular device driver code (and such code should be
already including module.h).

Delete the include, since module.h is such a giant thing that we
don't want it implicitly sneaking into compiles where it isn't
specifically required.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:26 -04:00
Paul Gortmaker
8a24454869 device_cgroup.h: delete needless include <linux/module.h>
There is nothing modular in this file, and no reason to drag
in all the extra headers that module.h brings with it, since
it just slows down compiles.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:26 -04:00