When the file LRU lists are dominated by streaming IO pages, evict those
pages first, before considering evicting other pages.
This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted
The pages freed in this way can either be reused for streaming IO, or
allocated for something else. If the pages are used for streaming IO,
this pageout pattern continues. Otherwise, we will fall back to the
normal pageout pattern.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Elladan <elladan@eskimo.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
num_online_nodes() is called in a number of places but most often by the
page allocator when deciding whether the zonelist needs to be filtered
based on cpusets or the zonelist cache. This is actually a heavy function
and touches a number of cache lines.
This patch stores the number of online nodes at boot time and updates the
value when nodes get onlined and offlined. The value is then used in a
number of important paths in place of num_online_nodes().
[rientjes@google.com: do not override definition of node_set_online() with macro]
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ALLOC_WMARK_MIN, ALLOC_WMARK_LOW and ALLOC_WMARK_HIGH determin whether
pages_min, pages_low or pages_high is used as the zone watermark when
allocating the pages. Two branches in the allocator hotpath determine
which watermark to use.
This patch uses the flags as an array index into a watermark array that is
indexed with WMARK_* defines accessed via helpers. All call sites that
use zone->pages_* are updated to use the helpers for accessing the values
and the array offsets for setting.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On low-memory systems, anti-fragmentation gets disabled as there is
nothing it can do and it would just incur overhead shuffling pages between
lists constantly. Currently the check is made in the free page fast path
for every page. This patch moves it to a slow path. On machines with low
memory, there will be small amount of additional overhead as pages get
shuffled between lists but it should quickly settle.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Callers of alloc_pages_node() can optionally specify -1 as a node to mean
"allocate from the current node". However, a number of the callers in
fast paths know for a fact their node is valid. To avoid a comparison and
branch, this patch adds alloc_pages_exact_node() that only checks the nid
with VM_BUG_ON(). Callers that know their node is valid are then
converted.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Paul Mundt <lethal@linux-sh.org> [for the SLOB NUMA bits]
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No user of the allocator API should be passing in an order >= MAX_ORDER
but we check for it on each and every allocation. Delete this check and
make it a VM_BUG_ON check further down the call path.
[akpm@linux-foundation.org: s/VM_BUG_ON/WARN_ON_ONCE/]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The start of a large patch series to clean up and optimise the page
allocator.
The performance improvements are in a wide range depending on the exact
machine but the results I've seen so fair are approximately;
kernbench: 0 to 0.12% (elapsed time)
0.49% to 3.20% (sys time)
aim9: -4% to 30% (for page_test and brk_test)
tbench: -1% to 4%
hackbench: -2.5% to 3.45% (mostly within the noise though)
netperf-udp -1.34% to 4.06% (varies between machines a bit)
netperf-tcp -0.44% to 5.22% (varies between machines a bit)
I haven't sysbench figures at hand, but previously they were within the
-0.5% to 2% range.
On netperf, the client and server were bound to opposite number CPUs to
maximise the problems with cache line bouncing of the struct pages so I
expect different people to report different results for netperf depending
on their exact machine and how they ran the test (different machines, same
cpus client/server, shared cache but two threads client/server, different
socket client/server etc).
I also measured the vmlinux sizes for a single x86-based config with
CONFIG_DEBUG_INFO enabled but not CONFIG_DEBUG_VM. The core of the
.config is based on the Debian Lenny kernel config so I expect it to be
reasonably typical.
This patch:
__alloc_pages_internal is the core page allocator function but essentially
it is an alias of __alloc_pages_nodemask. Naming a publicly available and
exported function "internal" is also a big ugly. This patch renames
__alloc_pages_internal() to __alloc_pages_nodemask() and deletes the old
nodemask function.
Warning - This patch renames an exported symbol. No kernel driver is
affected by external drivers calling __alloc_pages_internal() should
change the call to __alloc_pages_nodemask() without any alteration of
parameters.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.
In order to update tasks' mems_allowed in time, we must modify the code of
memory policy. Because the memory policy is applied in the process's
context originally. After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.
But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression. But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set. In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.
[lee.schermerhorn@hp.com:
The rework of mpol_new() to extract the adjusting of the node mask to
apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
allocation. Fix this by adding the check for MPOL_PREFERRED and empty
node mask to mpol_new_mpolicy().
Remove the now unneeded 'nodes = NULL' from mpol_new().
Note that mpol_new_mempolicy() is always called with a non-NULL
'nodes' parameter now that it has been removed from mpol_new().
Therefore, we don't need to test nodes for NULL before testing it for
'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to
verify this assumption.]
[lee.schermerhorn@hp.com:
I don't think the function name 'mpol_new_mempolicy' is descriptive
enough to differentiate it from mpol_new().
This function applies cpuset set context, usually constraining nodes
to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag
is set, it also translates the nodes. So I settled on
'mpol_set_nodemask()', because the comment block for mpol_new() mentions
that we need to call this function to "set nodes".
Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move more documentation for get_user_pages_fast into the new kerneldoc comment.
Add some comments for get_user_pages as well.
Also, move get_user_pages_fast declaration up to get_user_pages. It wasn't
there initially because it was once a static inline function.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Andy Grover <andy.grover@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The counterpart of radix_tree_next_hole(). To be used by context readahead.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mmap read-around now shares the same code style and data structure with
readahead code.
This also removes do_page_cache_readahead(). Its last user, mmap
read-around, has been changed to call ra_submit().
The no-readahead-if-congested logic is dumped by the way. Users will be
pretty sensitive about the slow loading of executables. So it's
unfavorable to disabled mmap read-around on a congested queue.
[akpm@linux-foundation.org: coding-style fixes]
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the performance impact of possible mmap_miss wrap around to be
temporary and tolerable: i.e. MMAP_LOTSAMISS=100 extra readarounds.
Otherwise if ever mmap_miss wraps around to negative, it takes INT_MAX
cache misses to bring it back to normal state. During the time mmap
readaround will be _enabled_ for whatever wild random workload. That's
almost permanent performance impact.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* create mm/init-mm.c, move init_mm there
* remove INIT_MM, initialize init_mm with C99 initializer
* unexport init_mm on all arches:
init_mm is already unexported on x86.
One strange place is some OMAP driver (drivers/video/omap/) which
won't build modular, but it's already wants get_vm_area() export.
Somebody should look there.
[akpm@linux-foundation.org: add missing #includes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13484
Peer reported:
| The bug is introduced from kernel 2.6.27, if E820 table reserve the memory
| above 4G in 32bit OS(BIOS-e820: 00000000fff80000 - 0000000120000000
| (reserved)), system will report Int 6 error and hang up. The bug is caused by
| the following code in drivers/firmware/memmap.c, the resource_size_t is 32bit
| variable in 32bit OS, the BUG_ON() will be invoked to result in the Int 6
| error. I try the latest 32bit Ubuntu and Fedora distributions, all hit this
| bug.
|======
|static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
| const char *type,
| struct firmware_map_entry *entry)
and it only happen with CONFIG_PHYS_ADDR_T_64BIT is not set.
it turns out we need to pass u64 instead of resource_size_t for that.
[akpm@linux-foundation.org: add comment]
Reported-and-tested-by: Peer Chen <pchen@nvidia.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PIT_TICK_RATE is currently defined in four architectures, but in three
different places. While linux/timex.h is not the perfect place for it, it
is still a reasonable replacement for those drivers that traditionally use
asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency.
Note that for Alpha, the actual value changed from 1193182UL to 1193180UL.
This is unlikely to make a difference, and probably can only improve
accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE
a few years ago, after which every existing instance was getting changed
to 1193182. According to the specification, it should be
1193181.818181...
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sdio: add cards id for sms (Siano Mobile Silicon) MDTV receivers
Add SDIO vendor ID, and multiple device IDs for
various SMS-based MDTV SDIO adapters.
Signed-off-by: Uri Shkolnik <uris@siano-ms.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: core: use more outbound tlabels
firewire: core: don't update Broadcast_Channel if RFC 2734 conditions aren't met
firewire: core: prepare for non-core children of card devices
firewire: core: include linux/uaccess.h instead of asm/uaccess.h
firewire: add parent-of-unit accessor
firewire: rename source files
firewire: reorganize header files
firewire: clean up includes
firewire: ohci: access bus_seconds atomically
firewire: also use vendor ID in root directory for driver matches
firewire: share device ID table type with ieee1394
firewire: core: add sysfs attribute for easier udev rules
firewire: core: check for missing struct update at build time, not run time
firewire: core: improve check for local node
For specific boards, pass initialization data to ir-kbd-i2c instead
of modifying the settings after the device is initialized. This is
more efficient and easier to read.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Let card drivers probe for IR receiver devices and instantiate them if
found. Ultimately it would be better if we could stop probing
completely, but I suspect this won't be possible for all card types.
There's certainly room for cleanups. For example, some drivers are
sharing I2C adapter IDs, so they also had to share the list of I2C
addresses being probed for an IR receiver. Now that each driver
explicitly says which addresses should be probed, maybe some addresses
can be dropped from some drivers.
Also, the special cases in saa7134-i2c should probably be handled on a
per-board basis. This would be more efficient and less risky than always
probing extra addresses on all boards. I'll give it a try later.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
In the standard device driver binding model, the name field of
struct i2c_client is used to match devices to their drivers, so we
must stop using it for internal purposes. Define a separate field
in struct IR_i2c as a replacement, and use it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add ADV7343 I2C based video encoder driver. This follows the
v4l2-subdev framework. This driver has been tested on TI DM646x EVM. It
has been tested for Composite and Component outputs.
Updates as per review by Mauro Chehab, added support for more standards
supported by the encoder. Also adding the missed out signed-offs.Tested
only NTSC and PAL standards.
[hverkuil@xs4all.nl: s_routing API changed, updated driver to use new API]
Signed-off-by: Manjunath Hadli <mrh@ti.com>
Signed-off-by: Brijesh Jadav <brijesh.j@ti.com>
Signed-off-by: Chaithrika U S <chaithrika@ti.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch adds driver for TI THS7303 video amplifier. This driver is
implemented as a v4l2 sub device. Tested on TI DM646x EVM.
This version has updates based on review comments by Mauro Chehab.
Signed-off-by: Chaithrika U S <chaithrika@ti.com>
Reviewed-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add a platform driver to soc_camera.c. This way we preserve backwards
compatibility with existing platforms and can start converting them one by one
to the new platform-device soc-camera interface.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Until now I relied on i2c_del_adapter to unregister the i2c_clients for
me, however, if the i2c bus is a platform bus then it is never deleted.
So instead I need to unregister i2c clients when unregistering the
v4l2_device.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add a utility function that can be used to setup the v4l2_device's name
field in a standard manner.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Make camera devices direct children of host platform devices, move the
inheritance management into the soc_camera.c core driver.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently pcm990 camera bus-width management functions request a GPIO and never
free it again. With this approach the GPIO extender driver cannot be unloaded
once camera drivers have been loaded, also unloading theb i2c-pxa bus driver
produces errors, because the GPIO extender driver cannot unregister properly.
Another problem is, that if camera drivers are once loaded before the GPIO
extender driver, the platform code marks the GPIO unavailable and only a reboot
helps to recover. Adding an explicit free_bus method and using it in mt9m001
and mt9v022 drivers fixes these problems.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The name size limit is gone from the driver core, the BUS_ID_SIZE
value will be removed.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (64 commits)
debugfs: use specified mode to possibly mark files read/write only
debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem.
xen: remove driver_data direct access of struct device from more drivers
usb: gadget: at91_udc: remove driver_data direct access of struct device
uml: remove driver_data direct access of struct device
block/ps3: remove driver_data direct access of struct device
s390: remove driver_data direct access of struct device
parport: remove driver_data direct access of struct device
parisc: remove driver_data direct access of struct device
of_serial: remove driver_data direct access of struct device
mips: remove driver_data direct access of struct device
ipmi: remove driver_data direct access of struct device
infiniband: ehca: remove driver_data direct access of struct device
ibmvscsi: gadget: at91_udc: remove driver_data direct access of struct device
hvcs: remove driver_data direct access of struct device
xen block: remove driver_data direct access of struct device
thermal: remove driver_data direct access of struct device
scsi: remove driver_data direct access of struct device
pcmcia: remove driver_data direct access of struct device
PCIE: remove driver_data direct access of struct device
...
Manually fix up trivial conflicts due to different direct driver_data
direct access fixups in drivers/block/{ps3disk.c,ps3vram.c}
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: remove some includings of blktrace_api.h
mg_disk: seperate mg_disk.h again
block: Introduce helper to reset queue limits to default values
cfq: remove extraneous '\n' in blktrace output
ubifs: register backing_dev_info
btrfs: properly register fs backing device
block: don't overwrite bdi->state after bdi_init() has been run
cfq: cleanup for last_end_request in cfq_data
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (38 commits)
ps3flash: Always read chunks of 256 KiB, and cache them
ps3flash: Cache the last accessed FLASH chunk
ps3: Replace direct file operations by callback
ps3: Switch ps3_os_area_[gs]et_rtc_diff to EXPORT_SYMBOL_GPL()
ps3: Correct debug message in dma_ioc0_map_pages()
drivers/ps3: Add missing annotations
ps3fb: Use ps3_system_bus_[gs]et_drvdata() instead of direct access
ps3flash: Use ps3_system_bus_[gs]et_drvdata() instead of direct access
ps3: shorten ps3_system_bus_[gs]et_driver_data to ps3_system_bus_[gs]et_drvdata
ps3: Use dev_[gs]et_drvdata() instead of direct access for system bus devices
block/ps3: remove driver_data direct access of struct device
ps3vram: Make ps3vram_priv.reports a void *
ps3vram: Remove no longer used ps3vram_priv.ddr_base
ps3vram: Replace mutex by spinlock + bio_list
block: Add bio_list_peek()
powerpc: Use generic atomic64_t implementation on 32-bit processors
lib: Provide generic atomic64_t implementation
powerpc: Add compiler memory barrier to mtmsr macro
powerpc/iseries: Mark signal_vsp_instruction() as maybe unused
powerpc/iseries: Fix unused function warning in iSeries DT code
...
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits)
ACPICA: Update version to 20090521.
ACPICA: Disable preservation of SCI enable bit (SCI_EN)
ACPICA: Region deletion: Ensure region object is removed from handler list
ACPICA: Eliminate extra call to NsGetParentNode
ACPICA: Simplify internal operation region interface
ACPICA: Update Load() to use operation region interfaces
ACPICA: New: AcpiInstallMethod - install a single control method
ACPICA: Invalidate DdbHandle after table unload
ACPICA: Fix reference count issues for DdbHandle object
ACPICA: Simplify and optimize NsGetNextNode function
ACPICA: Additional validation of _PRT packages (resource mgr)
ACPICA: Fix DebugObject output for DdbHandle objects
ACPICA: Fix allowable release order for ASL mutex objects
ACPICA: Mutex support: Fix release ordering issue and current sync level
ACPICA: Update version to 20090422.
ACPICA: Linux OSL: cleanup/update/merge
ACPICA: Fix implementation of AML BreakPoint operator (break to debugger)
ACPICA: Fix miscellaneous warnings under gcc 4+
ACPICA: Miscellaneous lint changes
ACPICA: Fix possible dereference of null pointer
...
This adds a KERN_DEFAULT loglevel marker, for when you cannot decide
which loglevel you want, and just want to keep an existing printk
with the default loglevel.
The difference between having KERN_DEFAULT and having no log-level
marker at all is two-fold:
- having the log-level marker will now force a new-line if the
previous printout had not added one (perhaps because it forgot,
but perhaps because it expected a continuation)
- having a log-level marker is required if you are printing out a
message that otherwise itself could perhaps otherwise be mistaken
for a log-level.
Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
It used to be that we would only look at the log-level in a printk()
after explicit newlines, which can cause annoying problems when the
previous printk() did not end with a '\n'. In that case, the log-level
marker would be just printed out in the middle of the line, and be
seen as just noise rather than change the logging level.
This changes things to always look at the log-level in the first
bytes of the printout. If a log level marker is found, it is always
used as the log-level. Additionally, if no newline existed, one is
added (unless the log-level is the explicit KERN_CONT marker, to
explicitly show that it's a continuation of a previous line).
Acked-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If socket destuction gets delayed to a timer, we try to
lock_sock() from that timer which won't work.
Use bh_lock_sock() in that case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Ingo Molnar <mingo@elte.hu>
eec9462088 fold mg_disk.h into mg_disk.c,
but mg_disk platform driver needs private data for operation. This also
make mg_disk.c as machine independent. Seperate only needed structure and
defines to mg_disk.h
Signed-off-by: unsik Kim <donari75@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
DM reuses the request queue when swapping in a new device table
Introduce blk_set_default_limits() which can be used to reset the the
queue_limits prior to stacking devices.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Differentiate between SuperSpeed endpoint companion descriptor and the
wireless USB endpoint companion descriptor. Make all structure names for
this descriptor have "ss" (SuperSpeed) in them. David Vrabel asked for
this change in http://marc.info/?l=linux-usb&m=124091465109367&w=2
Reported-by: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is the original patch I created before David Vrabel posted a better
patch (http://marc.info/?l=linux-usb&m=123377477209109&w=2) that does
basically the same thing. This patch will get replaced with his
(modified) patch later.
Allow USB device drivers that use usb_sg_init() and usb_sg_wait() to push
bulk endpoint scatter gather lists down to the host controller drivers.
This allows host controller drivers to more efficiently enqueue these
transfers, and allows the xHCI host controller to better take advantage of
USB 3.0 "bursts" for bulk endpoints.
This patch currently only enables scatter gather lists for bulk endpoints.
Other endpoint types that use the usb_sg_* functions will not have their
scatter gather lists pushed down to the host controller. For periodic
endpoints, we want each scatterlist entry to be a separate transfer.
Eventually, HCDs could parse these scatter-gather lists for periodic
endpoints also. For now, we use the old code and call usb_submit_urb()
for each scatterlist entry.
The caller of usb_sg_init() can request that all bytes in the scatter
gather list be transferred by passing in a length of zero. Handle that
request for a bulk endpoint under xHCI by walking the scatter gather list
and calculating the length. We could let the HCD handle a zero length in
this case, but I'm not sure if the core layers in between will get
confused by this.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The USB 3.0 bus specification added an "Endpoint Companion" descriptor that is
supposed to follow all SuperSpeed Endpoint descriptors. This descriptor is used
to extend the bus protocol to allow more packets to be sent to an endpoint per
"microframe". The word microframe was removed from the USB 3.0 specification
because the host controller does not send Start Of Frame (SOF) symbols down the
USB 3.0 wires.
The descriptor defines a bMaxBurst field, which indicates the number of packets
of wMaxPacketSize that a SuperSpeed device can send or recieve in a service
interval. All non-control endpoints may set this value as high as 16 packets
(bMaxBurst = 15).
The descriptor also allows isochronous endpoints to further specify that they
can send and receive multiple bursts per service interval. The bmAttributes
allows them to specify a "Mult" of up to 3 (bmAttributes = 2).
Bulk endpoints use bmAttributes to report the number of "Streams" they support.
This was an extension of the endpoint pipe concept to allow multiple mass
storage device commands to be outstanding for one bulk endpoint at a time. This
should allow USB 3.0 mass storage devices to support SCSI command queueing.
Bulk endpoints can say they support up to 2^16 (65,536) streams.
The information in the endpoint companion descriptor must be stored with the
other device, config, interface, and endpoint descriptors because the host
controller needs to access them quickly, and we need to install some default
values if a SuperSpeed device doesn't provide an endpoint companion descriptor.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Warn users of URB_NO_SETUP_DMA_MAP about xHCI behavior.
Device drivers can choose to DMA map the setup packet of a control transfer
before submitting the URB to the USB core. Drivers then set the
URB_NO_SETUP_DMA_MAP and pass in the DMA memory address in setup_dma, instead of
providing a kernel address for setup_packet. However, xHCI requires that the
setup packet be copied into an internal data structure, and we need a kernel
memory address pointer for that. Warn users of URB_NO_SETUP_DMA_MAP that they
should provide a valid pointer for setup_packet, along with the DMA address.
FIXME: I'm not entirely sure how to work around this in the xHCI driver
or USB core.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add host controller driver API and a slot_id variable to struct
usb_device. This allows the xHCI host controller driver to ask the
hardware to allocate a slot for the device when a struct usb_device is
allocated. The slot needs to be allocated at that point because the
hardware can run out of internal resources, and we want to know that very
early in the device connection process. Don't call this new API for root
hubs, since they aren't real devices.
Add HCD API to let the host controller choose the device address. This is
especially important for xHCI hardware running in a virtualized
environment. The guests running under the VM don't need to know which
addresses on the bus are taken, because the hardware picks the address for
them. Announce SuperSpeed USB devices after the address has been assigned
by the hardware.
Don't use the new get descriptor/set address scheme with xHCI. Unless
special handling is done in the host controller driver, the xHC can't
issue control transfers before you set the device address. Support for
the older addressing scheme will be added when the xHCI driver supports
the Block Set Address Request (BSR) flag in the Address Device command.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds a hex route string to each USB device. The route string is used
by the USB 3.0 host controller to send packets through the device tree. USB 3.0
hubs use this string to route packets to the correct port. This is fundamental
bus change from USB 2.0, where all packets were broadcast across the bus.
Devices (including hubs) under a root port receive the route string 0x0. Every
four bits in the route string represent a port on a hub. This length works
because USB 3.0 hubs are limited to 15 ports, and USB 2.0 hubs (with potentially
more ports) will never see packets with a route string. A port number of 0
means the packet is destined for that hub.
For example, a peripheral device might have a route string of 0x00097.
This means the device is connected to port 9 of the hub at depth 1.
The hub at depth 1 is connected to port 7 of a hub at depth 0.
The hub at depth 0 is connected to a root port.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Modify the USB core to handle the new USB 3.0 speed, "SuperSpeed". This
is 5.0 Gbps (wire speed). There are probably more places that check for
speed that I've missed.
SuperSpeed devices have a 512 byte endpoint 0 max packet size. This shows
up as a bMaxPacketSize0 set to 0x09 (see table 9-8 of the USB 3.0 bus
spec).
xHCI spec says that the xHC can handle intervals up to 2^15 microframes. That
might change when real silicon becomes available.
Add FIXME note for SuperSpeed isochronous endpoints. They can transmit up
to 16 packets in one "burst" before they wait for an acknowledgment of the
packets. They can do up to 3 bursts per microframe (determined by the
mult value in the endpoint companion descriptor). The xHCI driver doesn't
have support for isoc yet, so fix this later.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add PCI initialization code to take control of the xHCI host controller
away from the BIOS, halt, and reset the host controller. The xHCI spec
says that BIOSes must give up the host controller within 5 seconds.
Add some host controller glue functions to handle hardware initialization
and memory allocation for the host controller. The current xHCI
prototypes use PCI interrupts, but the xHCI spec requires MSI-X
interrupts. Add code to support MSI-X interrupts, but use the PCI
interrupts for now.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Description:
This driver is used for Intel Langwell* USB OTG controller in Intel
Moorestown* platform. It tries to implement host/device role switch
according to OTG spec. The actual hsot and device functions are
accomplished in modified EHCI driver and Intel Langwell USB OTG client
controller driver.
* Langwell and Moorestown are names used in development. They are not
approved official name.
Note:
This patch is the first version Intel Langwell USB OTG Transceiver
driver. The development is not finished, and the bug fixing is on going
for some hardware and software issues. The main purpose of this
submission is for code view.
Supported features:
- Data-line Pulsing SRP
- Support HNP to switch roles
- PCI D0/D3 power management support
Known issues:
- HNP is only tested with another Moorestown platform.
- PCI D0/D3 power management support is not fully tested.
- VBus Pulsing SRP is not support in current version.
Signed-off-by: Hao Wu <hao.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Intel Langwell USB Device Controller is a High-Speed USB OTG device
controller in Intel Moorestown platform. It can work in OTG device mode
with Intel Langwell USB OTG transceiver driver as well as device-only
mode. The number of programmable endpoints is different through
controller revision.
NOTE:
This patch is the first version Intel Langwell USB OTG device controller
driver. The bug fixing is on going for some hardware and software
issues. Intel Langwell USB OTG transceiver driver and EHCI driver
patches will be submitted later.
Supported features:
- USB OTG protocol support with Intel Langwell USB OTG transceiver
driver (turn on CONFIG_USB_LANGWELL_OTG)
- Support control, bulk, interrupt and isochronous endpoints
(isochronous not tested)
- PCI D0/D3 power management support
- Link Power Management (LPM) support
Tested gadget drivers:
- g_file_storage
- g_ether
- g_zero
The passed tests:
- g_file_storage: USBCV Chapter 9 tests
- g_file_storage: USBCV MSC tests
- g_file_storage: from/to host files copying
- g_ether: ping, ftp and scp files from/to host
- Hotplug, with and without hubs
Known issues:
- g_ether: failed part of USBCV chap9 tests
- LPM support not fully tested
TODO:
- g_ether: pass all USBCV chap9 tests
- g_zero: pass usbtest tests
- Stress tests on different gadget drivers
- On-chip private SRAM caching support
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1254) splits up the shutdown method of usb_serial_driver
into a disconnect and a release method.
The problem is that the usb-serial core was calling shutdown during
disconnect handling, but drivers didn't expect it to be called until
after all the open file references had been closed. The result was an
oops when the close method tried to use memory that had been
deallocated by shutdown.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1253) prevents the usb-serial core from calling a
driver's port_probe and port_remove methods more than once per port.
It also removes some unnecessary try_module_get() calls and adds a
missing port_remove method call in a failure path.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
CPU/board specific parameters (PLL clock, vif etc...) can be set
by platform_data instead of module_param.
v2: remove irq_sense member in platform_data because it can OR in
IRQF_TRIGGER_LOW or IRQF_TRIGGER_FALLING against IORESOURCE_IRQ in
the struct resource.
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Reviewed-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The usb_debug driver was modified to implement serial break handling
by using a "magic" data packet comprised of the sequence:
0x00 0xff 0x01 0xfe 0x00 0xfe 0x01 0xff
When the tty layer requests a serial break the usb_debug driver sends
the magic packet. On the receiving side the magic packet is thrown
away or a sysrq is activated depending on what kernel .config options
have been set.
The generic serial driver was modified as well as the usb serial
headers to generically implement sysrq processing in the same way the
non usb uart based drivers implement the sysrq handling. This will
allow other usb serial devices to implement sysrq handling as desired.
The new usb serial functions are named similarly and implemented
similarly to the uart functions as follows:
usb_serial_handle_break <-> uart_handle_break
usb_serial_handle_sysrq_char <-> uart_handle_sysrq_char
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The usb_debug driver, when used as the console, will always fail to
insert the carriage return and new line sequence as well as randomly
drop console output. This is a result of only having the single
write_urb and that the tty layer will have a lock that prevents the
processing of the back to back urb requests.
The solution is to allow more than one urb to be outstanding and have
a slightly deeper transmit queue. The idea and some code is borrowed
from the ftdi_sio usb driver.
The generic usb serial driver was modified so as to allow the classic
method of 1 write urb, or a multi write urb scheme with N allowed
outstanding urbs where N is controlled by max_in_flight_urbs. When
max_in_flight_urbs in a "struct usb_serial_driver" is non zero the
multi write urb scheme will be used.
The size of 4000 was selected for the usb_debug driver so that the
driver lowers possibility of losing the queued console messages during
the kernel startup.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use "/* private:" to mark struct members as private so that
scripts/kernel-doc will handle them correctly.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mark internal struct members as /* private: */ so that kernel-doc
won't produce warnings about missing descriptions for them.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1235) adds an array of PCI power-state names, together
with a simple inline accessor routine.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The usb_host class isn't used for anything anymore (it was used for
debug files, but they have moved to debugfs a few kernel releases ago),
so let's delete it before someone accidentally puts a file in it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1239) updates the kernel's treatment of Unicode. The
character-set conversion routines are well behind the current state of
the Unicode specification: They don't recognize the existence of code
points beyond plane 0 or of surrogate pairs in the UTF-16 encoding.
The old wchar_t 16-bit type is retained because it's still used in
lots of places. This shouldn't cause any new problems; if a
conversion now results in an invalid 16-bit code then before it must
have yielded an undefined code.
Difficult-to-read names like "utf_mbstowcs" are replaced with more
transparent names like "utf8s_to_utf16s" and the ordering of the
parameters is rationalized (buffer lengths come immediate after the
pointers they refer to, and the inputs precede the outputs).
Fortunately the low-level conversion routines are used in only a few
places; the interfaces to the higher-level uni2char and char2uni
methods have been left unchanged.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The NOP OTG transceiver driver needs to be usable from modules.
Make sure its symbols are always accessible at both compile and
link time, and make sure the device instance is allocated from
the heap so that device lifetime rules are obeyed.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Many developers use "/debug/" or "/debugfs/" or "/sys/kernel/debug/"
directory name to mount debugfs filesystem for ftrace according to
./Documentation/tracers/ftrace.txt file.
And, three directory names(ex:/debug/, /debugfs/, /sys/kernel/debug/) is
existed in kernel source like ftrace, DRM, Wireless, Documentation,
Network[sky2]files to mount debugfs filesystem.
debugfs means debug filesystem for debugging easy to use by greg kroah
hartman. "/sys/kernel/debug/" name is suitable as directory name
of debugfs filesystem.
- debugfs related reference: http://lwn.net/Articles/334546/
Fix inconsistency of directory name to mount debugfs filesystem.
* From Steven Rostedt
- find_debugfs() and tracing_files() in this patch.
Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Acked-by : Inaky Perez-Gonzalez <inaky@linux.intel.com>
Reviewed-by : Steven Rostedt <rostedt@goodmis.org>
Reviewed-by : James Smart <james.smart@emulex.com>
CC: Jiri Kosina <trivial@kernel.org>
CC: David Airlie <airlied@linux.ie>
CC: Peter Osterlund <petero2@telia.com>
CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device. Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used. These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds support for block drivers to report their requested nodename
to userspace. It also updates a number of block drivers to provide the
needed subdirectory and device name to be used for them.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds support for USB drivers to report their requested nodename to
userspace. It also updates a number of USB drivers to provide the
needed subdirectory and device name to be used for them.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds support for misc devices to report their requested nodename to
userspace. It also updates a number of misc drivers to provide the
needed subdirectory and device name to be used for them.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds the nodename callback for struct class, struct device_type and
struct device, to allow drivers to send userspace hints on the device
name and subdirectory that should be used for it.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As we're allocating the firmware name dynamically, we no longer need this
definition.
This patch must be applied only after the 5 previous patches from this pacth
set have been applied.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This converts resource and IRQ getbyname functions for the platform
bus to use const char *, I ran into compiler moanings when I tried
using a const char * for looking up a certain resource.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds a new bus notifier event which is emitted _after_ a
device is removed from its driver. This event will be used by the
dma-api debug code to check if a driver has released all dma allocations
for that device.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
During bootup performance tracing we see repeated occurrences of
/sys/kernel/uid/* events for the same uid, leading to a,
in this case, rather pointless userspace processing for the
same uid over and over.
This is usually caused by tools which change their uid to "nobody",
to run without privileges to read data supplied by untrusted users.
This change delays the execution of the (already existing) scheduled
work, to cleanup the uid after one second, so the allocated and announced
uid can possibly be re-used by another process.
This is the current behavior, where almost every invocation of a
binary, which changes the uid, creates two events:
$ read START < /sys/kernel/uevent_seqnum; \
for i in `seq 100`; do su --shell=/bin/true bin; done; \
read END < /sys/kernel/uevent_seqnum; \
echo $(($END - $START))
178
With the delayed cleanup, we get only two events, and userspace finishes
a bit faster too:
$ read START < /sys/kernel/uevent_seqnum; \
for i in `seq 100`; do su --shell=/bin/true bin; done; \
read END < /sys/kernel/uevent_seqnum; \
echo $(($END - $START))
1
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change ide_drive_t 'drive_data' field from 'unsigned int' type to 'void *'
type, allowing a wider range of values/types to be stored in this field.
Added 'ide_get_drivedata' and 'ide_set_drivedata' helpers to get and set
the 'drive_data' field.
Fixed all host drivers to maintain coherency with the change in the
'drive_data' field type.
Signed-off-by: Joao Ramos <joao.ramos@inov.pt>
[bart: fix qd65xx build, cast to 'unsigned long', minor Coding Style fixups]
Acked-by: Sergei Shtylyov <sshtylyov@ru.montavista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: Logic to move non pinned timers
timers: /proc/sys sysctl hook to enable timer migration
timers: Identifying the existing pinned timers
timers: Framework for identifying pinned timers
timers: allow deferrable timers for intervals tv2-tv5 to be deferred
Fix up conflicts in kernel/sched.c and kernel/timer.c manually
* 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: prevent selection of low resolution clocksourse also for nohz=on
clocksource: sanity check sysfs clocksource changes
Move the ack_intr() method into 'struct ide_port_ops', also renaming it to
test_irq() while at it...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add 'unsigned long port_flags' field to ide_hwif_t.
* Add IDE_PFLAG_PROBING port flag and keep it set during probing.
* Fix ide_pio_need_iordy() to not enable IORDY at a probe time
(IORDY may lead to controller lock up on certain controllers
if the port is not occupied).
Loosely based on the recent libata's fix by Tejun, thanks to Alan
for the hint that IDE may also need it.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Add ide_pio_need_iordy() helper and convert host drivers to use it.
This fixes it8172, it8213, pdc202xx_old, piix, slc90e66 and siimage
host drivers to handle IORDY correctly.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
v2:
Minor fixes per Sergei's review.
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits)
powerpc: Fix bug in move of altivec code to vector.S
powerpc: Add support for swiotlb on 32-bit
powerpc/spufs: Remove unused error path
powerpc: Fix warning when printing a resource_size_t
powerpc/xmon: Remove unused variable in xmon.c
powerpc/pseries: Fix warnings when printing resource_size_t
powerpc: Shield code specific to 64-bit server processors
powerpc: Separate PACA fields for server CPUs
powerpc: Split exception handling out of head_64.S
powerpc: Introduce CONFIG_PPC_BOOK3S
powerpc: Move VMX and VSX asm code to vector.S
powerpc: Set init_bootmem_done on NUMA platforms as well
powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
powerpc/mm: Fix some SMP issues with MMU context handling
powerpc: Add PTRACE_SINGLEBLOCK support
fbdev: Add PLB support and cleanup DCR in xilinxfb driver.
powerpc/virtex: Add ml510 reference design device tree
powerpc/virtex: Add Xilinx ML510 reference design support
powerpc/virtex: refactor intc driver and add support for i8259 cascading
powerpc/virtex: Add support for Xilinx PCI host bridge
...
This patch allows the Adaptec firmware to pass on its values for Packetize and
QAS. To do this, the settings max_iu and max_qas have been introduced into
the SPI transport class and populated from the adaptec NVram tables. Domain
validation in the SPI transport class will respect the max settings when
configuring to the highest possible speed for testing.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The purpose of this change is to allow __getname() users to pass a
custom GFP mask to kmem_cache_alloc(). This is needed for annotating
a certain kmemcheck false positive.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
This gets rid of a heap of false-positive warnings from the tracer
code due to the use of bitfields.
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
The use of bitfields here would lead to false positive warnings with
kmemcheck. Silence them.
(Additionally, one erroneous comment related to the bitfield was also
fixed.)
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Add the bitfield API which can be used to annotate bitfields in structs
and get rid of false positive reports.
According to Al Viro, the syntax we were using (putting #ifdef inside
macro arguments) was not valid C. He also suggested using begin/end
markers instead, which is what we do now.
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
This adds support for tracking the initializedness of memory that
was allocated with the page allocator. Highmem requests are not
tracked.
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
[build fix for !CONFIG_KMEMCHECK]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
This patch hooks into the DMA API to prevent the reporting of the
false positives that would otherwise be reported when memory is
accessed that is also used directly by devices.
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
With kmemcheck enabled, the slab allocator needs to do this:
1. Tell kmemcheck to allocate the shadow memory which stores the status of
each byte in the allocation proper, e.g. whether it is initialized or
uninitialized.
2. Tell kmemcheck which parts of memory that should be marked uninitialized.
There are actually a few more states, such as "not yet allocated" and
"recently freed".
If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
memory that can take page faults because of kmemcheck.
If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
request memory with the __GFP_NOTRACK flag. This does not prevent the page
faults from occuring, however, but marks the object in question as being
initialized so that no warnings will ever be produced for this object.
In addition to (and in contrast to) __GFP_NOTRACK, the
__GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should
not be tracked _because_ it would produce a false positive. Their values
are identical, but need not be so in the future (for example, we could now
enable/disable false positives with a config option).
Parts of this patch were contributed by Pekka Enberg but merged for
atomicity.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
The V3 regulator can be configured with an external resistor
connected to the feedback pin (R24 in the data sheet) to
increase the voltage range.
For example, hx4700 has R24 = 3.32 kOhm to achieve a maximum
V3 voltage of 1.55 V which is needed for 624 MHz CPU frequency.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
This patch adds regulator drivers for National Semiconductors LP3971 PMIC.
This LP3971 PMIC controller has 3 DC/DC voltage converters and 5 low
drop-out (LDO) regulators. LP3971 PMIC controller uses I2C interface.
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
The userspace-consumer driver allows control of voltage and current
regulator state from userspace. This is required for fine-grained
power management of devices that are completely controller by userspace
applications, e.g. a GPS transciever connected to a serial port.
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
The Maxim 1586 regulator is a voltage regulator with 2
voltage outputs, specially suitable for Marvell PXA
chips. One output is in the range of required VCC_CORE by
the PXA27x chips, the other in the VCC_USIM required as well
by PXA27x chips.
The chip is controlled through the I2C bus.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS
(like in PSCHED_TICKS_PER_SEC already) to avoid misleading.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce bio_list_peek(), to obtain a pointer to the first bio on the bio_list
without actually removing it from the list. This is needed when you want to
serialize based on the list being empty or not.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Many processor architectures have no 64-bit atomic instructions, but
we need atomic64_t in order to support the perf_counter subsystem.
This adds an implementation of 64-bit atomic operations using hashed
spinlocks to provide atomicity. For each atomic operation, the address
of the atomic64_t variable is hashed to an index into an array of 16
spinlocks. That spinlock is taken (with interrupts disabled) around the
operation, which can then be coded non-atomically within the lock.
On UP, all the spinlock manipulation goes away and we simply disable
interrupts around each operation. In fact gcc eliminates the whole
atomic64_lock variable as well.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add kernel modesetting support to radeon driver, use the ttm memory
manager to manage memory and DRM/GEM to provide userspace API.
In order to avoid backward compatibility issue and to allow clean
design and code the radeon kernel modesetting use different code path
than old radeon/drm driver.
When kernel modesetting is enabled the IOCTL of radeon/drm
driver are considered as invalid and an error message is printed
in the log and they return failure.
KMS enabled userspace will use new API to talk with the radeon/drm
driver. The new API provide functions to create/destroy/share/mmap
buffer object which are then managed by the kernel memory manager
(here TTM). In order to submit command to the GPU the userspace
provide a buffer holding the command stream, along this buffer
userspace have to provide a list of buffer object used by the
command stream. The kernel radeon driver will then place buffer
in GPU accessible memory and will update command stream to reflect
the position of the different buffers.
The kernel will also perform security check on command stream
provided by the user, we want to catch and forbid any illegal use
of the GPU such as DMA into random system memory or into memory
not owned by the process supplying the command stream. This part
of the code is still incomplete and this why we propose that patch
as a staging driver addition, future security might forbid current
experimental userspace to run.
This code support the following hardware : R1XX,R2XX,R3XX,R4XX,R5XX
(radeon up to X1950). Works is underway to provide support for R6XX,
R7XX and newer hardware (radeon from HD2XXX to HD4XXX).
Authors:
Jerome Glisse <jglisse@redhat.com>
Dave Airlie <airlied@redhat.com>
Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
TTM is a GPU memory manager subsystem designed for use with GPU
devices with various memory types (On-card VRAM, AGP,
PCI apertures etc.). It's essentially a helper library that assists
the DRM driver in creating and managing persistent buffer objects.
TTM manages placement of data and CPU map setup and teardown on
data movement. It can also optionally manage synchronization of
data on a per-buffer-object level.
TTM takes care to provide an always valid virtual user-space address
to a buffer object which makes user-space sub-allocation of
big buffer objects feasible.
TTM uses a fine-grained per buffer-object locking scheme, taking
care to release all relevant locks when waiting for the GPU.
Although this implies some locking overhead, it's probably a big
win for devices with multiple command submission mechanisms, since
the lock contention will be minimal.
TTM can be used with whatever user-space interface the driver
chooses, including GEM. It's used by the upcoming Radeon KMS DRM driver
and is also the GPU memory management core of various new experimental
DRM drivers.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits)
.gitignore: ignore *.lzma files
kbuild: add generic --set-str option to scripts/config
kbuild: simplify argument loop in scripts/config
kbuild: handle non-existing options in scripts/config
kallsyms: generalize text region handling
kallsyms: support kernel symbols in Blackfin on-chip memory
documentation: make version fix
kbuild: fix a compile warning
gitignore: Add GNU GLOBAL files to top .gitignore
kbuild: fix delay in setlocalversion on readonly source
README: fix misleading pointer to the defconf directory
vmlinux.lds.h update
kernel-doc: cleanup perl script
Improve vmlinux.lds.h support for arch specific linker scripts
kbuild: fix headers_exports with boolean expression
kbuild/headers_check: refine extern check
kbuild: fix "Argument list too long" error for "make headers_check",
ignore *.patch files
Remove bashisms from scripts
menu: fix embedded menu presentation
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mlx4: Add strong ordering to local inval and fast reg work requests
IB/ehca: Remove superfluous bitmasks from QP control block
RDMA/cxgb3: Limit fast register size based on T3 limitations
RDMA/cxgb3: Report correct port state and MTU
mlx4_core: Add module parameter for number of MTTs per segment
IB/mthca: Add module parameter for number of MTTs per segment
RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
infiniband: Remove void casts
IB/ehca: Increment version number
IB/ehca: Remove unnecessary memory operations for userspace queue pairs
IB/ehca: Fall back to vmalloc() for big allocations
IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
In addition to KT_DEAD which has limited support for diacriticals,
there is KT_DEAD2 that can support 256 criticals, so let's advertise
it in <linux/keyboard.h>.
This lets userland know abut the drivers/char/keyboard.c function
k_dead2, which supports more than the few trivial ones that k_dead
supports.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (25 commits)
atmel-mci: add MCI2 register definitions
atmel-mci: Integrate AT91 specific definition in header file
tmio_mmc: allow compilation for ASIC3
mmc_block: do not DMA to stack
sdhci: Print ADMA status and pointer on debug
tmio_mmc: fix clock setup
tmio_mmc: map SD control registers after enabling the MFD cell
tmio_mmc: correct probe return value for num_resources != 3
tmio_mmc: don't use set_irq_type
tmio_mmc: add bus_shift support
MFD,mmc: tmio_mmc: make HCLK configurable
mmc_spi: don't use EINVAL for possible transmission errors
cb710: more cleanup for the DEBUG case.
sdhci: platform driver for SDHCI
mxcmmc: remove frequency workaround
cb710: handle DEBUG define in Makefile
cb710: add missing parenthesis
cb710: fix printk format string
mmc: Driver for CB710/720 memory card reader (MMC part)
pxamci: add regulator support.
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: fix inverted wheel for bluetooth version of apple mighty mouse
HID: no more reinitializtion is needed in post_reset
HID: hidraw -- fix comment about accepted devices
HID: Multitouch support for the N-Trig touchscreen
HID: add new multitouch and digitizer contants
HID: autocentering support for Logitech Force 3D Pro
HID: fix hid-ff drivers so that devices work even without ff support
HID: force feedback support for SmartJoy PLUS PS2/USB adapter
HID: Wacom Graphire Bluetooth driver
HID: autocentering support for Logitech G25 Racing Wheel
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (417 commits)
MAINTAINERS: EB110ATX is not ebsa110
MAINTAINERS: update Eric Miao's email address and status
fb: add support of LCD display controller on pxa168/910 (base layer)
[ARM] 5552/1: ep93xx get_uart_rate(): use EP93XX_SYSCON_PWRCNT and EP93XX_SYSCON_PWRCN
[ARM] pxa/sharpsl_pm: zaurus needs generic pxa suspend/resume routines
[ARM] 5544/1: Trust PrimeCell resource sizes
[ARM] pxa/sharpsl_pm: cleanup of gpio-related code.
[ARM] pxa/sharpsl_pm: drop set_irq_type calls
[ARM] pxa/sharpsl_pm: merge pxa-specific code into generic one
[ARM] pxa/sharpsl_pm: merge the two sharpsl_pm.c since it's now pxa specific
[ARM] sa1100: remove unused collie_pm.c
[ARM] pxa: fix the conflicting non-static declarations of global_gpios[]
[ARM] 5550/1: Add default configure file for w90p910 platform
[ARM] 5549/1: Add clock api for w90p910 platform.
[ARM] 5548/1: Add gpio api for w90p910 platform
[ARM] 5551/1: Add multi-function pin api for w90p910 platform.
[ARM] Make ARM_VIC_NR depend on ARM_VIC
[ARM] 5546/1: ARM PL022 SSP/SPI driver v3
ARM: OMAP4: SMP: Update defconfig for OMAP4430
ARM: OMAP4: SMP: Enable SMP support for OMAP4430
...
Updated after review by Tim Abbott.
- Use HEAD_TEXT_SECTION
- Drop use of section-names.h and delete file
- Introduce EXIT_CALL
Deleting section-names.h required a few simple
updates of init.h
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@ksplice.com>
Tlabel is a 6 bits wide datum. Wrap it after 63 rather than 31 for more
safety against transaction label exhaustion and potential responders'
transaction layer bugs. (As noted by Guus Sliepen, this change requires
an expansion of tlabel_mask to 64 bits.)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This extra check will avoid Broadcast_Channel register related traffic
to many IIDC, SBP-2, and AV/C devices which aren't IRMC or have a
max_rec < 8 (i.e. support < 512 bytes async payload). This avoids a
little bit of traffic after bus reset and is even more careful with
devices which don't implement this CSR.
The assumption is that no other protocol than IP over 1394 uses the
broadcast channel for streams.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
The Toshiba parts all have a 24 MHz HCLK, but HTC ASIC3 has a 24.576 MHz HCLK
and AMD Imageon w228x's HCLK is 80 MHz. With this patch, the MFD driver
provides the HCLK frequency to tmio_mmc via mfd_cell->driver_data.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Acked-by: Ian Molton <ian@mnementh.co.uk>
Acked-by: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
The code is divided in two parts. There is a virtual 'bus' driver
that handles PCI device and registers three new devices one per card
reader type. The other driver handles SD/MMC part of the reader.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: Fix oops on unaligned user access
avr32: Add support for Mediama RMTx add-on board for ATNGW100
avr32: Change Atmel ATNGW100 config to add choice of add-on board
Fix MIMC200 board LCD init
avr32: Fix clash in ATMEL_USART_ flags
avr32: remove obsolete hw_interrupt_type
avr32: Solves problem with inverted MCI detect pin on Merisc board
atmel-mci: Add support for inverted detect pin
General description: kmemcheck is a patch to the linux kernel that
detects use of uninitialized memory. It does this by trapping every
read and write to memory that was allocated dynamically (e.g. using
kmalloc()). If a memory address is read that has not previously been
written to, a message is printed to the kernel log.
Thanks to Andi Kleen for the set_memory_4k() solution.
Andrew Morton suggested documenting the shadow member of struct page.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
[export kmemcheck_mark_initialized]
[build fix for setup_max_cpus]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.
The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.
At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.
The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.
During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.
A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.
For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.
This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds the hlist_nulls_add_head() function which is
based on hlist_nulls_add_head_rcu() but without the use of
rcu_assign_pointer(). It also adds hlist_nulls_del which is
exactly the same like hlist_nulls_del_rcu().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch moves the helper destruction to a function that lives
in nf_conntrack_helper.c. This new function is used in the patch
to add ctnetlink reliable event delivery.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.
The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.
BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
commit 3f68535ada (clocksource: sanity check sysfs clocksource
changes) prevents selection of non high resolution capable
clocksources when high resolution mode is active, but did not take
into account that the same rules apply for highres=off nohz=on.
Check the tick device mode instead of hrtimer_hres_active() to verify
whether the system needs to be protected from a switch to jiffies or
other non highres capable clock sources.
Reported-by: Luming Yu <luming.yu@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There is sometimes a need for the ocores driver to add devices to the
bus when installed.
i2c_register_board_info can not always be used, because the I2C devices
are not known at an early state, they could for instance be connected
on a I2C bus on a PCI device which has the Open Cores IP.
i2c_new_device can not be used in all cases either since the resulting
bus nummer might be unknown.
The solution is the pass a list of I2C devices in the platform data to
the Open Cores driver. This is useful for MFD drivers.
Signed-off-by: Richard Röjfors <richard.rojfors.ext@mocean-labs.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Rationale: kmemcheck needs to be able to schedule a tasklet without
touching any dynamically allocated memory _at_ _all_ (since that would
lead to a recursive page fault). This tasklet is used for writing the
error reports to the kernel log.
The new scheduling function avoids touching any other tasklets by
inserting the new tasklist as the head of the "tasklet_hi" list instead
of on the tail.
Also don't wake up the softirq thread lest the scheduler access some
tracked memory and we go down with a recursive page fault.
In this case, we'd better just wait for the maximum time of 1/HZ for the
message to appear.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Move the SLAB struct kmem_cache definition to <linux/slab_def.h> like
with SLUB so kmemcheck can access ->ctor and ->flags.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (50 commits)
drm: include kernel list header file in hashtab header
drm: Export hash table functionality.
drm: Split out the mm declarations in a separate header. Add atomic operations.
drm/radeon: add support for RV790.
drm/radeon: add rv740 drm support.
drm_calloc_large: check right size, check integer overflow, use GFP_ZERO
drm: Eliminate magic I2C frobbing when reading EDID
drm/i915: duplicate desired mode for use by fbcon.
drm/via: vfree() no need checking before calling it
drm: Replace DRM_DEBUG with DRM_DEBUG_DRIVER in i915 driver
drm: Replace DRM_DEBUG with DRM_DEBUG_MODE in drm_mode
drm/i915: Replace DRM_DEBUG with DRM_DEBUG_KMS in intel_sdvo
drm/i915: replace DRM_DEBUG with DRM_DEBUG_KMS in intel_lvds
drm: add separate drm debugging levels
radeon: remove _DRM_DRIVER from the preadded sarea map
drm: don't associate _DRM_DRIVER maps with a master
drm: simplify kcalloc() call to kzalloc().
intelfb: fix spelling of "CLOCK"
drm: fix LOCK_TEST_WITH_RETURN macro
drm/i915: Hook connector to encoder during load detection (fixes tv/vga detect)
...
This will help kmemcheck (and possibly other debugging tools) since we
can now simply pass regs->bp to the stack tracer instead of specifying
the number of stack frames to skip, which is unreliable if gcc decides
to inline functions, etc.
Note that this makes the API incomplete for other architectures, but I
expect that those can be updated lazily, e.g. when they need it.
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
dlm: use more NOFS allocation
dlm: connect to nodes earlier
dlm: fix use count with multiple joins
dlm: Make name input parameter of {,dlm_}new_lockspace() const
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
perf_counter: Add forward/backward attribute ABI compatibility
perf record: Explicity program a default counter
perf_counter: Remove PERF_TYPE_RAW special casing
perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
powerpc, perf_counter: Fix performance counter event types
perf_counter/x86: Add a quirk for Atom processors
perf_counter tools: Remove one L1-data alias
git commit 0a0c5168 "PM: Introduce functions for suspending and resuming
device interrupts" introduced some helper functions. However these
functions are only available for architectures which support
GENERIC_HARDIRQS.
Other architectures will see this build error:
drivers/built-in.o: In function `sysdev_suspend':
(.text+0x15138): undefined reference to `check_wakeup_irqs'
drivers/built-in.o: In function `device_power_up':
(.text+0x1cb66): undefined reference to `resume_device_irqs'
drivers/built-in.o: In function `device_power_down':
(.text+0x1cb92): undefined reference to `suspend_device_irqs'
To fix this add some empty inline functions for !GENERIC_HARDIRQS.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The *_nvs_* routines in swsusp.c make use of the io*map()
functions, which are only provided for HAS_IOMEM, thus
breaking compilation if HAS_IOMEM is not set. Fix this
by moving the *_nvs_* routines into hibernate_nvs.c, which
is only compiled if HAS_IOMEM is set.
[rjw: Change the name of the new file to hibernate_nvs.c, add the
license line to the header comment.]
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This patch removes the legacy callbacks ->suspend() and
->resume() from struct device_type. These callbacks seem
unused, and new code should instead make use of struct
dev_pm_ops.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Remove the ->suspend_late() and ->resume_early() callbacks
from struct bus_type V2. These callbacks are legacy stuff
at this point and since there seem to be no in-tree users
we may as well remove them. New users should use dev_pm_ops.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This patch (as1241) renames a bunch of functions in the PM core.
Rather than go through a boring list of name changes, suffice it to
say that in the end we have a bunch of pairs of functions:
device_resume_noirq dpm_resume_noirq
device_resume dpm_resume
device_complete dpm_complete
device_suspend_noirq dpm_suspend_noirq
device_suspend dpm_suspend
device_prepare dpm_prepare
in which device_X does the X operation on a single device and dpm_X
invokes device_X for all devices in the dpm_list.
In addition, the old dpm_power_up and device_resume_noirq have been
combined into a single function (dpm_resume_noirq).
Lastly, dpm_suspend_start and dpm_resume_end are the renamed versions
of the former top-level device_suspend and device_resume routines.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>