Some are needed to build but not actually used on archs not supporting
transparent hugepages. Others like pmdp_clear_flush are used by x86 too.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These returns 0 at compile time when the config option is disabled, to
allow gcc to eliminate the transparent hugepage function calls at compile
time without additional #ifdefs (only the export of those functions have
to be visible to gcc but they won't be required at link time and
huge_memory.o can be not built at all).
_PAGE_BIT_UNUSED1 is never used for pmd, only on pte.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
huge_memory.c needs it too when it fallbacks in copying hugepages into
regular fragmented pages if hugepage allocation fails during COW.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alter compound get_page/put_page to keep references on subpages too, in
order to allow __split_huge_page_refcount to split an hugepage even while
subpages have been pinned by one of the get_user_pages() variants.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new compound_lock() needed to serialize put_page against
__split_huge_page_refcount().
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simon Kirby reported the following problem
We're seeing cases on a number of servers where cache never fully
grows to use all available memory. Sometimes we see servers with 4 GB
of memory that never seem to have less than 1.5 GB free, even with a
constantly-active VM. In some cases, these servers also swap out while
this happens, even though they are constantly reading the working set
into memory. We have been seeing this happening for a long time; I
don't think it's anything recent, and it still happens on 2.6.36.
After some debugging work by Simon, Dave Hansen and others, the prevaling
theory became that kswapd is reclaiming order-3 pages requested by SLUB
too aggressive about it.
There are two apparent problems here. On the target machine, there is a
small Normal zone in comparison to DMA32. As kswapd tries to balance all
zones, it would continually try reclaiming for Normal even though DMA32
was balanced enough for callers. The second problem is that
sleeping_prematurely() does not use the same logic as balance_pgdat() when
deciding whether to sleep or not. This keeps kswapd artifically awake.
A number of tests were run and the figures from previous postings will
look very different for a few reasons. One, the old figures were forcing
my network card to use GFP_ATOMIC in attempt to replicate Simon's problem.
Second, I previous specified slub_min_order=3 again in an attempt to
reproduce Simon's problem. In this posting, I'm depending on Simon to say
whether his problem is fixed or not and these figures are to show the
impact to the ordinary cases. Finally, the "vmscan" figures are taken
from /proc/vmstat instead of the tracepoints. There is less information
but recording is less disruptive.
The first test of relevance was postmark with a process running in the
background reading a large amount of anonymous memory in blocks. The
objective was to vaguely simulate what was happening on Simon's machine
and it's memory intensive enough to have kswapd awake.
POSTMARK
traceonly kanyzone
Transactions per second: 156.00 ( 0.00%) 153.00 (-1.96%)
Data megabytes read per second: 21.51 ( 0.00%) 21.52 ( 0.05%)
Data megabytes written per second: 29.28 ( 0.00%) 29.11 (-0.58%)
Files created alone per second: 250.00 ( 0.00%) 416.00 (39.90%)
Files create/transact per second: 79.00 ( 0.00%) 76.00 (-3.95%)
Files deleted alone per second: 520.00 ( 0.00%) 420.00 (-23.81%)
Files delete/transact per second: 79.00 ( 0.00%) 76.00 (-3.95%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 16.58 17.4
Total Elapsed Time (seconds) 218.48 222.47
VMstat Reclaim Statistics: vmscan
Direct reclaims 0 4
Direct reclaim pages scanned 0 203
Direct reclaim pages reclaimed 0 184
Kswapd pages scanned 326631 322018
Kswapd pages reclaimed 312632 309784
Kswapd low wmark quickly 1 4
Kswapd high wmark quickly 122 475
Kswapd skip congestion_wait 1 0
Pages activated 700040 705317
Pages deactivated 212113 203922
Pages written 9875 6363
Total pages scanned 326631 322221
Total pages reclaimed 312632 309968
%age total pages scanned/reclaimed 95.71% 96.20%
%age total pages scanned/written 3.02% 1.97%
proc vmstat: Faults
Major Faults 300 254
Minor Faults 645183 660284
Page ins 493588 486704
Page outs 4960088 4986704
Swap ins 1230 661
Swap outs 9869 6355
Performance is mildly affected because kswapd is no longer doing as much
work and the background memory consumer process is getting in the way.
Note that kswapd scanned and reclaimed fewer pages as it's less aggressive
and overall fewer pages were scanned and reclaimed. Swap in/out is
particularly reduced again reflecting kswapd throwing out fewer pages.
The slight performance impact is unfortunate here but it looks like a
direct result of kswapd being less aggressive. As the bug report is about
too many pages being freed by kswapd, it may have to be accepted for now.
The second test is a streaming IO benchmark that was previously used by
Johannes to show regressions in page reclaim.
MICRO
traceonly kanyzone
User/Sys Time Running Test (seconds) 29.29 28.87
Total Elapsed Time (seconds) 492.18 488.79
VMstat Reclaim Statistics: vmscan
Direct reclaims 2128 1460
Direct reclaim pages scanned 2284822 1496067
Direct reclaim pages reclaimed 148919 110937
Kswapd pages scanned 15450014 16202876
Kswapd pages reclaimed 8503697 8537897
Kswapd low wmark quickly 3100 3397
Kswapd high wmark quickly 1860 7243
Kswapd skip congestion_wait 708 801
Pages activated 9635 9573
Pages deactivated 1432 1271
Pages written 223 1130
Total pages scanned 17734836 17698943
Total pages reclaimed 8652616 8648834
%age total pages scanned/reclaimed 48.79% 48.87%
%age total pages scanned/written 0.00% 0.01%
proc vmstat: Faults
Major Faults 165 221
Minor Faults 9655785 9656506
Page ins 3880 7228
Page outs 37692940 37480076
Swap ins 0 69
Swap outs 19 15
Again fewer pages are scanned and reclaimed as expected and this time the
test completed faster. Note that kswapd is hitting its watermarks faster
(low and high wmark quickly) which I expect is due to kswapd reclaiming
fewer pages.
I also ran fs-mark, iozone and sysbench but there is nothing interesting
to report in the figures. Performance is not significantly changed and
the reclaim statistics look reasonable.
Tgis patch:
When the allocator enters its slow path, kswapd is woken up to balance the
node. It continues working until all zones within the node are balanced.
For order-0 allocations, this makes perfect sense but for higher orders it
can have unintended side-effects. If the zone sizes are imbalanced,
kswapd may reclaim heavily within a smaller zone discarding an excessive
number of pages. The user-visible behaviour is that kswapd is awake and
reclaiming even though plenty of pages are free from a suitable zone.
This patch alters the "balance" logic for high-order reclaim allowing
kswapd to stop if any suitable zone becomes balanced to reduce the number
of pages it reclaims from other zones. kswapd still tries to ensure that
order-0 watermarks for all zones are met before sleeping.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Eric B Munson <emunson@mgebm.net>
Cc: Simon Kirby <sim@hostway.ca>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_mapping() has a unlikely that the mapping has PAGE_MAPPING_ANON set.
But running the annotated branch profiler on a normal desktop system doing
vairous tasks (xchat, evolution, firefox, distcc), it is not really that
unlikely that the mapping here will have the PAGE_MAPPING_ANON flag set:
correct incorrect % Function File Line
------- --------- - -------- ---- ----
35935762 1270265395 97 page_mapping mm.h 659
1306198001 143659 0 page_mapping mm.h 657
203131478 121586 0 page_mapping mm.h 657
5415491 1116 0 page_mapping mm.h 657
74899487 1116 0 page_mapping mm.h 657
203132845 224 0 page_mapping mm.h 659
5415464 27 0 page_mapping mm.h 659
13552 0 0 page_mapping mm.h 657
13552 0 0 page_mapping mm.h 659
242630 0 0 page_mapping mm.h 657
242630 0 0 page_mapping mm.h 659
74899487 0 0 page_mapping mm.h 659
The page_mapping() is a static inline, which is why it shows up multiple
times.
The unlikely in page_mapping() was correct a total of 1909540379 times and
incorrect 1270533123 times, with a 39% being incorrect. With this much of
an error, it's best to simply remove the unlikely and have the compiler
and branch prediction figure this out.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mapping_unevictable() has a likely() around the mapping parameter.
This mapping parameter comes from page_mapping() which has an unlikely()
that the page will be set as PAGE_MAPPING_ANON, and if so, it will return
NULL. One would think that this unlikely() means that the mapping
returned by page_mapping() would not be NULL, but where page_mapping() is
used just above mapping_unevictable(), that unlikely() is incorrect most
of the time. This means that the "likely(mapping)" in
mapping_unevictable() is incorrect most of the time.
Running the annotated branch profiler on my main box which runs firefox,
evolution, xchat and is part of my distcc farm, I had this:
correct incorrect % Function File Line
------- --------- - -------- ---- ----
12872836 1269443893 98 mapping_unevictable pagemap.h 51
35935762 1270265395 97 page_mapping mm.h 659
1306198001 143659 0 page_mapping mm.h 657
203131478 121586 0 page_mapping mm.h 657
5415491 1116 0 page_mapping mm.h 657
74899487 1116 0 page_mapping mm.h 657
203132845 224 0 page_mapping mm.h 659
5415464 27 0 page_mapping mm.h 659
13552 0 0 page_mapping mm.h 657
13552 0 0 page_mapping mm.h 659
242630 0 0 page_mapping mm.h 657
242630 0 0 page_mapping mm.h 659
74899487 0 0 page_mapping mm.h 659
The page_mapping() is a static inline, which is why it shows up multiple
times. The mapping_unevictable() is also a static inline but seems to be
used only once in my setup.
The unlikely in page_mapping() was correct a total of 1909540379 times and
incorrect 1270533123 times, with a 39% being incorrect. Perhaps this is
enough to remove the unlikely from page_mapping() as well.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Nick Piggin <npiggin@kernel.dk>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the code to mlock pages from __mlock_vma_pages_range() to
follow_page().
This allows __mlock_vma_pages_range() to not have to break down work into
16-page batches.
An additional motivation for doing this within the present patch series is
that it'll make it easier for a later chagne to drop mmap_sem when
blocking on disk (we'd like to be able to resume at the page that was read
from disk instead of at the start of a 16-page batch).
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Temporary IO failures, eg. due to loss of both multipath paths, can
permanently leave the PageError bit set on a page, resulting in msync or
fsync returning -EIO over and over again, even if IO is now getting to the
disk correctly.
We already clear the AS_ENOSPC and AS_IO bits in mapping->flags in the
filemap_fdatawait_range function. Also clearing the PageError bit on the
page allows subsequent msync or fsync calls on this file to return without
an error, if the subsequent IO succeeds.
Unfortunately data written out in the msync or fsync call that returned
-EIO can still get lost, because the page dirty bit appears to not get
restored on IO error. However, the alternative could be potentially all
of memory filling up with uncleanable dirty pages, hanging the system, so
there is no nice choice here...
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Valerie Aurora <vaurora@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We'd like to be able to oom_score_adj a process up/down as it
enters/leaves the foreground. Currently, it is not possible to oom_adj
down without CAP_SYS_RESOURCE. This patch allows a task to decrease its
oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
or its inherited value at fork. Assuming the thread that has forked it
has oom_score_adj of 0, each process could decrease it back from 0 upon
activation unless a CAP_SYS_RESOURCE thread elevated it to something
higher.
Alternative considered:
* a setuid binary
* a daemon with CAP_SYS_RESOURCE
Since you don't wan't all processes to be able to reduce their oom_adj, a
setuid or daemon implementation would be complex. The alternatives also
have much higher overhead.
This patch updated from original patch based on feedback from David
Rientjes.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
module_init(). Much of the code is duplicated and can be generalized in a
globally accessible function, __vmalloc_node_range().
__vmalloc_node() now calls into __vmalloc_node_range() with a range of
[VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.
Each architecture may then use __vmalloc_node_range() directly to remove
the duplication of code.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pcpu_get_vm_areas() only uses GFP_KERNEL allocations, so remove the gfp_t
formal and use the mask internally.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_vm_area_node() is unused in the kernel and can thus be removed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With compaction being used instead of lumpy reclaim, the name lumpy_mode
and associated variables is a bit misleading. Rename lumpy_mode to
reclaim_mode which is a better fit. There is no functional change.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the introduction of the boolean sync parameter, the API looks a
little inconsistent as offlining is still an int. Convert offlining to a
bool for the sake of being tidy.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Migration synchronously waits for writeback if the initial passes fails.
Callers of memory compaction do not necessarily want this behaviour if the
caller is latency sensitive or expects that synchronous migration is not
going to have a significantly better success rate.
This patch adds a sync parameter to migrate_pages() allowing the caller to
indicate if wait_on_page_writeback() is allowed within migration or not.
For reclaim/compaction, try_to_compact_pages() is first called
asynchronously, direct reclaim runs and then try_to_compact_pages() is
called synchronously as there is a greater expectation that it'll succeed.
[akpm@linux-foundation.org: build/merge fix]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lumpy reclaim is disruptive. It reclaims a large number of pages and
ignores the age of the pages it reclaims. This can incur significant
stalls and potentially increase the number of major faults.
Compaction has reached the point where it is considered reasonably stable
(meaning it has passed a lot of testing) and is a potential candidate for
displacing lumpy reclaim. This patch introduces an alternative to lumpy
reclaim whe compaction is available called reclaim/compaction. The basic
operation is very simple - instead of selecting a contiguous range of
pages to reclaim, a number of order-0 pages are reclaimed and then
compaction is later by either kswapd (compact_zone_order()) or direct
compaction (__alloc_pages_direct_compact()).
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: use conventional task_struct naming]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
syncronous or asyncronous. In preparation for using compaction instead of
lumpy reclaim, this patch converts the flags into a bitmap.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for a patches promoting the use of memory compaction over
lumpy reclaim, this patch adds trace points for memory compaction
activity. Using them, we can monitor the scanning activity of the
migration and free page scanners as well as the number and success rates
of pages passed to page migration.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This tracks when balance_dirty_pages() tries to wakeup the flusher thread
for background writeback (if it was not started already).
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
reduce_pgdat_percpu_threshold() and restore_pgdat_percpu_threshold() exist
to adjust the per-cpu vmstat thresholds while kswapd is awake to avoid
errors due to counter drift. The functions duplicate some code so this
patch replaces them with a single set_pgdat_percpu_threshold() that takes
a callback function to calculate the desired threshold as a parameter.
[akpm@linux-foundation.org: readability tweak]
[kosaki.motohiro@jp.fujitsu.com: set_pgdat_percpu_threshold(): don't use for_each_online_cpu]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit aa45484 ("calculate a better estimate of NR_FREE_PAGES when memory
is low") noted that watermarks were based on the vmstat NR_FREE_PAGES. To
avoid synchronization overhead, these counters are maintained on a per-cpu
basis and drained both periodically and when a threshold is above a
threshold. On large CPU systems, the difference between the estimate and
real value of NR_FREE_PAGES can be very high. The system can get into a
case where pages are allocated far below the min watermark potentially
causing livelock issues. The commit solved the problem by taking a better
reading of NR_FREE_PAGES when memory was low.
Unfortately, as reported by Shaohua Li this accurate reading can consume a
large amount of CPU time on systems with many sockets due to cache line
bouncing. This patch takes a different approach. For large machines
where counter drift might be unsafe and while kswapd is awake, the per-cpu
thresholds for the target pgdat are reduced to limit the level of drift to
what should be a safe level. This incurs a performance penalty in heavy
memory pressure by a factor that depends on the workload and the machine
but the machine should function correctly without accidentally exhausting
all memory on a node. There is an additional cost when kswapd wakes and
sleeps but the event is not expected to be frequent - in Shaohua's test
case, there was one recorded sleep and wake event at least.
To ensure that kswapd wakes up, a safe version of zone_watermark_ok() is
introduced that takes a more accurate reading of NR_FREE_PAGES when called
from wakeup_kswapd, when deciding whether it is really safe to go back to
sleep in sleeping_prematurely() and when deciding if a zone is really
balanced or not in balance_pgdat(). We are still using an expensive
function but limiting how often it is called.
When the test case is reproduced, the time spent in the watermark
functions is reduced. The following report is on the percentage of time
spent cumulatively spent in the functions zone_nr_free_pages(),
zone_watermark_ok(), __zone_watermark_ok(), zone_watermark_ok_safe(),
zone_page_state_snapshot(), zone_page_state().
vanilla 11.6615%
disable-threshold 0.2584%
David said:
: We had to pull aa454840 "mm: page allocator: calculate a better estimate
: of NR_FREE_PAGES when memory is low and kswapd is awake" from 2.6.36
: internally because tests showed that it would cause the machine to stall
: as the result of heavy kswapd activity. I merged it back with this fix as
: it is pending in the -mm tree and it solves the issue we were seeing, so I
: definitely think this should be pushed to -stable (and I would seriously
: consider it for 2.6.37 inclusion even at this late date).
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reported-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Tested-by: Nicolas Bareil <nico@chdir.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: <stable@kernel.org> [2.6.37.1, 2.6.36.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This warning was added in commit bdff746a39 ("clone: prepare to recycle
CLONE_STOPPED") three years ago. 2.6.26 came and went. As far as I know,
no-one is actually using CLONE_STOPPED.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use modern per_cpu API to increment {soft|hard}irq counters, and use
per_cpu allocation for (struct irq_desc)->kstats_irq instead of an array.
This gives better SMP/NUMA locality and saves few instructions per irq.
With small nr_cpuids values (8 for example), kstats_irq was a small array
(less than L1_CACHE_BYTES), potentially source of false sharing.
In the !CONFIG_SPARSE_IRQ case, remove the huge, NUMA/cache unfriendly
kstat_irqs_all[NR_IRQS][NR_CPUS] array.
Note: we still populate kstats_irq for all possible irqs in
early_irq_init(). We probably could use on-demand allocations. (Code
included in alloc_descs()). Problem is not all IRQS are used with a prior
alloc_descs() call.
kstat_irqs_this_cpu() is not used anymore, remove it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (32 commits)
dm: raid456 basic support
dm: per target unplug callback support
dm: introduce target callbacks and congestion callback
dm mpath: delay activate_path retry on SCSI_DH_RETRY
dm: remove superfluous irq disablement in dm_request_fn
dm log: use PTR_ERR value instead of ENOMEM
dm snapshot: avoid storing private suspended state
dm snapshot: persistent make metadata_wq multithreaded
dm: use non reentrant workqueues if equivalent
dm: convert workqueues to alloc_ordered
dm stripe: switch from local workqueue to system_wq
dm: dont use flush_scheduled_work
dm snapshot: remove unused dm_snapshot queued_bios_work
dm ioctl: suppress needless warning messages
dm crypt: add loop aes iv generator
dm crypt: add multi key capability
dm crypt: add post iv call to iv generator
dm crypt: use io thread for reads only if mempool exhausted
dm crypt: scale to multiple cpus
dm crypt: simplify compatible table output
...
This reverts commit 0fdae42d36, which
wasn't really supposed to go in, and causes lots of annoying warnings.
Quoth Andrew:
"Complete brainfart - I meant to drop that patch ages ago."
Quoth Greg:
"Ick, yeah, that patch isn't ok to go in as-is, all of the callers
need to be fixed up first, which is what I thought we had agreed on..."
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is defined in include/linux/ieee80211.h. As per IEEE spec.
bit6 to bit15 in block ack parameter represents buffer size.
So the bitmask should be 0xFFC0.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Cc: stable@kernel.org
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
DM currently implements congestion checking by checking on congestion
in each component device. For raid456 we need to also check if the
stripe cache is congested.
Add per-target congestion checker callback support.
Extending the target_callbacks structure with additional callback
functions allows for establishing multiple callbacks per-target (a
callback is also needed for unplug).
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
serve only a single work item from the dm instance, so non-reentrant
workqueues would provide the same ordering guarantees as ordered ones
while allowing CPU affinity and use of the workqueues for other
purposes. Switch them to non-reentrant workqueues.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds a 'version' field to the 'dm_ulog_request'
structure.
The 'version' field is taken from a portion of the unused
'padding' field in the 'dm_ulog_request' structure. This
was done to avoid changing the size of the structure and
possibly disrupting backwards compatibility.
The version number will help notify user-space daemons
when a change has been made to the kernel/userspace
log API.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Allow the uuid of a mapped device to be set after device creation.
Previously the uuid (which is optional) could only be set by
DM_DEV_CREATE. If no uuid was supplied it could not be set later.
Sometimes it's necessary to create the device before the uuid is known,
and in such cases the uuid must be filled in after the creation.
This patch extends DM_DEV_RENAME to accept a uuid accompanied by
a new flag DM_UUID_FLAG. This can only be done once and if no
uuid was previously supplied. It cannot be used to change an
existing uuid.
DM_VERSION_MINOR is also bumped to 19 to indicate this interface
extension is available.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
fs: add documentation on fallocate hole punching
Gfs2: fail if we try to use hole punch
Btrfs: fail if we try to use hole punch
Ext4: fail if we try to use hole punch
Ocfs2: handle hole punching via fallocate properly
XFS: handle hole punching via fallocate properly
fs: add hole punching to fallocate
vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
fix signedness mess in rw_verify_area() on 64bit architectures
fs: fix kernel-doc for dcache::prepend_path
fs: fix kernel-doc for dcache::d_validate
sanitize ecryptfs ->mount()
switch afs
move internal-only parts of ncpfs headers to fs/ncpfs
switch ncpfs
switch 9p
pass default dentry_operations to mount_pseudo()
switch hostfs
switch affs
switch configfs
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix cleanup when trying to mount inexistent image
net/ceph: make ceph_msgr_wq non-reentrant
ceph: fsc->*_wq's aren't used in memory reclaim path
ceph: Always free allocated memory in osdmap_decode()
ceph: Makefile: Remove unnessary code
ceph: associate requests with opening sessions
ceph: drop redundant r_mds field
ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
ceph: add dir_layout to inode
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (45 commits)
regulator: missing index in PTR_ERR() in isl6271a_probe()
regulator: Assign return value of mc13xxx_reg_rmw to ret
regulator: Add initial per-regulator debugfs support
regulator: Make regulator_has_full_constraints a bool
regulator: Clean up logging a bit
regulator: Optimise out noop voltage changes
regulator: Add API to re-apply voltage to hardware
regulator: Staticise non-exported functions in mc13892
regulator: Only notify voltage changes when they succeed
regulator: Provide a selector based set_voltage_sel() operation
regulator: Factor out voltage set operation into a separate function
regulator: Convert WM8994 to use get_voltage_sel()
regulator: Convert WM835x to use get_voltage_sel()
regulator: Allow modular build of mc13xxx-core
regulator: support PMIC mc13892
make mc13783 regulator code generic
Change the register name definitions for mc13783
mach-ux500: Updated and connected ab8500 regulator board configuration
regulators: Removed macros for initialization of ab8500 regulators
regulators: Added verbose debug messages to ab8500 regulators
...
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
KVM: Initialize fpu state in preemptible context
KVM: VMX: when entering real mode align segment base to 16 bytes
KVM: MMU: handle 'map_writable' in set_spte() function
KVM: MMU: audit: allow audit more guests at the same time
KVM: Fetch guest cr3 from hardware on demand
KVM: Replace reads of vcpu->arch.cr3 by an accessor
KVM: MMU: only write protect mappings at pagetable level
KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
KVM: MMU: Initialize base_role for tdp mmus
KVM: VMX: Optimize atomic EFER load
KVM: VMX: Add definitions for more vm entry/exit control bits
KVM: SVM: copy instruction bytes from VMCB
KVM: SVM: implement enhanced INVLPG intercept
KVM: SVM: enhance mov DR intercept handler
KVM: SVM: enhance MOV CR intercept handler
KVM: SVM: add new SVM feature bit names
KVM: cleanup emulate_instruction
KVM: move complete_insn_gp() into x86.c
KVM: x86: fix CR8 handling
KVM guest: Fix kvm clock initialization when it's configured out
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hid-multitouch: minor fixes based on additional review
HID: Switch turbox/mosart touchscreen to hid-mosart
HID: add Add Cando touch screen 10.1-inch product id
HID: hid-mulitouch: add support for the 'Sensing Win7-TwoFinger'
HID: hid-multitouch: add support for Cypress TrueTouch panels
HID: hid-multitouch: support for PixCir-based panels
HID: set HID_MAX_FIELD at 128
HID: add feature_mapping callback
This implements the API defined in <linux/decompress/generic.h> which is
used for kernel, initramfs, and initrd decompression. This patch together
with the first patch is enough for XZ-compressed initramfs and initrd;
XZ-compressed kernel will need arch-specific changes.
The buffering requirements described in decompress_unxz.c are stricter
than with gzip, so the relevant changes should be done to the
arch-specific code when adding support for XZ-compressed kernel.
Similarly, the heap size in arch-specific pre-boot code may need to be
increased (30 KiB is enough).
The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and
memzero() (memset(ptr, 0, size)), which aren't available in all
arch-specific pre-boot environments. I'm including simple versions in
decompress_unxz.c, but a cleaner solution would naturally be nicer.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In userspace, the .lzma format has become mostly a legacy file format that
got superseded by the .xz format. Similarly, LZMA Utils was superseded by
XZ Utils.
These patches add support for XZ decompression into the kernel. Most of
the code is as is from XZ Embedded <http://tukaani.org/xz/embedded.html>.
It was written for the Linux kernel but is usable in other projects too.
Advantages of XZ over the current LZMA code in the kernel:
- Nice API that can be used by other kernel modules; it's
not limited to kernel, initramfs, and initrd decompression.
- Integrity check support (CRC32)
- BCJ filters improve compression of executable code on
certain architectures. These together with LZMA2 can
produce a few percent smaller kernel or Squashfs images
than plain LZMA without making the decompression slower.
This patch: Add the main decompression code (xz_dec), testing module
(xz_dec_test), wrapper script (xz_wrap.sh) for the xz command line tool,
and documentation. The xz_dec module is enough to have a usable XZ
decompressor e.g. for Squashfs.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently users of mm.h need to include <linux/slab.h> to use the macros
malloc() and free() provided by mm.h. This fixes it.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_error_fn() has become a useless complication after c1e7c3ae59
("bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure") fixed
the use of error() in malloc(). Only decompress_unlzma.c had some use for
it and that was easy to change too.
This also gets rid of the static function pointer "error", which
should have been marked as __initdata.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This header uses things like __be32, so pull in linux/types.h.
Further, it uses BLOCK_SIZE, so pull in linux/fs.h.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, 3 kernel function prototypes are present in a header
file exported to userland. This patch fixes it.
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MONOTONIC_RAW clock timestamps are ideally suited for frequency
calculation and also fit well into the original NTP hardpps design. Now
phase and frequency can be adjusted separately: the former based on
REALTIME clock and the latter based on MONOTONIC_RAW clock.
A new function getnstime_raw_and_real is added to timekeeping subsystem to
capture both timestamps at the same time and atomically.
Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit adds hardpps() implementation based upon the original one from
the NTPv4 reference kernel code from David Mills. However, it is highly
optimized towards very fast syncronization and maximum stickness to PPS
signal. The typical error is less then a microsecond.
To make it sync faster I had to throw away exponential phase filter so
that the full phase offset is corrected immediately. Then I also had to
throw away median phase filter because it gives a bigger error itself if
used without exponential filter.
Maybe we will find an appropriate filtering scheme in the future but it's
not necessary if the signal quality is ok.
Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the code that gatheres timestamp in pps_tty_dcd_change() in case
passed ts parameter is NULL because it never happens in the current code.
Fix comments as well.
Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using device index as a pointer needs some unnecessary work to be done
every time the pointer is needed (in irq handler for example). Using a
direct pointer is much more easy (and safe as well).
Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a helper function to gather timestamps. This way clients don't have
to duplicate it.
Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was a race in PPS_FETCH ioctl handler when several processes want to
obtain PPS data simultaneously using sleeping PPS_FETCH. They all sleep
most of the time in the system call.
With the old approach when the first process waiting on the pps queue is
waken up it makes new system call right away and zeroes pps->go. So other
processes continue to sleep. This is a clear race condition because of
the global 'go' variable.
With the new approach pps->last_ev holds some value increasing at each PPS
event. PPS_FETCH ioctl handler saves current value to the local variable
at the very beginning so it can safely check that there is a new event by
just comparing both variables.
Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here are some very trivial fixes combined:
- add macro definitions to protect header file from including several times
- remove declaration for an unexistent array
- fix typos
Signed-off-by: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Creates a new "Near Field Communication" subsystem in drivers/nfc.
http://en.wikipedia.org/wiki/Near_Field_Communication is useful ;)
This is a driver for the pn544 NFC device. The driver transfers
ETSI messages between the device and the user space.
Signed-off-by: Matti J. Aaltonen <matti.j.aaltonen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently on 64-bit arch the user_namespace is 2096 and when being
kmalloc-ed it resides on a 4k slab wasting 2003 bytes.
If we allocate a separate cache for it and reduce the hash size from 128
to 64 chains the packaging becomes *much* better - the struct is 1072
bytes and the hole between is 98 bytes.
[akpm@linux-foundation.org: s/__initcall/module_init/]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add new sRIO switch device IDs and enable a basic support for them.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add definition of the unique device identifier field in the component tag.
RIO_CTAG_UDEVID does not take all 32 bits of the component tag value to
allow future extensions to the component tag use.
Selected size of the RIO_CTAG_UDEVID field (17 bits) is sufficient to
accommodate maximum number of endpoints in large RIO network (16-bit id)
plus switches.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert RIO switches device structures (rio_dev + rio_switch) into a
single allocation unit.
This change is based on the fact that RIO switches are using common RIO
device objects anyway. Allocating RIO switch objects as RIO devices with
added space for switch information simplifies handling of RIO switch
devices.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change code to use one storage location common for switches and endpoints.
This eliminates unnecessary device type checks during basic access
operations. Logic that assigns destid to RIO devices stays unchanged - as
before, switches use an associated destid because they do not have their
own.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 66fa12c571 ("ieee1394: remove the old IEEE 1394 driver stack")
eliminated the only user of cdev_index(). So it can be removed too.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Because GPIOs can have crucial functions especially in embedded systems,
we are better safe than sorry regarding their configuration. For
gpio_request, the documentation is simply enforced: <quote>"The return
value of gpio_request() must be checked."</quote> For gpio_direction_* and
gpio_request_*, we now act accordingly.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drop the old geode_gpio crud, as well as the raw outl() calls; instead,
use the Linux GPIO API where possible, and the cs5535_gpio API in other
places.
Note that we don't actually clean up the driver properly yet (once loaded,
it always remains loaded). That'll come later..
This patch is necessary for building the driver.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds (well, re-adds actually) handling for events/IRQs through cs5535
GPIOs. In the wild and wooly world of CS5535, setup_event() is for
assigning an IRQ to a GPIO filter/event pair, and set_irq() sets up the
pair to trigger IRQs.
These should really only be used in highly platform-specific drivers (such
as OLPC's DCON driver). Sadly, because set_irq() uses MSRs, this causes
the driver to become X86-specific.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cleans up a few bits in binfmt_elf.c and binfmts.h:
- the hasvdso field in struct linux_binfmt is unused, so remove it and
the only initialization of it
- the elf_map CPP symbol is not defined anywhere in the kernel, so
remove an unnecessary #ifndef elf_map
- reduce excessive indentation in elf_format's initializer
- add missing spaces, remove extraneous spaces
No functional changes, but tested on x86 (32 and 64 bit), powerpc (32 and
64 bit), sparc64, arm, and alpha.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a 16TB machine, max_user_watches has an integer overflow. Convert it
to use a long and handle the associated fallout.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Move printk_once definitions and add an #ifdef CONFIG_PRINTK
- Add pr_<level>_once so printks can use pr_fmt
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Move no_printk above first CONFIG_PRINTK block so it can be used by
printk_once.
- Convert statement expression if (0) printk macros to no_printk.
- Convert printk_once(x...) to more normally used (fmt, ...) fmt,
##__VA_ARGS__.
- Standardize __attribute__ use.
- Expand single line inline functions.
- Remove space before pointer.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are many uses of printk_once(KERN_<level>, so add pr_<level>_once
macros to avoid printk_once(KERN_<level> pr_fmt(fmt).
Add an #ifdef CONFIG_PRINTK for print_hex_dump and static inline void
functions for the #else cases to reduce embedded code size. Neaten and
organize the rest of the code.
This patch:
Move console functions and variables together.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the %pK printk format specifier and the /proc/sys/kernel/kptr_restrict
sysctl.
The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.
If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".
[akpm@linux-foundation.org: check for IRQ context when !kptr_restrict, save an indent level, s/WARN/WARN_ONCE/]
[akpm@linux-foundation.org: coding-style fixup]
[randy.dunlap@oracle.com: fix kernel/sysctl.c warning]
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The readmostly section should end at a cacheline aligned address,
otherwise the last several data might share cachline with other data and
make the readmostly data still have cache bounce.
For example, in ia64, secpath_cachep is the last readmostly data, and it
shares cacheline with init_uts_ns.
a000000100e80480 d secpath_cachep
a000000100e80488 D init_uts_ns
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Will Newton <will.newton@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, tosh_smm() prototype is present in a header file exported to
userland. This patch fixes it.
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Cc: Jonathan Buzzard <jonathan@buzzard.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal reports:
In the framebuffer subsystem the abs() macro is often used as a part of
the calculation of a Manhattan metric, which in turn is used as a measure
of similarity between video modes. The arguments of abs() are sometimes
unsigned numbers. This worked fine until commit a49c59c0 ("Make sure the
value in abs() does not get truncated if it is greater than 2^32:) , which
changed the definition of abs() to prevent truncation. As a result of
this change, in the following piece of code:
u32 a = 0, b = 1;
u32 c = abs(a - b);
'c' will end up with a value of 0xffffffff instead of the expected 0x1.
A problem caused by this change and visible by the end user is that
framebuffer drivers relying on functions from modedb.c will fail to find
high resolution video modes similar to that explicitly requested by the
user if an exact match cannot be found (see e.g.
Fix this by special-casing `long' types within abs().
This patch reduces x86_64 code size a bit - drivers/video/uvesafb.o shrunk
by 15 bytes, presumably because it is doing abs() on 4-byte quantities,
and expanding those to 8-byte longs adds code.
testcase:
#define oldabs(x) ({ \
long __x = (x); \
(__x < 0) ? -__x : __x; \
})
#define newabs(x) ({ \
long ret; \
if (sizeof(x) == sizeof(long)) { \
long __x = (x); \
ret = (__x < 0) ? -__x : __x; \
} else { \
int __x = (x); \
ret = (__x < 0) ? -__x : __x; \
} \
ret; \
})
typedef unsigned int u32;
main()
{
u32 a = 0;
u32 b = 1;
u32 oldc = oldabs(a - b);
u32 newc = newabs(a - b);
printf("%u %u\n", oldc, newc);
}
akpm:/home/akpm> gcc t.c
akpm:/home/akpm> ./a.out
4294967295 1
Reported-by: Michal Januszewski <michalj@gmail.com>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to know the reason why system rebooted in support service.
However, we can't inform our customers of the reason because final
messages are lost on current Linux kernel.
This patch improves the situation above because the final messages are
saved by adding kmsg_dump() to reboot, halt, poweroff and
emergency_restart path.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the led device name is fetched from the device_type in
I2C_BOARD_INFO which comes from the platform data. This name is in turn
used to create an entry in sysfs.
If there exists two or more lp5521 on a particular platform, the
device_type in I2C_BOARD_INFO has to be the same, else lp5521 driver probe
wont be called and if used so, results in run time warning "cannot create
sysfs with same name" and hence a failure.
The name that is used to create sysfs entry is to be passed by the struct
led_platform_data. Hence adding an element of type const char * and
change in lp5521 driver to use this name in creating the led device if
present else use the name obtained by I2C_BOARD_INFO.
Signed-off-by: Arun Murthy <arun.murthy@stericsson.com>
Acked-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently all leds channels begins with string lp5523. Patch adds a
possibility to provide name via platform data. This makes it possible to
have several chips without overlapping sysfs names.
Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: Arun Murthy <arun.murthy@stericsson.com>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The RED statistics structure includes backlog field which is not
set or used by any code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hole punching has already been implemented by XFS and OCFS2, and has the
potential to be implemented on both BTRFS and EXT4 so we need a generic way to
get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE
to fallocate() since it already looks like the normal fallocate() operation.
I've tested this patch with XFS and BTRFS to make sure XFS did what it's
supposed to do and that BTRFS failed like it was supposed to. Thank you,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Coda ->d_revalidate() actually checks for root, ->d_delete() is irrelevant.
So we can use the same d_op for all coda dentries
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ceph messenger code does a rather complex dancing around multithread
workqueue to make sure the same work item isn't executed concurrently
on different CPUs. This restriction can be provided by workqueue with
WQ_NON_REENTRANT.
Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
level and remove the QUEUED/BUSY logic.
* This removes backoff handling in con_work() but it couldn't reliably
block execution of con_work() to begin with - queue_con() can be
called after the work started but before BUSY is set. It seems that
it was an optimization for a rather cold path and can be safely
removed.
* The number of concurrent work items is bound by the number of
connections and connetions are independent from each other. With
the default concurrency level, different connections will be
executed independently.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function. This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.
Signed-off-by: Sage Weil <sage@newdream.net>
The newest device firmware stores IB vs. Ethernet protocol in two bits
in members_count field of multicast group table (0: Infiniband, 1:
Ethernet). When changing the QP members count for a multicast group,
it important not to reset this information. When calling multicast
attach first time, the protocol type should be specified. In this
patch we always set it IB, but in the future we will handle Ethernet
too. When looking for a QP, the protocol type shoud be checked too.
Signed-off-by: Aleksey Senin <alekseys@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The recent rework of the NVS saving/restoring code introduced two
build issues for !CONFIG_ACPI, a warning in drivers/acpi/internal.h
and an error in arch/x86/kernel/e820.c.
Fix them by providing suitable static inline definitions of the
relevant functions.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.
This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.
Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
is prone to deadlocks. We hold s_umount semaphore for reading during the
path resolution and resolution itself may need to acquire the semaphore
for writing when e. g. autofs mountpoint is passed.
Solve the problem by performing the resolution before we get hold of the
superblock (and thus s_umount semaphore). The whole thing is complicated
by the fact that some filesystems (OCFS2) ignore the path argument. So to
distinguish between filesystem which want the path and which do not we
introduce new .quota_on_meta callback which does not get the path. OCFS2
then uses this callback instead of old .quota_on.
CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Christoph Hellwig <hch@lst.de>
CC: Ted Ts'o <tytso@mit.edu>
CC: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'stable/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/xenbus: making backend support modular is too complex
xen/pci: Make xen-pcifront be dependent on XEN_XENBUS_FRONTEND
xen/xenbus: fixup checkpatch issues in xenbus_probe*
xen/netfront: select XEN_XENBUS_FRONTEND
xen/xenbus: clean up noise in xenbus_probe_frontend.c
xen/xenbus: clean up noise in xenbus_probe_backend.c
xen/xenbus: clean up noise in xenbus_probe.c
xen/xenbus: cleanup debug noise in xenbus_comms.c
xen/xenbus: clean up error handling
xen/xenbus: make frontend bus GPL
xen/xenbus: make sure backend bus is registered earlier
xenbus/frontend: register bus earlier
xen: remove xen/evtchn.h
xen: add backend driver support
xen: separate out frontend xenbus
We only expose the use and open counts to userspace, providing a tiny
bit of insight into what the API is up to.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
When cooperating with an external control source the regulator setup
may be changed underneath the API. Currently consumers can just redo
the regulator_set_voltage() to restore a previously set configuration
but provide an explicit API for doing this as optimsations in the
regulator_set_voltage() implementation will shortly prevent that.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Many regulator drivers implement voltage setting by looping through a
table of possible values, normally because the set of available voltages
can't be mapped onto selectors with simple calcuation. Factor out these
loops by providing a variant of set_voltage() which takes a selector rather
than a voltage range as an argument and implementing a loop through the
available selectors in the core.
This is not going to be suitable for use with all devices as when the
regulator voltage can be mapped onto selector values with a simple
calculation the linear scan through the available values will be more
expensive than just doing the calculation, especially for regulators
that provide fine grained voltage control.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
To make mc13783 and mc13892 share code, the register names should be
changed to fit the new macro definitions in the comming patch.
Signed-off-by: Yong Shen <yong.shen@linaro.org>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
The define for number of regulators is moved from ab8500-core to
ab8500-regulator so that the regulator driver can be updated
independently of ab8500-core. This also changes the platform
configuration structure of ab8500-core so that it contains a
pointer to the regulator_init_data array plus number of
regulators instead of an fixed size array of pointers to
regulator_init_data.
Signed-off-by: Bengt Jonsson <bengt.g.jonsson@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Since drivers already have to provide an API for translating selectors
into voltages they may as well just report the selector values directly
to the core API rather than implement the lookup themselves. The old
interface is left in place for now, but may be removed in future.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Currently the regulator API uses the constraints structure passed in to
the core throughout the lifetime of the object. This means that it is not
possible to mark the constraints as __initdata so if the kernel supports
many boards the constraints for all of them are kept around throughout the
lifetime of the system, consuming memory needlessly. By copying constraints
that are actually used we allow the use of __initdata, saving memory when
multiple boards are supported.
This also means the constraints can be const.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Provide some basic trace facilities to the regulator API. We generate
events on regulator enable, disable and voltage setting over the actual
hardware operations (which are assumed to be the expensive ones which
require interaction with the actual device). This is intended to facilitate
debug of the performance and behaviour with consumers allowing unified
traces to be generated including the regulator operations within the
context of the other components of the system.
For enable we log the explicit delay for the voltage ramp separately to
the interaction with the hardware to highlight the time consumed in I/O.
We should add a similar delay for voltage changes, though there the
relatively small magnitude of the changes in the context of the I/O
costs makes it much less critical for most regulators.
Only hardware interactions are currently traced as the primary focus is
on the performance and synchronisation of actual hardware interactions.
Additional tracepoints for debugging of the logical operations can be
added later if required.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Change the interface used by set_voltage() to report the selected value
to the regulator core in terms of a selector used by list_voltage().
This allows the regulator core to know the voltage that was chosen
without having to do an explict get_voltage(), which would be much more
expensive as it will generally access hardware.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
The wake_capable ACPI device flag is not necessary, because it is
only used in scan.c for recording the information that _PRW is
present for the given device. That information is only used by
acpi_add_single_object() to decide whether or not to call
acpi_bus_get_wakeup_device_flags(), so the flag may be dropped
if the _PRW check is moved to acpi_bus_get_wakeup_device_flags().
Moreover, acpi_bus_get_wakeup_device_flags() always returns 0,
so it really should be void.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
There are no more users of acpi_bus_get_power(), so it can be
dropped. Moreover, it should be dropped, because it modifies
the device->power.state field of an ACPI device without updating
the reference counters of the device's power resources, which is
wrong.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Use the new function acpi_bus_update_power() for manipulating power
resources used by ACPI fan devices, which allows them to be put into
the right state during initialization and resume. Consequently,
remove the flags.force_power_state field from struct acpi_device,
which is not necessary any more.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Add function acpi_bus_update_power() for reading the actual power
state of an ACPI device and updating its device->power.state field
in such a way that its power resources' reference counters will
remain consistent with that field.
For this purpose introduce __acpi_bus_set_power() setting the
power state of an ACPI device without updating its
device->power.state field and make acpi_bus_set_power() and
acpi_bus_update_power() use it (acpi_bus_set_power() retains the
current behavior for now).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Large page information has two elements but one of them, write_count, alone
is accessed by a helper function.
This patch replaces this helper function with more generic one which returns
newly named kvm_lpage_info structure and use it to access the other element
rmap_pde.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
| that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page()
| can consult the bit and force a flush if set.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM compilation fails with the following warning:
include/linux/kvm_host.h: In function 'kvm_irq_routing_update':
include/linux/kvm_host.h:679:2: error: 'struct kvm' has no member named 'irq_routing'
That function is only used and reasonable to have on systems that implement
an in-kernel interrupt chip. PPC doesn't.
Fix by #ifdef'ing it out when no irqchip is available.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.
While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.
This also adds some comments about locking rules and rcu usage in code.
Some notes on the design:
- Use pointer into the rt instead of copying an entry,
to make it possible to use rcu, thus side-stepping
locking complexities. We also save some memory this way.
- Old workqueue code is still used for level irqs.
I don't think we DTRT with level anyway, however,
it seems easier to keep the code around as
it has been thought through and debugged, and fix level later than
rip out and re-instate it later.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cosmetic change, but it helps to correlate IRQs with PCI devices.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.
Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.
It also fixes error clean-up on failures of kvm_create_vm for IA64.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Tracing 'async' and *pfn is useless, since 'async' is always true,
and '*pfn' is always "fault_pfn'
We can trace 'gva' and 'gfn' instead, it can help us to see the
life-cycle of an async_pf
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.
This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.
Performance improvement:
I measured performance for the case of VGA update by trace-cmd.
The result was 1.5 times faster than the original one.
In the case of live migration, the improvement ratio depends on the workload
and the guest memory size. In general, the larger the memory size is the more
benefits we get.
Note:
This does not change other architectures's logic but the allocation size
becomes twice. This will increase the actual memory consumption only when
the new size changes the number of pages allocated by vmalloc().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Version 20101209.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The global event handler is called whenever a general purpose
or fixed ACPI event occurs.
Also update Linux OSL to collect events counter with
global event handler.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This feature provides an automatic device notification for wake devices
when a wakeup GPE occurs and there is no corresponding GPE method or
handler. Rather than ignoring such a GPE, an implicit AML Notify
operation is performed on the parent device object.
This feature is not part of the ACPI specification and is provided for
Windows compatibility only.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The new GPE handler callback has 2 additional parameters, gpe_device and
gpe_number.
typedef
u32 (*acpi_gpe_handler) (acpi_handle gpe_device, u32 gpe_number, void *context);
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some function and variable names are renamed to be consistent with
ACPICA code base.
acpi_raw_enable_gpe -> acpi_ev_add_gpe_reference
acpi_raw_disable_gpe -> acpi_ev_remove_gpe_reference
acpi_gpe_can_wake -> acpi_setup_gpe_for_wake
acpi_gpe_wakeup -> acpi_set_gpe_wake_mask
acpi_update_gpes -> acpi_update_all_gpes
acpi_all_gpes_initialized -> acpi_gbl_all_gpes_initialized
acpi_handler_info -> acpi_gpe_handler_info
...
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.
Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.
Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.
Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.
[avi: remove call to get_user_pages_noio(), nacked by Linus; this
makes everything synchrnous again]
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This switch is used to signal that user want to disable screen
transitions from portrait to landscape mode and back.
Signed-off-by: Jekyll Lai <jekyll_lai@wistron.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This patch adds event notification support to the generic
thermal sysfs framework in the kernel. The notification is in the
form of a netlink event.
Signed-off-by: R.Durgadoss <durgadoss.r@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (67 commits)
cxgb4vf: recover from failure in cxgb4vf_open()
netfilter: ebtables: make broute table work again
netfilter: fix race in conntrack between dump_table and destroy
ah: reload pointers to skb data after calling skb_cow_data()
ah: update maximum truncated ICV length
xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC
ehea: Increase the skb array usage
net/fec: remove config FEC2 as it's used nowhere
pcnet_cs: add new_id
tcp: disallow bind() to reuse addr/port
net/r8169: Update the function of parsing firmware
net: ppp: use {get,put}_unaligned_be{16,32}
CAIF: Fix IPv6 support in receive path for GPRS/3G
arp: allow to invalidate specific ARP entries
net_sched: factorize qdisc stats handling
mlx4: Call alloc_etherdev to allocate RX and TX queues
net: Add alloc_netdev_mqs function
caif: don't set connection request param size before copying data
cxgb4vf: fix mailbox data/control coherency domain race
qlcnic: change module parameter permissions
...
* 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits)
NFS fix the setting of exchange id flag
NFS: Don't use vm_map_ram() in readdir
NFSv4: Ensure continued open and lockowner name uniqueness
NFS: Move cl_delegations to the nfs_server struct
NFS: Introduce nfs_detach_delegations()
NFS: Move cl_state_owners and related fields to the nfs_server struct
NFS: Allow walking nfs_client.cl_superblocks list outside client.c
pnfs: layout roc code
pnfs: update nfs4_callback_recallany to handle layouts
pnfs: add CB_LAYOUTRECALL handling
pnfs: CB_LAYOUTRECALL xdr code
pnfs: change lo refcounting to atomic_t
pnfs: check that partial LAYOUTGET return is ignored
pnfs: add layout to client list before sending rpc
pnfs: serialize LAYOUTGET(openstateid)
pnfs: layoutget rpc code cleanup
pnfs: change how lsegs are removed from layout list
pnfs: change layout state seqlock to a spinlock
pnfs: add prefix to struct pnfs_layout_hdr fields
pnfs: add prefix to struct pnfs_layout_segment fields
...
broute table init hook sets up the "br_should_route_hook" pointer,
which then gets called from br_input.
commit a386f99025
(bridge: add proper RCU annotation to should_route_hook)
introduced a typedef, and then changed this to:
br_should_route_hook_t *rhook;
[..]
rhook = rcu_dereference(br_should_route_hook);
if (*rhook(skb))
problem is that "br_should_route_hook" contains the address of the function,
so calling *rhook() results in kernel panic.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
ext4: fix trimming starting with block 0 with small blocksize
ext4: revert buggy trim overflow patch
ext4: don't pass entire map to check_eofblocks_fl
ext4: fix memory leak in ext4_free_branches
ext4: remove ext4_mb_return_to_preallocation()
ext4: flush the i_completed_io_list during ext4_truncate
ext4: add error checking to calls to ext4_handle_dirty_metadata()
ext4: fix trimming of a single group
ext4: fix uninitialized variable in ext4_register_li_request
ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
ext4: drop i_state_flags on architectures with 64-bit longs
ext4: reorder ext4_inode_info structure elements to remove unneeded padding
ext4: drop ec_type from the ext4_ext_cache structure
ext4: use ext4_lblk_t instead of sector_t for logical blocks
ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
ext4: fix 32bit overflow in ext4_ext_find_goal()
ext4: add more error checks to ext4_mkdir()
ext4: ext4_ext_migrate should use NULL not 0
ext4: Use ext4_error_file() to print the pathname to the corrupted inode
ext4: use IS_ERR() to check for errors in ext4_error_file
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG
ext3: Remove redundant unlikely()
ext2: Remove redundant unlikely()
ext3: speed up file creates by optimizing rec_len functions
ext2: speed up file creates by optimizing rec_len functions
ext3: Add more journal error check
ext3: Add journal error check in resize.c
quota: Use %pV and __attribute__((format (printf in __quota_error and fix fallout
ext3: Add FITRIM handling
ext3: Add batched discard support for ext3
ext3: Add journal error check into ext3_rename()
ext3: Use search_dirblock() in ext3_dx_find_entry()
ext3: Avoid uninitialized memory references with a corrupted htree directory
ext3: Return error code from generic_check_addressable
ext3: Add journal error check into ext3_delete_entry()
ext3: Add error check in ext3_mkdir()
fs/ext3/super.c: Use printf extension %pV
fs/ext2/super.c: Use printf extension %pV
ext3: don't update sb journal_devnum when RO dev
For SHA256, RFC4868 requires to truncate ICV length to 128 bits,
hence MAX_AH_AUTH_LEN should be updated to 16.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stantums multitouch panels sends more than 64 reports and this results
in not being able to handle all the touches given by this device.
This patch is required to be able to include Stantum panels in the
unified hid-multitouch driver.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Acked-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Currently hid doesn't export the features it knows to the specific modules.
Some information can be really important in such features: MosArt and
Cypress devices are by default not in a multitouch mode.
We have to send the value 2 on the right feature.
This patch exports to the module the features report so they can find the
right feature to set up the correct mode.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Acked-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This allows us to reuse the xprt associated with a server connection if
one has already been set up.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This seems obviously transport-level information even if it's currently
used only by the server socket code.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Basic xdr and processing for BIND_CONN_TO_SESSION. This adds a
connection to the list of connections associated with a session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
gnttab_map_refs maps some grant refs and uses the new m2p override to
set a proper m2p mapping for the granted pages.
gnttab_unmap_refs unmaps the granted refs and removes th mappings from
the m2p override.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The gntdev driver allows usermode to map granted pages from other
domains. This is typically used to implement a Xen backend driver
in user mode.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Impact: hypercall definitions
These functions populate the gnttab data structures used by the
granttab map and unmap ops and are used in the backend drivers.
Originally xen-unstable.hg 9625:c3bb51c443a7
[ Include Stefano's fix for phys_addr_t ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Indicate support for referrals. Do not set any PNFS roles. Check the flags
returned by the server for validity. Do not use exchange flags from an old
client ID instance when recovering a client ID.
Update the EXCHID4_FLAG_XXX set to RFC 5661.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix Moorestown VRTC fixmap placement
x86/gpio: Implement x86 gpio_to_irq convert function
x86, UV: Fix APICID shift for Westmere processors
x86: Use PCI method for enabling AMD extended config space before MSR method
x86: tsc: Prevent delayed init if initial tsc calibration failed
x86, lapic-timer: Increase the max_delta to 31 bits
x86: Fix sparse non-ANSI function warnings in smpboot.c
x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation
x86, AMD, PCI: Add AMD northbridge PCI device id for CPU families 12h and 14h
x86, numa: Fix cpu to node mapping for sparse node ids
x86, numa: Fake node-to-cpumask for NUMA emulation
x86, numa: Fake apicid and pxm mappings for NUMA emulation
x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU
x86, numa: Reduce minimum fake node size to 32M
Fix up trivial conflict in arch/x86/kernel/apic/x2apic_uv_x.c
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rtc: Namespace fixup
RTC: Remove UIE emulation
RTC: Rework RTC code to use timerqueue for events
Fix up trivial conflict in drivers/rtc/rtc-dev.c
Now, memory_hotplug_(un)lock() is used for add/remove/offline pages
for avoiding races with hibernation. But this should be held in
online_pages(), too. It seems asymmetric.
There are cases where one has to avoid a race with memory hotplug
notifier and his own local code, and hotplug v.s. hotplug.
This will add a generic solution for avoiding races. In other view,
having lock here has no big impacts. online pages is tend to be
done by udev script at el against each memory section one by one.
Then, it's better to have lock here, too.
Cc: <stable@kernel.org> # 2.6.37
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Currently, the lockdep annotation in flush_work() requires exclusive
access on the workqueue the target work is queued on and triggers
warning if a work is trying to flush another work on the same
workqueue; however, this is no longer true as workqueues can now
execute multiple works concurrently.
This patch adds lock_map_acquire_read() and make process_one_work()
hold read access to the workqueue while executing a work and
start_flush_work() check for write access if concurrnecy level is one
or the workqueue has a rescuer (as only one execution resource - the
rescuer - is guaranteed to be available under memory pressure), and
read access if higher.
This better represents what's going on and removes spurious lockdep
warnings which are triggered by fake dependency chain created through
flush_work().
* Peter pointed out that flushing another work from a WQ_MEM_RECLAIM
wq breaks forward progress guarantee under memory pressure.
Condition check accordingly updated.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (390 commits)
drm/radeon/kms: disable underscan by default
drm/radeon/kms: only enable hdmi features if the monitor supports audio
drm: Restore the old_fb upon modeset failure
drm/nouveau: fix hwmon device binding
radeon: consolidate asic-specific function decls for pre-r600
vga_switcheroo: comparing too few characters in strncmp()
drm/radeon/kms: add NI pci ids
drm/radeon/kms: don't enable pcie gen2 on NI yet
drm/radeon/kms: add radeon_asic struct for NI asics
drm/radeon/kms/ni: load default sclk/mclk/vddc at pm init
drm/radeon/kms: add ucode loader for NI
drm/radeon/kms: add support for DCE5 display LUTs
drm/radeon/kms: add ni_reg.h
drm/radeon/kms: add bo blit support for NI
drm/radeon/kms: always use writeback/events for fences on NI
drm/radeon/kms: adjust default clock/vddc tracking for pm on DCE5
drm/radeon/kms: add backend map workaround for barts
drm/radeon/kms: fill gpu init for NI asics
drm/radeon/kms: add disabled vbios accessor for NI asics
drm/radeon/kms: handle NI thermal controller
...
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c: Constify i2c_client where possible
i2c-algo-bit: Complain about masters which can't read SCL
i2c-algo-bit: Refactor adapter registration
i2c: Add generic I2C multiplexer using GPIO API
i2c-nforce2: Remove unnecessary cast of pci_get_drvdata
i2c-i801: Include <linux/slab.h>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: (52 commits)
Blackfin: encode cpu-rev into uImage name
Blackfin: bf54x: don't ack GPIO ints when unmasking them
Blackfin: sram_free_with_lsl: do not ignore return value of sram_free
Blackfin: boards: add missing "static" to peripheral lists
Blackfin: DNP5370: new board port
Blackfin: bf518f-ezbrd: fix dsa resources
Blackfin: move "-m elf32bfin" to general LDFLAGS
Blackfin: kgdb_test: make sure to initialize num2
Blackfin: kgdb: disable preempt schedule when running single step in kgdb
Blackfin: kgdb: disable interrupt when single stepping in ADEOS
Blackfin: SMP: kgdb: apply anomaly 257 work around
Blackfin: fix building IPIPE code when XIP is enabled
Blackfin: SMP: kgdb: flush core internal write buffer before flushinv
Blackfin: sport_uart resources: remove unused secondary RX/TX pins
Blackfin: tll6527m: fix spelling in unused code (struct name)
Blackfin: bf527-ezkit: add adau1373 chip address
Blackfin: no-mpu: fix masking of small uncached dma region
Blackfin: pm: drop irq save/restore in standby and suspend to mem callback
MAINTAINERS: update Analog Devices support info
Blackfin: dpmc.h: pull in new pll.h
...
IPv4 over firewire needs to be able to remove ARP entries
from the ARP cache that belong to nodes that are removed, because
IPv4 over firewire uses ARP packets for private information
about nodes.
This information becomes invalid as soon as node drops
off the bus and when it reconnects, its only possible
to start talking to it after it responded to an ARP packet.
But ARP cache prevents such packets from being sent.
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
driver core: Document that device_rename() is only for networking
sysfs: remove useless test from sysfs_merge_group
driver-core: merge private parts of class and bus
driver core: fix whitespace in class_attr_string
HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.
They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packets
Add bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.
Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added alloc_netdev_mqs function which allows the number of transmit and
receive queues to be specified independenty. alloc_netdev_mq was
changed to a macro to call the new function. Also added
alloc_etherdev_mqs with same purpose.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (36 commits)
sony-laptop: support new hotkeys on the P, Z and EC series
platform/x86: Consistently select LEDS Kconfig options
sony-laptop: fix sparse non-ANSI function warning
intel_ips: fix sparse non-ANSI function warning
Support KHLB2 in the compal laptop driver
acer-wmi: Enabled Acer Launch Manager mode
[PATCH] intel_pmic_gpio: modify EOI handling following change of kernel irq subsystem
ACPI Thinkpad: We must always call va_end() after va_start() but do not do so in thinkpad_acpi.c::acpi_evalf()
acer-wmi: Initialize wlan/bluetooth/wwan rfkill software block state
acer-wmi: Detect the WiFi/Bluetooth/3G devices available
acer-wmi: Add 3G rfkill sysfs file
acer-wmi: Add acer wmi hotkey events support
platform/x86: Kconfig: Replace select by depends on ACPI_WMI
ideapad: pass ideapad_priv as argument (part 2)
ideapad: pass ideapad_priv as argument (part 1)
ideapad: add markups, unify comments and return result when init
ideapad: add hotkey support
ideapad: let camera power control entry under platform driver
ideapad: add platform driver for ideapad
fujitsu-laptop: fix compiler warning on pnp_ids
...
Dan Rosenberg pointed out that there were some signed comparison bugs
in the phonet protocol.
http://marc.info/?l=full-disclosure&m=129424528425330&w=2
The problem is that we check for array overflows but "protocol" is
signed and we don't check for array underflows. If you have already
have CAP_SYS_ADMIN then you could use the bugs to get root, or someone
could cause an oops by mistake.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Helper functions for I2C and SMBus transactions don't modify the
i2c_client that is passed to them, so it can be marked const.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Add an i2c mux driver providing access to i2c bus segments using a
hardware MUX sitting on a master bus and controlled through gpio pins.
E.G. something like:
---------- ---------- Bus segment 1 - - - - -
| | SCL/SDA | |-------------- | |
| |------------| |
| | | | Bus segment 2 | |
| Linux | GPIO 1..N | MUX |--------------- Devices
| |------------| | | |
| | | | Bus segment M
| | | |---------------| |
---------- ---------- - - - - -
SCL/SDA of the master I2C bus is multiplexed to bus segment 1..M
according to the settings of the GPIO pins 1..N.
Signed-off-by: Peter Korsgaard <peter.korsgaard@barco.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Some mesh attribute/command docs are missing or
have errors in the name so they don't match, fix
all of them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When I made the patch to add mesh join/leave I
didn't pay attention to docs because it was a
proof of concept, and then when we actually did
merge it I forgot -- add docs now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The flag is IEEE80211_TX_CTL_TX_OFFCHAN and I had
added that in a previous patch but forgotten docs.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add documentation for the new callbacks that I
forgot in the patch adding the callbacks.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.
The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org [2.6.37]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (30 commits)
MAINTAINERS: Add tomoyo-dev-en ML.
SELinux: define permissions for DCB netlink messages
encrypted-keys: style and other cleanup
encrypted-keys: verify datablob size before converting to binary
trusted-keys: kzalloc and other cleanup
trusted-keys: additional TSS return code and other error handling
syslog: check cap_syslog when dmesg_restrict
Smack: Transmute labels on specified directories
selinux: cache sidtab_context_to_sid results
SELinux: do not compute transition labels on mountpoint labeled filesystems
This patch adds a new security attribute to Smack called SMACK64EXEC. It defines label that is used while task is running.
SELinux: merge policydb_index_classes and policydb_index_others
selinux: convert part of the sym_val_to_name array to use flex_array
selinux: convert type_val_to_struct to flex_array
flex_array: fix flex_array_put_ptr macro to be valid C
SELinux: do not set automatic i_ino in selinuxfs
selinux: rework security_netlbl_secattr_to_sid
SELinux: standardize return code handling in selinuxfs.c
SELinux: standardize return code handling in selinuxfs.c
SELinux: standardize return code handling in policydb.c
...
Using "iptables -L" with a lot of rules have a too big BH latency.
Jesper mentioned ~6 ms and worried of frame drops.
Switch to a per_cpu seqlock scheme, so that taking a snapshot of
counters doesnt need to block BH (for this cpu, but also other cpus).
This adds two increments on seqlock sequence per ipt_do_table() call,
its a reasonable cost for allowing "iptables -L" not block BH
processing.
Reported-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is driver for EasyPoint AS5011 2 axis joystick chip. This chip is
plugged on an I2C bus.
Tested on ARM processor (i.MX27).
Signed-off-by: Fabien Marteau <fabien.marteau@armadeus.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
When I enable EXT2_XATTR_DEBUG in fs/ext2/xattr.c I get a build error stating
the following:
CC fs/ext2/xattr.o
fs/ext2/xattr.c: In function 'ext2_xattr_cache_insert':
fs/ext2/xattr.c:841: error: dereferencing pointer to incomplete type
fs/ext2/xattr.c:846: error: dereferencing pointer to incomplete type
make[2]: *** [fs/ext2/xattr.o] Error 1
make[1]: *** [fs/ext2] Error 2
make: *** [fs] Error 2
These lines reference ext2_xattr_cache->c_entry_count which is defined
in struct mb_cache. struct mb_cache is currently only defined in fs/mbcache.c.
Moving struct mb_cache definition to include/linux/mbcache.h to resolve the
issue.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The addition of 64k block capability in the rec_len_from_disk
and rec_len_to_disk functions added a bit of math overhead which
slows down file create workloads needlessly when the architecture
cannot even support 64k blocks, thanks to page size limits.
Similar changes already exist in the ext4 codebase.
The directory entry checking can also be optimized a bit
by sprinkling in some unlikely() conditions to move the
error handling out of line.
bonnie++ sequential file creates on a 512MB ramdisk speeds up
from about 77,000/s to about 82,000/s, about a 6% improvement.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Use %pV in __quota_error so a single printk can not be
interleaved with other logging messages.
Add __attribute__((format (printf, 3, 4))) so format
and arguments can be verified by compiler.
Make sure printk formats and arguments match.
Block # needed a pointer dereference.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).
It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks are
marked as used and then trimmed. Afterwards these blocks are marked as
free in per-group bitmap.
[JK: Fixed up error handling and trimming of a single group]
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
After one CPU is offlined, it is unnecessary to switch T-state for it.
So it will be better that the throttling is disabled after the cpu
is offline.
At the same time after one cpu is online, we should check whether
the T-state is supported and then set the corresponding T-state
flag.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing. This allows us to further slim down the ext4_inode_info
structure.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add new mappings for assist, VAIO, zoom and eject buttons present on
refurbished P, Z and EC models.
Reported-by: Gyorgy Jeney <nog.lkml@gmail.com>
Reported-by: Stephan Mueller <smueller@chronox.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging: (44 commits)
hwmon: Support for Dallas Semiconductor DS620
hwmon: driver for Sensirion SHT21 humidity and temperature sensor
hwmon: Add humidity attribute to sysfs ABI
hwmon: sysfs ABI updates
hwmon: (via-cputemp) sync hotplug handling with coretemp/pkgtemp
hwmon: (lm95241) Rewrite to avoid using macros
hwmon: (applesmc) Fix checkpatch errors and fix value range checks
hwmon: (applesmc) Update copyright information
hwmon: (applesmc) Silence driver
hwmon: (applesmc) Simplify feature sysfs handling
hwmon: (applesmc) Dynamic creation of fan files
hwmon: (applesmc) Extract all features generically
hwmon: (applesmc) Handle new temperature format
hwmon: (applesmc) Dynamic creation of temperature files
hwmon: (applesmc) Introduce a register lookup table
hwmon: (applesmc) Use pr_fmt and pr_<level>
hwmon: (applesmc) Relax the severity of device init failure
hwmon: (applesmc) Add MacBookAir3,1(3,2) support
hwmon: (w83627hf) Use pr_fmt and pr_<level>
hwmon: (w83627ehf) Use pr_fmt and pr_<level>
...
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: (29 commits)
of/flattree: forward declare struct device_node in of_fdt.h
ipmi: explicitly include of_address.h and of_irq.h
sparc: explicitly cast negative phandle checks to s32
powerpc/405: Fix missing #{address,size}-cells in i2c node
powerpc/5200: dts: refactor dts files
powerpc/5200: dts: Change combatible strings on localbus
powerpc/5200: dts: remove unused properties
powerpc/5200: dts: rename nodes to prepare for refactoring dts files
of/flattree: Update dtc to current mainline.
of/device: Don't register disabled devices
powerpc/dts: fix syntax bugs in bluestone.dts
of: Fixes for OF probing on little endian systems
of: make drivers depend on CONFIG_OF instead of CONFIG_PPC_OF
of/flattree: Add of_flat_dt_match() helper function
of_serial: explicitly include of_irq.h
of/flattree: Refactor unflatten_device_tree and add fdt_unflatten_tree
of/flattree: Reorder unflatten_dt_node
of/flattree: Refactor unflatten_dt_node
of/flattree: Add non-boottime device tree functions
of/flattree: Add Kconfig for EARLY_FLATTREE
...
Fix up trivial conflict in arch/sparc/prom/tree_32.c as per Grant.
Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove path.h from sched.h and other files.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Fix a crash during slabinfo -v
tracing/slab: Move kmalloc tracepoint out of inline code
slub: Fix slub_lock down/up imbalance
slub: Fix build breakage in Documentation/vm
slub tracing: move trace calls out of always inlined functions to reduce kernel code size
slub: move slabinfo.c to tools/slub/slabinfo.c
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
mkuboot.sh: Fail if mkimage is missing
gen_init_cpio: checkpatch fixes
gen_init_cpio: Avoid race between call to stat() and call to open()
modpost: Fix address calculation in reloc_location()
Make fixdep error handling more explicit
checksyscalls: Fix stand-alone usage
modpost: Put .zdebug* section on white list
kbuild: fix interaction of CONFIG_IKCONFIG and KCONFIG_CONFIG
kbuild: export linux/{a.out,kvm,kvm_para}.h on headers_install_all
kbuild: introduce HDR_ARCH_LIST for headers_install_all
headers_install: check exit status of unifdef
gen_init_cpio: remove leading `/' from file names
scripts/genksyms: fix header usage
fixdep: use hash table instead of a single array
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (34 commits)
HID: roccat: Update sysfs attribute doc
HID: roccat: don't use #pragma pack
HID: roccat: Add support for Roccat Kone[+] v2
HID: roccat: reduce number of functions in kone and pyra drivers
HID: roccat: declare meaning of pack pragma usage in driver headers
HID: roccat: use class for char device for sysfs attribute creation
sysfs: Introducing binary attributes for struct class
HID: hidraw: add compatibility ioctl() for 32-bit applications.
HID: hid-picolcd: Fix memory leak in picolcd_debug_out_report()
HID: picolcd: fix misuse of logical operation in place of bitop
HID: usbhid: base runtime PM on modern API
HID: replace offsets values with their corresponding BTN_* defines
HID: hid-mosart: support suspend/resume
HID: hid-mosart: ignore buttons report
HID: hid-picolcd: don't use flush_scheduled_work()
HID: simplify an index check in hid_lookup_collection
HID: Hoist assigns from ifs
HID: Remove superfluous __inline__
HID: Use vzalloc for vmalloc/memset(,0...)
HID: Add and use hid_<level>: dev_<level> equivalents
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
spi / PM: Support dev_pm_ops
PM: Prototype the pm_generic_ operations
PM / Runtime: Generic resume shouldn't set RPM_ACTIVE unconditionally
PM: Use dev_name() in core device suspend and resume routines
PM: Permit registration of parentless devices during system suspend
PM: Replace the device power.status field with a bit field
PM: Remove redundant checks from core device resume routines
PM: Use a different list of devices for each stage of device suspend
PM: Avoid compiler warning in pm_noirq_op()
PM: Use pm_wakeup_pending() in __device_suspend()
PM / Wakeup: Replace pm_check_wakeup_events() with pm_wakeup_pending()
PM: Prevent dpm_prepare() from returning errors unnecessarily
PM: Fix references to basic-pm-debugging.txt in drivers-testing.txt
PM / Runtime: Add synchronous runtime interface for interrupt handlers (v3)
PM / Hibernate: When failed, in_suspend should be reset
PM / Hibernate: hibernation_ops->leave should be checked too
Freezer: Fix a race during freezing of TASK_STOPPED tasks
PM: Use proper ccflag flag in kernel/power/Makefile
PM / Runtime: Fix comments to match runtime callback code
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: use split transaction timeout only for split transactions
firewire: ohci: consolidate context status flags
firewire: ohci: cache the context run bit
firewire: ohci: flush AT contexts after bus reset - addendum
firewire: ohci: flush AT contexts after bus reset for OHCI 1.2
firewire: net: set carrier state at ifup
firewire: net: add carrier detection
firewire: net: ratelimit error messages
firewire: ohci: restart iso DMA contexts on resume from low power mode
firewire: ohci: restore GUID on resume.
firewire: ohci: use common buffer for self IDs and AR descriptors
firewire: ohci: optimize iso context checks in the interrupt handler
firewire: make PHY packet header format consistent
firewire: ohci: properly clear posted write errors
firewire: ohci: flush MMIO writes in the interrupt handler
firewire: ohci: fix AT context initialization error handling
firewire: ohci: Asynchronous Reception rewrite
firewire: core: Update WARN uses
firewire: nosy: char device is not seekable
Introduce the helper function snd_ctl_enum_info() to fill out the
elem_info fields for an enumerated control.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix ioctl ABI
fuse: allow batching of FORGET requests
fuse: separate queue for FORGET requests
fuse: ioctl cleanup
Fix up trivial conflict in fs/fuse/inode.c due to RCU lookup having done
the RCU-freeing of the inode in fuse_destroy_inode().
Fix kernel-doc notation warnings in pipe_fs_i.h:
Warning(include/linux/pipe_fs_i.h:58): No description found for parameter 'buffers'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix new kernel-doc notation warning in hrtimer.h:
Warning(include/linux/hrtimer.h:150): Excess struct/union/enum/typedef member 'first' description in 'hrtimer_clock_base'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix new kernel-doc notation warning in dcache.h:
Warning(include/linux/dcache.h:316): Excess function parameter 'Returns' description in '__d_rcu_to_refcount'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A bunch of arches define reads[bwl]/writes[bwl] helpers for accessing
memory mapped registers. Since the Blackfin ones aren't specific to
Blackfin code, move them to the common asm-generic/io.h for people.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Now that there is a single function that can compute the device
features relevant to a packet, we don't want to run it for each
offload. This converts netif_needs_gso() to take the features
of the device, rather than computing them itself.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_get_vlan_features() is currently only used by netif_needs_gso(),
so it only concerns itself with GSO features. However, several other
places also should take into account the contents of the packet when
deciding whether to offload to hardware. This generalizes the function
to return features about all of the various forms of offloading. Since
offloads tend to be linked together, this avoids duplicating the logic
in each location (i.e. the scatter/gather code also needs the checksum
logic).
Suggested-by: Michał Mirosław <mirqus@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix new kernel-doc notation warning in sock.h by annotating skc_dontcopy_*
as private fields.
Warning(include/net/sock.h:163): No description found for parameter 'skc_dontcopy_end[0]'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add mac field into fec_platform_data and consolidate function
fec_get_mac to get mac address in following order.
1) module parameter via kernel command line fec.macaddr=0x00,0x04,...
2) from flash in case of CONFIG_M5272 or fec_platform_data mac
field for others, which typically have mac stored in fuse
3) fec mac address registers set by bootloader
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'drm-radeon-ni' of ../drm-radeon-next: (30 commits)
radeon: consolidate asic-specific function decls for pre-r600
drm/radeon/kms: add NI pci ids
drm/radeon/kms: don't enable pcie gen2 on NI yet
drm/radeon/kms: add radeon_asic struct for NI asics
drm/radeon/kms/ni: load default sclk/mclk/vddc at pm init
drm/radeon/kms: add ucode loader for NI
drm/radeon/kms: add support for DCE5 display LUTs
drm/radeon/kms: add ni_reg.h
drm/radeon/kms: add bo blit support for NI
drm/radeon/kms: always use writeback/events for fences on NI
drm/radeon/kms: adjust default clock/vddc tracking for pm on DCE5
drm/radeon/kms: add backend map workaround for barts
drm/radeon/kms: fill gpu init for NI asics
drm/radeon/kms: add disabled vbios accessor for NI asics
drm/radeon/kms: handle NI thermal controller
drm/radeon/kms: parse DCE5 encoder caps when setting up encoders
drm/radeon/kms: dvo dpms updates for DCE5
drm/radeon/kms: dac dpms updates for DCE5
drm/radeon/kms: DCE5 atom dig encoder updates
drm/radeon/kms: DCE5 atom transmitter control updates
...
Conflicts:
security/smack/smack_lsm.c
Verified and added fix by Stephen Rothwell <sfr@canb.auug.org.au>
Ok'd by Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
Driver for Dallas Semiconductor DS620 temperature sensor and thermostat
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
This patch implements SDIO IRQ support for mfds which
announce the TMIO_MMC_SDIO_IRQ flag for tmio_mmc.
If MMC_CAP_SDIO_IRQ is also set SDIO IRQ signalling is activated.
Tested with a b43-based wireless SDIO card and sh_mobile_sdhi.
Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
For example, with SDIO WLAN cards, some transfers happen with buffers at
odd addresses, whereas the SH-Mobile DMA engine requires even addresses
for SDHI. This patch extends the tmio driver with a bounce buffer, that
is used for single entry scatter-gather lists both for sending and
receiving. If we ever encounter unaligned transfers with multi-element
sg lists, this patch will have to be extended. For now it just falls
back to PIO in this and other unsupported cases.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
This adds the mmc host driver for the Synopsys DesignWare mmc
host controller, found in a number of embedded SoC designs.
Signed-off-by: Will Newton <will.newton@imgtec.com>
Reviewed-by: Matt Fleming <matt@console-pimps.org>
Reviewed-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Some controllers misparse segment length 0 as being 0, not 65536. Add
a quirk to deal with it.
Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Some old MMC devices fail with the 4/8 bits the driver tries to use
exclusively. This patch adds a test for the given bus setup and falls
back to the lower bit mode (until 1-bit mode) when the test fails.
[Major rework and refactoring by tiwai]
[Quirk addition and many fixes by prakity]
Signed-off-by: Aries Lee <arieslee@jmicron.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Philip Rakity <prakity@marvell.com>
Tested-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
This patch forward declares struct device_node to fix a compile error
when of_fdt.h is included, but of.h is not. Alternately, including
linux/of.h could have been added to of_fdt.h, but that pulls in a lot
of unnecessary declarations when only working with the flattened
form.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>