There are still some leftovers of commit f59ca058
[locking, lib/atomic64: Annotate atomic64_lock::lock as raw]
[ tglx: Seems I picked the wrong version of that patch :( ]
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shan Hai <haishan.bai@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20110914074924.GA16096@zhy
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Include <linux/cryptohash.h> to pickup the declarations for sha_transform
and sha_init to quite the sparse noise:
warning: symbol 'sha_transform' was not declared. Should it be static?
warning: symbol 'sha_init' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The spinlock protected atomic64 operations must be irq safe as they
are used in hard interrupt context and cannot be preempted on -rt:
NIP [c068b218] rt_spin_lock_slowlock+0x78/0x3a8
LR [c068b1e0] rt_spin_lock_slowlock+0x40/0x3a8
Call Trace:
[eb459b90] [c068b1e0] rt_spin_lock_slowlock+0x40/0x3a8 (unreliable)
[eb459c20] [c068bdb0] rt_spin_lock+0x40/0x98
[eb459c40] [c03d2a14] atomic64_read+0x48/0x84
[eb459c60] [c001aaf4] perf_event_interrupt+0xec/0x28c
[eb459d10] [c0010138] performance_monitor_exception+0x7c/0x150
[eb459d30] [c0014170] ret_from_except_full+0x0/0x4c
So annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Shan Hai <haishan.bai@gmail.com>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no reason to allow the lock protecting rwsems (the
ownerless variant) to be preemptible on -rt. Convert it to raw.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The logbuf_lock lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ merged and fixed it ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The prop_local_percpu::lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The percpu_counter::lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If there are no builtin users of find_next_bit_le() and
find_next_zero_bit_le(), these functions are not present in the kernel
image, causing m68k allmodconfig to fail with:
ERROR: "find_next_zero_bit_le" [fs/ufs/ufs.ko] undefined!
ERROR: "find_next_bit_le" [fs/udf/udf.ko] undefined!
...
This started to happen after commit 171d809df1 ("m68k: merge mmu and
non-mmu bitops.h"), as m68k had its own inline versions before.
commit 63e424c844 ("arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,
BIT_LE, LAST_BIT}") added find_last_bit.o to obj-y (so it's always
included), but find_next_bit.o to lib-y (so it gets removed by the
linker if there are no builtin users).
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Summary:
Users of the pci_dma_sync_single_* api allow users to sync address ranges within
the range of a mapped entry (i.e. you can dma map address X to dma_addr_t A and
then pci_dma_sync_single on dma_addr_t A+1. The dma-debug library however
assume dma syncs will always occur using the base address of a mapped region,
and uses that assumption to find entries in its hash table. Since thats often
(but not always the case), the dma debug library can give us false errors about
missing entries, which are reported as syncing of memory not allocated by the
driver. This was noted in the cxgb3 driver as this error:
WARNING: at lib/dma-debug.c:902 check_sync+0xdd/0x48c()
Hardware name: To be filled by O.E.M.
cxgb3 0000:01:00.0: DMA-API: device driver tries to sync DMA memory it has not
allocated [device address=0x00000000fff97800] [size=1984 bytes]
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput
snd_hda_codec_intelhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer e1000e snd soundcore r8169
cxgb3 iTCO_wdt snd_page_alloc mii shpchp i2c_i801 iTCO_vendor_support mdio
microcode firewire_ohci firewire_core crc_itu_t ata_generic pata_acpi i915
drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded:
scsi_wait_scan]
Pid: 1818, comm: ifconfig Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #1
Call Trace:
[<ffffffff81050f71>] warn_slowpath_common+0x85/0x9d
[<ffffffff8105102c>] warn_slowpath_fmt+0x46/0x48
[<ffffffff8124658e>] ? check_sync+0x39/0x48c
[<ffffffff8107c470>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff81246632>] check_sync+0xdd/0x48c
[<ffffffff81246ca6>] debug_dma_sync_single_for_device+0x3f/0x41
[<ffffffffa011615c>] ? pci_map_page+0x84/0x97 [cxgb3]
[<ffffffffa0117bc3>] pci_dma_sync_single_for_device.clone.0+0x65/0x6e [cxgb3]
[<ffffffffa0117ed1>] refill_fl+0x305/0x30a [cxgb3]
[<ffffffffa011857d>] t3_sge_alloc_qset+0x6a7/0x821 [cxgb3]
[<ffffffffa010a07b>] cxgb_up+0x4d0/0xe62 [cxgb3]
[<ffffffff81086037>] ? __module_text_address+0x12/0x58
[<ffffffffa010aa4c>] cxgb_open+0x3f/0x309 [cxgb3]
[<ffffffff813e9f6c>] __dev_open+0x8e/0xbc
[<ffffffff813e7ca5>] __dev_change_flags+0xbe/0x142
[<ffffffff813e9ea8>] dev_change_flags+0x21/0x57
[<ffffffff81445937>] devinet_ioctl+0x29a/0x54b
[<ffffffff811f9a87>] ? inode_has_perm+0xaa/0xce
[<ffffffff81446ed2>] inet_ioctl+0x8f/0xa7
[<ffffffff813d683a>] sock_do_ioctl+0x29/0x48
[<ffffffff813d6c83>] sock_ioctl+0x213/0x222
[<ffffffff81137f78>] vfs_ioctl+0x32/0xa6
[<ffffffff811384e2>] do_vfs_ioctl+0x47a/0x4b3
[<ffffffff81138571>] sys_ioctl+0x56/0x79
[<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
---[ end trace 69a4d4cc77b58004 ]---
(some edits by Joerg Roedel)
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jay Fenalson <fenlason@redhat.com>
CC: Divy LeRay <divy@chelsio.com>
CC: Stanislaw Gruszka <sgruszka@redhat.com>
CC: Joerg Roedel <joerg.roedel@amd.com>
CC: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The patch below removes an extra "it" in the comment.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
kobject_uevent() uses a multicast socket and should ignore
if one of listeners cannot handle messages or nobody is
listening at all.
Easily reproducible when a process in system is cloned
with CLONE_NEWNET flag.
(See also http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/5256)
Signed-off-by: Milan Broz <mbroz@redhat.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Previously, if dynamic debug was enabled netdev_dbg() was using
dynamic_dev_dbg() to print out the underlying msg. Fix this by making
sure netdev_dbg() uses __netdev_printk().
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add pr_fmt(fmt) with __func__.
Converts "ddebug:" prefix to "dynamic_debug:".
Most likely the if (verbose) outputs could
also be converted from pr_info to pr_debug.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Multiple printks with KERN_CONT can be interleaved by
other printks. Reduce the likelihood of that interleaving
by consolidating multiple calls to printk.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adding dynamic_dev_dbg duplicated prefix output.
Consolidate that output to a single routine.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Unlike dynamic_pr_debug, dynamic uses of dev_dbg can not
currently add task_pid/KBUILD_MODNAME/__func__/__LINE__
to selected debug output.
Add a new function similar to dynamic_pr_debug to
optionally emit these prefixes.
Cc: Aloisio Almeida <aloisio.almeida@openbossa.org>
Noticed-by: Aloisio Almeida <aloisio.almeida@openbossa.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Compute protocol sequence numbers and fragment IDs using MD5.
crypto: Move md5_transform to lib/md5.c
For ChromiumOS, we use SHA-1 to verify the integrity of the root
filesystem. The speed of the kernel sha-1 implementation has a major
impact on our boot performance.
To improve boot performance, we investigated using the heavily optimized
sha-1 implementation used in git. With the git sha-1 implementation, we
see a 11.7% improvement in boot time.
10 reboots, remove slowest/fastest.
Before:
Mean: 6.58 seconds Stdev: 0.14
After (with git sha-1, this patch):
Mean: 5.89 seconds Stdev: 0.07
The other cool thing about the git SHA-1 implementation is that it only
needs 64 bytes of stack for the workspace while the original kernel
implementation needed 320 bytes.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Cc: Nicolas Pitre <nico@cam.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI, APEI, EINJ Param support is disabled by default
APEI GHES: 32-bit buildfix
ACPI: APEI build fix
ACPI, APEI, GHES: Add hardware memory error recovery support
HWPoison: add memory_failure_queue()
ACPI, APEI, GHES, Error records content based throttle
ACPI, APEI, GHES, printk support for recoverable error via NMI
lib, Make gen_pool memory allocator lockless
lib, Add lock-less NULL terminated single list
Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
ACPI, APEI, Add WHEA _OSC support
ACPI, APEI, Add APEI bit support in generic _OSC call
ACPI, APEI, GHES, Support disable GHES at boot time
ACPI, APEI, GHES, Prevent GHES to be built as module
ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST
ACPI, APEI, Add apei_exec_run_optional
ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic
ACPI, APEI, ERST, Fix erst-dbg long record reading issue
ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled
We have already acknowledged that swapoff of a tmpfs file is slower than
it was before conversion to the generic radix_tree: a little slower
there will be acceptable, if the hotter paths are faster.
But it was a shock to find swapoff of a 500MB file 20 times slower on my
laptop, taking 10 minutes; and at that rate it significantly slows down
my testing.
Now, most of that turned out to be overhead from PROVE_LOCKING and
PROVE_RCU: without those it was only 4 times slower than before; and
more realistic tests on other machines don't fare as badly.
I've tried a number of things to improve it, including tagging the swap
entries, then doing lookup by tag: I'd expected that to halve the time,
but in practice it's erratic, and often counter-productive.
The only change I've so far found to make a consistent improvement, is
to short-circuit the way we go back and forth, gang lookup packing
entries into the array supplied, then shmem scanning that array for the
target entry. Scanning in place doubles the speed, so it's now only
twice as slow as before (or three times slower when the PROVEs are on).
So, add radix_tree_locate_item() as an expedient, once-off,
single-caller hack to do the lookup directly in place. #ifdef it on
CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
applicability as save space in other configurations. And, sadly,
#include sched.h for cond_resched().
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its
peculiar swap vector, instead keeping a file's swap entries in the same
radix tree as its struct page pointers: thus saving memory, and
simplifying its code and locking.
This patch:
The radix_tree is used by several subsystems for different purposes. A
major use is to store the struct page pointers of a file's pagecache for
memory management. But what if mm wanted to store something other than
page pointers there too?
The low bit of a radix_tree entry is already used to denote an indirect
pointer, for internal use, and the unlikely radix_tree_deref_retry()
case.
Define the next bit as denoting an exceptional entry, and supply inline
functions radix_tree_exception() to return non-0 in either unlikely
case, and radix_tree_exceptional_entry() to return non-0 in the second
case.
If a subsystem already uses radix_tree with that bit set, no problem: it
does not affect internal workings at all, but is defined for the
convenience of those storing well-aligned pointers in the radix_tree.
The radix_tree_gang_lookups have an implicit assumption that the caller
can deduce the offset of each entry returned e.g. by the page->index of
a struct page. But that may not be feasible for some kinds of item to
be stored there.
radix_tree_gang_lookup_slot() allow for an optional indices argument,
output array in which to return those offsets. The same could be added
to other radix_tree_gang_lookups, but for now keep it to the only one
for which we need it.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current hyper-optimized functions are overkill if you simply want to
allocate an id for a device. Create versions which use an internal
lock.
In followup patches, numerous drivers are converted to use this
interface.
Thanks to Tejun for feedback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
init_fault_attr_dentries() is used to export fault_attr via debugfs.
But it can only export it in debugfs root directory.
Per Forlin is working on mmc_fail_request which adds support to inject
data errors after a completed host transfer in MMC subsystem.
The fault_attr for mmc_fail_request should be defined per mmc host and
export it in debugfs directory per mmc host like
/sys/kernel/debug/mmc0/mmc_fail_request.
init_fault_attr_dentries() doesn't help for mmc_fail_request. So this
introduces fault_create_debugfs_attr() which is able to create a
directory in the arbitrary directory and replace
init_fault_attr_dentries().
[akpm@linux-foundation.org: extraneous semicolon, per Randy]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Per Forlin <per.forlin@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some trivial conflicts due to other various merges
adding to the end of common lists sooner than this one.
arch/ia64/Kconfig
arch/powerpc/Kconfig
arch/x86/Kconfig
lib/Kconfig
lib/Makefile
Signed-off-by: Len Brown <len.brown@intel.com>
This version of the gen_pool memory allocator supports lockless
operation.
This makes it safe to use in NMI handlers and other special
unblockable contexts that could otherwise deadlock on locks. This is
implemented by using atomic operations and retries on any conflicts.
The disadvantage is that there may be livelocks in extreme cases. For
better scalability, one gen_pool allocator can be used for each CPU.
The lockless operation only works if there is enough memory available.
If new memory is added to the pool a lock has to be still taken. So
any user relying on locklessness has to ensure that sufficient memory
is preallocated.
The basic atomic operation of this allocator is cmpxchg on long. On
architectures that don't have NMI-safe cmpxchg implementation, the
allocator can NOT be used in NMI handler. So code uses the allocator
in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Cmpxchg is used to implement adding new entry to the list, deleting
all entries from the list, deleting first entry of the list and some
other operations.
Because this is a single list, so the tail can not be accessed in O(1).
If there are multiple producers and multiple consumers, llist_add can
be used in producers and llist_del_all can be used in consumers. They
can work simultaneously without lock. But llist_del_first can not be
used here. Because llist_del_first depends on list->first->next does
not changed if list->first is not changed during its operation, but
llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
llist_add) sequence in another consumer may violate that.
If there are multiple producers and one consumer, llist_add can be
used in producers and llist_del_all or llist_del_first can be used in
the consumer.
This can be summarized as follow:
| add | del_first | del_all
add | - | - | -
del_first | | L | L
del_all | | | -
Where "-" stands for no lock is needed, while "L" stands for lock is
needed.
The list entries deleted via llist_del_all can be traversed with
traversing function such as llist_for_each etc. But the list entries
can not be traversed safely before deleted from the list. The order
of deleted entries is from the newest to the oldest added one. If you
want to traverse from the oldest to the newest, you must reverse the
order by yourself before traversing.
The basic atomic operation of this list is cmpxchg on long. On
architectures that don't have NMI-safe cmpxchg implementation, the
list can NOT be used in NMI handler. So code uses the list in NMI
handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use debugfs_remove_recursive() to simplify initialization and
deinitialization of fault injection debugfs files.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minor cosmetic changes for simple attribute of stacktrace_depth:
- use min_t()
- reduce #ifdef by moving a function
- do not use partly capitalized function name
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No need to include linux/kallsyms.h.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NUMA_NO_NODE and numa_node_id() have different meanings. NUMA_NO_NODE is
obviously the recommended fallback.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ This patch has already been accepted as commit 0ac0c0d0f8 but later
reverted (commit 35926ff5fb) because it itroduced arch specific
__node_random which was defined only for x86 code so it broke other
archs. This is a followup without any arch specific code. Other than
that there are no functional changes.]
Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems). Part of the reason is
that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
at node 0 for newly created tasks.
This patch changes the rotor to be initialized to a random node number
of the cpuset.
[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
[mhocko@suse.cz: Make it arch independent]
[akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Menage <menage@google.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge akpm patch series: (122 commits)
drivers/connector/cn_proc.c: remove unused local
Documentation/SubmitChecklist: add RCU debug config options
reiserfs: use hweight_long()
reiserfs: use proper little-endian bitops
pnpacpi: register disabled resources
drivers/rtc/rtc-tegra.c: properly initialize spinlock
drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
drivers/rtc: add support for Qualcomm PMIC8xxx RTC
drivers/rtc/rtc-s3c.c: support clock gating
drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
init: skip calibration delay if previously done
misc/eeprom: add eeprom access driver for digsy_mtc board
misc/eeprom: add driver for microwire 93xx46 EEPROMs
checkpatch.pl: update $logFunctions
checkpatch: make utf-8 test --strict
checkpatch.pl: add ability to ignore various messages
checkpatch: add a "prefer __aligned" check
checkpatch: validate signature styles and To: and Cc: lines
checkpatch: add __rcu as a sparse modifier
checkpatch: suggest using min_t or max_t
...
Did this as a merge because of (trivial) conflicts in
- Documentation/feature-removal-schedule.txt
- arch/xtensa/include/asm/uaccess.h
that were just easier to fix up in the merge than in the patch series.
This function is required by *printf and kstrto* functions that are
located in the different modules. This patch makes _tolower() public.
However, it's good idea to not use the helper outside of mentioned
functions.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The symbol 'lcm' is exported to the kernel (EXPORT_SYMBOL_GPL).
Pick up it's definition in <linux/lcm.h> to quiet the sparse noise:
warning: symbol 'lcm' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
devres uses the pointer value as key after it's freed, which is safe but
triggers spurious use-after-free warnings on some static analysis tools.
Rearrange code to avoid such warnings.
Signed-off-by: Maxin B. John <maxin.john@gmail.com>
Reviewed-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
fs: Merge split strings
treewide: fix potentially dangerous trailing ';' in #defined values/expressions
uwb: Fix misspelling of neighbourhood in comment
net, netfilter: Remove redundant goto in ebt_ulog_packet
trivial: don't touch files that are removed in the staging tree
lib/vsprintf: replace link to Draft by final RFC number
doc: Kconfig: `to be' -> `be'
doc: Kconfig: Typo: square -> squared
doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
drivers/net: static should be at beginning of declaration
drivers/media: static should be at beginning of declaration
drivers/i2c: static should be at beginning of declaration
XTENSA: static should be at beginning of declaration
SH: static should be at beginning of declaration
MIPS: static should be at beginning of declaration
ARM: static should be at beginning of declaration
rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
Update my e-mail address
PCIe ASPM: forcedly -> forcibly
gma500: push through device driver tree
...
Fix up trivial conflicts:
- arch/arm/mach-ep93xx/dma-m2p.c (deleted)
- drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
- drivers/net/r8169.c (just context changes)
<linux/kernel.h> is needed for min_t. The old version
happened to work on x86 because <asm/unaligned.h>
indirectly includes <linux/kernel.h>, but it didn't
work on ARM.
<linux/kernel.h> includes <asm/byteorder.h> so it's
not necessary to include it explicitly anymore.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
sched: Cleanup duplicate local variable in [enqueue|dequeue]_task_fair
sched: Replace use of entity_key()
sched: Separate group-scheduling code more clearly
sched: Reorder root_domain to remove 64 bit alignment padding
sched: Do not attempt to destroy uninitialized rt_bandwidth
sched: Remove unused function cpu_cfs_rq()
sched: Fix (harmless) typo 'CONFG_FAIR_GROUP_SCHED'
sched, cgroup: Optimize load_balance_fair()
sched: Don't update shares twice on on_rq parent
sched: update correct entity's runtime in check_preempt_wakeup()
xtensa: Use generic config PREEMPT definition
h8300: Use generic config PREEMPT definition
m32r: Use generic PREEMPT config
sched: Skip autogroup when looking for all rt sched groups
sched: Simplify mutex_spin_on_owner()
sched: Remove rcu_read_lock() from wake_affine()
sched: Generalize sleep inside spinlock detection
sched: Make sleeping inside spinlock detection working in !CONFIG_PREEMPT
sched: Isolate preempt counting in its own config option
sched: Remove pointless in_atomic() definition check
...
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Fix lockdep_no_validate against IRQ states
mutex: Make mutex_destroy() an inline function
plist: Remove the need to supply locks to plist heads
lockup detector: Fix reference to the non-existent CONFIG_DETECT_SOFTLOCKUP option
Use the CONFIG_HAS_IOPORT and CONFIG_PCI options to decide whether or
not functions for mapping these areas are provided.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Arnd Bergmann <arnd@arndb.de>
This was legacy code brought over from the RT tree and
is no longer necessary.
Signed-off-by: Dima Zavin <dima@android.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Walker <dwalker@codeaurora.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/1310084879-10351-2-git-send-email-dima@android.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reduce the number of variables modified by the loop in do_csum() by 1,
which seems like a good idea. On Nios II (a RISC CPU with 3-operand
instruction set) it reduces the loop from 7 to 6 instructions, including
the conditional branch.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sleeping inside spinlock detection is actually used
for more general sleeping inside atomic sections
debugging: preemption disabled, rcu read side critical
sections, interrupts, interrupt disabled, etc...
Change the name of the config and its help section to
reflect its more general role.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Order of initialization look like this:
...
debugobjects
kmemleak
...(lots of other subsystems)...
workqueues (through early initcall)
...
debugobjects use schedule_work for batch freeing of its data and kmemleak
heavily use debugobjects, so when it comes to freeing and workqueues were
not initialized yet, kernel crashes:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff810854d1>] __queue_work+0x29/0x41a
[<ffffffff81085910>] queue_work_on+0x16/0x1d
[<ffffffff81085abc>] queue_work+0x29/0x55
[<ffffffff81085afb>] schedule_work+0x13/0x15
[<ffffffff81242de1>] free_object+0x90/0x95
[<ffffffff81242f6d>] debug_check_no_obj_freed+0x187/0x1d3
[<ffffffff814b6504>] ? _raw_spin_unlock_irqrestore+0x30/0x4d
[<ffffffff8110bd14>] ? free_object_rcu+0x68/0x6d
[<ffffffff8110890c>] kmem_cache_free+0x64/0x12c
[<ffffffff8110bd14>] free_object_rcu+0x68/0x6d
[<ffffffff810b58bc>] __rcu_process_callbacks+0x1b6/0x2d9
...
because system_wq is NULL.
Fix it by checking if workqueues susbystem was initialized before using.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/20110528112342.GA3068@joi.lan
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
AFS: Use i_generation not i_version for the vnode uniquifier
AFS: Set s_id in the superblock to the volume name
vfs: Fix data corruption after failed write in __block_write_begin()
afs: afs_fill_page reads too much, or wrong data
VFS: Fix vfsmount overput on simultaneous automount
fix wrong iput on d_inode introduced by e6bc45d65d
Delay struct net freeing while there's a sysfs instance refering to it
afs: fix sget() races, close leak on umount
ubifs: fix sget races
ubifs: split allocation of ubifs_info into a separate function
fix leak in proc_set_super()
Fix new kernel-doc warnings in lib/bitmap.c:
Warning(lib/bitmap.c:596): No description found for parameter 'buf'
Warning(lib/bitmap.c:596): Excess function parameter 'bp' description in '__bitmap_parselist'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* new refcount in struct net, controlling actual freeing of the memory
* new method in kobj_ns_type_operations (->drop_ns())
* ->current_ns() semantics change - it's supposed to be followed by
corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps
the new refcount; net_drop_ns() decrements it and calls net_free() if the
last reference has been dropped. Method renamed to ->grab_current_ns().
* old net_free() callers call net_drop_ns() instead.
* sysfs_exit_ns() is gone, along with a large part of callchain
leading to it; now that the references stored in ->ns[...] stay valid we
do not need to hunt them down and replace them with NULL. That fixes
problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
of sb->s_instances abuse.
Note that struct net *shutdown* logics has not changed - net_cleanup()
is called exactly when it used to be called. The only thing postponed by
having a sysfs instance refering to that struct net is actual freeing of
memory occupied by struct net.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Select CONFIG_PREEMPT_COUNT when we enable the sleeping inside
spinlock detection, so that the preempt offset gets correctly
incremented/decremented from preempt_disable()/preempt_enable().
This makes the preempt count eventually working in !CONFIG_PREEMPT
when that debug option is set and thus fixes the detection of explicit
preemption disabled sections under such config. Code that sleeps
in explicitly preempt disabled section can be finally spotted
in non-preemptible kernels.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
* 'stable/xen-swiotlb.bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb-2.6:
swiotlb: Export swioltb_nr_tbl and utilize it as appropiate.
RFC 5952 (http://tools.ietf.org/html/rfc5952) mandates that 2 or more
consecutive 0's are required before using :: compression.
Update ip6_compressed_string to match the RFC and update the http
reference as well.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By default the io_tlb_nslabs is set to zero, and gets set to
whatever value is passed in via swiotlb_init_with_tbl function.
The default value passed in is 64MB. However, if the user provides
the 'swiotlb=<nslabs>' the default value is ignored and
the value provided by the user is used... Except when the SWIOTLB
is used under Xen - there the default value of 64MB is used and
the Xen-SWIOTLB has no mechanism to get the 'io_tlb_nslabs' filled
out by setup_io_tlb_npages functions. This patch provides a function
for the Xen-SWIOTLB to call to see if the io_tlb_nslabs is set
and if so use that value.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The brcm80211 driver in the staging tree has a cordic function to
determine cosine and sine for a given angle. Feedback received from
John Linville suggested that these kind of functions should be made
available to others as a library function in the kernel tree. The
b43 driver also has a cordic angle calculation implemented.
Cc: linux-kernel@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Reviewed-by: Roland Vossen <rvossen@broadcom.com>
Reviewed-by: Henry Ptasinski <henryp@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The brcm80211 driver in staging tree uses a crc8 function. Based on
feedback from John Linville to move this to lib directory, the linux
source has been searched. Although there is currently only one other
kernel driver using this algorithm (ie. drivers/ssb) we are providing
this as a library function for others to use.
Cc: linux-kernel@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: Dan Carpenter <error27@gmail.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Reviewed-by: Henry Ptasinski <henryp@broadcom.com>
Reviewed-by: Roland Vossen <rvossen@broadcom.com>
Reviewed-by: "Franky (Zhenhui) Lin" <frankyl@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
to test for existence of find bitops anymore.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The style that we normally use in asm-generic is to test the macro itself
for existence, so in asm-generic, do:
#ifndef find_next_zero_bit_le
extern unsigned long find_next_zero_bit_le(const void *addr,
unsigned long size, unsigned long offset);
#endif
and in the architectures, write
static inline unsigned long find_next_zero_bit_le(const void *addr,
unsigned long size, unsigned long offset)
#define find_next_zero_bit_le find_next_zero_bit_le
This adds the #ifndef for each of the find bitops in the generic header
and source files.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On most architectures division is an expensive operation and accessing an
element currently requires four of them. This performance penalty
effectively precludes flex arrays from being used on any kind of fast
path. However, two of these divisions can be handled at creation time and
the others can be replaced by a reciprocal divide, completely avoiding
real divisions on access.
[eparis@redhat.com: rebase on top of changes to support 0 len elements]
[eparis@redhat.com: initialize part_nr when array fits entirely in base]
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HARDIRQ_ENTER() maps to irq_enter() which calls rcu_irq_enter().
But HARDIRQ_EXIT() maps to __irq_exit() which doesn't call
rcu_irq_exit().
So for every locking selftest that simulates hardirq disabled,
we create an imbalance in the rcu extended quiescent state
internal state.
As a result, after the first missing rcu_irq_exit(), subsequent
irqs won't exit dyntick-idle mode after leaving the interrupt
handler. This means that RCU won't see the affected CPU as being
in an extended quiescent state, resulting in long grace-period
delays (as in grace periods extending for hours).
To fix this, just use __irq_enter() to simulate the hardirq
context. This is sufficient for the locking selftests as we
don't need to exit any extended quiescent state or perform
any check that irqs normally do when they wake up from idle.
As a side effect, this patch makes it possible to restore
"rcu: Decrease memory-barrier usage based on semi-formal proof",
which eventually helped finding this bug.
Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stable <stable@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (26 commits)
arch/tile: prefer "tilepro" as the name of the 32-bit architecture
compat: include aio_abi.h for aio_context_t
arch/tile: cleanups for tilegx compat mode
arch/tile: allocate PCI IRQs later in boot
arch/tile: support signal "exception-trace" hook
arch/tile: use better definitions of xchg() and cmpxchg()
include/linux/compat.h: coding-style fixes
tile: add an RTC driver for the Tilera hypervisor
arch/tile: finish enabling support for TILE-Gx 64-bit chip
compat: fixes to allow working with tile arch
arch/tile: update defconfig file to something more useful
tile: do_hardwall_trap: do not play with task->sighand
tile: replace mm->cpu_vm_mask with mm_cpumask()
tile,mn10300: add device parameter to dma_cache_sync()
audit: support the "standard" <asm-generic/unistd.h>
arch/tile: clarify flush_buffer()/finv_buffer() function names
arch/tile: kernel-related cleanups from removing static page size
arch/tile: various header improvements for building drivers
arch/tile: disable GX prefetcher during cache flush
arch/tile: tolerate disabling CONFIG_BLK_DEV_INITRD
...
Most arches define CONFIG_DEBUG_STACK_USAGE exactly the same way. Move it
to lib/Kconfig.debug so each arch doesn't have to define it. This
obviously makes the option generic, but that's fine because the config is
already used in generic code.
It's not obvious to me that sysrq-P actually does anything caution by
keeping the most inclusive wording.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So we can specify the virtual address as the base of the pool chunk and
then get physical addresses for hardware IP.
For example on at91 we will use this on spi, uart or macb
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Patrice VILCHEZ <patrice.vilchez@atmel.com>
Cc: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DEBUG_PER_CPU_MAPS is used in lib/cpumask.c as well as in
inlcude/linux/cpumask.h and thus it has outgrown its use within x86 and
powerpc alone. Any arch with SMP support may want to get some more
debugging, so make this option generic.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is quite a lot of code which does copy_from_user() + strict_strto*()
or simple_strto*() combo in slightly different ways.
Before doing conversions all over tree, let's get final API correct.
Enter kstrtoull_from_user() and friends.
Typical code which uses them looks very simple:
TYPE val;
int rv;
rv = kstrtoTYPE_from_user(buf, count, 0, &val);
if (rv < 0)
return rv;
[use val]
return count;
There is a tiny semantic difference from the plain kstrto*() API -- the
latter allows any amount of leading zeroes, while the former copies data
into buffer on stack and thus allows leading zeroes as long as it fits
into buffer.
This shouldn't be a problem for typical usecase "echo 42 > /proc/x".
The point is to make reading one integer from userspace _very_ simple and
very bug free.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This has no actual effect, since sizeof(struct hlist_head) ==
sizeof(struct hlist_head *), but it's still the wrong type to use.
The semantic match that finds this problem:
// <smpl>
@@
type T;
identifier x;
@@
T *x;
...
* x = kzalloc(... * sizeof(T*) * ..., ...);
// </smpl>
[akpm@linux-foundation.org: use kcalloc()]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Lars Ellenberg <lars@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Otherwise, the warning at the top of vsnprintf() gets triggered by
kvasprintf()'s first invocation (with NULL buffer and zero size) of
vsnprintf().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manually adjusting the smp_affinity for IRQ's becomes unwieldy when the
cpu count is large.
Setting smp affinity to cpus 256 to 263 would be:
echo 000000ff,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > smp_affinity
instead of:
echo 256-263 > smp_affinity_list
Think about what it looks like for cpus around say, 4088 to 4095.
We already have many alternate "list" interfaces:
/sys/devices/system/cpu/cpuX/indexY/shared_cpu_list
/sys/devices/system/cpu/cpuX/topology/thread_siblings_list
/sys/devices/system/cpu/cpuX/topology/core_siblings_list
/sys/devices/system/node/nodeX/cpulist
/sys/devices/pci***/***/local_cpulist
Add a companion interface, smp_affinity_list to use cpu lists instead of
cpu maps. This conforms to other companion interfaces where both a map
and a list interface exists.
This required adding a bitmap_parselist_user() function in a manner
similar to the bitmap_parse_user() function.
[akpm@linux-foundation.org: make __bitmap_parselist() static]
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Architectures that implement their own show_mem() function did not pass
the filter argument to show_free_areas() to appropriately avoid emitting
the state of nodes that are disallowed in the current context. This patch
now passes the filter argument to show_free_areas() so those nodes are now
avoided.
This patch also removes the show_free_areas() wrapper around
__show_free_areas() and converts existing callers to pass an empty filter.
ia64 emits additional information for each node, so skip_free_areas_zone()
must be made global to filter disallowed nodes and it is converted to use
a nid argument rather than a zone for this use case.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
b43: fix comment typo reqest -> request
Haavard Skinnemoen has left Atmel
cris: typo in mach-fs Makefile
Kconfig: fix copy/paste-ism for dell-wmi-aio driver
doc: timers-howto: fix a typo ("unsgined")
perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
treewide: fix a few typos in comments
regulator: change debug statement be consistent with the style of the rest
Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
audit: acquire creds selectively to reduce atomic op overhead
rtlwifi: don't touch with treewide double semicolon removal
treewide: cleanup continuations and remove logging message whitespace
ath9k_hw: don't touch with treewide double semicolon removal
include/linux/leds-regulator.h: fix syntax in example code
tty: fix typo in descripton of tty_termios_encode_baud_rate
xtensa: remove obsolete BKL kernel option from defconfig
m68k: fix comment typo 'occcured'
arch:Kconfig.locks Remove unused config option.
treewide: remove extra semicolons
...
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
seqlock: Don't smp_rmb in seqlock reader spin loop
watchdog, hung_task_timeout: Add Kconfig configurable default
lockdep: Remove cmpxchg to update nr_chain_hlocks
lockdep: Print a nicer description for simple irq lock inversions
lockdep: Replace "Bad BFS generated tree" message with something less cryptic
lockdep: Print a nicer description for irq inversion bugs
lockdep: Print a nicer description for simple deadlocks
lockdep: Print a nicer description for normal deadlocks
lockdep: Print a nicer description for irq lock inversions
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, gart: Rename pci-gart_64.c to amd_gart_64.c
x86/amd-iommu: Use threaded interupt handler
arch/x86/kernel/pci-iommu_table.c: Convert sprintf_symbol to %pS
x86/amd-iommu: Add support for invalidate_all command
x86/amd-iommu: Add extended feature detection
x86/amd-iommu: Add ATS enable/disable code
x86/amd-iommu: Add flag to indicate IOTLB support
x86/amd-iommu: Flush device IOTLB if ATS is enabled
x86/amd-iommu: Select PCI_IOV with AMD IOMMU driver
PCI: Move ATS declarations in seperate header file
dma-debug: print information about leaked entry
x86/amd-iommu: Flush all internal TLBs when IOMMUs are enabled
x86/amd-iommu: Rename iommu_flush_device
x86/amd-iommu: Improve handling of full command buffer
x86/amd-iommu: Rename iommu_flush* to domain_flush*
x86/amd-iommu: Remove command buffer resetting logic
x86/amd-iommu: Cleanup completion-wait handling
x86/amd-iommu: Cleanup inv_pages command handling
x86/amd-iommu: Move inv-dte command building to own function
x86/amd-iommu: Move compl-wait command building to own function
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm:
kmemleak: Initialise kmemleak after debug_objects_mem_init()
kmemleak: Select DEBUG_FS unconditionally in DEBUG_KMEMLEAK
kmemleak: Do not return a pointer to an object that kmemleak did not get
In the past DEBUG_FS used to depend on SYSFS and DEBUG_KMEMLEAK selected
it conditionally. This is no longer the case, so always select DEBUG_FS
via DEBUG_KMEMLEAK.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This is a rename of the usr_strtobool proposal, which was a renamed,
relocated and fixed version of previous kstrtobool RFC
Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There a large number hand-coded binary searches in the kernel (run
"git grep search | grep binary" to find many of them). Since in my
experience, hand-coding binary searches can be error-prone, it seems
worth cleaning this up by providing a generic binary search function.
This generic binary search implementation comes from Ksplice. It has
the same basic API as the C library bsearch() function. Ksplice uses
it in half a dozen places with 4 different comparison functions, and I
think our code is substantially cleaner because of this.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Extra-bikeshedding-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Extra-bikeshedding-by: André Goddard Rosa <andre.goddard@gmail.com>
Extra-bikeshedding-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
kptr_restrict has been triggering bugs in apps such as perf, and it also makes
the system less useful by default, so turn it off by default.
This is how we generally handle security features that remove functionality,
such as firewall code or SELinux - they have to be configured and activated
from user-space.
Distributions can turn kptr_restrict on again via this line in
/etc/sysctrl.conf:
kernel.kptr_restrict = 1
( Also mark the variable __read_mostly while at it, as it's typically modified
only once per bootup, or not at all. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The prohibition of DEBUG_OBJECTS_RCU_HEAD from !PREEMPT was due to the
fixup actions. So just produce a warning from !PREEMPT.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The RCU CPU stall warnings can now be controlled using the
rcu_cpu_stall_suppress boot-time parameter or via the same parameter
from sysfs. There is therefore no longer any reason to have
kernel config parameters for this feature. This commit therefore
removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE
kernel config parameters. The RCU_CPU_STALL_TIMEOUT parameter remains
to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter
remains to allow task-stall information to be suppressed if desired.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Many of the syscalls mentioned in the audit code are not present
for architectures that implement only the "standard" set of
Linux syscalls (e.g. openat, but not open, etc.). This change
adds proper #ifdefs for all those syscalls.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
The old code considered valid empty LZMA2 streams to be corrupt.
Note that a typical empty .xz file has no LZMA2 data at all,
and thus most .xz files having no uncompressed data are handled
correctly even without this fix.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just like kmalloc will allow one to allocate a 0 length segment of memory
flex arrays should do the same thing. It should bomb if you try to use
something, but it should at least allow the allocation.
This is needed because when SELinux switched to using flex_arrays in 2.6.38
the inability to allocate a 0 length array resulted in SELinux policy load
returning -ENOSPC when previously it worked.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Chris Richards <gizmo@giz-works.com>
Cc: stable@kernel.org [2.6.38+]
Change flex_array_prealloc to take the number of elements for which space
should be allocated instead of the last (inclusive) element. Users
and documentation are updated accordingly. flex_arrays got introduced before
they had users. When folks started using it, they ended up needing a
different API than was coded up originally. This swaps over to the API that
folks apparently need.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Chris Richards <gizmo@giz-works.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: stable@kernel.org [2.6.38+]
flex_arrays are supposed to be a replacement for:
kmalloc(num_elements * sizeof(element))
If kmalloc is given 0 num_elements or a 0 size element it will happily return
ZERO_SIZE_PTR. Which looks like a valid allocation, but which will explode if
something actually try to use it. The current flex_array code will return an
equivalent result if num_elements is 0, but will fail to work if
sizeof(element) is 0. This patch allows allocation to work even for 0 size
elements. It will cause flex_arrays to explode though if they are used.
Imitating the kmalloc behavior.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Just like kmalloc will allow one to allocate a 0 length segment of memory
flex arrays should do the same thing. It should bomb if you try to use
something, but it should at least allow the allocation.
This is needed because when SELinux switched to using flex_arrays in 2.6.38
the inability to allocate a 0 length array resulted in SELinux policy load
returning -ENOSPC when previously it worked.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Chris Richards <gizmo@giz-works.com>
Cc: stable@kernel.org [2.6.38+]
Change flex_array_prealloc to take the number of elements for which space
should be allocated instead of the last (inclusive) element. Users
and documentation are updated accordingly. flex_arrays got introduced before
they had users. When folks started using it, they ended up needing a
different API than was coded up originally. This swaps over to the API that
folks apparently need.
Based-on-patch-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Chris Richards <gizmo@giz-works.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: stable@kernel.org [2.6.38+]
This patch allows the default value for sysctl_hung_task_timeout_secs
to be set at build time. The feature carries virtually no overhead,
so it makes sense to keep it enabled. On heavily loaded systems, though,
it can end up triggering stack traces when there is no bug other than
the system being underprovisioned. We use this patch to keep the hung task
facility available but disabled at boot-time.
The default of 120 seconds is preserved. As a note, commit e162b39a may
have accidentally reverted commit fb822db4, which raised the default from
120 seconds to 480 seconds.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Mandeep Singh Baines <msb@google.com>
Link: http://lkml.kernel.org/r/4DB8600C.8080000@suse.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the following warnings:
CC [M] lib/test-kstrtox.o
lib/test-kstrtox.c: In function 'test_kstrtou64_ok':
lib/test-kstrtox.c:318: warning: this decimal constant is unsigned only in ISO C90
...
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cite Documentation/kernel-parameters.txt for an alternative to
building with PRINTK_TIME compiled in.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When driver leak dma mapping, print additional information about one of
leaked entries, to to help investigate problem. Patch should be useful
for debugging drivers, which maps many different class of buffers.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
"You probably want ... instead." sounds like a recommendation better
not to use the v... functions.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
mm/kmemleak-test.c is used to provide an example of how kmemleak
tool works.
Memory is leaked at module unload-time, so building the test
in kernel (Y) makes the leaks impossible and the test useless.
Qualify DEBUG_KMEMLEAK_TEST config symbol with "depends on m",
to restrict module-only building.
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Fix WARN_ON() test for UP
WARN_ON_SMP(): Allow use in if() statements on UP
x86, dumpstack: Use %pB format specifier for stack trace
vsprintf: Introduce %pB format specifier
lockdep: Remove unused 'factor' variable from lockdep_stats_show()
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (9356 commits)
[media] rc: update for bitop name changes
fs: simplify iget & friends
fs: pull inode->i_lock up out of writeback_single_inode
fs: rename inode_lock to inode_hash_lock
fs: move i_wb_list out from under inode_lock
fs: move i_sb_list out from under inode_lock
fs: remove inode_lock from iput_final and prune_icache
fs: Lock the inode LRU list separately
fs: factor inode disposal
fs: protect inode->i_state with inode->i_lock
lib, arch: add filter argument to show_mem and fix private implementations
SLUB: Write to per cpu data when allocating it
slub: Fix debugobjects with lockless fastpath
autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
autofs4 - remove autofs4_lock
autofs4 - fix d_manage() return on rcu-walk
autofs4 - fix autofs4_expire_indirect() traversal
autofs4 - fix dentry leak in autofs4_expire_direct()
autofs4 - reinstate last used update on access
vfs - check non-mountpoint dentry might block in __follow_mount_rcu()
...
NOTE!
This merge commit was created to fix compilation error. The block
tree was merged upstream and removed the 'elv_queue_empty()'
function which the new 'mtdswap' driver is using. So a simple
merge of the mtd tree with upstream does not compile. And the
mtd tree has already be published, so re-basing it is not an option.
To fix this unfortunate situation, I had to merge upstream into the
mtd-2.6.git tree without committing, put the fixup patch on top of
this, and then commit this. The result is that we do not have commits
which do not compile.
In other words, this merge commit "merges" 3 things: the MTD tree, the
upstream tree, and the fixup patch.
Commit ddd588b5dd ("oom: suppress nodes that are not allowed from
meminfo on oom kill") moved lib/show_mem.o out of lib/lib.a, which
resulted in build warnings on all architectures that implement their own
versions of show_mem():
lib/lib.a(show_mem.o): In function `show_mem':
show_mem.c:(.text+0x1f4): multiple definition of `show_mem'
arch/sparc/mm/built-in.o:(.text+0xd70): first defined here
The fix is to remove __show_mem() and add its argument to show_mem() in
all implementations to prevent this breakage.
Architectures that implement their own show_mem() actually don't do
anything with the argument yet, but they could be made to filter nodes
that aren't allowed in the current context in the future just like the
generic implementation.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The %pB format specifier is for stack backtrace. Its handler
sprint_backtrace() does symbol lookup using (address-1) to
ensure the address will not point outside of the function.
If there is a tail-call to the function marked "noreturn",
gcc optimized out the code after the call then causes saved
return address points outside of the function (i.e. the start
of the next function), so pollutes call trace somewhat.
This patch adds the %pB printk mechanism that allows architecture
call-trace printout functions to improve backtrace printouts.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
LKML-Reference: <1300934550-21394-1-git-send-email-namhyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This introduces CONFIG_GENERIC_FIND_BIT_LE to tell whether to use generic
implementation of find_*_bit_le() in lib/find_next_bit.c or not.
For now we select CONFIG_GENERIC_FIND_BIT_LE for all architectures which
enable CONFIG_GENERIC_FIND_NEXT_BIT.
But m68knommu wants to define own faster find_next_zero_bit_le() and
continues using generic find_next_{,zero_}bit().
(CONFIG_GENERIC_FIND_NEXT_BIT and !CONFIG_GENERIC_FIND_BIT_LE)
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the little-endian bitops take any pointer types by changing the
prototypes and adding casts in the preprocessor macros.
That would seem to at least make all the filesystem code happier, and they
can continue to do just something like
#define ext2_set_bit __test_and_set_bit_le
(or whatever the exact sequence ends up being).
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of always creating a huge (268K) deflate_workspace with the
maximum compression parameters (windowBits=15, memLevel=8), allow the
caller to obtain a smaller workspace by specifying smaller parameter
values.
For example, when capturing oops and panic reports to a medium with
limited capacity, such as NVRAM, compression may be the only way to
capture the whole report. In this case, a small workspace (24K works
fine) is a win, whether you allocate the workspace when you need it (i.e.,
during an oops or panic) or at boot time.
I've verified that this patch works with all accepted values of windowBits
(positive and negative), memLevel, and compression level.
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. simple_strto*() do not contain overflow checks and crufty,
libc way to indicate failure.
2. strict_strto*() also do not have overflow checks but the name and
comments pretend they do.
3. Both families have only "long long" and "long" variants,
but users want strtou8()
4. Both "simple" and "strict" prefixes are wrong:
Simple doesn't exactly say what's so simple, strict should not exist
because conversion should be strict by default.
The solution is to use "k" prefix and add convertors for more types.
Enter
kstrtoull()
kstrtoll()
kstrtoul()
kstrtol()
kstrtouint()
kstrtoint()
kstrtou64()
kstrtos64()
kstrtou32()
kstrtos32()
kstrtou16()
kstrtos16()
kstrtou8()
kstrtos8()
Include runtime testsuite (somewhat incomplete) as well.
strict_strto*() become deprecated, stubbed to kstrto*() and
eventually will be removed altogether.
Use kstrto*() in code today!
Note: on some archs _kstrtoul() and _kstrtol() are left in tree, even if
they'll be unused at runtime. This is temporarily solution,
because I don't want to hardcode list of archs where these
functions aren't needed. Current solution with sizeof() and
__alignof__ at least always works.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've been burned by regressions/bugs which we later realized could have
been triaged quicker if only we'd paid closer attention to dmesg. To make
it easier to audit dmesg, we'd like to make DEFAULT_MESSAGE_LEVEL
Kconfig-settable. That way we can set it to KERN_NOTICE and audit any
messages <= KERN_WARNING.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In an effort to reduce kernel address leaks that might be used to help
target kernel privilege escalation exploits, this patch uses %pK when
displaying addresses in /proc/kallsyms, /proc/modules, and
/sys/module/*/sections/*.
Note that this changes %x to %p, so some legitimately 0 values in
/proc/kallsyms would have changed from 00000000 to "(null)". To avoid
this, "(null)" is not used when using the "K" format. Anything that was
already successfully parsing "(null)" in addition to full hex digits
should have no problem with this change. (Thanks to Joe Perches for the
suggestion.) Due to the %x to %p, "void *" casts are needed since these
addresses are already "unsigned long" everywhere internally, due to their
starting life as ELF section offsets.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Cc: Eugene Teo <eugene@redhat.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If kptr restrictions are on, just set the passed pointer to NULL.
$ size lib/vsprintf.o.*
text data bss dec hex filename
8247 4 2 8253 203d lib/vsprintf.o.new
8282 4 2 8288 2060 lib/vsprintf.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a cpu is considered stuck, instead of limping along and just printing
a warning, it is sometimes preferred to just panic, let kdump capture the
vmcore and reboot. This gets the machine back into a stable state quickly
while saving the info that got it into a stuck state to begin with.
Add a Kconfig option to allow users to set the hardlockup to panic
by default. Also add in a 'nmi_watchdog=nopanic' to override this.
[akpm@linux-foundation.org: fix strncmp length]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The oom killer is extremely verbose for machines with a large number of
cpus and/or nodes. This verbosity can often be harmful if it causes other
important messages to be scrolled from the kernel log and incurs a
signicant time delay, specifically for kernels with CONFIG_NODES_SHIFT >
8.
This patch causes only memory information to be displayed for nodes that
are allowed by current's cpuset when dumping the VM state. Information
for all other nodes is irrelevant to the oom condition; we don't care if
there's an abundance of memory elsewhere if we can't access it.
This only affects the behavior of dumping memory information when an oom
is triggered. Other dumps, such as for sysrq+m, still display the
unfiltered form when using the existing show_mem() interface.
Additionally, the per-cpu pageset statistics are extremely verbose in oom
killer output, so it is now suppressed. This removes
nodes_weight(current->mems_allowed) * (1 + nr_cpus)
lines from the oom killer output.
Callers may use __show_mem(SHOW_MEM_FILTER_NODES) to filter disallowed
nodes.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
kbuild: Make DEBUG_SECTION_MISMATCH selectable, but not on by default
genksyms: Regenerate lexer and parser
genksyms: Track changes to enum constants
genksyms: simplify usage of find_symbol()
genksyms: Add helpers for building string lists
genksyms: Simplify printing of symbol types
genksyms: Simplify lexer
genksyms: Do not paste the bison header file to lex.c
modpost: fix trailing comma
KBuild: silence "'scripts/unifdef' is up to date."
kbuild: Add extra gcc checks
kbuild: reenable section mismatch analysis
unifdef: update to upstream version 2.5
CONFIG_DEBUG_SECTION_MISMATCH has also runtime effects due to the
-fno-inline-functions-called-once compiler flag, so forcing it on
everyone is not a good idea.
Signed-off-by: Michal Marek <mmarek@suse.cz>
* 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
BKL: That's all, folks
fs/locks.c: Remove stale FIXME left over from BKL conversion
ipx: remove the BKL
appletalk: remove the BKL
x25: remove the BKL
ufs: remove the BKL
hpfs: remove the BKL
drivers: remove extraneous includes of smp_lock.h
tracing: don't trace the BKL
adfs: remove the big kernel lock
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
bonding: enable netpoll without checking link status
xfrm: Refcount destination entry on xfrm_lookup
net: introduce rx_handler results and logic around that
bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
bonding: wrap slave state work
net: get rid of multiple bond-related netdevice->priv_flags
bonding: register slave pointer for rx_handler
be2net: Bump up the version number
be2net: Copyright notice change. Update to Emulex instead of ServerEngines
e1000e: fix kconfig for crc32 dependency
netfilter ebtables: fix xt_AUDIT to work with ebtables
xen network backend driver
bonding: Improve syslog message at device creation time
bonding: Call netif_carrier_off after register_netdevice
bonding: Incorrect TX queue offset
net_sched: fix ip_tos2prio
xfrm: fix __xfrm_route_forward()
be2net: Fix UDP packet detected status in RX compl
Phonet: fix aligned-mode pipe socket buffer header reserve
netxen: support for GbE port settings
...
Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
* 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
futex: Deobfuscate handle_futex_death()
plist: Add priority list test
plist: Shrink struct plist_head
futex,plist: Remove debug lock assignment from plist_node
futex,plist: Pass the real head of the priority list to plist_del()
futex: Sanitize futex ops argument types
futex: Sanitize cmpxchg_futex_value_locked API
futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic()
futex: Avoid redudant evaluation of task_pid_vnr()
futex: Update futex_wait_setup comments about locking
Add test code for checking plist when the kernel is booting.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4D107986.1010302@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
struct plist_head is used in struct task_struct as well as struct
rtmutex. If we can make it smaller, it will also make these structures
smaller as well.
The field prio_list in struct plist_head is seldom used and we can get
its information from the plist_nodes. Removing this field will decrease
the size of plist_head by half.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4D107982.9090700@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This is a new software BCH encoding/decoding library, similar to the shared
Reed-Solomon library.
Binary BCH (Bose-Chaudhuri-Hocquenghem) codes are widely used to correct
errors in NAND flash devices requiring more than 1-bit ecc correction; they
are generally better suited for NAND flash than RS codes because NAND bit
errors do not occur in bursts. Latest SLC NAND devices typically require at
least 4-bit ecc protection per 512 bytes block.
This library provides software encoding/decoding, but may also be used with
ASIC/SoC hardware BCH engines to perform error correction. It is being
currently used for this purpose on an OMAP3630 board (4bit/8bit HW BCH). It
has also been used to decode raw dumps of NAND devices with on-die BCH ecc
engines (e.g. Micron 4bit ecc SLC devices).
Latest NAND devices (including SLC) can exhibit high error rates (typically
a dozen or more bitflips per hour during stress tests); in order to
minimize the performance impact of error correction, this library
implements recently developed algorithms for fast polynomial root finding
(see bch.c header for details) instead of the traditional exhaustive Chien
root search; a few performance figures are provided below:
Platform: arm926ejs @ 468 MHz, 32 KiB icache, 16 KiB dcache
BCH ecc : 4-bit per 512 bytes
Encoding average throughput: 250 Mbits/s
Error correction time (compared with Chien search):
average worst average (Chien) worst (Chien)
----------------------------------------------------------
1 bit 8.5 µs 11 µs 200 µs 383 µs
2 bit 9.7 µs 12.5 µs 477 µs 728 µs
3 bit 18.1 µs 20.6 µs 758 µs 1010 µs
4 bit 19.5 µs 23 µs 1028 µs 1280 µs
In the above figures, "worst" is meant in terms of error pattern, not in
terms of cache miss / page faults effects (not taken into account here).
The library has been extensively tested on the following platforms: x86,
x86_64, arm926ejs, omap3630, qemu-ppc64, qemu-mips.
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In complex subsystems like mac80211 structures can contain several
timers and work structs, so identifying a specific instance from the
call trace and object type output of debugobjects can be hard.
Allow the subsystems which support debugobjects to provide a hint
function. This function returns a pointer to a kernel address
(preferrably the objects callback function) which is printed along
with the debugobjects type.
Add hint methods for timer_list, work_struct and hrtimer.
[ tglx: Massaged changelog, made it compile ]
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <20110307085809.GA9334@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This removes the implementation of the big kernel lock,
at last. A lot of people have worked on this in the
past, I so the credit for this patch should be with
everyone who participated in the hunt.
The names on the Cc list are the people that were the
most active in this, according to the recorded git
history, in alphabetical order.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Blunck <jblunck@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: Paul Menage <menage@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Make CONFIG_AVERAGE selectable for out-of-tree users
such as compat-wireless.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
MAINTAINERS: Add Andy Gospodarek as co-maintainer.
r8169: disable ASPM
RxRPC: Fix v1 keys
AF_RXRPC: Handle receiving ACKALL packets
cnic: Fix lost interrupt on bnx2x
cnic: Prevent status block race conditions with hardware
net: dcbnl: check correct ops in dcbnl_ieee_set()
e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead
igb: fix sparse warning
e1000: fix sparse warning
netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values
dccp: fix oops on Reset after close
ipvs: fix dst_lock locking on dest update
davinci_emac: Add Carrier Link OK check in Davinci RX Handler
bnx2x: update driver version to 1.62.00-6
bnx2x: properly calculate lro_mss
bnx2x: perform statistics "action" before state transition.
bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode).
bnx2x: Fix ethtool -t link test for MF (non-pmf) devices.
bnx2x: Fix nvram test for single port devices.
...
Currently nla_policy_len always returns n * NLA_HDRLEN:
It loops, but does not advance it's iterator.
NLA_UNSPEC == 0 does not contain a .len in any policy.
Trivially fixed by adding p++.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
swiotlb's map_page wrongly calls panic() when it can't find a buffer fit
for device's dma mask. It should return an error instead.
Devices with an odd dma mask (i.e. under 4G) like b44 network card hit
this bug (the system crashes):
http://marc.info/?l=linux-kernel&m=129648943830106&w=2
If swiotlb returns an error, b44 driver can use the own bouncing
mechanism.
Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was disabled in commit
e5f95c8 (kbuild: print only total number of section mismatces found)
because there were too many warnings. Now we're down to a reasonable
number again, so we start scaring people with the details.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>