The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files. Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.
To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.
Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs. ones which emit errors. The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.
While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms. This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.
show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.
* Archs which didn't print debug info now do.
alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
um, xtensa
* Already prints debug info. Replaced with show_regs_print_info().
The printed information is superset of what used to be there.
arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
* s390 is special in that it used to print arch-specific information
along with generic debug info. Heiko and Martin think that the
arch-specific extra isn't worth keeping s390 specfic implementation.
Converted to use the generic version.
Note that now all archs print the debug info before actual register
dumps.
An example BUG() dump follows.
kernel BUG at /work/os/work/kernel/workqueue.c:4841!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6
RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
Call Trace:
[<ffffffff81000312>] do_one_initcall+0x122/0x170
[<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
[<ffffffff81c47760>] ? rest_init+0x140/0x140
[<ffffffff81c4776e>] kernel_init+0xe/0xf0
[<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
[<ffffffff81c47760>] ? rest_init+0x140/0x140
...
v2: Typo fix in x86-32.
v3: CPU number dropped from show_regs_print_info() as
dump_stack_print_info() has been updated to print it. s390
specific implementation dropped as requested by s390 maintainers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86 and ia64 can acquire extra hardware identification information
from DMI and print it along with task dumps; however, the usage isn't
consistent.
* x86 show_regs() collects vendor, product and board strings and print
them out with PID, comm and utsname. Some of the information is
printed again later in the same dump.
* warn_slowpath_common() explicitly accesses the DMI board and prints
it out with "Hardware name:" label. This applies to both x86 and
ia64 but is irrelevant on all other archs.
* ia64 doesn't show DMI information on other non-WARN dumps.
This patch introduces arch-specific hardware description used by
dump_stack(). It can be set by calling dump_stack_set_arch_desc()
during boot and, if exists, printed out in a separate line with
"Hardware name:" label.
dmi_set_dump_stack_arch_desc() is added which sets arch-specific
description from DMI data. It uses dmi_ids_string[] which is set from
dmi_present() used for DMI debug message. It is superset of the
information x86 show_regs() is using. The function is called from x86
and ia64 boot code right after dmi_scan_machine().
This makes the explicit DMI handling in warn_slowpath_common()
unnecessary. Removed.
show_regs() isn't yet converted to use generic debug information
printing and this patch doesn't remove the duplicate DMI handling in
x86 show_regs(). The next patch will unify show_regs() handling and
remove the duplication.
An example WARN dump follows.
WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
Call Trace:
[<ffffffff81c614dc>] dump_stack+0x19/0x1b
[<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
[<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8234a0c3>] init_workqueues+0x35/0x505
...
v2: Use the same string as the debug message from dmi_present() which
also contains BIOS information. Move hardware name into its own
line as warn_slowpath_common() did. This change was suggested by
Bjorn Helgaas.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both dump_stack() and show_stack() are currently implemented by each
architecture. show_stack(NULL, NULL) dumps the backtrace for the
current task as does dump_stack(). On some archs, dump_stack() prints
extra information - pid, utsname and so on - in addition to the
backtrace while the two are identical on other archs.
The usages in arch-independent code of the two functions indicate
show_stack(NULL, NULL) should print out bare backtrace while
dump_stack() is used for debugging purposes when something went wrong,
so it does make sense to print additional information on the task which
triggered dump_stack().
There's no reason to require archs to implement two separate but mostly
identical functions. It leads to unnecessary subtle information.
This patch expands the dummy fallback dump_stack() implementation in
lib/dump_stack.c such that it prints out debug information (taken from
x86) and invokes show_stack(NULL, NULL) and drops arch-specific
dump_stack() implementations in all archs except blackfin. Blackfin's
dump_stack() does something wonky that I don't understand.
Debug information can be printed separately by calling
dump_stack_print_info() so that arch-specific dump_stack()
implementation can still emit the same debug information. This is used
in blackfin.
This patch brings the following behavior changes.
* On some archs, an extra level in backtrace for show_stack() could be
printed. This is because the top frame was determined in
dump_stack() on those archs while generic dump_stack() can't do that
reliably. It can be compensated by inlining dump_stack() but not
sure whether that'd be necessary.
* Most archs didn't use to print debug info on dump_stack(). They do
now.
An example WARN dump follows.
WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
Hardware name: empty
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
Call Trace:
[<ffffffff81c614dc>] dump_stack+0x19/0x1b
[<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8234a071>] init_workqueues+0x35/0x505
...
v2: CPU number added to the generic debug info as requested by s390
folks and dropped the s390 specific dump_stack(). This loses %ksp
from the debug message which the maintainers think isn't important
enough to keep the s390-specific dump_stack() implementation.
dump_stack_print_info() is moved to kernel/printk.c from
lib/dump_stack.c. Because linkage is per objecct file,
dump_stack_print_info() living in the same lib file as generic
dump_stack() means that archs which implement custom dump_stack()
- at this point, only blackfin - can't use dump_stack_print_info()
as that will bring in the generic version of dump_stack() too. v1
The v1 patch broke build on blackfin due to this issue. The build
breakage was reported by Fengguang Wu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are multiple ways a task can be dumped - explicit call to
dump_stack(), triggering WARN() or BUG(), through sysrq-t and so on.
Most of what gets printed is upto each architecture and the current
state is not particularly pretty. Different pieces of information are
presented differently depending on which path the dump takes and which
architecture it's running on. This is messy for no good reason and
makes it exceedingly difficult to add or modify debug information to
task dumps.
In all archs except for s390, there's nothing arch-specific about the
printed debug information. This patchset updates all those archs to use
the same helpers to consistently print out the same debug information.
An example WARN dump after this patchset.
WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
Call Trace:
[<ffffffff81c614dc>] dump_stack+0x19/0x1b
[<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
[<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8234a0c3>] init_workqueues+0x35/0x505
...
And BUG dump.
kernel BUG at kernel/workqueue.c:4841!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6
RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
Call Trace:
[<ffffffff81000312>] do_one_initcall+0x122/0x170
[<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
[<ffffffff81c47760>] ? rest_init+0x140/0x140
[<ffffffff81c4776e>] kernel_init+0xe/0xf0
[<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
[<ffffffff81c47760>] ? rest_init+0x140/0x140
...
This patchset contains the following seven patches.
0001-x86-don-t-show-trace-beyond-show_stack-NULL-NULL.patch
0002-sparc32-make-show_stack-acquire-fp-if-_ksp-is-not-sp.patch
0003-dump_stack-consolidate-dump_stack-implementations-an.patch
0004-dmi-morph-dmi_dump_ids-into-dmi_format_ids-which-for.patch
0005-dump_stack-implement-arch-specific-hardware-descript.patch
0006-dump_stack-unify-debug-information-printed-by-show_r.patch
0007-arc-print-fatal-signals-reduce-duplicated-informatio.patch
0001-0002 update stack dumping functions in x86 and sparc32 in
preparation.
0003 makes all arches except blackfin use generic dump_stack().
blackfin still uses the generic helper to print the same info.
0004-0005 properly abstract DMI identifier printing in WARN() and
show_regs() so that all dumps print out the information. This enables
show_regs() to use the same debug info message.
0006 updates show_regs() of all arches to use a common generic helper
to print debug info.
0007 removes somem duplicate information from arc dumps.
While this patchset changes how debug info is printed on some archs,
the printed information is always superset of what used to be there.
This patchset makes task dump debug messages consistent and enables
adding more information. Workqueue is scheduled to add worker
information including the workqueue in use and work item specific
description.
While this patch touches a lot of archs, it isn't too likely to cause
non-trivial conflicts with arch-specfic changes and would probably be
best to route together either through -mm.
x86 is tested but other archs are either only compile tested or not
tested at all. Changes to most archs are generally trivial.
This patch:
show_stack(current or NULL, NULL) is used to print the backtrace of the
current task. As trace beyond the function itself isn't of much
interest to anyone, don't show it by determining sp and bp in
show_stack()'s frame and passing them to show_stack_log_lvl().
This brings show_stack(NULL, NULL)'s behavior in line with
dump_stack().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 RAS changes from Ingo Molnar:
- Add an Intel CMCI hotplug fix
- Add AMD family 16h EDAC support
- Make the AMD MCE banks code more flexible for virtual environments
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
amd64_edac: Add Family 16h support
x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD
x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper
Pull x86 platform changes from Ingo Molnar:
"Small fixes and cleanups all over the map"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/setup: Drop unneeded include <asm/dmi.h>
x86/olpc/xo1/sci: Don't call input_free_device() after input_unregister_device()
x86/platform/intel/mrst: Remove cast for kmalloc() return value
x86/platform/uv: Replace kmalloc() & memset with kzalloc()
Pull x86 paravirt update from Ingo Molnar:
"Various paravirtualization related changes - the biggest one makes
guest support optional via CONFIG_HYPERVISOR_GUEST"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, wakeup, sleep: Use pvops functions for changing GDT entries
x86, xen, gdt: Remove the pvops variant of store_gdt.
x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.
x86: Make Linux guest support optional
x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu
Pull x86 mm changes from Ingo Molnar:
"Misc smaller changes all over the map"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/iommu/dmar: Remove warning for HPET scope type
x86/mm/gart: Drop unnecessary check
x86/mm/hotplug: Put kernel_physical_mapping_remove() declaration in CONFIG_MEMORY_HOTREMOVE
x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER
x86/mm/numa: Simplify some bit mangling
x86/mm: Re-enable DEBUG_TLBFLUSH for X86_32
x86/mm/cpa: Cleanup split_large_page() and its callee
x86: Drop always empty .text..page_aligned section
Pull perparatory x86 kasrl changes from Ingo Molnar:
"This contains changes from the ongoing KASLR work, by Kees Cook.
The main changes are the use of a read-only IDT on x86 (which
decouples the userspace visible virtual IDT address from the physical
address), and a rework of ELF relocation support, in preparation of
random, boot-time kernel image relocation."
* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF
x86, relocs: Build separate 32/64-bit tools
x86, relocs: Add 64-bit ELF support to relocs tool
x86, relocs: Consolidate processing logic
x86, relocs: Generalize ELF structure names
x86: Use a read-only IDT alias on all CPUs
Pull x86 debug update from Ingo Molnar:
"Two small changes: a documentation update and a constification"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, early-printk: Update earlyprintk documentation (and kill x86 copy)
x86: Constify a few items
Pull x86 cleanups from Ingo Molnar:
"Misc smaller cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Fix spelling, put space between a numeral and its units
x86/lib: Fix spelling in the comments
x86, quirks: Shut-up a long-standing gcc warning
x86, msr: Unify variable names
x86-64, docs, mm: Add vsyscall range to virtual address space layout
x86: Drop KERNEL_IMAGE_START
x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety
Pull core timer updates from Ingo Molnar:
"The main changes in this cycle's merge are:
- Implement shadow timekeeper to shorten in kernel reader side
blocking, by Thomas Gleixner.
- Posix timers enhancements by Pavel Emelyanov:
- allocate timer ID per process, so that exact timer ID allocations
can be re-created be checkpoint/restore code.
- debuggability and tooling (/proc/PID/timers, etc.) improvements.
- suspend/resume enhancements by Feng Tang: on certain new Intel Atom
processors (Penwell and Cloverview), there is a feature that the
TSC won't stop in S3 state, so the TSC value won't be reset to 0
after resume. This can be taken advantage of by the generic via
the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
recover/approximate sleep time, the main (and precise) clocksource
can be used.
- Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
CPUs the file goes beyond 4MB of size and thus the current
simplistic seqfile approach fails. Convert /proc/timer_list to a
proper seq_file with its own iterator.
- Cleanups and refactorings of the core timekeeping code by John
Stultz.
- International Atomic Clock time is managed by the NTP code
internally currently but not exposed externally. Separate the TAI
code out and add CLOCK_TAI support and TAI support to the hrtimer
and posix-timer code, by John Stultz.
- Add deep idle support enhacement to the broadcast clockevents core
timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
clockevents feature (which will be utilized by future clockevents
driver updates), which allows the use of IRQ affinities to avoid
spurious wakeups of idle CPUs - the right CPU with an expiring
timer will be woken.
- Add new ARM bcm281xx clocksource driver, by Christian Daudt
- ... various other fixes and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
clockevents: Set dummy handler on CPU_DEAD shutdown
timekeeping: Update tk->cycle_last in resume
posix-timers: Remove unused variable
clockevents: Switch into oneshot mode even if broadcast registered late
timer_list: Convert timer list to be a proper seq_file
timer_list: Split timer_list_show_tickdevices
posix-timers: Show sigevent info in proc file
posix-timers: Introduce /proc/PID/timers file
posix timers: Allocate timer id per process (v2)
timekeeping: Make sure to notify hrtimers when TAI offset changes
hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
hrtimer: Add expiry time overflow check in hrtimer_interrupt
timekeeping: Shorten seq_count region
timekeeping: Implement a shadow timekeeper
timekeeping: Delay update of clock->cycle_last
timekeeping: Store cycle_last value in timekeeper struct as well
ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
timekeeping: Simplify tai updating from do_adjtimex
timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
...
Pull SMP/hotplug changes from Ingo Molnar:
"This is a pretty large, multi-arch series unifying and generalizing
the various disjunct pieces of idle routines that architectures have
historically copied from each other and have grown in random, wildly
inconsistent and sometimes buggy directions:
101 files changed, 455 insertions(+), 1328 deletions(-)
this went through a number of review and test iterations before it was
committed, it was tested on various architectures, was exposed to
linux-next for quite some time - nevertheless it might cause problems
on architectures that don't read the mailing lists and don't regularly
test linux-next.
This cat herding excercise was motivated by the -rt kernel, and was
brought to you by Thomas "the Whip" Gleixner."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
idle: Remove GENERIC_IDLE_LOOP config switch
um: Use generic idle loop
ia64: Make sure interrupts enabled when we "safe_halt()"
sparc: Use generic idle loop
idle: Remove unused ARCH_HAS_DEFAULT_IDLE
bfin: Fix typo in arch_cpu_idle()
xtensa: Use generic idle loop
x86: Use generic idle loop
unicore: Use generic idle loop
tile: Use generic idle loop
tile: Enter idle with preemption disabled
sh: Use generic idle loop
score: Use generic idle loop
s390: Use generic idle loop
powerpc: Use generic idle loop
parisc: Use generic idle loop
openrisc: Use generic idle loop
mn10300: Use generic idle loop
mips: Use generic idle loop
microblaze: Use generic idle loop
...
Pull scheduler changes from Ingo Molnar:
"The main changes in this development cycle were:
- full dynticks preparatory work by Frederic Weisbecker
- factor out the cpu time accounting code better, by Li Zefan
- multi-CPU load balancer cleanups and improvements by Joonsoo Kim
- various smaller fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
sched: Fix init NOHZ_IDLE flag
sched: Prevent to re-select dst-cpu in load_balance()
sched: Rename load_balance_tmpmask to load_balance_mask
sched: Move up affinity check to mitigate useless redoing overhead
sched: Don't consider other cpus in our group in case of NEWLY_IDLE
sched: Explicitly cpu_idle_type checking in rebalance_domains()
sched: Change position of resched_cpu() in load_balance()
sched: Fix wrong rq's runnable_avg update with rt tasks
sched: Document task_struct::personality field
sched/cpuacct/UML: Fix header file dependency bug on the UML build
cgroup: Kill subsys.active flag
sched/cpuacct: No need to check subsys active state
sched/cpuacct: Initialize cpuacct subsystem earlier
sched/cpuacct: Initialize root cpuacct earlier
sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically
sched/cpuacct: Clean up cpuacct.h
sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field()
sched/cpuacct: Remove redundant NULL checks in cpuacct_charge()
sched/cpuacct: Add cpuacct_acount_field()
sched/cpuacct: Add cpuacct_init()
...
Pull perf updates from Ingo Molnar:
"Features:
- Add "uretprobes" - an optimization to uprobes, like kretprobes are
an optimization to kprobes. "perf probe -x file sym%return" now
works like kretprobes. By Oleg Nesterov.
- Introduce per core aggregation in 'perf stat', from Stephane
Eranian.
- Add memory profiling via PEBS, from Stephane Eranian.
- Event group view for 'annotate' in --stdio, --tui and --gtk, from
Namhyung Kim.
- Add support for AMD NB and L2I "uncore" counters, by Jacob Shin.
- Add Ivy Bridge-EP uncore support, by Zheng Yan
- IBM zEnterprise EC12 oprofile support patchlet from Robert Richter.
- Add perf test entries for checking breakpoint overflow signal
handler issues, from Jiri Olsa.
- Add perf test entry for for checking number of EXIT events, from
Namhyung Kim.
- Add perf test entries for checking --cpu in record and stat, from
Jiri Olsa.
- Introduce perf stat --repeat forever, from Frederik Deweerdt.
- Add --no-demangle to report/top, from Namhyung Kim.
- PowerPC fixes plus a couple of cleanups/optimizations in uprobes
and trace_uprobes, by Oleg Nesterov.
Various fixes and refactorings:
- Fix dependency of the python binding wrt libtraceevent, from
Naohiro Aota.
- Simplify some perf_evlist methods and to allow 'stat' to share code
with 'record' and 'trace', by Arnaldo Carvalho de Melo.
- Remove dead code in related to libtraceevent integration, from
Namhyung Kim.
- Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf
sched lat' back working, by Arnaldo Carvalho de Melo
- We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho
de Melo.
- Kill a bunch of die() calls, from Namhyung Kim.
- Fix build on non-glibc systems due to libio.h absence, from Cody P
Schafer.
- Remove some perf_session and tracing dead code, from David Ahern.
- Honor parallel jobs, fix from Borislav Petkov
- Introduce tools/lib/lk library, initially just removing duplication
among tools/perf and tools/vm. from Borislav Petkov
... and many more I missed to list, see the shortlog and git log for
more details."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits)
perf/x86/intel/P4: Robistify P4 PMU types
perf/x86/amd: Fix AMD NB and L2I "uncore" support
perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c
perf/x86: Check all MSRs before passing hw check
perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
perf/x86/intel: Add Ivy Bridge-EP uncore support
perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management
perf/x86: Avoid kfree() in CPU_{STARTING,DYING}
uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty
uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()
uprobes/tracing: Change create_trace_uprobe() to support uretprobes
uprobes/tracing: Make seq_printf() code uretprobe-friendly
uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly
uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly
uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()
uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers
uprobes/tracing: Generalize struct uprobe_trace_entry_head
uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls
uprobes/tracing: Kill the pointless seq_print_ip_sym() call
uprobes/tracing: Kill the pointless task_pt_regs() calls
...
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pageattr-test calls srandom32() once every test iteration. But calling
srandom32() after late_initcalls is not meaningfull. Because the random
states for random32() is mixed by good random numbers in late_initcall
prandom_reseed().
So this removes the call to srandom32().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The early console implementations are the same all over the place. Move
the print function to kernel/printk and get rid of the copies.
[akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list]
[paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory hotplug can happen on a machine under load, memory shortness
and fragmentation, so huge page allocations for the vmemmap are not
guaranteed to succeed.
Try to fall back to regular pages before failing the hotplug event
completely.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We already have generic code to allocate vmemmap with regular pages, use
it.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No need to maintain addr_end and p_end when they are never actually read
anywhere on !pse setups. Remove the dead code.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sparse code, when asking the architecture to populate the vmemmap,
specifies the section range as a starting page and a number of pages.
This is an awkward interface, because none of the arch-specific code
actually thinks of the range in terms of 'struct page' units and always
translates it to bytes first.
In addition, later patches mix huge page and regular page backing for
the vmemmap. For this, they need to call vmemmap_populate_basepages()
on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these
are not necessarily multiples of the 'struct page' size and so this unit
is too coarse.
Just translate the section range into bytes once in the generic sparse
code, then pass byte ranges down the stack.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset removes vm_struct list management after initializing
vmalloc. Adding and removing an entry to vmlist is linear time
complexity, so it is inefficient. If we maintain this list, overall
time complexity of adding and removing area to vmalloc space is O(N),
although we use rbtree for finding vacant place and it's time complexity
is just O(logN).
And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
It is preferable that we hide this raw data structure and provide
well-defined function for supporting them, because it makes that they
cannot mistake when manipulating theses structure and it makes us easily
maintain vmalloc layer.
For kexec and makedumpfile, I export vmap_area_list, instead of vmlist.
This comes from Atsushi's recommendation. For more information, please
refer below link. https://lkml.org/lkml/2012/12/6/184
This patch:
The purpose of iterating a vmlist is finding vm area with specific virtual
address. find_vm_area() is provided for this purpose and more efficient,
because it uses a rbtree. So change it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit abf09bed3c ("s390/mm: implement software dirty bits")
introduced another difference in the pte layout vs. the pmd layout on
s390, thoroughly breaking the s390 support for hugetlbfs. This requires
replacing some more pte_xxx functions in mm/hugetlbfs.c with a
huge_pte_xxx version.
This patch introduces those huge_pte_xxx functions and their generic
implementation in asm-generic/hugetlb.h, which will now be included on
all architectures supporting hugetlbfs apart from s390. This change
will be a no-op for those architectures.
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts]
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use helper function free_highmem_page() to free highmem pages into
the buddy system.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Attilio Rao <attilio.rao@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use common help functions to free reserved pages.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generate asm-x86/cpufeature.h with posix-2008 commands instead of perl.
Signed-off-by: Rob Landley <rob@landley.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowell@redhat.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Populate the boot_params with EDD data.
- Cleanups in the IRQ code.
Bug-fixes:
- CPU hotplug offline/online in PVHVM mode.
- Re-upload processor PM data after ACPI S3 suspend/resume cycle.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRcWFDAAoJEFjIrFwIi8fJQScIAJmnsPzoqd/USahqgbOvc0Nb
fJgzvno0Lfuvi5NipMpMDNCpCxRuikoJe6ocoF6E0+poucckRGDFSC3R2KVgLR2O
pXkO7bzggkUYkLehW7gXmM4IlvhLxnsfEofDWxG2VJy6RfWxFk+84v/uimQSIB0D
jI9FB3oUhfA+IT8/3Iofyv2OiPH39zNLlzidAnVXmJ8SoJFDLt3l1IymYqVdq6Lt
AKL0IXa31f+yfj9Vli1mkBsxuEIvIo5tVQNn250B4cMlR8mcWsQBnxIeA2M+8Jhl
d2ereTBwhjohCrFR/TWlXYrMlL+XVucI7J7agf3tF4kmDUqjg+c9hUkszSOPjEc=
=yJWr
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen updates from Konrad Rzeszutek Wilk:
"Features:
- Populate the boot_params with EDD data.
- Cleanups in the IRQ code.
Bug-fixes:
- CPU hotplug offline/online in PVHVM mode.
- Re-upload processor PM data after ACPI S3 suspend/resume cycle."
And Konrad gets a gold star for sending the pull request early when he
thought he'd be away for the first week of the merge window (but because
of 3.9 dragging out to -rc8 he then re-sent the reminder on the first
day of the merge window anyway)
* tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: resolve section mismatch warnings in xen-acpi-processor
xen: Re-upload processor PM data to hypervisor after S3 resume (v2)
xen/smp: Unifiy some of the PVs and PVHVM offline CPU path
xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one.
xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM
xen/spinlock: Check against default value of -1 for IRQ line.
xen/time: Add default value of -1 for IRQ and check for that.
xen/events: Check that IRQ value passed in is valid.
xen/time: Fix kasprintf splat when allocating timer%d IRQ line.
xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline
xen/smp: Fix leakage of timer interrupt line for every CPU online/offline.
xen kconfig: fix select INPUT_XEN_KBDDEV_FRONTEND
xen: drop tracking of IRQ vector
x86/xen: populate boot_params with EDD data
Linus found, while extending integer type extension checks in the
sparse static code checker, various fragile patterns of mixed
signed/unsigned 64-bit/32-bit integer use in perf_events_p4.c.
The relevant hardware register ABI is 64 bit wide on 32-bit
kernels as well, so clean it all up a bit, remove unnecessary
casts, and make sure we use 64-bit unsigned integers in these
places.
[ Unfortunately this patch was not tested on real P4 hardware,
those are pretty rare already. If this patch causes any
problems on P4 hardware then please holler ... ]
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130424072630.GB1780@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
on some Apple machines because they implement EFI spec 1.10, which
doesn't provide a QueryVariableInfo() runtime function and the logic
used to check for the existence of that function was insufficient.
Fix from Josh Boyer.
* The anti-bricking algorithm also introduced a compiler warning on
32-bit. Fix from Borislav Petkov.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJReOtLAAoJEC84WcCNIz1VFZgP/Aws1NdPo/RdyI6/oGkI7ZV4
+5O79pLcaJt7ESuWjx2/9pto/qTzsWMri40HZivGbgxw+ViEdprGjJUFqSTn1LyJ
QrYamP40jBdLFfh1oDHvsub8HiC72sjB/ILSoDvooHEniDmajrL6zZK7C66gP+na
Q4ZN/Jp3x3XAW0s1mVJC4VnL60489Q/ndR3SH01hr2gqMSvmjwnhfiio6n9gYvdd
egmoalTIst94+X0nW1VHA4HT3SRM7cuwCA/kDxtG6qitbsQMUKUoa+DOpMNfE8mD
QdzmzZL115O+7ORj8Ki/JNS2CSyI83IRSQ3kcM1J5026mWIBMiM3h9Vlu5NwAyFA
bapZSaYr7S5u9BU/vICGnpyYnSsLfjuB3CnAuJFyM0YVFjR6n7moUpnP1LNifGHX
E/Qr1HDyIwwxE8K0f/n86a7BfstoMjzE74an6wOVXKDUY/RnH+FdWG/HDBPd8iG4
Avei1bK2zLLcXK4Kqmx8EkXTK7VSFx6StCPjAVlpgYOAMpRmQEmNpd/3lF7Y70gp
yXIBTSTKaPZ+/5SaeOPL2sgW37Uo9fFMphww2mLXGIdgO3L0BHD5hIq9pZQ7g0VK
noDN7f6ViCuNYuZIrTAtLo9Oc+KKgqOXa0TovUhORkJ8Gk93moL4fgYyFVPvsYnD
rQuTRJ3pZEEHlCmyZzBl
=l/fT
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* The EFI variable anti-bricking algorithm merged in -rc8 broke booting
on some Apple machines because they implement EFI spec 1.10, which
doesn't provide a QueryVariableInfo() runtime function and the logic
used to check for the existence of that function was insufficient.
Fix from Josh Boyer.
* The anti-bricking algorithm also introduced a compiler warning on
32-bit. Fix from Borislav Petkov.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
arch/x86/kernel/setup.c includes <asm/dmi.h> but it doesn't look
like it needs it, <linux/dmi.h> is sufficient.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/1366881845.4186.65.camel@chaos.site
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
We need to check the runtime sys_table for the EFI version the firmware
specifies instead of just checking for a NULL QueryVariableInfo. Older
implementations of EFI don't have QueryVariableInfo but the runtime is
a smaller structure, so the pointer to it may be pointing off into garbage.
This is apparently the case with several Apple firmwares that support EFI
1.10, and the current check causes them to no longer boot. Fix based on
a suggestion from Matthew Garrett.
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Fix this:
arch/x86/boot/compressed/eboot.c: In function ‘setup_efi_vars’:
arch/x86/boot/compressed/eboot.c:269:2: warning: passing argument 1 of ‘efi_call_phys’ makes pointer from integer without a cast [enabled by default]
In file included from arch/x86/boot/compressed/eboot.c:12:0:
/w/kernel/linux/arch/x86/include/asm/efi.h:8:33: note: expected ‘void *’ but argument is of type ‘long unsigned int’
after cc5a080c5d ("efi: Pass boot services variable info to runtime
code").
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Borislav Petkov reported a lockdep splat warning about kzalloc()
done in an IPI (hardirq) handler.
This is a real bug, do not call kzalloc() in a smp_call_function_single()
handler because it can schedule and crash.
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <eranian@google.com>
Cc: <a.p.zijlstra@chello.nl>
Cc: <acme@ghostprotocols.net>
Cc: <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20130421180627.GA21049@jshin-Toonie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Support for NB counters, MSRs 0xc0010240 ~ 0xc0010247, got
moved to perf_event_amd_uncore.c in the following commit:
c43ca5091a perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
AMD Family 10h NB events (events 0xe0 ~ 0xff, on MSRs 0xc001000 ~
0xc001007) will still continue to be handled by perf_event_amd.c
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/1366046483-1765-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
check_hw_exists() has a number of checks which go to two exit
paths: msr_fail and bios_fail. Checks classified as msr_fail
will cause check_hw_exists() to return false, causing the PMU
not to be used; bios_fail checks will only cause a warning to be
printed, but will return true.
The problem is that if there are both msr failures and bios
failures, and the routine hits a bios_fail check first, it will
exit early and return true, not finishing the rest of the msr
checks. If those msrs are in fact broken, it will cause them to
be used erroneously.
In the case of a Xen PV VM, the guest OS has read access to all
the MSRs, but write access is white-listed to supported
features. Writes to unsupported MSRs have no effect. The PMU
MSRs are not (typically) supported, because they are expensive
to save and restore on a VM context switch. One of the
"msr_fail" checks is supposed to detect this circumstance (ether
for Xen or KVM) and disable the harware PMU.
However, on one of my AMD boxen, there is (apparently) a broken
BIOS which triggers one of the bios_fail checks. In particular,
MSR_K7_EVNTSEL0 has the ARCH_PERFMON_EVENTSEL_ENABLE bit set.
The guest kernel detects this because it has read access to all
MSRs, and causes it to skip the rest of the checks and try to
use the non-existent hardware PMU. This minimally causes a lot
of useless instruction emulation and Xen console spam; it may
cause other issues with the watchdog as well.
This changset causes check_hw_exists() to go through all of the
msr checks, failing and returning false if any of them fail.
This makes sure that a guest running under Xen without a virtual
PMU will detect that there is no functioning PMU and not attempt
to use it.
This problem affects kernels as far back as 3.2, and should thus
be considered for backport.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Link: http://lkml.kernel.org/r/1365000388-32448-1-git-send-email-george.dunlap@eu.citrix.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for AMD Family 15h [and above] northbridge
performance counters. MSRs 0xc0010240 ~ 0xc0010247 are shared
across all cores that share a common northbridge.
Add support for AMD Family 16h L2 performance counters. MSRs
0xc0010230 ~ 0xc0010237 are shared across all cores that share a
common L2 cache.
We do not enable counter overflow interrupts. Sampling mode and
per-thread events are not supported.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130419213428.GA8229@jshin-Toonie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The existing code assumes all Cbox and PCU events are using
filter, but actually the filter is event specific. Furthermore
the filter is sub-divided into multiple fields which are used
by different events.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1366113067-3262-3-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Stephane Eranian <eranian@google.com>