While writing a new tracer, I had a bug where I caused the ring-buffer
to recurse in a bad way. The bug was with the tracer I was writing
and not the ring-buffer itself. But it took a long time to find the
problem.
This patch adds paranoid checks into the ring-buffer infrastructure
that will catch bugs of this nature.
Note: I put the bug back in the tracer and this patch showed the error
nicely and prevented the lockup.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: ia64+tracing build fix
When a function is kprobed, the return address is set to the
kprobe_trampoline, or something similar. This caused the output
of the trace to look confusing when the parent seemed to be this
"kprobe_trampoline" function.
To fix this, Abhishek Sagar added a test of the instruction pointer
of the parent to see if it matched the kprobe_trampoline. If it
did, the output would print a "[unknown/kretprobe'd]" instead.
Unfortunately, not all archs do this the same way, and the trampoline
function may not be exported, which causes failures in builds.
This patch will compare the name instead of the pointer to see
if it matches. This prevents us from depending on a function from
being exported, and should work on all archs. The worst that can
happen is that an arch might use a different name and then we
go back to the confusing output. At least the arch will still build.
Reported-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Acked-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Impact: build fix on !stacktrace architectures
only select STACKTRACE on architectures that have STACKTRACE_SUPPORT
... since we also need to ifdef out the guts of ftrace_trace_stack().
We also want to disallow setting TRACE_ITER_STACKTRACE in trace_flags
on such configs, but that can wait.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Insufficient dependency - we really want CONFIG_RTC_CLASS=y there.
That will give us CONFIG_RTC_LIB=y, so the old dependency can be
simply replaced.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This one apparently doesn't generate any warnings, because the function
is only used during system bootup, when the warnings are disabled. But
it's still very wrong.
The __reserve_region_with_split() function is called with the
resource_lock held for writing, so it must only ever do GFP_ATOMIC
allocations.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: remove sched-design.txt from 00-INDEX
sched: change sched_debug's mode to 0444
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ftrace: handle archs that do not support irqs_disabled_flags
Impact: build fix on non-lockdep architectures
Some architectures do not support a way to read the irq flags that
is set from "local_irq_save(flags)" to determine if interrupts were
disabled or enabled. Ftrace uses this information to display to the user
if the trace occurred with interrupts enabled or disabled.
Besides the fact that those archs that do not support this will fail to
compile, unless they fix it, we do not want to have the trace simply
say interrupts were not disabled or they were enabled, without knowing
the real answer.
This patch adds a 'X' in the output to let the user know that the
architecture they are running on does not support a way for the tracer
to determine if interrupts were enabled or disabled. It also lets those
same archs compile with tracing enabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ftrace: fix trace_nop config select
ftrace: perform an initialization for ftrace to enable it
Currently "kill <sig> -1" kills processes in all namespaces and breaks the
isolation of namespaces. Earlier attempt to fix this was discussed at:
http://lkml.org/lkml/2008/7/23/148
As suggested by Oleg Nesterov in that thread, use "task_pid_vnr() > 1"
check since task_pid_vnr() returns 0 if process is outside the caller's
namespace.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Daniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
profile_init() calls in to alloc_bootmem() on early initialization. While
alloc_bootmem() is __init, the reference itself is safe in that it is
tucked below a !slab_is_available() check. So, flag profile_init() as
__ref.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just call unfreeze_cgroup() if goal_state == THAWED, and call
try_to_freeze_cgroup() if goal_state == FROZEN.
No behavior has been changed.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't duplicate the implementation of thaw_process().
[akpm@linux-foundation.org: make __thaw_process() static]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is sufficient to check if @task is frozen, and no need to check if the
original freezer is frozen.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The BUG_ON() should be protected by freezer->lock, otherwise it can be
triggered easily when a task has been unfreezed but the corresponding
cgroup hasn't been changed to FROZEN state.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: change /proc/sched/debug from rw-r--r-- to r--r--r--
/proc/sched_debug is read-only.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix on non-function-tracing architectures
The trace_nop is the tracer that is defined when no tracer is set in
the ftrace infrastructure.
The trace_nop was mistakenly selected by HAVE_FTRACE due to the confusion
between ftrace infrastructure and the ftrace function tracer (which has
been solved by renaming the function tracer).
This patch changes the select to the approriate TRACING.
This patch should fix compile errors on architectures that do not define
the FUNCTION_TRACER.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: avoid false-positive WARN_ON()
Andi Kleen reported:
> When running x86info on a 2.6.27-git8 system I get
>
> resource map sanity check conflict: 0x9e000 0x9efff 0x10000 0x9e7ff System RAM
> ------------[ cut here ]------------
> WARNING: at /home/lsrc/linux/arch/x86/mm/ioremap.c:226 __ioremap_caller+0xf2/0x2d6()
> ...
Some of the pages below the 1MB ISA addresses will be shared typically by both
BIOS and system usable RAM. For example:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
x86info reads the low physical address using /dev/mem, which internally
uses ioremap() for accessing non RAM pages. ioremap() of such low
pages conflicts with multiple resource entities leading to the
above warning.
Change the iomem_map_sanity_check() to allow mapping a page spanning multiple
resource entities (minimum granularity that one can map is a page anyhow).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: corrects a bug which made the non-dyn function tracer not functional
With latest git, the non-dynamic function tracer didn't get any trace.
The problem was the fact that ftrace_enabled wasn't initialized to 1
because ftrace hasn't any init function when DYNAMIC_FTRACE is disabled.
So when a tracer tries to register an ftrace_ops struct,
__register_ftrace_function failed to set the hook.
This patch corrects it by setting an init function to initialize
ftrace during the boot.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
ftrace: fix current_tracer error return
tracing: fix a build error on alpha
ftrace: use a real variable for ftrace_nop in x86
tracing/ftrace: make boot tracer select the sched_switch tracer
tracepoint: check if the probe has been registered
asm-generic: define DIE_OOPS in asm-generic
trace: fix printk warning for u64
ftrace: warning in kernel/trace/ftrace.c
ftrace: fix build failure
ftrace, powerpc, sparc64, x86: remove notrace from arch ftrace file
ftrace: remove ftrace hash
ftrace: remove mcount set
ftrace: remove daemon
ftrace: disable dynamic ftrace for all archs that use daemon
ftrace: add ftrace warn on to disable ftrace
ftrace: only have ftrace_kill atomic
ftrace: use probe_kernel
ftrace: comment arch ftrace code
ftrace: return error on failed modified text.
ftrace: dynamic ftrace process only text section
...
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: fix irqs on/off ip tracing
lockdep: minor fix for debug_show_all_locks()
x86: restore the old swiotlb alloc_coherent behavior
x86: use GFP_DMA for 24bit coherent_dma_mask
swiotlb: remove panic for alloc_coherent failure
xen: compilation fix of drivers/xen/events.c on IA64
xen: portability clean up and some minor clean up for xencomm.c
xen: don't reload cr3 on suspend
kernel/resource: fix reserve_region_with_split() section mismatch
printk: remove unused code from kernel/printk.c
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: fix documentation reference for sched_min_granularity_ns
sched: virtual time buddy preemption
sched: re-instate vruntime based wakeup preemption
sched: weaken sync hint
sched: more accurate min_vruntime accounting
sched: fix a find_busiest_group buglet
sched: add CONFIG_SMP consistency
The commit (in linux-tip) c2931e05ec
( ftrace: return an error when setting a nonexistent tracer )
added useful code that would error when a bad tracer was written into
the current_tracer file.
But this had a bug if the amount written was more than the amount read by
that code. The first iteration would set the tracer correctly, but since
it did not consume the rest of what was written (usually whitespace), the
userspace utility would continue to write what was not consumed. This
second iteration would fail to find a tracer and return -EINVAL. Funny
thing is that the tracer would have already been set.
This patch just consumes all the data that is written to the file.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix lockdep lock-api-caller output when irqsoff tracing is enabled
81d68a96 "ftrace: trace irq disabled critical timings" added wrappers around
trace_hardirqs_on/off_caller. However these functions use
__builtin_return_address(0) to figure out which function actually disabled
or enabled irqs. The result is that we save the ips of trace_hardirqs_on/off
instead of the real caller. Not very helpful.
However since the patch from Steven the ip already gets passed. So use that
and get rid of __builtin_return_address(0) in these two functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we failed to get tasklist_lock eventually (count equals 0),
we should only print " ignoring it.\n", and not print
" locked it.\n" needlessly.
Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix on Alpha
When tracing is enabled, some arch have included <linux/irqflags.h>
on their <asm/system.h> but others like alpha or m68k don't.
Build error on alpha:
kernel/trace/trace.c: In function 'tracing_cpumask_write':
kernel/trace/trace.c:2145: error: implicit declaration of function 'raw_local_irq_disable'
kernel/trace/trace.c:2162: error: implicit declaration of function 'raw_local_irq_enable'
Tested on Alpha through a cross-compiler (should correct a similar issue on m68k).
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
If the boot tracer is selected but not the sched_switch,
there will be a build failure:
kernel/built-in.o: In function `boot_trace_init':
trace_boot.c:(.text+0x5ee38): undefined reference to `sched_switch_trace'
kernel/built-in.o: In function `disable_boot_trace':
(.text+0x5eee1): undefined reference to `tracing_stop_cmdline_record'
kernel/built-in.o: In function `enable_boot_trace':
(.text+0x5ef11): undefined reference to `tracing_start_cmdline_record'
This patch fixes it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix kernel crash that can trigger during tracing
If we try to remove a probe that has not been already registered,
the tracepoint_entry_remove_probe() function will dereference a NULL
pointer.
Check the probe before removing it to avoid crashes.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A powerpc ppc64_defconfig build produces these warnings:
kernel/trace/ring_buffer.c: In function 'rb_add_time_stamp':
kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 2 has type 'u64'
kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
kernel/trace/ring_buffer.c:969: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'u64'
Just cast the u64s to unsigned long long like we do everywhere else.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
/scratch/sfr/next/kernel/cgroup.c: In function 'cgroup_tasks_start':
/scratch/sfr/next/kernel/cgroup.c:2107: warning: unused variable 'i'
Introduced in commit cc31edceee "cgroups:
convert tasks file to use a seq_file with shared pid array".
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit a802dd0eb5 by moving
the call to init_workqueues() back where it belongs - after SMP has been
initialized.
It also moves stop_machine_init() - which needs workqueues - to a later
phase using a core_initcall() instead of early_initcall(). That should
satisfy all ordering requirements, and was apparently the reason why
init_workqueues() was moved to be too early.
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
this warning:
kernel/trace/ftrace.c:189: warning: ‘frozen_record_count’ defined but not used
triggers because frozen_record_count is only used in the KCONFIG_MARKERS
case. Move the variable it there.
Alas, this frozen-record facility seems to have little use. The
frozen_record_count variable is not used by anything, nor the flags.
So this section might need a bit of dead-code-removal care as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since we moved wakeup preemption back to virtual time, it makes sense to move
the buddy stuff back as well. The purpose of the buddy scheduling is to allow
a quickly scheduling pair of tasks to run away from the group as far as a
regular busy task would be allowed under wakeup preemption.
This has the advantage that the pair can ping-pong for a while, enjoying
cache-hotness. Without buddy scheduling other tasks would interleave destroying
the cache.
Also, it saves a word in cfs_rq.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The advantage is that vruntime based wakeup preemption has a better
conceptual model. Here wakeup_gran = 0 means: preempt when 'fair'.
Therefore wakeup_gran is the granularity of unfairness we allow in order
to make progress.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mysql+oltp and pgsql+oltp peaks are still shifted right. The below puts
the peaks back to 1 client/server pair per core.
Use the avg_overlap information to weaken the sync hint.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike noticed the current min_vruntime tracking can go wrong and skip the
current task. If the only remaining task in the tree is a nice 19 task
with huge vruntime, new tasks will be inserted too far to the right too,
causing some interactibity issues.
min_vruntime can only change due to the leftmost entry disappearing
(dequeue_entity()), or by the leftmost entry being incremented past the
next entry, which elects a new leftmost (__update_curr())
Due to the current entry not being part of the actual tree, we have to
compare the leftmost tree entry with the current entry, and take the
leftmost of these two.
So create a update_min_vruntime() function that takes computes the
leftmost vruntime in the system (either tree of current) and increases
the cfs_rq->min_vruntime if the computed value is larger than the
previously found min_vruntime. And call this from the two sites we've
identified that can change min_vruntime.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In one of the group load balancer patches:
commit 408ed066b1
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jun 27 13:41:28 2008 +0200
Subject: sched: hierarchical load vs find_busiest_group
The following change:
- if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >=
+ if (max_load - this_load + 2*busiest_load_per_task >=
busiest_load_per_task * imbn) {
made the condition always true, because imbn is [1,2].
Therefore, remove the 2*, and give the it a fair chance.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, small kernel text size reduction, no functionality changed
reserve_region_with_split() calls in to __reserve_region_with_split(),
which is an __init function. The only caller of reserve_region_with_split()
is an __init function, so make it __init too.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move free_module_param_attrs() into the CONFIG_MODULES section, since
it's only used inside there. Thus avoiding the warning
kernel/params.c:514: warning: 'free_module_param_attrs' defined but not used
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
hrtimers: add missing docbook comments to struct hrtimer
hrtimers: simplify hrtimer_peek_ahead_timers()
hrtimers: fix docbook comments
DECLARE_PER_CPU needs linux/percpu.h
hrtimers: fix typo
rangetimers: fix the bug reported by Ingo for real
rangetimer: fix BUG_ON reported by Ingo
rangetimer: fix x86 build failure for the !HRTIMERS case
select: fix alpha OSF wrapper
select: fix alpha OSF wrapper
hrtimer: peek at the timer queue just before going idle
hrtimer: make the futex() system call use the per process slack value
hrtimer: make the nanosleep() syscall use the per process slack
hrtimer: fix signed/unsigned bug in slack estimator
hrtimer: show the timer ranges in /proc/timer_list
hrtimer: incorporate feedback from Peter Zijlstra
hrtimer: add a hrtimer_start_range() function
hrtimer: another build fix
hrtimer: fix build bug found by Ingo
hrtimer: make select() and poll() use the hrtimer range feature
...
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
stop_machine: fix error code handling on multiple cpus
stop_machine: use workqueues instead of kernel threads
workqueue: introduce create_rt_workqueue
Call init_workqueues before pre smp initcalls.
Make panic= and panic_on_oops into core_params
Make initcall_debug a core_param
core_param() for genuinely core kernel parameters
param: Fix duplicate module prefixes
module: check kernel param length at compile time, not runtime
Remove stop_machine during module load v2
module: simplify load_module.