Impact: fix compile warning and clean up
When I first wrote __tracing_open, instead of passing the error
code via the ERR_PTR macros, I lazily used a separate parameter
to hold the return for errors.
When Frederic Weisbecker updated that function, he used the Linux
kernel ERR_PTR for the returns. This caused the parameter return
to possibly not be initialized on error. gcc correctly pointed this
out with a warning.
This patch converts the entire function to use the Linux kernel
ERR_PTR macro methods.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix to possible race conditions
There's some uses of current_tracer that is not protected by the
trace_types_lock. There is a small chance that a sysadmin changes
the tracer while the current_tracer is being referenced.
If the race is hit, it is unlikely to cause any harm since the
tracers are constant and are not freed. But some strang side
effects may occur.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds the tracer dependent options dynamically to the
options directory when the tracer is activated. These options are
removed when the tracer is deactivated.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch creates an options directory in the debugfs, that contains
the available tracing options. These files contain 1 or 0, where 1
is the option is enabled and 0 it is disabled.
Simply echoing in 1 will enable the option and 0 will disable it.
This patch only contains the core options, not the tracer options.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Allow CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures to still specify
that their sched_clock() implementation is reliable.
This will be used by x86 to switch on a faster sched_clock_cpu()
implementation on certain CPU types.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: implement new tracing timestamp APIs
Add three trace clock variants, with differing scalability/precision
tradeoffs:
- local: CPU-local trace clock
- medium: scalable global clock with some jitter
- global: globally monotonic, serialized clock
Make the ring-buffer use the local trace clock internally.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
make sure we dont execute more complex sched_clock() code in NMI context.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add new tracepoints
Add them to the generic IRQ code, that way every architecture
gets these new tracepoints, not just x86.
Using Steve's new 'TRACE_FORMAT', I can get function graph
trace as follows using the original two IRQ tracepoints:
3) | handle_IRQ_event() {
3) | /* (irq_handler_entry) irq=28 handler=eth0 */
3) | e1000_intr_msi() {
3) 2.460 us | __napi_schedule();
3) 9.416 us | }
3) | /* (irq_handler_exit) irq=28 handler=eth0 return=handled */
3) + 22.935 us | }
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix incorrect condition check
No need to start rt bandwidth timer when rt bandwidth is disabled.
If this timer starts, it may stop at sched_rt_period_timer() on the first time.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: restructure the VFS layout of per CPU trace buffers
The per cpu trace files are all in a single directory:
/debug/tracing/per_cpu. In case of a large number of cpu, the
content of this directory becomes messy so we create now one
directory per cpu inside /debug/tracing/per_cpu which contain
each their own trace_pipe and trace files.
Ie:
/debug/tracing$ ls -R per_cpu
per_cpu:
cpu0 cpu1
per_cpu/cpu0:
trace trace_pipe
per_cpu/cpu1:
trace trace_pipe
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes a bug located by Vegard Nossum with the aid of
kmemcheck, updated based on review comments from Nick Piggin,
Ingo Molnar, and Andrew Morton. And cleans up the variable-name
and function-name language. ;-)
The boot CPU runs in the context of its idle thread during boot-up.
During this time, idle_cpu(0) will always return nonzero, which will
fool Classic and Hierarchical RCU into deciding that a large chunk of
the boot-up sequence is a big long quiescent state. This in turn causes
RCU to prematurely end grace periods during this time.
This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
function to ignore the idle task as a quiescent state until the
system has started up the scheduler in rest_init(), introducing a
new non-API function rcu_idle_now_means_idle() to inform RCU of this
transition. RCU maintains an internal rcu_idle_cpu_truthful variable
to track this state, which is then used by rcu_check_callback() to
determine if it should believe idle_cpu().
Because this patch has the effect of disallowing RCU grace periods
during long stretches of the boot-up sequence, this patch also introduces
Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
no-op if num_online_cpus() returns 1. This allows boot-time code that
calls synchronize_rcu() to proceed normally. Note, however, that RCU
callbacks registered by call_rcu() will likely queue up until later in
the boot sequence. Although rcuclassic and rcutree can also use this
same optimization after boot completes, rcupreempt must restrict its
use of this optimization to the portion of the boot sequence before the
scheduler starts up, given that an rcupreempt RCU read-side critical
section may be preeempted.
In addition, this patch takes Nick Piggin's suggestion to make the
system_state global variable be __read_mostly.
Changes since v4:
o Changes the name of the introduced function and variable to
be less emotional. ;-)
Changes since v3:
o WARN_ON(nr_context_switches() > 0) to verify that RCU
switches out of boot-time mode before the first context
switch, as suggested by Nick Piggin.
Changes since v2:
o Created rcu_blocking_is_gp() internal-to-RCU API that
determines whether a call to synchronize_rcu() is itself
a grace period.
o The definition of rcu_blocking_is_gp() for rcuclassic and
rcutree checks to see if but a single CPU is online.
o The definition of rcu_blocking_is_gp() for rcupreempt
checks to see both if but a single CPU is online and if
the system is still in early boot.
This allows rcupreempt to again work correctly if running
on a single CPU after booting is complete.
o Added check to rcupreempt's synchronize_sched() for there
being but one online CPU.
Tested all three variants both SMP and !SMP, booted fine, passed a short
rcutorture test on both x86 and Power.
Located-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should
be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it
TRACE_FORMAT.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Now that several per-cpu files can be read or spliced at the
same, we want the read/splice callbacks for tracing files to be
reentrants.
Until now, a single global mutex (trace_types_lock) serialized
the access to tracing_read_pipe(), tracing_splice_read_pipe(),
and the seq helpers.
Ie: it means that if a user tries to read trace_pipe0 and
trace_pipe1 at the same time, the access to the function
tracing_read_pipe() is contended and one reader must wait for
the other to finish its read call.
The trace_type_lock mutex is mostly here to serialize the access
to the global current tracer (current_trace), which can be
changed concurrently. Although the iter struct keeps a private
pointer to this tracer, its callbacks can be changed by another
function.
The method used here is to not keep anymore private reference to
the tracer inside the iterator but to make a copy of it inside
the iterator. Then it checks on subsequents read calls if the
tracer has changed. This is not costly because the current
tracer is not expected to be changed often, so we use a branch
prediction for that.
Moreover, we add a private mutex to the iterator (there is one
iterator per file descriptor) to serialize the accesses in case
of multiple consumers per file descriptor (which would be a
silly idea from the user). Note that this is not to protect the
ring buffer, since the ring buffer already serializes the
readers accesses. This is to prevent from traces weirdness in
case of concurrent consumers. But these mutexes can be dropped
anyway, that would not result in any crash. Just tell me what
you think about it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: split up tracing output per cpu
Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:
- trace is an iterator through the ring-buffer. It's a reader
but not a consumer It doesn't block when no more traces are
available.
- trace pretty similar to the former, except that it adds more
informations such as prempt count, irq flag, ...
- trace_pipe is a reader and a consumer, it will also block
waiting for traces if necessary (heh, yes it's a pipe).
The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.
The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.
So this patch creates a new directory: /debug/tracing/per_cpu/.
Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.
Which means if you have two cpus, you will have:
trace0
trace1
trace_pipe0
trace_pipe1
And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.
The original all-in-one cpu trace file are still available on
their original place.
Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.
Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.
Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: remove old debug/tracing API
/debug/tracing/latency_trace is an old legacy format we kept from
the old latency tracer. Remove the file for now. If there's any
useful bit missing then we'll propagate any useful output bits into
the /debug/tracing/trace output.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix CPU hotplug lockup
bts_hotcpu_handler() is called with irqs disabled, so using mutex_lock()
is a no-no.
All the BTS codepaths here are atomic (they do not schedule), so using
a spinlock is the right solution.
Cc: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the directory /debug/tracing/events/ that will contain
all the registered trace points.
# ls /debug/tracing/events/
sched_kthread_stop sched_process_fork sched_switch
sched_kthread_stop_ret sched_process_free sched_wait_task
sched_migrate_task sched_process_wait sched_wakeup
sched_process_exit sched_signal_send sched_wakeup_new
# ls /debug/tracing/events/sched_switch/
enable
# cat /debug/tracing/events/sched_switch/enable
1
# cat /debug/tracing/set_event
sched_switch
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch changes the trace/sched.h to use the DECLARE_TRACE_FMT
such that they are automatically registered with the event tracer.
And it also adds the tracing sched headers to kernel/trace/events.c
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch creates the event tracing infrastructure of ftrace.
It will create the files:
/debug/tracing/available_events
/debug/tracing/set_event
The available_events will list the trace points that have been
registered with the event tracer.
set_events will allow the user to enable or disable an event hook.
example:
# echo sched_wakeup > /debug/tracing/set_event
Will enable the sched_wakeup event (if it is registered).
# echo "!sched_wakeup" >> /debug/tracing/set_event
Will disable the sched_wakeup event (and only that event).
# echo > /debug/tracing/set_event
Will disable all events (notice the '>')
# cat /debug/tracing/available_events > /debug/tracing/set_event
Will enable all registered event hooks.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Fix an invalid memory reference problem when cpu hotplug support is
disabled and the hw-branch-tracer is set as current tracer.
Initializing the tracer calls bts_trace_init() which has already
been freed at this time.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.
This is based on an earlier patch from Linus.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hibernate:
PM: Fix suspend_console and resume_console to use only one semaphore
PM: Wait for console in resume
PM: Fix pm_notifiers during user mode hibernation
swsusp: clean up shrink_all_zones()
swsusp: dont fiddle with swappiness
PM: fix build for CONFIG_PM unset
PM/hibernate: fix "swap breaks after hibernation failures"
PM/resume: wait for device probing to finish
Consolidate driver_probe_done() loops into one place
This fixes a race where a thread acquires the console while the
console is suspended, and the console is resumed before this
thread releases it. In this case, the secondary console
semaphore would be left locked, and the primary semaphore would
be released twice. This in turn would cause the console switch
on suspend or resume to hang forever.
Note that suspend_console does not actually lock the console
for clients that use acquire_console_sem, it only locks it for
clients that use try_acquire_console_sem. If we change
suspend_console to fully lock the console, then the kernel
may deadlock on suspend. One client of try_acquire_console_sem
is acquire_console_semaphore_for_printk, which uses it to
prevent printk from using the console while it is suspended.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoids later waking up to a blinking cursor if the device woke up and
returned to sleep before the console switch happened.
Signed-off-by: Brian Swetland <swetland@google.com>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Snapshot device is opened with O_RDONLY during suspend and O_WRONLY durig
resume. Make sure we also call notifiers with correct parameter telling
them what we are really doing.
Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken
config dependncies. Fix that.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
the resume code does not currently wait for device probing to finish.
Even without async function calls this is dicey and not correct,
but with async function calls during the boot sequence this is going
to get hit more...
This patch adds the synchronization using the newly introduced helper.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: added precaution on failure detection
Break out of the modifying loop as soon as a failure is detected.
This is just an added precaution found by code review and was not
found by any bug chasing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch creates the weak functions: ftrace_arch_code_modify_prepare
and ftrace_arch_code_modify_post_process that are called before and
after the stop machine is called to modify the kernel text.
If the arch needs to do pre or post processing, it only needs to define
these functions.
[ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_*
over using ftrace_arch_modify_* ]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: trace only functions matching a pattern
The set_graph_function file let one to trace only one or several
chosen functions and follow all their code flow.
Currently, only a constant function name is allowed so this patch
allows the ftrace_regex functions:
- matches all functions that end with "name":
echo *name > set_graph_function
- matches all functions that begin with "name":
echo name* > set_graph_function
- matches all functions that contains "name":
echo *name* > set_graph_function
Example:
echo mutex* > set_graph_function
0) | mutex_lock_nested() {
0) 0.563 us | __might_sleep();
0) 2.072 us | }
0) | mutex_unlock() {
0) 1.036 us | __mutex_unlock_slowpath();
0) 2.433 us | }
0) | mutex_unlock() {
0) 0.691 us | __mutex_unlock_slowpath();
0) 1.787 us | }
0) | mutex_lock_interruptible_nested() {
0) 0.548 us | __might_sleep();
0) 1.945 us | }
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new scalable dynamic percpu allocator which allows dynamic
percpu areas to be accessed the same way as static ones
Implement scalable dynamic percpu allocator which can be used for both
static and dynamic percpu areas. This will allow static and dynamic
areas to share faster direct access methods. This feature is optional
and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by
arch. Please read comment on top of mm/percpu.c for details.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Impact: cleanup
There are two allocated per-cpu accessor macros with almost identical
spelling. The original and far more popular is per_cpu_ptr (44
files), so change over the other 4 files.
tj: kill percpu_ptr() and update UP too
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: mingo@redhat.com
Cc: lenb@kernel.org
Cc: cpufreq@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: limit the number of loops the ring buffer self test can make
tracing: have function trace select kallsyms
tracing: disable tracing while testing ring buffer
tracing/function-graph-tracer: trace the idle tasks
Impact: prevent deadlock if ring buffer gets corrupted
This patch adds a paranoid check to make sure the ring buffer consumer
does not go into an infinite loop. Since the ring buffer has been set
to read only, the consumer should not loop for more than the ring buffer
size. A check is added to make sure the consumer does not loop more than
the ring buffer size.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix output of function tracer to be useful
The function tracer is pretty useless if KALLSYMS is not configured.
Unless you are good at reading hex values, the function tracer should
select the KALLSYMS configuration.
Also, the dynamic function tracer will fail its self test if KALLSYMS
is not selected.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix to prevent hard lockup on self tests
If one of the tracers are broken and is constantly filling the ring
buffer while the test of the ring buffer is running, it will hang
the box. The reason is that the test is a consumer that will not
stop till the ring buffer is empty. But if the tracer is broken and
is constantly producing input to the buffer, this test will never
end. The result is a lockup of the box.
This happened when KALLSYMS was not defined and the dynamic ftrace
test constantly filled the ring buffer, because the filter failed
and all functions were being traced. Something was being called
that constantly filled the buffer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
block: fix booting from partitioned md array
block: revert part of 18ce3751cc
cciss: PCI power management reset for kexec
paride/pg.c: xs(): &&/|| confusion
fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
block: fix bad definition of BIO_RW_SYNC
bsg: Fix sense buffer bug in SG_IO
Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken
config dependncies. Fix that.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In cgroup_kill_sb(), root is freed before sb is detached from the list, so
another sget() may find this sb and call cgroup_test_super(), which will
access the root that has been freed.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is nothing really arch specific of the push and pop functions
used by the function graph tracer. This patch moves them to generic
code.
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
213d9417fe.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Impact: api and pipe waiting change
Currently, the waiting used in tracing_read_pipe() is done through a
100 msecs schedule_timeout() loop which periodically check if there
are traces on the buffer.
This can cause small latencies for programs which are reading the incoming
events.
This patch makes the reader waiting for the trace_wait waitqueue except
for few tracers such as the sched and functions tracers which might be
already hold the runqueue lock while waking up the reader.
This is performed through a new callback wait_pipe() on struct tracer.
If none is implemented on a specific tracer, the default waiting for
trace_wait queue is attached.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
doc: mmiotrace.txt, buffer size control change
trace: mmiotrace to the tracer menu in Kconfig
mmiotrace: count events lost due to not recording
Impact: clean up
The traceon and traceoff function probes are confusing to developers
to what happens when a counter is not specified. This should help
clear things up.
# echo "*:traceoff" > set_ftrace_filter
# cat /debug/tracing/set_ftrace_filter
#### all functions enabled ####
do_fork:traceoff:unlimited
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: cleanup
Fix incorrect hint message in code and typos in comments.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch is to fix the return value of trace_selftest_startup_sysprof
and trace_selftest_startup_branch on failure.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Pass tsk to tracing_record_cmdline instead of current.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
Ingo Molnar did not like the _hook naming convention used by the
select function tracer. Luis Claudio R. Goncalves suggested using
the "_probe" extension. This patch implements the change of
calling the functions and variables "_hook" and replacing them
with "_probe".
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Ingo Molnar pointed out some coding style issues with the recent ftrace
updates. This patch cleans them up.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds a pretty print version of traceon and traceoff
output for set_ftrace_filter.
# echo 'sys_open:traceon:4' > set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
sys_open:traceon:count=4
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds a call back for the tracers that have hooks to
selected functions. This allows the tracer to show better output
in the set_ftrace_filter file.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds output to show what functions have tracer hooks
attached to them.
# echo 'sys_open:traceon:4' > /debug/tracing/set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000004
# echo 'do_fork:traceoff:' > set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000002
do_fork:ftrace_traceoff:ffffffffffffffff
Note the 4 changed to a 2. This is because The code was executed twice
since the traceoff was added. If a cat is done again:
#### all functions enabled ####
sys_open:ftrace_traceon
do_fork:ftrace_traceoff:ffffffffffffffff
The number disappears. That is because it will not print a NULL.
Callbacks to allow the tracer to pretty print will be implemented soon.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds the new function selection commands traceon and
traceoff. traceon sets the function to enable the ring buffers
while traceoff disables the ring buffers. You can pass in the
number of times you want the command to be executed when the function
is hit. It will only execute if the state of the buffers are not
already in that state.
Example:
# echo do_fork:traceon:4
Will enable the ring buffers if they are disabled every time it
hits do_fork, up to 4 times.
# echo sys_close:traceoff
This will disable the ring buffers every time (unlimited) when
sys_close is called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
Now that ftrace_lock is a mutex, there is no reason to have three
different mutexes protecting similar data. All the mutex paths
are not in hot paths, so having a mutex to cover more data is
not a problem.
This patch removes the ftrace_sysctl_lock and ftrace_start_lock
and uses the ftrace_lock to protect the locations that were protected
by these locks. By doing so, this change also removes some of
the lock nesting that was taking place.
There are still more mutexes in ftrace.c that can probably be
consolidated, but they can be dealt with later. We need to be careful
about the way the locks are nested, and by consolidating, we can cause
a recursive deadlock.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
The older versions of ftrace required doing the ftrace list
search under atomic context. Now all the calls are in non-atomic
context. There is no reason to keep the ftrace_lock as a spinlock.
This patch converts it to a mutex.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Allow for other tracers to add their own commands for function
selection. This interface gives a trace the ability to name a
command for function selection. Right now it is pretty limited
in what it offers, but this is a building step for more features.
The :mod: command is converted to this interface and also serves
as a template for other implementations.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix to prevent empty set_ftrace_filter and no ftrace output
The function filter is used to only trace a given set of functions.
The filter is enabled when a function name is echoed into the
set_ftrace_filter file. But if the name has a typo and the function
is not found, the filter is enabled, but no function is listed.
This makes a confusing situation where set_ftrace_filter is empty
but no functions ever get enabled for tracing.
For example:
# cat /debug/tracing/set_ftrace_filter
#### all functions enabled ####
# echo bad_name > set_ftrace_filter
# cat /debug/tracing/set_ftrace_filter
# echo function > current_tracer
# cat trace
# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
This patch changes that to only enable filtering if a function
is set to be filtered on. Now, the filter is not enabled if
a bad name is echoed into set_ftrace_filter.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds a "command" syntax to the function filtering files:
/debugfs/tracing/set_ftrace_filter
/debugfs/tracing/set_ftrace_notrace
Of the format: <function>:<command>:<parameter>
The command is optional, and dependent on the command, so are
the parameters.
echo do_fork > set_ftrace_filter
Will only trace 'do_fork'.
echo 'sched_*' > set_ftrace_filter
Will only trace functions starting with the letters 'sched_'.
echo '*:mod:ext3' > set_ftrace_filter
Will trace only the ext3 module functions.
echo '*write*:mod:ext3' > set_ftrace_notrace
Will prevent the ext3 functions with the letters 'write' in
the name from being traced.
echo '!*_allocate:mod:ext3' > set_ftrace_filter
Will remove the functions in ext3 that end with the letters
'_allocate' from the ftrace filter.
Although this patch implements the 'command' format, only the
'mod' command is supported. More commands to follow.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
ftrace_match_records does a lot of things that other features
can use. This patch breaks up ftrace_match_records and pulls
out ftrace_setup_glob and ftrace_match_record.
ftrace_setup_glob prepares a simple glob expression for use with
ftrace_match_record. ftrace_match_record compares a single record
with a glob type.
Breaking this up will allow for more features to run on individual
records.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
ftrace_match is too generic of a name. What it really does is
search all records and matches the records with the given string,
and either sets or unsets the functions to be traced depending
on if the parameter 'enable' is set or not.
This allows us to make another function called ftrace_match that
can be used to test a single record.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
To iterate over all the functions that dynamic trace knows about
it requires two for loops. One to iterate over the pages and the
other to iterate over the records within the page.
There are several duplications of these loops in ftrace.c. This
patch creates the macros do_for_each_ftrace_rec and
while_for_each_ftrace_rec to handle this logic, and removes the
duplicate code.
While making this change, I also discovered and fixed a small
bug that one of the iterations should exit the loop after it found the
record it was searching for. This used a break when it should have
used a goto, since there were two loops it needed to break out
from. No real harm was done by this bug since it would only continue
to search the other records, and the code was in a slow path anyway.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up, make set_ftrace_filter less confusing
The set_ftrace_filter shows only the functions that will be traced.
But when it is empty, it will trace all functions. This can be a bit
confusing.
This patch makes set_ftrace_filter show:
#### all functions enabled ####
When all functions will be traced, and we do not filter only a select
few.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
The function bts_trace_init() references a variable
bts_hotcpu_notifier which is marked
as __cpuinitdata. Thus causes section mismatch. This patch fixes it.
LD kernel/trace/built-in.o
WARNING: kernel/trace/built-in.o(.text+0xc90c): Section mismatch in
reference from the function bts_trace_init() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_init() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_init lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.
WARNING: kernel/trace/built-in.o(.text+0xc92a): Section mismatch in
reference from the function bts_trace_reset() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_reset() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_reset lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: markus.t.metzger@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cosmetic change in Kconfig menu layout
This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.
CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.
Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enhances lost events counting in mmiotrace
The tracing framework, or the ring buffer facility it uses, has a switch
to stop recording data. When recording is off, the trace events will be
lost. The framework does not count these, so mmiotrace has to count them
itself.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The __GFP_FS annotations fail to build with CONFIG_LOCKDEP=y,
CONFIG_PROVE_LOCKING=n, ammend that.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arnd pointed out we have the stringify macro magic already in-kernel.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
re-add some of the comments that got lost in the refactoring.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that we have nice numerical relations for the states, remove the macro
magics.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now what its only two functions, they again look rather similar.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Generate the state bit definitions from the lockdep_states.h file.
Also, move LOCK_USED to last, so that the
USED_IN
USED_IN_READ
ENABLED
ENABLED_READ
states are nicely bit aligned -- we're going to use that property
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce a header file to generate all the states from.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
s/HELD_OVER/ENABLED/g
so that its similar to the hard and soft-irq names.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
s/\(LOCKF\?_ENABLED_[^ ]*\)S\(_READ\)\?\>/\1\2/g
So that the USED_IN and ENABLED have the same names.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Here is another version, with the incremental patch rolled up, and
added reclaim context annotation to kswapd, and allocation tracing
to slab allocators (which may only ever reach the page allocator
in rare cases, so it is good to put annotations here too).
Haven't tested this version as such, but it should be getting closer
to merge worthy ;)
--
After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
allocation when it should not have been, I thought it might be a good idea to
try to catch this kind of thing with lockdep.
I coded up a little idea that seems to work. Unfortunately the system has to
actually be in __GFP_FS page reclaim, then take the lock, before it will mark
it. But at least that might still be some orders of magnitude more common
(and more debuggable) than an actual deadlock condition, so we have some
improvement I hope (the concept is no less complete than discovery of a lock's
interrupt contexts).
I guess we could even do the same thing with __GFP_IO (normal reclaim), and
even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
code paths, so let's start there and see how it goes.
It *seems* to work. I did a quick test.
=================================
[ INFO: inconsistent lock state ]
2.6.28-rc6-00007-ged31348-dirty #26
---------------------------------
inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
(testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
{in-reclaim-W} state was registered at:
[<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
[<ffffffff80268f71>] lock_acquire+0x91/0xc0
[<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
[<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
[<ffffffff8020903b>] _stext+0x3b/0x170
[<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
[<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 3929
hardirqs last enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
softirqs last enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
other info that might help us debug this:
1 lock held by modprobe/8526:
#0: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
stack backtrace:
Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26
Call Trace:
[<ffffffff80265483>] print_usage_bug+0x193/0x1d0
[<ffffffff80266530>] mark_lock+0xaf0/0xca0
[<ffffffff80266735>] mark_held_locks+0x55/0xc0
[<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
[<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
[<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
[<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
[<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
[<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
[<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
[<ffffffff8020903b>] _stext+0x3b/0x170
[<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
[<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
[<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
[<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
[<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This modifies the timer code in a way to allow lockdep to detect
deadlocks resulting from a lock being taken in the timer function
as well as around the del_timer_sync() call.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
uids in namespaces other than init don't get a sysfs entry.
For those in the init namespace, while we're waiting to remove
the sysfs entry for the uid the uid is still hashed, and
alloc_uid() may re-grab that uid without getting a new
reference to the user_ns, which we've already put in free_user
before scheduling remove_user_sysfs_dir().
Reported-and-tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert the c/p state "power" tracer to use tracepoints. Avoids a
function call when the tracer is disabled.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
While reviewing the manpages, I noticed I'd missed some clock vs timer sites.
Make sure that all timer functions call cpu_timer_sample_group() and not
cpu_clock_sample_group(). This ensures that we enable the process wide timer
in time, and therefore pay the O(n) thread group cost from the syscall.
Not doing it here, will result in the first jiffy tick after setting the timer
doing this, resulting in a very expensive tick (but only once) and a delay in
actually starting the timer.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
While reviewing the ring buffer code, I thougth I saw a bug with
if (!__raw_spin_trylock(&cpu_buffer->lock))
goto out_unlock;
But I forgot that we use a variable "lock_taken" that is set if
the spinlock is taken, and only unlock it if that variable is set.
To avoid further confusion from other reviewers, this patch
renames the label out_unlock with out_reset, which is the more
appropriate name.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
I enabled all cgroup subsystems when compiling kernel, and then:
# mount -t cgroup -o net_cls xxx /mnt
# mkdir /mnt/0
This showed up immediately:
BUG: MAX_LOCKDEP_SUBCLASSES too low!
turning off the locking correctness validator.
It's caused by the cgroup hierarchy lock:
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->root == root)
mutex_lock_nested(&ss->hierarchy_mutex, i);
}
Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
MAX_LOCKDEP_SUBCLASSES is 8.
This patch uses different lockdep keys for different subsystems.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to pass an unsigned long as the minimum, because it gets casted
to an unsigned long in the sysctl handler. If we pass an int, we'll
access four more bytes on 64bit arches, resulting in a random minimum
value.
[rientjes@google.com: fix type of `old_bytes']
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Catalin noticed that (38d47c1b70: futex: rely on get_user_pages() for
shared futexes) caused an mm_struct leak.
Some tracing with the function graph tracer quickly pointed out that
futex_wait() has exit paths with unbalanced reference counts.
This regression was discovered by kmemleak.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: fix TIMER_ABSTIME for process wide cpu timers
timers: split process wide cpu clocks/timers, fix
x86: clean up hpet timer reinit
timers: split process wide cpu clocks/timers, remove spurious warning
timers: split process wide cpu clocks/timers
signal: re-add dead task accumulation stats.
x86: fix hpet timer reinit for x86_64
sched: fix nohz load balancer on cpu offline
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ptrace, x86: fix the usage of ptrace_fork()
i8327: fix outb() parameter order
x86: fix math_emu register frame access
x86: math_emu info cleanup
x86: include correct %gs in a.out core dump
x86, vmi: put a missing paravirt_release_pmd in pgd_dtor
x86: find nr_irqs_gsi with mp_ioapic_routing
x86: add clflush before monitor for Intel 7400 series
x86: disable intel_iommu support by default
x86: don't apply __supported_pte_mask to non-present ptes
x86: fix grammar in user-visible BIOS warning
x86/Kconfig.cpu: make Kconfig help readable in the console
x86, 64-bit: print DMI info in the oops trace
Intel reported a 10% regression (mysql+sysbench) on a 16-way machine
with these patches:
1596e29: sched: symmetric sync vs avg_overlap
d942fb6: sched: fix sync wakeups
Revert them.
Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Bisected-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The POSIX timer interface allows for absolute time expiry values through the
TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock
every time we start it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To decrease the chance of a missed enable, always enable the timer when we
sample it, we'll always disable it when we find that there are no active timers
in the jiffy tick.
This fixes a flood of warnings reported by Mike Galbraith.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the missing pair tracing_{start,stop}_record_cmdline() to record well
the cmdline associated with pid.
Changes in v2:
- fix a build error, the sched_switch tracer is needed to record the
cmdline.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I noticed by pure accident we have ptrace_fork() and friends. This was
added by "x86, bts: add fork and exit handling", commit
bf53de907d.
I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly
believe this needs the fix. I think something like this program
int main(void)
{
int pid = fork();
if (!pid) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
kill(getpid(), SIGSTOP);
fork();
} else {
struct ptrace_bts_config bts = {
.flags = PTRACE_BTS_O_ALLOC,
.size = 4 * 4096,
};
wait(NULL);
ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);
ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts));
ptrace(PTRACE_CONT, pid, NULL, NULL);
sleep(1);
}
return 0;
}
should crash the kernel.
If the task is traced by its natural parent ptrace_reparented() returns 0
but we should clear ->btsxxx anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix these sparse warnings:
kernel/trace/ring_buffer.c:70:37: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:84:39: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:96:43: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2500:40: warning: incorrect type in argument 3 (different signedness)
kernel/trace/ring_buffer.c:2505:44: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2507:46: warning: incorrect type in argument 2 (different signedness)
kernel/trace/trace.c:2130:40: warning: incorrect type in argument 3 (different signedness)
kernel/trace/trace.c:2280:40: warning: incorrect type in argument 3 (different signedness)
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make global variables and a global function static
The function '__trace_userstack' does not seem to have a caller, so it
is commented out.
Fix this sparse warnings:
kernel/trace/trace.c:82:5: warning: symbol 'tracing_disabled' was not declared. Should it be static?
kernel/trace/trace.c:600:10: warning: symbol 'trace_record_cmdline_disabled' was not declared. Should it be static?
kernel/trace/trace.c:957:6: warning: symbol '__trace_userstack' was not declared. Should it be static?
kernel/trace/trace.c:1694:5: warning: symbol 'tracing_release' was not declared. Should it be static?
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is to make the function return early on failure, and give
correct return value on success.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>