linux/kernel/trace/trace_functions_graph.c

1441 lines
34 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
// SPDX-License-Identifier: GPL-2.0
/*
*
* Function graph tracer.
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
* Copyright (c) 2008-2009 Frederic Weisbecker <fweisbec@gmail.com>
* Mostly borrowed from function tracer which
* is Copyright (c) Steven Rostedt <srostedt@redhat.com>
*
*/
#include <linux/uaccess.h>
#include <linux/ftrace.h>
#include <linux/interrupt.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 08:04:11 +00:00
#include <linux/slab.h>
#include <linux/fs.h>
#include "trace.h"
#include "trace_output.h"
/* When set, irq functions will be ignored */
static int ftrace_graph_skip_irqs;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
struct fgraph_cpu_data {
pid_t last_pid;
int depth;
int depth_irq;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
int ignore;
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
unsigned long enter_funcs[FTRACE_RETFUNC_DEPTH];
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
};
struct fgraph_data {
struct fgraph_cpu_data __percpu *cpu_data;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
/* Place to preserve last processed entry. */
struct ftrace_graph_ent_entry ent;
struct ftrace_graph_ret_entry ret;
int failed;
int cpu;
};
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
#define TRACE_GRAPH_INDENT 2
unsigned int fgraph_max_depth;
static struct tracer_opt trace_opts[] = {
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
/* Display overruns? (for self-debug purpose) */
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
{ TRACER_OPT(funcgraph-overrun, TRACE_GRAPH_PRINT_OVERRUN) },
/* Display CPU ? */
{ TRACER_OPT(funcgraph-cpu, TRACE_GRAPH_PRINT_CPU) },
/* Display Overhead ? */
{ TRACER_OPT(funcgraph-overhead, TRACE_GRAPH_PRINT_OVERHEAD) },
/* Display proc name/pid */
{ TRACER_OPT(funcgraph-proc, TRACE_GRAPH_PRINT_PROC) },
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
/* Display duration of execution */
{ TRACER_OPT(funcgraph-duration, TRACE_GRAPH_PRINT_DURATION) },
/* Display absolute time of an entry */
{ TRACER_OPT(funcgraph-abstime, TRACE_GRAPH_PRINT_ABS_TIME) },
/* Display interrupts */
{ TRACER_OPT(funcgraph-irqs, TRACE_GRAPH_PRINT_IRQS) },
/* Display function name after trailing } */
{ TRACER_OPT(funcgraph-tail, TRACE_GRAPH_PRINT_TAIL) },
function_graph: Support recording and printing the return value of function Analyzing system call failures with the function_graph tracer can be a time-consuming process, particularly when locating the kernel function that first returns an error in the trace logs. This change aims to simplify the process by recording the function return value to the 'retval' member of 'ftrace_graph_ret' and printing it when outputting the trace log. We have introduced new trace options: funcgraph-retval and funcgraph-retval-hex. The former controls whether to display the return value, while the latter controls the display format. Please note that even if a function's return type is void, a return value will still be printed. You can simply ignore it. This patch only establishes the fundamental infrastructure. Subsequent patches will make this feature available on some commonly used processor architectures. Here is an example: I attempted to attach the demo process to a cpu cgroup, but it failed: echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks -bash: echo: write error: Invalid argument The strace logs indicate that the write system call returned -EINVAL(-22): ... write(1, "273\n", 4) = -1 EINVAL (Invalid argument) ... To capture trace logs during a write system call, use the following commands: cd /sys/kernel/debug/tracing/ echo 0 > tracing_on echo > trace echo *sys_write > set_graph_function echo *spin* > set_graph_notrace echo *rcu* >> set_graph_notrace echo *alloc* >> set_graph_notrace echo preempt* >> set_graph_notrace echo kfree* >> set_graph_notrace echo $$ > set_ftrace_pid echo function_graph > current_tracer echo 1 > options/funcgraph-retval echo 0 > options/funcgraph-retval-hex echo 1 > tracing_on echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks echo 0 > tracing_on cat trace > ~/trace.log To locate the root cause, search for error code -22 directly in the file trace.log and identify the first function that returned -22. Once you have identified this function, examine its code to determine the root cause. For example, in the trace log below, cpu_cgroup_can_attach returned -22 first, so we can focus our analysis on this function to identify the root cause. ... 1) | cgroup_migrate() { 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ 1) | cgroup_migrate_execute() { 1) | cpu_cgroup_can_attach() { 1) | cgroup_taskset_first() { 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */ 1) 4.369 us | } /* cgroup_migrate_execute = -22 */ 1) 7.143 us | } /* cgroup_migrate = -22 */ ... Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn Tested-by: Florian Kauer <florian.kauer@linutronix.de> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-08 12:42:15 +00:00
#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
/* Display function return value ? */
{ TRACER_OPT(funcgraph-retval, TRACE_GRAPH_PRINT_RETVAL) },
/* Display function return value in hexadecimal format ? */
{ TRACER_OPT(funcgraph-retval-hex, TRACE_GRAPH_PRINT_RETVAL_HEX) },
#endif
/* Include sleep time (scheduled out) between entry and return */
{ TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) },
#ifdef CONFIG_FUNCTION_PROFILER
/* Include time within nested functions */
{ TRACER_OPT(graph-time, TRACE_GRAPH_GRAPH_TIME) },
#endif
{ } /* Empty entry */
};
static struct tracer_flags tracer_flags = {
/* Don't display overruns, proc, or tail by default */
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
.val = TRACE_GRAPH_PRINT_CPU | TRACE_GRAPH_PRINT_OVERHEAD |
TRACE_GRAPH_PRINT_DURATION | TRACE_GRAPH_PRINT_IRQS |
TRACE_GRAPH_SLEEP_TIME | TRACE_GRAPH_GRAPH_TIME,
.opts = trace_opts
};
static struct trace_array *graph_array;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
/*
* DURATION column is being also used to display IRQ signs,
* following values are used by print_graph_irq and others
* to fill in space into DURATION column.
*/
enum {
FLAGS_FILL_FULL = 1 << TRACE_GRAPH_PRINT_FILL_SHIFT,
FLAGS_FILL_START = 2 << TRACE_GRAPH_PRINT_FILL_SHIFT,
FLAGS_FILL_END = 3 << TRACE_GRAPH_PRINT_FILL_SHIFT,
};
static void
print_graph_duration(struct trace_array *tr, unsigned long long duration,
struct trace_seq *s, u32 flags);
int __trace_graph_entry(struct trace_array *tr,
struct ftrace_graph_ent *trace,
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
unsigned int trace_ctx)
{
struct trace_event_call *call = &event_funcgraph_entry;
struct ring_buffer_event *event;
struct trace_buffer *buffer = tr->array_buffer.buffer;
struct ftrace_graph_ent_entry *entry;
event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_ENT,
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
sizeof(*entry), trace_ctx);
if (!event)
return 0;
entry = ring_buffer_event_data(event);
entry->graph_ent = *trace;
tracing: Update event filters for multibuffer The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:34:17 +00:00
if (!call_filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit_nostack(buffer, event);
return 1;
}
static inline int ftrace_graph_ignore_irqs(void)
{
if (!ftrace_graph_skip_irqs || trace_recursion_test(TRACE_IRQ_BIT))
return 0;
return in_hardirq();
}
int trace_graph_entry(struct ftrace_graph_ent *trace)
{
struct trace_array *tr = graph_array;
struct trace_array_cpu *data;
unsigned long flags;
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
unsigned int trace_ctx;
long disabled;
int ret;
int cpu;
if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
return 0;
/*
* Do not trace a function if it's filtered by set_graph_notrace.
* Make the index of ret stack negative to indicate that it should
* ignore further functions. But it needs its own ret stack entry
* to recover the original index in order to continue tracing after
* returning from the function.
*/
if (ftrace_graph_notrace_addr(trace->func)) {
trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
/*
* Need to return 1 to have the return called
* that will clear the NOTRACE bit.
*/
return 1;
}
if (!ftrace_trace_task(tr))
return 0;
if (ftrace_graph_ignore_func(trace))
return 0;
if (ftrace_graph_ignore_irqs())
return 0;
/*
* Stop here if tracing_threshold is set. We only write function return
* events to the ring buffer.
*/
if (tracing_thresh)
return 1;
local_irq_save(flags);
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) {
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
trace_ctx = tracing_gen_ctx_flags(flags);
ret = __trace_graph_entry(tr, trace, trace_ctx);
} else {
ret = 0;
}
atomic_dec(&data->disabled);
local_irq_restore(flags);
return ret;
}
static void
__trace_graph_function(struct trace_array *tr,
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
unsigned long ip, unsigned int trace_ctx)
{
u64 time = trace_clock_local();
struct ftrace_graph_ent ent = {
.func = ip,
.depth = 0,
};
struct ftrace_graph_ret ret = {
.func = ip,
.depth = 0,
.calltime = time,
.rettime = time,
};
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
__trace_graph_entry(tr, &ent, trace_ctx);
__trace_graph_return(tr, &ret, trace_ctx);
}
void
trace_graph_function(struct trace_array *tr,
unsigned long ip, unsigned long parent_ip,
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
unsigned int trace_ctx)
{
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
__trace_graph_function(tr, ip, trace_ctx);
}
void __trace_graph_return(struct trace_array *tr,
struct ftrace_graph_ret *trace,
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
unsigned int trace_ctx)
{
struct trace_event_call *call = &event_funcgraph_exit;
struct ring_buffer_event *event;
struct trace_buffer *buffer = tr->array_buffer.buffer;
struct ftrace_graph_ret_entry *entry;
event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_RET,
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
sizeof(*entry), trace_ctx);
if (!event)
return;
entry = ring_buffer_event_data(event);
entry->ret = *trace;
tracing: Update event filters for multibuffer The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:34:17 +00:00
if (!call_filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit_nostack(buffer, event);
}
void trace_graph_return(struct ftrace_graph_ret *trace)
{
struct trace_array *tr = graph_array;
struct trace_array_cpu *data;
unsigned long flags;
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
unsigned int trace_ctx;
long disabled;
int cpu;
tracing/fgraph: Fix set_graph_function from showing interrupts The tracefs file set_graph_function is used to only function graph functions that are listed in that file (or all functions if the file is empty). The way this is implemented is that the function graph tracer looks at every function, and if the current depth is zero and the function matches something in the file then it will trace that function. When other functions are called, the depth will be greater than zero (because the original function will be at depth zero), and all functions will be traced where the depth is greater than zero. The issue is that when a function is first entered, and the handler that checks this logic is called, the depth is set to zero. If an interrupt comes in and a function in the interrupt handler is traced, its depth will be greater than zero and it will automatically be traced, even if the original function was not. But because the logic only looks at depth it may trace interrupts when it should not be. The recent design change of the function graph tracer to fix other bugs caused the depth to be zero while the function graph callback handler is being called for a longer time, widening the race of this happening. This bug was actually there for a longer time, but because the race window was so small it seldom happened. The Fixes tag below is for the commit that widen the race window, because that commit belongs to a series that will also help fix the original bug. Cc: stable@kernel.org Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack") Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-29 13:50:27 +00:00
ftrace_graph_addr_finish(trace);
if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
return;
}
local_irq_save(flags);
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) {
tracing: Merge irqflags + preempt counter. The state of the interrupts (irqflags) and the preemption counter are both passed down to tracing_generic_entry_update(). Only one bit of irqflags is actually required: The on/off state. The complete 32bit of the preemption counter isn't needed. Just whether of the upper bits (softirq, hardirq and NMI) are set and the preemption depth is needed. The irqflags and the preemption counter could be evaluated early and the information stored in an integer `trace_ctx'. tracing_generic_entry_update() would use the upper bits as the TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth (considering that one must be substracted from the counter in one special cases). The actual preemption value is not used except for the tracing record. The `irqflags' variable is mostly used only for the tracing record. An exception here is for instance wakeup_tracer_call() or probe_wakeup_sched_switch() which explicilty disable interrupts and use that `irqflags' to save (and restore) the IRQ state and to record the state. Struct trace_event_buffer has also the `pc' and flags' members which can be replaced with `trace_ctx' since their actual value is not used outside of trace recording. This will reduce tracing_generic_entry_update() to simply assign values to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved to _tracing_gen_ctx_flags() which replaces preempt_count() and local_save_flags() invocations. As an example, ftrace_syscall_enter() may invoke: - trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update() - event_trigger_unlock_commit() -> ftrace_trace_stack() -> … -> tracing_generic_entry_update() -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update() In this case the TRACE_FLAG_* bits were evaluated three times. By using the `trace_ctx' they are evaluated once and assigned three times. A build with all tracers enabled on x86-64 with and without the patch: text data bss dec hex filename 21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old 21970293 17084168 7639260 46693721 2c87d59 vmlinux.new text shrank by 379 bytes, data remained constant. Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-25 19:45:08 +00:00
trace_ctx = tracing_gen_ctx_flags(flags);
__trace_graph_return(tr, trace, trace_ctx);
}
atomic_dec(&data->disabled);
local_irq_restore(flags);
}
void set_graph_array(struct trace_array *tr)
{
graph_array = tr;
/* Make graph_array visible before we start tracing */
smp_mb();
}
static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
function-graph: Add tracing_thresh support to function_graph tracer Add support for tracing_thresh to the function_graph tracer. This version of this feature isolates the checks into new entry and return functions, to avoid adding more conditional code into the main function_graph paths. When the tracing_thresh is set and the function graph tracer is enabled, only the functions that took longer than the time in microseconds that was set in tracing_thresh are recorded. To do this efficiently, only the function exits are recorded: [tracing]# echo 100 > tracing_thresh [tracing]# echo function_graph > current_tracer [tracing]# cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) ! 119.214 us | } /* smp_apic_timer_interrupt */ 1) <========== | 0) ! 101.527 us | } /* __rcu_process_callbacks */ 0) ! 126.461 us | } /* rcu_process_callbacks */ 0) ! 145.111 us | } /* __do_softirq */ 0) ! 149.667 us | } /* do_softirq */ 0) ! 168.817 us | } /* irq_exit */ 0) ! 248.254 us | } /* smp_apic_timer_interrupt */ Also, add support for specifying tracing_thresh on the kernel command line. When used like so: "tracing_thresh=200 ftrace=function_graph" this can be used to analyse system startup. It is important to disable tracing soon after boot, in order to avoid losing the trace data. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Tim Bird <tim.bird@am.sony.com> LKML-Reference: <4B87098B.4040308@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-25 23:36:43 +00:00
{
tracing/fgraph: Fix set_graph_function from showing interrupts The tracefs file set_graph_function is used to only function graph functions that are listed in that file (or all functions if the file is empty). The way this is implemented is that the function graph tracer looks at every function, and if the current depth is zero and the function matches something in the file then it will trace that function. When other functions are called, the depth will be greater than zero (because the original function will be at depth zero), and all functions will be traced where the depth is greater than zero. The issue is that when a function is first entered, and the handler that checks this logic is called, the depth is set to zero. If an interrupt comes in and a function in the interrupt handler is traced, its depth will be greater than zero and it will automatically be traced, even if the original function was not. But because the logic only looks at depth it may trace interrupts when it should not be. The recent design change of the function graph tracer to fix other bugs caused the depth to be zero while the function graph callback handler is being called for a longer time, widening the race of this happening. This bug was actually there for a longer time, but because the race window was so small it seldom happened. The Fixes tag below is for the commit that widen the race window, because that commit belongs to a series that will also help fix the original bug. Cc: stable@kernel.org Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack") Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-29 13:50:27 +00:00
ftrace_graph_addr_finish(trace);
if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
return;
}
function-graph: Add tracing_thresh support to function_graph tracer Add support for tracing_thresh to the function_graph tracer. This version of this feature isolates the checks into new entry and return functions, to avoid adding more conditional code into the main function_graph paths. When the tracing_thresh is set and the function graph tracer is enabled, only the functions that took longer than the time in microseconds that was set in tracing_thresh are recorded. To do this efficiently, only the function exits are recorded: [tracing]# echo 100 > tracing_thresh [tracing]# echo function_graph > current_tracer [tracing]# cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) ! 119.214 us | } /* smp_apic_timer_interrupt */ 1) <========== | 0) ! 101.527 us | } /* __rcu_process_callbacks */ 0) ! 126.461 us | } /* rcu_process_callbacks */ 0) ! 145.111 us | } /* __do_softirq */ 0) ! 149.667 us | } /* do_softirq */ 0) ! 168.817 us | } /* irq_exit */ 0) ! 248.254 us | } /* smp_apic_timer_interrupt */ Also, add support for specifying tracing_thresh on the kernel command line. When used like so: "tracing_thresh=200 ftrace=function_graph" this can be used to analyse system startup. It is important to disable tracing soon after boot, in order to avoid losing the trace data. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Tim Bird <tim.bird@am.sony.com> LKML-Reference: <4B87098B.4040308@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-25 23:36:43 +00:00
if (tracing_thresh &&
(trace->rettime - trace->calltime < tracing_thresh))
return;
else
trace_graph_return(trace);
}
static struct fgraph_ops funcgraph_thresh_ops = {
.entryfunc = &trace_graph_entry,
.retfunc = &trace_graph_thresh_return,
};
static struct fgraph_ops funcgraph_ops = {
.entryfunc = &trace_graph_entry,
.retfunc = &trace_graph_return,
};
static int graph_trace_init(struct trace_array *tr)
{
int ret;
set_graph_array(tr);
function-graph: Add tracing_thresh support to function_graph tracer Add support for tracing_thresh to the function_graph tracer. This version of this feature isolates the checks into new entry and return functions, to avoid adding more conditional code into the main function_graph paths. When the tracing_thresh is set and the function graph tracer is enabled, only the functions that took longer than the time in microseconds that was set in tracing_thresh are recorded. To do this efficiently, only the function exits are recorded: [tracing]# echo 100 > tracing_thresh [tracing]# echo function_graph > current_tracer [tracing]# cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) ! 119.214 us | } /* smp_apic_timer_interrupt */ 1) <========== | 0) ! 101.527 us | } /* __rcu_process_callbacks */ 0) ! 126.461 us | } /* rcu_process_callbacks */ 0) ! 145.111 us | } /* __do_softirq */ 0) ! 149.667 us | } /* do_softirq */ 0) ! 168.817 us | } /* irq_exit */ 0) ! 248.254 us | } /* smp_apic_timer_interrupt */ Also, add support for specifying tracing_thresh on the kernel command line. When used like so: "tracing_thresh=200 ftrace=function_graph" this can be used to analyse system startup. It is important to disable tracing soon after boot, in order to avoid losing the trace data. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Tim Bird <tim.bird@am.sony.com> LKML-Reference: <4B87098B.4040308@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-25 23:36:43 +00:00
if (tracing_thresh)
ret = register_ftrace_graph(&funcgraph_thresh_ops);
function-graph: Add tracing_thresh support to function_graph tracer Add support for tracing_thresh to the function_graph tracer. This version of this feature isolates the checks into new entry and return functions, to avoid adding more conditional code into the main function_graph paths. When the tracing_thresh is set and the function graph tracer is enabled, only the functions that took longer than the time in microseconds that was set in tracing_thresh are recorded. To do this efficiently, only the function exits are recorded: [tracing]# echo 100 > tracing_thresh [tracing]# echo function_graph > current_tracer [tracing]# cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) ! 119.214 us | } /* smp_apic_timer_interrupt */ 1) <========== | 0) ! 101.527 us | } /* __rcu_process_callbacks */ 0) ! 126.461 us | } /* rcu_process_callbacks */ 0) ! 145.111 us | } /* __do_softirq */ 0) ! 149.667 us | } /* do_softirq */ 0) ! 168.817 us | } /* irq_exit */ 0) ! 248.254 us | } /* smp_apic_timer_interrupt */ Also, add support for specifying tracing_thresh on the kernel command line. When used like so: "tracing_thresh=200 ftrace=function_graph" this can be used to analyse system startup. It is important to disable tracing soon after boot, in order to avoid losing the trace data. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Tim Bird <tim.bird@am.sony.com> LKML-Reference: <4B87098B.4040308@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-25 23:36:43 +00:00
else
ret = register_ftrace_graph(&funcgraph_ops);
if (ret)
return ret;
tracing_start_cmdline_record();
return 0;
}
static void graph_trace_reset(struct trace_array *tr)
{
tracing_stop_cmdline_record();
if (tracing_thresh)
unregister_ftrace_graph(&funcgraph_thresh_ops);
else
unregister_ftrace_graph(&funcgraph_ops);
}
static int graph_trace_update_thresh(struct trace_array *tr)
{
graph_trace_reset(tr);
return graph_trace_init(tr);
}
static int max_bytes_for_cpu;
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
static void print_graph_cpu(struct trace_seq *s, int cpu)
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
{
/*
* Start with a space character - to make it stand out
* to the right a bit when trace output is pasted into
* email:
*/
trace_seq_printf(s, " %*d) ", max_bytes_for_cpu, cpu);
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
}
#define TRACE_GRAPH_PROCINFO_LENGTH 14
static void print_graph_proc(struct trace_seq *s, pid_t pid)
{
char comm[TASK_COMM_LEN];
/* sign + log10(MAX_INT) + '\0' */
char pid_str[11];
int spaces = 0;
int len;
int i;
trace_find_cmdline(pid, comm);
comm[7] = '\0';
sprintf(pid_str, "%d", pid);
/* 1 stands for the "-" character */
len = strlen(comm) + strlen(pid_str) + 1;
if (len < TRACE_GRAPH_PROCINFO_LENGTH)
spaces = TRACE_GRAPH_PROCINFO_LENGTH - len;
/* First spaces to align center */
for (i = 0; i < spaces / 2; i++)
trace_seq_putc(s, ' ');
trace_seq_printf(s, "%s-%s", comm, pid_str);
/* Last spaces to align center */
for (i = 0; i < spaces - (spaces / 2); i++)
trace_seq_putc(s, ' ');
}
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
static void print_graph_lat_fmt(struct trace_seq *s, struct trace_entry *entry)
{
trace_seq_putc(s, ' ');
trace_print_lat_fmt(s, entry);
tracing: Put a margin between flags and duration for wakeup tracers Don't mix context flags with function duration info. Instead of this: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNh5 | /* 0:120:R + [000] 11: 0:R migration/0 */ 2 us | 0) <idle>-0 | dNh5 0.000 us | (null)(); 4 us | 0) <idle>-0 | dNh4 | _raw_spin_unlock() { 4 us | 0) <idle>-0 | dNh4 0.304 us | preempt_count_sub(); 5 us | 0) <idle>-0 | dNh3 1.063 us | } 5 us | 0) <idle>-0 | dNh3 0.266 us | ttwu_stat(); 6 us | 0) <idle>-0 | dNh3 | _raw_spin_unlock_irqrestore() { 6 us | 0) <idle>-0 | dNh3 0.273 us | preempt_count_sub(); 6 us | 0) <idle>-0 | dNh2 0.818 us | } Show this: # tracer: wakeup # # wakeup latency trace v1.1.5 on 4.20.0+ # -------------------------------------------------------------------- # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */ 3 us | 0) <idle>-0 | dNs. | 0.000 us | (null)(); 67 us | 0) <idle>-0 | dNs. | 0.721 us | ttwu_stat(); 69 us | 0) <idle>-0 | dNs. | 0.607 us | _raw_spin_unlock_irqrestore(); 71 us | 0) <idle>-0 | .Ns. | 0.598 us | _raw_spin_lock_irq(); 72 us | 0) <idle>-0 | .Ns. | 0.584 us | _raw_spin_lock_irq(); 73 us | 0) <idle>-0 | dNs. | + 11.118 us | __next_timer_interrupt(); 75 us | 0) <idle>-0 | dNs. | | call_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | delayed_work_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | __queue_work() { ... Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:12 +00:00
trace_seq_puts(s, " | ");
}
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
/* If the pid changed since the last trace, output this event */
static void
verif_pid(struct trace_seq *s, pid_t pid, int cpu, struct fgraph_data *data)
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
{
pid_t prev_pid;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
pid_t *last_pid;
if (!data)
return;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
last_pid = &(per_cpu_ptr(data->cpu_data, cpu)->last_pid);
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
if (*last_pid == pid)
return;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
prev_pid = *last_pid;
*last_pid = pid;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
if (prev_pid == -1)
return;
/*
* Context-switch trace line:
------------------------------------------
| 1) migration/0--1 => sshd-1755
------------------------------------------
*/
trace_seq_puts(s, " ------------------------------------------\n");
print_graph_cpu(s, cpu);
print_graph_proc(s, prev_pid);
trace_seq_puts(s, " => ");
print_graph_proc(s, pid);
trace_seq_puts(s, "\n ------------------------------------------\n\n");
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
}
static struct ftrace_graph_ret_entry *
get_return_for_leaf(struct trace_iterator *iter,
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
struct ftrace_graph_ent_entry *curr)
{
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
struct fgraph_data *data = iter->private;
struct ring_buffer_iter *ring_iter = NULL;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
struct ring_buffer_event *event;
struct ftrace_graph_ret_entry *next;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
/*
* If the previous output failed to write to the seq buffer,
* then we just reuse the data from before.
*/
if (data && data->failed) {
curr = &data->ent;
next = &data->ret;
} else {
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
ring_iter = trace_buffer_iter(iter, iter->cpu);
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
/* First peek to compare current entry and the next one */
if (ring_iter)
event = ring_buffer_iter_peek(ring_iter, NULL);
else {
/*
* We need to consume the current entry to see
* the next one.
*/
ring_buffer_consume(iter->array_buffer->buffer, iter->cpu,
ring-buffer: Add place holder recording of dropped events Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-31 17:21:56 +00:00
NULL, NULL);
event = ring_buffer_peek(iter->array_buffer->buffer, iter->cpu,
ring-buffer: Add place holder recording of dropped events Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-31 17:21:56 +00:00
NULL, NULL);
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
}
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
if (!event)
return NULL;
next = ring_buffer_event_data(event);
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
if (data) {
/*
* Save current and next entries for later reference
* if the output fails.
*/
data->ent = *curr;
/*
* If the next event is not a return type, then
* we only care about what type it is. Otherwise we can
* safely copy the entire event.
*/
if (next->ent.type == TRACE_GRAPH_RET)
data->ret = *next;
else
data->ret.ent.type = next->ent.type;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
}
}
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
if (next->ent.type != TRACE_GRAPH_RET)
return NULL;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
if (curr->ent.pid != next->ent.pid ||
curr->graph_ent.func != next->ret.func)
return NULL;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
/* this is a leaf, now advance the iterator */
if (ring_iter)
ring_buffer_iter_advance(ring_iter);
return next;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
}
static void print_graph_abs_time(u64 t, struct trace_seq *s)
{
unsigned long usecs_rem;
usecs_rem = do_div(t, NSEC_PER_SEC);
usecs_rem /= 1000;
trace_seq_printf(s, "%5lu.%06lu | ",
(unsigned long)t, usecs_rem);
}
function_graph: Support displaying relative timestamp When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:10 +00:00
static void
print_graph_rel_time(struct trace_iterator *iter, struct trace_seq *s)
{
unsigned long long usecs;
usecs = iter->ts - iter->array_buffer->time_start;
function_graph: Support displaying relative timestamp When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:10 +00:00
do_div(usecs, NSEC_PER_USEC);
trace_seq_printf(s, "%9llu us | ", usecs);
}
static void
print_graph_irq(struct trace_iterator *iter, unsigned long addr,
enum trace_type type, int cpu, pid_t pid, u32 flags)
{
struct trace_array *tr = iter->tr;
struct trace_seq *s = &iter->seq;
struct trace_entry *ent = iter->ent;
if (addr < (unsigned long)__irqentry_text_start ||
addr >= (unsigned long)__irqentry_text_end)
return;
if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) {
/* Absolute time */
if (flags & TRACE_GRAPH_PRINT_ABS_TIME)
print_graph_abs_time(iter->ts, s);
function_graph: Support displaying relative timestamp When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:10 +00:00
/* Relative time */
if (flags & TRACE_GRAPH_PRINT_REL_TIME)
print_graph_rel_time(iter, s);
/* Cpu */
if (flags & TRACE_GRAPH_PRINT_CPU)
print_graph_cpu(s, cpu);
/* Proc */
if (flags & TRACE_GRAPH_PRINT_PROC) {
print_graph_proc(s, pid);
trace_seq_puts(s, " | ");
}
/* Latency format */
if (tr->trace_flags & TRACE_ITER_LATENCY_FMT)
print_graph_lat_fmt(s, ent);
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
}
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
/* No overhead */
print_graph_duration(tr, 0, s, flags | FLAGS_FILL_START);
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
if (type == TRACE_GRAPH_ENT)
trace_seq_puts(s, "==========>");
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
else
trace_seq_puts(s, "<==========");
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
print_graph_duration(tr, 0, s, flags | FLAGS_FILL_END);
trace_seq_putc(s, '\n');
}
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
void
trace_print_graph_duration(unsigned long long duration, struct trace_seq *s)
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
{
unsigned long nsecs_rem = do_div(duration, 1000);
/* log10(ULONG_MAX) + '\0' */
char usecs_str[21];
char nsecs_str[5];
int len;
int i;
sprintf(usecs_str, "%lu", (unsigned long) duration);
/* Print msecs */
trace_seq_printf(s, "%s", usecs_str);
len = strlen(usecs_str);
/* Print nsecs (we don't want to exceed 7 numbers) */
if (len < 7) {
size_t slen = min_t(size_t, sizeof(nsecs_str), 8UL - len);
snprintf(nsecs_str, slen, "%03lu", nsecs_rem);
trace_seq_printf(s, ".%s", nsecs_str);
len += strlen(nsecs_str) + 1;
}
trace_seq_puts(s, " us ");
/* Print remaining spaces to fit the row's width */
for (i = len; i < 8; i++)
trace_seq_putc(s, ' ');
}
static void
print_graph_duration(struct trace_array *tr, unsigned long long duration,
struct trace_seq *s, u32 flags)
{
if (!(flags & TRACE_GRAPH_PRINT_DURATION) ||
!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO))
return;
/* No real adata, just filling the column with spaces */
switch (flags & TRACE_GRAPH_PRINT_FILL_MASK) {
case FLAGS_FILL_FULL:
trace_seq_puts(s, " | ");
return;
case FLAGS_FILL_START:
trace_seq_puts(s, " ");
return;
case FLAGS_FILL_END:
trace_seq_puts(s, " |");
return;
}
/* Signal a overhead of time execution to the output */
tracing: Add additional marks to signal very large time deltas Currently, function graph tracer prints "!" or "+" just before function execution time to signal a function overhead, depending on the time. And some tracers tracing latency also print "!" or "+" just after time to signal overhead, depending on the interval between events. Even it is usually enough to do that, we sometimes need to signal for bigger execution time than 100 micro seconds. For example, I used function graph tracer to detect if there is any case that exit_mm() takes too much time. I did following steps in /sys/kernel/debug/tracing. It was easier to detect very large excution time with patched kernel than with original kernel. $ echo exit_mm > set_graph_function $ echo function_graph > current_tracer $ echo > trace $ cat trace_pipe > $LOGFILE ... (do something and terminate logging) $ grep "\\$" $LOGFILE 3) $ 22082032 us | } /* kernel_map_pages */ 3) $ 22082040 us | } /* free_pages_prepare */ 3) $ 22082113 us | } /* free_hot_cold_page */ 3) $ 22083455 us | } /* free_hot_cold_page_list */ 3) $ 22083895 us | } /* release_pages */ 3) $ 22177873 us | } /* free_pages_and_swap_cache */ 3) $ 22178929 us | } /* unmap_single_vma */ 3) $ 22198885 us | } /* unmap_vmas */ 3) $ 22206949 us | } /* exit_mmap */ 3) $ 22207659 us | } /* mmput */ 3) $ 22207793 us | } /* exit_mm */ And then, it was easy to find out that a schedule-out occured by sub_preempt_count() within kernel_map_pages(). To detect very large function exection time caused by either problematic function implementation or scheduling issues, this patch can be useful. Link: http://lkml.kernel.org/r/1416789259-24038-1-git-send-email-byungchul.park@lge.com Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-24 00:34:19 +00:00
if (flags & TRACE_GRAPH_PRINT_OVERHEAD)
trace_seq_printf(s, "%c ", trace_find_mark(duration));
else
trace_seq_puts(s, " ");
trace_print_graph_duration(duration, s);
trace_seq_puts(s, "| ");
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
}
function_graph: Support recording and printing the return value of function Analyzing system call failures with the function_graph tracer can be a time-consuming process, particularly when locating the kernel function that first returns an error in the trace logs. This change aims to simplify the process by recording the function return value to the 'retval' member of 'ftrace_graph_ret' and printing it when outputting the trace log. We have introduced new trace options: funcgraph-retval and funcgraph-retval-hex. The former controls whether to display the return value, while the latter controls the display format. Please note that even if a function's return type is void, a return value will still be printed. You can simply ignore it. This patch only establishes the fundamental infrastructure. Subsequent patches will make this feature available on some commonly used processor architectures. Here is an example: I attempted to attach the demo process to a cpu cgroup, but it failed: echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks -bash: echo: write error: Invalid argument The strace logs indicate that the write system call returned -EINVAL(-22): ... write(1, "273\n", 4) = -1 EINVAL (Invalid argument) ... To capture trace logs during a write system call, use the following commands: cd /sys/kernel/debug/tracing/ echo 0 > tracing_on echo > trace echo *sys_write > set_graph_function echo *spin* > set_graph_notrace echo *rcu* >> set_graph_notrace echo *alloc* >> set_graph_notrace echo preempt* >> set_graph_notrace echo kfree* >> set_graph_notrace echo $$ > set_ftrace_pid echo function_graph > current_tracer echo 1 > options/funcgraph-retval echo 0 > options/funcgraph-retval-hex echo 1 > tracing_on echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks echo 0 > tracing_on cat trace > ~/trace.log To locate the root cause, search for error code -22 directly in the file trace.log and identify the first function that returned -22. Once you have identified this function, examine its code to determine the root cause. For example, in the trace log below, cpu_cgroup_can_attach returned -22 first, so we can focus our analysis on this function to identify the root cause. ... 1) | cgroup_migrate() { 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ 1) | cgroup_migrate_execute() { 1) | cpu_cgroup_can_attach() { 1) | cgroup_taskset_first() { 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */ 1) 4.369 us | } /* cgroup_migrate_execute = -22 */ 1) 7.143 us | } /* cgroup_migrate = -22 */ ... Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn Tested-by: Florian Kauer <florian.kauer@linutronix.de> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-08 12:42:15 +00:00
#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
#define __TRACE_GRAPH_PRINT_RETVAL TRACE_GRAPH_PRINT_RETVAL
static void print_graph_retval(struct trace_seq *s, unsigned long retval,
bool leaf, void *func, bool hex_format)
{
unsigned long err_code = 0;
if (retval == 0 || hex_format)
goto done;
/* Check if the return value matches the negative format */
if (IS_ENABLED(CONFIG_64BIT) && (retval & BIT(31)) &&
(((u64)retval) >> 32) == 0) {
/* sign extension */
err_code = (unsigned long)(s32)retval;
} else {
err_code = retval;
}
if (!IS_ERR_VALUE(err_code))
err_code = 0;
done:
if (leaf) {
if (hex_format || (err_code == 0))
trace_seq_printf(s, "%ps(); /* = 0x%lx */\n",
func, retval);
else
trace_seq_printf(s, "%ps(); /* = %ld */\n",
func, err_code);
} else {
if (hex_format || (err_code == 0))
trace_seq_printf(s, "} /* %ps = 0x%lx */\n",
func, retval);
else
trace_seq_printf(s, "} /* %ps = %ld */\n",
func, err_code);
}
}
#else
#define __TRACE_GRAPH_PRINT_RETVAL 0
#define print_graph_retval(_seq, _retval, _leaf, _func, _format) do {} while (0)
#endif
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
/* Case of a leaf function on its call entry */
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
static enum print_line_t
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
print_graph_entry_leaf(struct trace_iterator *iter,
struct ftrace_graph_ent_entry *entry,
struct ftrace_graph_ret_entry *ret_entry,
struct trace_seq *s, u32 flags)
{
struct fgraph_data *data = iter->private;
struct trace_array *tr = iter->tr;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
struct ftrace_graph_ret *graph_ret;
struct ftrace_graph_ent *call;
unsigned long long duration;
int cpu = iter->cpu;
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
int i;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
graph_ret = &ret_entry->ret;
call = &entry->graph_ent;
duration = graph_ret->rettime - graph_ret->calltime;
if (data) {
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
struct fgraph_cpu_data *cpu_data;
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
/*
* Comments display at + 1 to depth. Since
* this is a leaf function, keep the comments
* equal to this depth.
*/
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
cpu_data->depth = call->depth - 1;
/* No need to keep this function around for this depth */
fgraph: Handle a case where a tracer ignores set_graph_notrace Both the wakeup and irqsoff tracers can use the function graph tracer when the display-graph option is set. The problem is that they ignore the notrace file, and record the entry of functions that would be ignored by the function_graph tracer. This causes the trace->depth to be recorded into the ring buffer. The set_graph_notrace uses a trick by adding a large negative number to the trace->depth when a graph function is to be ignored. On trace output, the graph function uses the depth to record a stack of functions. But since the depth is negative, it accesses the array with a negative number and causes an out of bounds access that can cause a kernel oops or corrupt data. Have the print functions handle cases where a tracer still records functions even when they are in set_graph_notrace. Also add warnings if the depth is below zero before accessing the array. Note, the function graph logic will still prevent the return of these functions from being recorded, which means that they will be left hanging without a return. For example: # echo '*spin*' > set_graph_notrace # echo 1 > options/display-graph # echo wakeup > current_tracer # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where it should look like: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Cc: stable@vger.kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 01:54:49 +00:00
if (call->depth < FTRACE_RETFUNC_DEPTH &&
!WARN_ON_ONCE(call->depth < 0))
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
cpu_data->enter_funcs[call->depth] = 0;
}
/* Overhead and duration */
print_graph_duration(tr, duration, s, flags);
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
/* Function */
for (i = 0; i < call->depth * TRACE_GRAPH_INDENT; i++)
trace_seq_putc(s, ' ');
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
function_graph: Support recording and printing the return value of function Analyzing system call failures with the function_graph tracer can be a time-consuming process, particularly when locating the kernel function that first returns an error in the trace logs. This change aims to simplify the process by recording the function return value to the 'retval' member of 'ftrace_graph_ret' and printing it when outputting the trace log. We have introduced new trace options: funcgraph-retval and funcgraph-retval-hex. The former controls whether to display the return value, while the latter controls the display format. Please note that even if a function's return type is void, a return value will still be printed. You can simply ignore it. This patch only establishes the fundamental infrastructure. Subsequent patches will make this feature available on some commonly used processor architectures. Here is an example: I attempted to attach the demo process to a cpu cgroup, but it failed: echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks -bash: echo: write error: Invalid argument The strace logs indicate that the write system call returned -EINVAL(-22): ... write(1, "273\n", 4) = -1 EINVAL (Invalid argument) ... To capture trace logs during a write system call, use the following commands: cd /sys/kernel/debug/tracing/ echo 0 > tracing_on echo > trace echo *sys_write > set_graph_function echo *spin* > set_graph_notrace echo *rcu* >> set_graph_notrace echo *alloc* >> set_graph_notrace echo preempt* >> set_graph_notrace echo kfree* >> set_graph_notrace echo $$ > set_ftrace_pid echo function_graph > current_tracer echo 1 > options/funcgraph-retval echo 0 > options/funcgraph-retval-hex echo 1 > tracing_on echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks echo 0 > tracing_on cat trace > ~/trace.log To locate the root cause, search for error code -22 directly in the file trace.log and identify the first function that returned -22. Once you have identified this function, examine its code to determine the root cause. For example, in the trace log below, cpu_cgroup_can_attach returned -22 first, so we can focus our analysis on this function to identify the root cause. ... 1) | cgroup_migrate() { 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ 1) | cgroup_migrate_execute() { 1) | cpu_cgroup_can_attach() { 1) | cgroup_taskset_first() { 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */ 1) 4.369 us | } /* cgroup_migrate_execute = -22 */ 1) 7.143 us | } /* cgroup_migrate = -22 */ ... Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn Tested-by: Florian Kauer <florian.kauer@linutronix.de> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-08 12:42:15 +00:00
/*
* Write out the function return value if the option function-retval is
* enabled.
*/
if (flags & __TRACE_GRAPH_PRINT_RETVAL)
print_graph_retval(s, graph_ret->retval, true, (void *)call->func,
!!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
else
trace_seq_printf(s, "%ps();\n", (void *)call->func);
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
print_graph_irq(iter, graph_ret->func, TRACE_GRAPH_RET,
cpu, iter->ent->pid, flags);
return trace_handle_return(s);
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
}
static enum print_line_t
print_graph_entry_nested(struct trace_iterator *iter,
struct ftrace_graph_ent_entry *entry,
struct trace_seq *s, int cpu, u32 flags)
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
{
struct ftrace_graph_ent *call = &entry->graph_ent;
struct fgraph_data *data = iter->private;
struct trace_array *tr = iter->tr;
int i;
if (data) {
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
struct fgraph_cpu_data *cpu_data;
int cpu = iter->cpu;
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
cpu_data->depth = call->depth;
/* Save this function pointer to see if the exit matches */
fgraph: Handle a case where a tracer ignores set_graph_notrace Both the wakeup and irqsoff tracers can use the function graph tracer when the display-graph option is set. The problem is that they ignore the notrace file, and record the entry of functions that would be ignored by the function_graph tracer. This causes the trace->depth to be recorded into the ring buffer. The set_graph_notrace uses a trick by adding a large negative number to the trace->depth when a graph function is to be ignored. On trace output, the graph function uses the depth to record a stack of functions. But since the depth is negative, it accesses the array with a negative number and causes an out of bounds access that can cause a kernel oops or corrupt data. Have the print functions handle cases where a tracer still records functions even when they are in set_graph_notrace. Also add warnings if the depth is below zero before accessing the array. Note, the function graph logic will still prevent the return of these functions from being recorded, which means that they will be left hanging without a return. For example: # echo '*spin*' > set_graph_notrace # echo 1 > options/display-graph # echo wakeup > current_tracer # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where it should look like: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Cc: stable@vger.kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 01:54:49 +00:00
if (call->depth < FTRACE_RETFUNC_DEPTH &&
!WARN_ON_ONCE(call->depth < 0))
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
cpu_data->enter_funcs[call->depth] = call->func;
}
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
/* No time */
print_graph_duration(tr, 0, s, flags | FLAGS_FILL_FULL);
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
/* Function */
for (i = 0; i < call->depth * TRACE_GRAPH_INDENT; i++)
trace_seq_putc(s, ' ');
trace_seq_printf(s, "%ps() {\n", (void *)call->func);
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
if (trace_seq_has_overflowed(s))
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
return TRACE_TYPE_PARTIAL_LINE;
/*
* we already consumed the current entry to check the next one
* and see if this is a leaf.
*/
return TRACE_TYPE_NO_CONSUME;
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
}
static void
print_graph_prologue(struct trace_iterator *iter, struct trace_seq *s,
int type, unsigned long addr, u32 flags)
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
{
struct fgraph_data *data = iter->private;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
struct trace_entry *ent = iter->ent;
struct trace_array *tr = iter->tr;
int cpu = iter->cpu;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
/* Pid */
verif_pid(s, ent->pid, cpu, data);
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
if (type)
/* Interrupt */
print_graph_irq(iter, addr, type, cpu, ent->pid, flags);
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
if (!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO))
return;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
/* Absolute time */
if (flags & TRACE_GRAPH_PRINT_ABS_TIME)
print_graph_abs_time(iter->ts, s);
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
function_graph: Support displaying relative timestamp When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:10 +00:00
/* Relative time */
if (flags & TRACE_GRAPH_PRINT_REL_TIME)
print_graph_rel_time(iter, s);
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
/* Cpu */
if (flags & TRACE_GRAPH_PRINT_CPU)
print_graph_cpu(s, cpu);
/* Proc */
if (flags & TRACE_GRAPH_PRINT_PROC) {
print_graph_proc(s, ent->pid);
trace_seq_puts(s, " | ");
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
}
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
/* Latency format */
if (tr->trace_flags & TRACE_ITER_LATENCY_FMT)
print_graph_lat_fmt(s, ent);
return;
}
/*
* Entry check for irq code
*
* returns 1 if
* - we are inside irq code
* - we just entered irq code
*
* returns 0 if
* - funcgraph-interrupts option is set
* - we are not inside irq code
*/
static int
check_irq_entry(struct trace_iterator *iter, u32 flags,
unsigned long addr, int depth)
{
int cpu = iter->cpu;
int *depth_irq;
struct fgraph_data *data = iter->private;
/*
* If we are either displaying irqs, or we got called as
* a graph event and private data does not exist,
* then we bypass the irq check.
*/
if ((flags & TRACE_GRAPH_PRINT_IRQS) ||
(!data))
return 0;
depth_irq = &(per_cpu_ptr(data->cpu_data, cpu)->depth_irq);
/*
* We are inside the irq code
*/
if (*depth_irq >= 0)
return 1;
if ((addr < (unsigned long)__irqentry_text_start) ||
(addr >= (unsigned long)__irqentry_text_end))
return 0;
/*
* We are entering irq code.
*/
*depth_irq = depth;
return 1;
}
/*
* Return check for irq code
*
* returns 1 if
* - we are inside irq code
* - we just left irq code
*
* returns 0 if
* - funcgraph-interrupts option is set
* - we are not inside irq code
*/
static int
check_irq_return(struct trace_iterator *iter, u32 flags, int depth)
{
int cpu = iter->cpu;
int *depth_irq;
struct fgraph_data *data = iter->private;
/*
* If we are either displaying irqs, or we got called as
* a graph event and private data does not exist,
* then we bypass the irq check.
*/
if ((flags & TRACE_GRAPH_PRINT_IRQS) ||
(!data))
return 0;
depth_irq = &(per_cpu_ptr(data->cpu_data, cpu)->depth_irq);
/*
* We are not inside the irq code.
*/
if (*depth_irq == -1)
return 0;
/*
* We are inside the irq code, and this is returning entry.
* Let's not trace it and clear the entry depth, since
* we are out of irq code.
*
* This condition ensures that we 'leave the irq code' once
* we are out of the entry depth. Thus protecting us from
* the RETURN entry loss.
*/
if (*depth_irq >= depth) {
*depth_irq = -1;
return 1;
}
/*
* We are inside the irq code, and this is not the entry.
*/
return 1;
}
static enum print_line_t
print_graph_entry(struct ftrace_graph_ent_entry *field, struct trace_seq *s,
struct trace_iterator *iter, u32 flags)
{
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
struct fgraph_data *data = iter->private;
struct ftrace_graph_ent *call = &field->graph_ent;
struct ftrace_graph_ret_entry *leaf_ret;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
static enum print_line_t ret;
int cpu = iter->cpu;
if (check_irq_entry(iter, flags, call->func, call->depth))
return TRACE_TYPE_HANDLED;
print_graph_prologue(iter, s, TRACE_GRAPH_ENT, call->func, flags);
leaf_ret = get_return_for_leaf(iter, field);
if (leaf_ret)
ret = print_graph_entry_leaf(iter, field, leaf_ret, s, flags);
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
else
ret = print_graph_entry_nested(iter, field, s, cpu, flags);
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
if (data) {
/*
* If we failed to write our output, then we need to make
* note of it. Because we already consumed our entry.
*/
if (s->full) {
data->failed = 1;
data->cpu = cpu;
} else
data->failed = 0;
}
return ret;
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
}
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
static enum print_line_t
print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
struct trace_entry *ent, struct trace_iterator *iter,
u32 flags)
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
{
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
unsigned long long duration = trace->rettime - trace->calltime;
struct fgraph_data *data = iter->private;
struct trace_array *tr = iter->tr;
pid_t pid = ent->pid;
int cpu = iter->cpu;
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
int func_match = 1;
int i;
if (check_irq_return(iter, flags, trace->depth))
return TRACE_TYPE_HANDLED;
if (data) {
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
struct fgraph_cpu_data *cpu_data;
int cpu = iter->cpu;
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
/*
* Comments display at + 1 to depth. This is the
* return from a function, we now want the comments
* to display at the same level of the bracket.
*/
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
cpu_data->depth = trace->depth - 1;
fgraph: Handle a case where a tracer ignores set_graph_notrace Both the wakeup and irqsoff tracers can use the function graph tracer when the display-graph option is set. The problem is that they ignore the notrace file, and record the entry of functions that would be ignored by the function_graph tracer. This causes the trace->depth to be recorded into the ring buffer. The set_graph_notrace uses a trick by adding a large negative number to the trace->depth when a graph function is to be ignored. On trace output, the graph function uses the depth to record a stack of functions. But since the depth is negative, it accesses the array with a negative number and causes an out of bounds access that can cause a kernel oops or corrupt data. Have the print functions handle cases where a tracer still records functions even when they are in set_graph_notrace. Also add warnings if the depth is below zero before accessing the array. Note, the function graph logic will still prevent the return of these functions from being recorded, which means that they will be left hanging without a return. For example: # echo '*spin*' > set_graph_notrace # echo 1 > options/display-graph # echo wakeup > current_tracer # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where it should look like: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Cc: stable@vger.kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09 01:54:49 +00:00
if (trace->depth < FTRACE_RETFUNC_DEPTH &&
!WARN_ON_ONCE(trace->depth < 0)) {
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
if (cpu_data->enter_funcs[trace->depth] != trace->func)
func_match = 0;
cpu_data->enter_funcs[trace->depth] = 0;
}
}
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
print_graph_prologue(iter, s, 0, 0, flags);
ftrace: add cpu annotation for function graph tracer Impact: enhancement for function graph tracer When run on a SMP box, the function graph tracer is confusing because it shows the different CPUS as changes in the trace. This patch adds the annotation of 'CPU[###]' where ### is a three digit number. The output will look similar to this: CPU[001] dput() { CPU[000] } 726 CPU[001] } 487 CPU[000] do_softirq() { CPU[001] } 2221 CPU[000] __do_softirq() { CPU[000] __local_bh_disable() { CPU[001] unroll_tree_refs() { CPU[000] } 569 CPU[001] } 501 CPU[000] rcu_process_callbacks() { CPU[001] kfree() { What makes this nice is that now you can grep the file and produce readable format for a particular CPU. # cat /debug/tracing/trace > /tmp/trace # grep '^CPU\[000\]' /tmp/trace > /tmp/trace0 # grep '^CPU\[001\]' /tmp/trace > /tmp/trace1 Will give you: # head /tmp/trace0 CPU[000] ------------8<---------- thread sshd-3899 ------------8<---------- CPU[000] inotify_dentry_parent_queue_event() { CPU[000] } 2531 CPU[000] inotify_inode_queue_event() { CPU[000] } 505 CPU[000] } 69626 CPU[000] } 73089 CPU[000] audit_syscall_exit() { CPU[000] path_put() { CPU[000] dput() { # head /tmp/trace1 CPU[001] ------------8<---------- thread pcscd-3446 ------------8<---------- CPU[001] } 4186 CPU[001] dput() { CPU[001] } 543 CPU[001] vfs_permission() { CPU[001] inode_permission() { CPU[001] shmem_permission() { CPU[001] generic_permission() { CPU[001] } 501 CPU[001] } 2205 Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 05:16:27 +00:00
/* Overhead and duration */
print_graph_duration(tr, duration, s, flags);
tracing/function-graph-tracer: adjustments of the trace informations Impact: increase the visual qualities of the call-graph-tracer output This patch applies various trace output formatting changes: - CPU is now a decimal number, followed by a parenthesis. - Overhead is now on the second column (gives a good visibility) - Cost is now on the third column, can't exceed 9999.99 us. It is followed by a virtual line based on a "|" character. - Functions calls are now the last column on the right. This way, we haven't dynamic column (which flow is harder to follow) on its right. - CPU and Overhead have their own option flag. They are default-on but you can disable them easily: echo nofuncgraph-cpu > trace_options echo nofuncgraph-overhead > trace_options TODO: _ Refactoring of the thread switch output. _ Give a default-off option to output the thread and its pid on each row. _ Provide headers _ .... Here is an example of the new trace style: 0) | mutex_unlock() { 0) 0.639 us | __mutex_unlock_slowpath(); 0) 1.607 us | } 0) | remove_wait_queue() { 0) 0.616 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.779 us | } 0) 0.495 us | n_tty_set_room(); 0) ! 9999.999 us | } 0) | tty_ldisc_deref() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.616 us | _spin_unlock_irqrestore(); 0) 2.793 us | } 0) | current_fs_time() { 0) 0.488 us | current_kernel_time(); 0) 0.495 us | timespec_trunc(); 0) 2.486 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) ! 9999.999 us | } 0) | sys_read() { 0) 0.796 us | fget_light(); 0) | vfs_read() { 0) | rw_verify_area() { 0) | security_file_permission() { 0) 0.488 us | cap_file_permission(); 0) 1.720 us | } 0) 3. 4 us | } 0) | tty_read() { 0) 0.488 us | tty_paranoia_check(); 0) | tty_ldisc_ref_wait() { 0) | tty_ldisc_try() { 0) 0.615 us | _spin_lock_irqsave(); 0) 0.615 us | _spin_unlock_irqrestore(); 0) 5.436 us | } 0) 6.427 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 23:42:46 +00:00
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
/* Closing brace */
for (i = 0; i < trace->depth * TRACE_GRAPH_INDENT; i++)
trace_seq_putc(s, ' ');
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
/*
function_graph: Support recording and printing the return value of function Analyzing system call failures with the function_graph tracer can be a time-consuming process, particularly when locating the kernel function that first returns an error in the trace logs. This change aims to simplify the process by recording the function return value to the 'retval' member of 'ftrace_graph_ret' and printing it when outputting the trace log. We have introduced new trace options: funcgraph-retval and funcgraph-retval-hex. The former controls whether to display the return value, while the latter controls the display format. Please note that even if a function's return type is void, a return value will still be printed. You can simply ignore it. This patch only establishes the fundamental infrastructure. Subsequent patches will make this feature available on some commonly used processor architectures. Here is an example: I attempted to attach the demo process to a cpu cgroup, but it failed: echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks -bash: echo: write error: Invalid argument The strace logs indicate that the write system call returned -EINVAL(-22): ... write(1, "273\n", 4) = -1 EINVAL (Invalid argument) ... To capture trace logs during a write system call, use the following commands: cd /sys/kernel/debug/tracing/ echo 0 > tracing_on echo > trace echo *sys_write > set_graph_function echo *spin* > set_graph_notrace echo *rcu* >> set_graph_notrace echo *alloc* >> set_graph_notrace echo preempt* >> set_graph_notrace echo kfree* >> set_graph_notrace echo $$ > set_ftrace_pid echo function_graph > current_tracer echo 1 > options/funcgraph-retval echo 0 > options/funcgraph-retval-hex echo 1 > tracing_on echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks echo 0 > tracing_on cat trace > ~/trace.log To locate the root cause, search for error code -22 directly in the file trace.log and identify the first function that returned -22. Once you have identified this function, examine its code to determine the root cause. For example, in the trace log below, cpu_cgroup_can_attach returned -22 first, so we can focus our analysis on this function to identify the root cause. ... 1) | cgroup_migrate() { 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ 1) | cgroup_migrate_execute() { 1) | cpu_cgroup_can_attach() { 1) | cgroup_taskset_first() { 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */ 1) 4.369 us | } /* cgroup_migrate_execute = -22 */ 1) 7.143 us | } /* cgroup_migrate = -22 */ ... Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn Tested-by: Florian Kauer <florian.kauer@linutronix.de> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-08 12:42:15 +00:00
* Always write out the function name and its return value if the
* function-retval option is enabled.
ftrace: Add function names to dangling } in function graph tracer The function graph tracer is currently the most invasive tracer in the ftrace family. It can easily overflow the buffer even with 10megs per CPU. This means that events can often be lost. On start up, or after events are lost, if the function return is recorded but the function enter was lost, all we get to see is the exiting '}'. Here is how a typical trace output starts: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } 0) ! 567.961 us | } 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } 0) ! 603.361 us | } 0) ! 613.574 us | } 0) ! 623.554 us | } 0) 3.653 us | fget_light(); 0) | sock_poll() { There are a series of '}' with no matching "func() {". There's no information to what functions these ending brackets belong to. This patch adds a stack on the per cpu structure used in outputting the function graph tracer to keep track of what function was outputted. Then on a function exit event, it checks the depth to see if the function exit has a matching entry event. If it does, then it only prints the '}', otherwise it adds the function name after the '}'. This allows function exit events to show what function they belong to at trace output startup, when the entry was lost due to ring buffer overflow, or even after a new task is scheduled in. Here is what the above trace will look like after this patch: [tracing] cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) + 91.897 us | } (irq_exit) 0) ! 567.961 us | } (smp_apic_timer_interrupt) 0) <========== | 0) ! 579.083 us | _raw_spin_lock_irqsave(); 0) 4.694 us | _raw_spin_unlock_irqrestore(); 0) ! 594.862 us | } (add_wait_queue) 0) ! 603.361 us | } (__pollwait) 0) ! 613.574 us | } (tcp_poll) 0) ! 623.554 us | } (sock_poll) 0) 3.653 us | fget_light(); 0) | sock_poll() { Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-26 22:08:16 +00:00
*/
function_graph: Support recording and printing the return value of function Analyzing system call failures with the function_graph tracer can be a time-consuming process, particularly when locating the kernel function that first returns an error in the trace logs. This change aims to simplify the process by recording the function return value to the 'retval' member of 'ftrace_graph_ret' and printing it when outputting the trace log. We have introduced new trace options: funcgraph-retval and funcgraph-retval-hex. The former controls whether to display the return value, while the latter controls the display format. Please note that even if a function's return type is void, a return value will still be printed. You can simply ignore it. This patch only establishes the fundamental infrastructure. Subsequent patches will make this feature available on some commonly used processor architectures. Here is an example: I attempted to attach the demo process to a cpu cgroup, but it failed: echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks -bash: echo: write error: Invalid argument The strace logs indicate that the write system call returned -EINVAL(-22): ... write(1, "273\n", 4) = -1 EINVAL (Invalid argument) ... To capture trace logs during a write system call, use the following commands: cd /sys/kernel/debug/tracing/ echo 0 > tracing_on echo > trace echo *sys_write > set_graph_function echo *spin* > set_graph_notrace echo *rcu* >> set_graph_notrace echo *alloc* >> set_graph_notrace echo preempt* >> set_graph_notrace echo kfree* >> set_graph_notrace echo $$ > set_ftrace_pid echo function_graph > current_tracer echo 1 > options/funcgraph-retval echo 0 > options/funcgraph-retval-hex echo 1 > tracing_on echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks echo 0 > tracing_on cat trace > ~/trace.log To locate the root cause, search for error code -22 directly in the file trace.log and identify the first function that returned -22. Once you have identified this function, examine its code to determine the root cause. For example, in the trace log below, cpu_cgroup_can_attach returned -22 first, so we can focus our analysis on this function to identify the root cause. ... 1) | cgroup_migrate() { 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ 1) | cgroup_migrate_execute() { 1) | cpu_cgroup_can_attach() { 1) | cgroup_taskset_first() { 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */ 1) 4.369 us | } /* cgroup_migrate_execute = -22 */ 1) 7.143 us | } /* cgroup_migrate = -22 */ ... Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn Tested-by: Florian Kauer <florian.kauer@linutronix.de> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-08 12:42:15 +00:00
if (flags & __TRACE_GRAPH_PRINT_RETVAL) {
print_graph_retval(s, trace->retval, false, (void *)trace->func,
!!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
} else {
/*
* If the return function does not have a matching entry,
* then the entry was lost. Instead of just printing
* the '}' and letting the user guess what function this
* belongs to, write out the function name. Always do
* that if the funcgraph-tail option is enabled.
*/
if (func_match && !(flags & TRACE_GRAPH_PRINT_TAIL))
trace_seq_puts(s, "}\n");
else
trace_seq_printf(s, "} /* %ps */\n", (void *)trace->func);
}
tracing/function-graph-tracer: enhancements for the trace output Impact: enhance the output of the graph-tracer This patch applies some ideas of Ingo Molnar and Steven Rostedt. * Output leaf functions in one line with parenthesis, semicolon and duration output. * Add a second column (after cpu) for an overhead sign. if duration > 100 us, "!" if duration > 10 us, "+" else " " * Print output in us with remaining nanosec: u.n * Print duration on the right end, following the indentation of the functions. Use also visual clues: "-" on entry call (no duration to output) and "+" on return (duration output). The name of the tracer has been fixed as well: function-branch becomes function_branch. Here is an example of the new output: CPU[000] dequeue_entity() { - CPU[000] update_curr() { - CPU[000] update_min_vruntime(); + 0.512 us CPU[000] } + 1.504 us CPU[000] clear_buddies(); + 0.481 us CPU[000] update_min_vruntime(); + 0.504 us CPU[000] } + 4.557 us CPU[000] hrtick_update() { - CPU[000] hrtick_start_fair(); + 0.489 us CPU[000] } + 1.443 us CPU[000] + } + 14.655 us CPU[000] + } + 15.678 us CPU[000] + } + 16.686 us CPU[000] msecs_to_jiffies(); + 0.481 us CPU[000] put_prev_task_fair(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.482 us CPU[000] pick_next_task_rt(); + 0.504 us CPU[000] pick_next_task_fair(); + 0.481 us CPU[000] pick_next_task_idle(); + 0.489 us CPU[000] _spin_trylock(); + 0.655 us CPU[000] _spin_unlock(); + 0.609 us CPU[000] ------------8<---------- thread bash-2794 ------------8<---------- CPU[000] finish_task_switch() { - CPU[000] _spin_unlock_irq(); + 0.722 us CPU[000] } + 2.369 us CPU[000] ! } + 501972.605 us CPU[000] ! } + 501973.763 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.670 us CPU[000] _spin_unlock_irqrestore(); + 0.699 us CPU[000] copy_to_user() { - CPU[000] might_fault() { - CPU[000] __might_sleep(); + 0.503 us CPU[000] } + 1.632 us CPU[000] __copy_to_user_ll(); + 0.542 us CPU[000] } + 3.858 us CPU[000] tty_audit_add_data() { - CPU[000] _spin_lock_irq(); + 0.609 us CPU[000] _spin_unlock_irq(); + 0.624 us CPU[000] } + 3.196 us CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.625 us CPU[000] + } + 13.611 us CPU[000] copy_from_read_buf() { - CPU[000] _spin_lock_irqsave(); + 0.624 us CPU[000] _spin_unlock_irqrestore(); + 0.616 us CPU[000] } + 2.820 us CPU[000] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 00:46:33 +00:00
/* Overrun */
if (flags & TRACE_GRAPH_PRINT_OVERRUN)
trace_seq_printf(s, " (Overruns: %u)\n",
trace->overrun);
print_graph_irq(iter, trace->func, TRACE_GRAPH_RET,
cpu, pid, flags);
return trace_handle_return(s);
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
}
tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char *file, int line) { static unsigned long prev_jiffy; /* ratelimiting */ ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) | _reiserfs_free_block() { 0) | reiserfs_read_bitmap_block() { 0) | __bread() { 0) | __getblk() { 0) | __find_get_block() { 0) 0.698 us | mark_page_accessed(); 0) 2.267 us | } 0) | __might_sleep() { 0) | /* Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us | } 0) 5.872 us | } 0) 7.313 us | } 0) 8.718 us | } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us | pick_next_task_rt(); 0) 1.457 us | _spin_trylock(); 0) 0.653 us | _spin_unlock(); 0) 0.728 us | _spin_trylock(); 0) 0.631 us | _spin_unlock(); 0) 0.729 us | native_load_sp0(); 0) 0.593 us | native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) | finish_task_switch() { 0) 0.841 us | _spin_unlock_irq(); 0) 0.616 us | post_schedule_rt(); 0) 3.882 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 22:45:11 +00:00
static enum print_line_t
print_graph_comment(struct trace_seq *s, struct trace_entry *ent,
struct trace_iterator *iter, u32 flags)
tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char *file, int line) { static unsigned long prev_jiffy; /* ratelimiting */ ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) | _reiserfs_free_block() { 0) | reiserfs_read_bitmap_block() { 0) | __bread() { 0) | __getblk() { 0) | __find_get_block() { 0) 0.698 us | mark_page_accessed(); 0) 2.267 us | } 0) | __might_sleep() { 0) | /* Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us | } 0) 5.872 us | } 0) 7.313 us | } 0) 8.718 us | } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us | pick_next_task_rt(); 0) 1.457 us | _spin_trylock(); 0) 0.653 us | _spin_unlock(); 0) 0.728 us | _spin_trylock(); 0) 0.631 us | _spin_unlock(); 0) 0.729 us | native_load_sp0(); 0) 0.593 us | native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) | finish_task_switch() { 0) 0.841 us | _spin_unlock_irq(); 0) 0.616 us | post_schedule_rt(); 0) 3.882 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 22:45:11 +00:00
{
struct trace_array *tr = iter->tr;
unsigned long sym_flags = (tr->trace_flags & TRACE_ITER_SYM_MASK);
struct fgraph_data *data = iter->private;
struct trace_event *event;
int depth = 0;
tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char *file, int line) { static unsigned long prev_jiffy; /* ratelimiting */ ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) | _reiserfs_free_block() { 0) | reiserfs_read_bitmap_block() { 0) | __bread() { 0) | __getblk() { 0) | __find_get_block() { 0) 0.698 us | mark_page_accessed(); 0) 2.267 us | } 0) | __might_sleep() { 0) | /* Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us | } 0) 5.872 us | } 0) 7.313 us | } 0) 8.718 us | } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us | pick_next_task_rt(); 0) 1.457 us | _spin_trylock(); 0) 0.653 us | _spin_unlock(); 0) 0.728 us | _spin_trylock(); 0) 0.631 us | _spin_unlock(); 0) 0.729 us | native_load_sp0(); 0) 0.593 us | native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) | finish_task_switch() { 0) 0.841 us | _spin_unlock_irq(); 0) 0.616 us | post_schedule_rt(); 0) 3.882 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 22:45:11 +00:00
int ret;
int i;
if (data)
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
depth = per_cpu_ptr(data->cpu_data, iter->cpu)->depth;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
print_graph_prologue(iter, s, 0, 0, flags);
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
/* No time */
print_graph_duration(tr, 0, s, flags | FLAGS_FILL_FULL);
tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char *file, int line) { static unsigned long prev_jiffy; /* ratelimiting */ ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) | _reiserfs_free_block() { 0) | reiserfs_read_bitmap_block() { 0) | __bread() { 0) | __getblk() { 0) | __find_get_block() { 0) 0.698 us | mark_page_accessed(); 0) 2.267 us | } 0) | __might_sleep() { 0) | /* Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us | } 0) 5.872 us | } 0) 7.313 us | } 0) 8.718 us | } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us | pick_next_task_rt(); 0) 1.457 us | _spin_trylock(); 0) 0.653 us | _spin_unlock(); 0) 0.728 us | _spin_trylock(); 0) 0.631 us | _spin_unlock(); 0) 0.729 us | native_load_sp0(); 0) 0.593 us | native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) | finish_task_switch() { 0) 0.841 us | _spin_unlock_irq(); 0) 0.616 us | post_schedule_rt(); 0) 3.882 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 22:45:11 +00:00
/* Indentation */
if (depth > 0)
for (i = 0; i < (depth + 1) * TRACE_GRAPH_INDENT; i++)
trace_seq_putc(s, ' ');
tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char *file, int line) { static unsigned long prev_jiffy; /* ratelimiting */ ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) | _reiserfs_free_block() { 0) | reiserfs_read_bitmap_block() { 0) | __bread() { 0) | __getblk() { 0) | __find_get_block() { 0) 0.698 us | mark_page_accessed(); 0) 2.267 us | } 0) | __might_sleep() { 0) | /* Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us | } 0) 5.872 us | } 0) 7.313 us | } 0) 8.718 us | } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us | pick_next_task_rt(); 0) 1.457 us | _spin_trylock(); 0) 0.653 us | _spin_unlock(); 0) 0.728 us | _spin_trylock(); 0) 0.631 us | _spin_unlock(); 0) 0.729 us | native_load_sp0(); 0) 0.593 us | native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) | finish_task_switch() { 0) 0.841 us | _spin_unlock_irq(); 0) 0.616 us | post_schedule_rt(); 0) 3.882 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 22:45:11 +00:00
/* The comment */
trace_seq_puts(s, "/* ");
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06 16:21:49 +00:00
switch (iter->ent->type) {
function_graph: Handle TRACE_BPUTS in print_graph_comment It missed to handle TRACE_BPUTS so messages recorded by trace_bputs() will be shown with symbol info unnecessarily. You can see it with the trace_printk sample code: # cd /sys/kernel/tracing/ # echo sys_sync > set_graph_function # echo 1 > options/sym-offset # echo function_graph > current_tracer Note that the sys_sync filter was there to prevent recording other functions and the sym-offset option was needed since the first message was called from a module init function so kallsyms doesn't have the symbol and omitted in the output. # cd ~/build/kernel # insmod samples/trace_printk/trace-printk.ko # cd - # head trace Before: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) | /* 0xffffffffa0002000: This is a static string that will use trace_bputs */ 1) | /* This is a dynamic string that will use trace_puts */ 1) | /* trace_printk_irq_work+0x5/0x7b [trace_printk]: (irq) This is a static string that will use trace_bputs */ 1) | /* (irq) This is a dynamic string that will use trace_puts */ 1) | /* (irq) This is a static string that will use trace_bprintk() */ 1) | /* (irq) This is a dynamic string that will use trace_printk */ After: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) | /* This is a static string that will use trace_bputs */ 1) | /* This is a dynamic string that will use trace_puts */ 1) | /* (irq) This is a static string that will use trace_bputs */ 1) | /* (irq) This is a dynamic string that will use trace_puts */ 1) | /* (irq) This is a static string that will use trace_bprintk() */ 1) | /* (irq) This is a dynamic string that will use trace_printk */ Link: http://lkml.kernel.org/r/20160901024354.13720-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-01 02:43:54 +00:00
case TRACE_BPUTS:
ret = trace_print_bputs_msg_only(iter);
if (ret != TRACE_TYPE_HANDLED)
return ret;
break;
case TRACE_BPRINT:
ret = trace_print_bprintk_msg_only(iter);
if (ret != TRACE_TYPE_HANDLED)
return ret;
break;
case TRACE_PRINT:
ret = trace_print_printk_msg_only(iter);
if (ret != TRACE_TYPE_HANDLED)
return ret;
break;
default:
event = ftrace_find_event(ent->type);
if (!event)
return TRACE_TYPE_UNHANDLED;
ret = event->funcs->trace(iter, sym_flags, event);
if (ret != TRACE_TYPE_HANDLED)
return ret;
}
tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char *file, int line) { static unsigned long prev_jiffy; /* ratelimiting */ ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) | _reiserfs_free_block() { 0) | reiserfs_read_bitmap_block() { 0) | __bread() { 0) | __getblk() { 0) | __find_get_block() { 0) 0.698 us | mark_page_accessed(); 0) 2.267 us | } 0) | __might_sleep() { 0) | /* Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us | } 0) 5.872 us | } 0) 7.313 us | } 0) 8.718 us | } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us | pick_next_task_rt(); 0) 1.457 us | _spin_trylock(); 0) 0.653 us | _spin_unlock(); 0) 0.728 us | _spin_trylock(); 0) 0.631 us | _spin_unlock(); 0) 0.729 us | native_load_sp0(); 0) 0.593 us | native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) | finish_task_switch() { 0) 0.841 us | _spin_unlock_irq(); 0) 0.616 us | post_schedule_rt(); 0) 3.882 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 22:45:11 +00:00
if (trace_seq_has_overflowed(s))
goto out;
/* Strip ending newline */
if (s->buffer[s->seq.len - 1] == '\n') {
s->buffer[s->seq.len - 1] = '\0';
s->seq.len--;
}
trace_seq_puts(s, " */\n");
out:
return trace_handle_return(s);
tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char *file, int line) { static unsigned long prev_jiffy; /* ratelimiting */ ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) | _reiserfs_free_block() { 0) | reiserfs_read_bitmap_block() { 0) | __bread() { 0) | __getblk() { 0) | __find_get_block() { 0) 0.698 us | mark_page_accessed(); 0) 2.267 us | } 0) | __might_sleep() { 0) | /* Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us | } 0) 5.872 us | } 0) 7.313 us | } 0) 8.718 us | } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us | pick_next_task_rt(); 0) 1.457 us | _spin_trylock(); 0) 0.653 us | _spin_unlock(); 0) 0.728 us | _spin_trylock(); 0) 0.631 us | _spin_unlock(); 0) 0.729 us | native_load_sp0(); 0) 0.593 us | native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) | finish_task_switch() { 0) 0.841 us | _spin_unlock_irq(); 0) 0.616 us | post_schedule_rt(); 0) 3.882 us | } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 22:45:11 +00:00
}
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
enum print_line_t
tracing, function_graph: Remove dependency of abstime and duration fields on latency The display of absolute time and duration fields is based on the latency field. This was added during the irqsoff/wakeup tracers graph support changes. It's causing confusion in what fields will be displayed for the function_graph tracer itself. So I'm removing this depency, and adding absolute time and duration fields to the preemptirqsoff preemptoff irqsoff wakeup tracers. With following commands: # echo function_graph > ./current_tracer # cat trace This is what it looked like before: # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 0) 0.068 us | } /* page_add_file_rmap */ 0) | _raw_spin_unlock() { ... This is what it looks like now: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) 0.068 us | } /* add_preempt_count */ 0) 0.993 us | } /* vfsmount_lock_local_lock */ ... For preemptirqsoff preemptoff irqsoff wakeup tracers, this is what it looked like before: SNIP # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> lock-depth # |||| / # CPU TASK/PID ||||| DURATION FUNCTION CALLS # | | | ||||| | | | | | | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... This is what it looks like now: SNIP # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 19.847735 | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-03 14:58:47 +00:00
print_graph_function_flags(struct trace_iterator *iter, u32 flags)
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
{
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
struct ftrace_graph_ent_entry *field;
struct fgraph_data *data = iter->private;
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
struct trace_entry *entry = iter->ent;
struct trace_seq *s = &iter->seq;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
int cpu = iter->cpu;
int ret;
if (data && per_cpu_ptr(data->cpu_data, cpu)->ignore) {
per_cpu_ptr(data->cpu_data, cpu)->ignore = 0;
return TRACE_TYPE_HANDLED;
}
/*
* If the last output failed, there's a possibility we need
* to print out the missing entry which would never go out.
*/
if (data && data->failed) {
field = &data->ent;
iter->cpu = data->cpu;
ret = print_graph_entry(field, s, iter, flags);
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
if (ret == TRACE_TYPE_HANDLED && iter->cpu != cpu) {
per_cpu_ptr(data->cpu_data, iter->cpu)->ignore = 1;
ret = TRACE_TYPE_NO_CONSUME;
}
iter->cpu = cpu;
return ret;
}
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
switch (entry->type) {
case TRACE_GRAPH_ENT: {
/*
* print_graph_entry() may consume the current event,
* thus @field may become invalid, so we need to save it.
* sizeof(struct ftrace_graph_ent_entry) is very small,
* it can be safely saved at the stack.
*/
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
struct ftrace_graph_ent_entry saved;
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
trace_assign_type(field, entry);
saved = *field;
return print_graph_entry(&saved, s, iter, flags);
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
}
case TRACE_GRAPH_RET: {
struct ftrace_graph_ret_entry *field;
trace_assign_type(field, entry);
return print_graph_return(&field->ret, s, entry, iter, flags);
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
}
case TRACE_STACK:
case TRACE_FN:
/* dont trace stack and functions as comments */
return TRACE_TYPE_UNHANDLED;
tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 23:57:25 +00:00
default:
return print_graph_comment(s, entry, iter, flags);
}
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
print_graph_function(struct trace_iterator *iter)
{
tracing, function_graph: Remove dependency of abstime and duration fields on latency The display of absolute time and duration fields is based on the latency field. This was added during the irqsoff/wakeup tracers graph support changes. It's causing confusion in what fields will be displayed for the function_graph tracer itself. So I'm removing this depency, and adding absolute time and duration fields to the preemptirqsoff preemptoff irqsoff wakeup tracers. With following commands: # echo function_graph > ./current_tracer # cat trace This is what it looked like before: # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 0) 0.068 us | } /* page_add_file_rmap */ 0) | _raw_spin_unlock() { ... This is what it looks like now: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) 0.068 us | } /* add_preempt_count */ 0) 0.993 us | } /* vfsmount_lock_local_lock */ ... For preemptirqsoff preemptoff irqsoff wakeup tracers, this is what it looked like before: SNIP # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> lock-depth # |||| / # CPU TASK/PID ||||| DURATION FUNCTION CALLS # | | | ||||| | | | | | | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... This is what it looks like now: SNIP # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 19.847735 | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-03 14:58:47 +00:00
return print_graph_function_flags(iter, tracer_flags.val);
}
static enum print_line_t
print_graph_function_event(struct trace_iterator *iter, int flags,
struct trace_event *event)
{
return print_graph_function(iter);
}
static void print_lat_header(struct seq_file *s, u32 flags)
{
static const char spaces[] = " " /* 16 spaces */
" " /* 4 spaces */
" "; /* 17 spaces */
int size = 0;
if (flags & TRACE_GRAPH_PRINT_ABS_TIME)
size += 16;
function_graph: Support displaying relative timestamp When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:10 +00:00
if (flags & TRACE_GRAPH_PRINT_REL_TIME)
size += 16;
if (flags & TRACE_GRAPH_PRINT_CPU)
size += 4;
if (flags & TRACE_GRAPH_PRINT_PROC)
size += 17;
seq_printf(s, "#%.*s _-----=> irqs-off \n", size, spaces);
seq_printf(s, "#%.*s / _----=> need-resched \n", size, spaces);
seq_printf(s, "#%.*s| / _---=> hardirq/softirq \n", size, spaces);
seq_printf(s, "#%.*s|| / _--=> preempt-depth \n", size, spaces);
tracing, function_graph: Remove lock-depth from latency trace The lock_depth was removed in commit e6e1e25 tracing: Remove lock_depth from event entry Removing the lock_depth info from function_graph latency header. With following commands: # echo function_graph > ./current_tracer # echo 1 > options/latency-format # cat trace This is what it looked like before: # tracer: function_graph # # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+ # -------------------------------------------------------------------- # latency: 0 us, #59756/311298, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> lock-depth # |||| / # CPU||||| DURATION FUNCTION CALLS # | ||||| | | | | | | 0) .... 0.068 us | } /* __rcu_read_unlock */ ... This is what it looks like now: # tracer: function_graph # # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+ # -------------------------------------------------------------------- # latency: 0 us, #59747/1744610, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # CPU|||| DURATION FUNCTION CALLS # | |||| | | | | | | 0) ..s. 1.641 us | } /* __rcu_process_callbacks */ ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1307113131-10045-5-git-send-email-jolsa@redhat.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-03 14:58:50 +00:00
seq_printf(s, "#%.*s||| / \n", size, spaces);
}
static void __print_graph_headers_flags(struct trace_array *tr,
struct seq_file *s, u32 flags)
{
int lat = tr->trace_flags & TRACE_ITER_LATENCY_FMT;
if (lat)
print_lat_header(s, flags);
/* 1st line */
seq_putc(s, '#');
if (flags & TRACE_GRAPH_PRINT_ABS_TIME)
seq_puts(s, " TIME ");
function_graph: Support displaying relative timestamp When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:10 +00:00
if (flags & TRACE_GRAPH_PRINT_REL_TIME)
seq_puts(s, " REL TIME ");
if (flags & TRACE_GRAPH_PRINT_CPU)
seq_puts(s, " CPU");
if (flags & TRACE_GRAPH_PRINT_PROC)
seq_puts(s, " TASK/PID ");
if (lat)
tracing: Put a margin between flags and duration for wakeup tracers Don't mix context flags with function duration info. Instead of this: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNh5 | /* 0:120:R + [000] 11: 0:R migration/0 */ 2 us | 0) <idle>-0 | dNh5 0.000 us | (null)(); 4 us | 0) <idle>-0 | dNh4 | _raw_spin_unlock() { 4 us | 0) <idle>-0 | dNh4 0.304 us | preempt_count_sub(); 5 us | 0) <idle>-0 | dNh3 1.063 us | } 5 us | 0) <idle>-0 | dNh3 0.266 us | ttwu_stat(); 6 us | 0) <idle>-0 | dNh3 | _raw_spin_unlock_irqrestore() { 6 us | 0) <idle>-0 | dNh3 0.273 us | preempt_count_sub(); 6 us | 0) <idle>-0 | dNh2 0.818 us | } Show this: # tracer: wakeup # # wakeup latency trace v1.1.5 on 4.20.0+ # -------------------------------------------------------------------- # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */ 3 us | 0) <idle>-0 | dNs. | 0.000 us | (null)(); 67 us | 0) <idle>-0 | dNs. | 0.721 us | ttwu_stat(); 69 us | 0) <idle>-0 | dNs. | 0.607 us | _raw_spin_unlock_irqrestore(); 71 us | 0) <idle>-0 | .Ns. | 0.598 us | _raw_spin_lock_irq(); 72 us | 0) <idle>-0 | .Ns. | 0.584 us | _raw_spin_lock_irq(); 73 us | 0) <idle>-0 | dNs. | + 11.118 us | __next_timer_interrupt(); 75 us | 0) <idle>-0 | dNs. | | call_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | delayed_work_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | __queue_work() { ... Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:12 +00:00
seq_puts(s, "|||| ");
if (flags & TRACE_GRAPH_PRINT_DURATION)
seq_puts(s, " DURATION ");
seq_puts(s, " FUNCTION CALLS\n");
/* 2nd line */
seq_putc(s, '#');
if (flags & TRACE_GRAPH_PRINT_ABS_TIME)
seq_puts(s, " | ");
function_graph: Support displaying relative timestamp When function_graph is used for latency tracers, relative timestamp is more straightforward than absolute timestamp as function trace does. This change adds relative timestamp support to function_graph and applies to latency tracers (wakeup and irqsoff). Instead of: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test # -------------------------------------------------------------------- # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule(); 124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() { 124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs(); 124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs(); 124.974310 | 2) systemd-693 | d..1 2.301 us | } [..] 124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() { 124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 2d..2 552us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Show: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: __schedule # => ended at: _raw_spin_unlock_irq # # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 7) sshd-1704 | d..1 0.000 us | __schedule(); 1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() { 1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs(); 2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs(); 3 us | 7) sshd-1704 | d..1 2.599 us | } [..] 509 us | 7) <idle>-0 | d..2 | finish_task_switch() { 510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() { 510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq(); 512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on(); <idle>-0 7d..2 543us : <stack trace> => __schedule => schedule_idle => do_idle => cpu_startup_entry => start_secondary => secondary_startup_64 Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:10 +00:00
if (flags & TRACE_GRAPH_PRINT_REL_TIME)
seq_puts(s, " | ");
if (flags & TRACE_GRAPH_PRINT_CPU)
seq_puts(s, " | ");
if (flags & TRACE_GRAPH_PRINT_PROC)
seq_puts(s, " | | ");
if (lat)
tracing: Put a margin between flags and duration for wakeup tracers Don't mix context flags with function duration info. Instead of this: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+ # -------------------------------------------------------------------- # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8) # ----------------- # | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNh5 | /* 0:120:R + [000] 11: 0:R migration/0 */ 2 us | 0) <idle>-0 | dNh5 0.000 us | (null)(); 4 us | 0) <idle>-0 | dNh4 | _raw_spin_unlock() { 4 us | 0) <idle>-0 | dNh4 0.304 us | preempt_count_sub(); 5 us | 0) <idle>-0 | dNh3 1.063 us | } 5 us | 0) <idle>-0 | dNh3 0.266 us | ttwu_stat(); 6 us | 0) <idle>-0 | dNh3 | _raw_spin_unlock_irqrestore() { 6 us | 0) <idle>-0 | dNh3 0.273 us | preempt_count_sub(); 6 us | 0) <idle>-0 | dNh2 0.818 us | } Show this: # tracer: wakeup # # wakeup latency trace v1.1.5 on 4.20.0+ # -------------------------------------------------------------------- # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 0 us | 0) <idle>-0 | dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */ 3 us | 0) <idle>-0 | dNs. | 0.000 us | (null)(); 67 us | 0) <idle>-0 | dNs. | 0.721 us | ttwu_stat(); 69 us | 0) <idle>-0 | dNs. | 0.607 us | _raw_spin_unlock_irqrestore(); 71 us | 0) <idle>-0 | .Ns. | 0.598 us | _raw_spin_lock_irq(); 72 us | 0) <idle>-0 | .Ns. | 0.584 us | _raw_spin_lock_irq(); 73 us | 0) <idle>-0 | dNs. | + 11.118 us | __next_timer_interrupt(); 75 us | 0) <idle>-0 | dNs. | | call_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | delayed_work_timer_fn() { 76 us | 0) <idle>-0 | dNs. | | __queue_work() { ... Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-01-01 15:46:12 +00:00
seq_puts(s, "|||| ");
if (flags & TRACE_GRAPH_PRINT_DURATION)
seq_puts(s, " | | ");
seq_puts(s, " | | | |\n");
}
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
static void print_graph_headers(struct seq_file *s)
{
print_graph_headers_flags(s, tracer_flags.val);
}
void print_graph_headers_flags(struct seq_file *s, u32 flags)
{
struct trace_iterator *iter = s->private;
struct trace_array *tr = iter->tr;
if (!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO))
return;
if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) {
/* print nothing if the buffers are empty */
if (trace_empty(iter))
return;
print_trace_header(s, iter);
tracing, function_graph: Remove dependency of abstime and duration fields on latency The display of absolute time and duration fields is based on the latency field. This was added during the irqsoff/wakeup tracers graph support changes. It's causing confusion in what fields will be displayed for the function_graph tracer itself. So I'm removing this depency, and adding absolute time and duration fields to the preemptirqsoff preemptoff irqsoff wakeup tracers. With following commands: # echo function_graph > ./current_tracer # cat trace This is what it looked like before: # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 0) 0.068 us | } /* page_add_file_rmap */ 0) | _raw_spin_unlock() { ... This is what it looks like now: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) 0.068 us | } /* add_preempt_count */ 0) 0.993 us | } /* vfsmount_lock_local_lock */ ... For preemptirqsoff preemptoff irqsoff wakeup tracers, this is what it looked like before: SNIP # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> lock-depth # |||| / # CPU TASK/PID ||||| DURATION FUNCTION CALLS # | | | ||||| | | | | | | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... This is what it looks like now: SNIP # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 19.847735 | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-03 14:58:47 +00:00
}
__print_graph_headers_flags(tr, s, flags);
}
void graph_trace_open(struct trace_iterator *iter)
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
{
/* pid and depth on the last trace processed */
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
struct fgraph_data *data;
tracing: Handle ftrace_dump() atomic context in graph_trace_open() graph_trace_open() can be called in atomic context from ftrace_dump(). Use GFP_ATOMIC for the memory allocations when that's the case, in order to avoid the following splat. BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0 Backtrace: .. [<8004dc94>] (__might_sleep) from [<801371f4>] (kmem_cache_alloc_trace+0x160/0x238) r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0 [<80137094>] (kmem_cache_alloc_trace) from [<800cbd60>] (graph_trace_open+0x30/0xd0) r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8 r4:810d16f0 [<800cbd30>] (graph_trace_open) from [<800c79c4>] (trace_init_global_iter+0x50/0x9c) r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30 [<800c7974>] (trace_init_global_iter) from [<800c7aa0>] (ftrace_dump+0x90/0x2ec) r4:810d2580 r3:00000000 [<800c7a10>] (ftrace_dump) from [<80414b2c>] (sysrq_ftrace_dump+0x1c/0x20) r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a r4:808d5394 [<80414b10>] (sysrq_ftrace_dump) from [<800169b8>] (return_to_handler+0x0/0x18) [<80415498>] (__handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210 [<80415668>] (handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) [<8042a760>] (pl011_int) from [<800169b8>] (return_to_handler+0x0/0x18) r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60 r4:8454fe00 [<8007782c>] (handle_irq_event_percpu) from [<80077b44>] (handle_irq_event+0x4c/0x6c) r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60 r4:84f01e00 [<80077af8>] (handle_irq_event) from [<8007aa28>] (handle_fasteoi_irq+0xf0/0x1ac) r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000 [<8007a938>] (handle_fasteoi_irq) from [<80076dc0>] (generic_handle_irq+0x3c/0x4c) r6:00000026 r5:00000000 r4:00000026 r3:8007a938 [<80076d84>] (generic_handle_irq) from [<80077128>] (__handle_domain_irq+0x8c/0xfc) r4:808c1e38 r3:0000002e [<8007709c>] (__handle_domain_irq) from [<800087b8>] (gic_handle_irq+0x34/0x6c) r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015 r4:8880210c r3:808c7ef0 [<80008784>] (gic_handle_irq) from [<80014044>] (__irq_svc+0x44/0x7c) Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-13 20:30:12 +00:00
gfp_t gfpflags;
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
int cpu;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
iter->private = NULL;
tracing: Handle ftrace_dump() atomic context in graph_trace_open() graph_trace_open() can be called in atomic context from ftrace_dump(). Use GFP_ATOMIC for the memory allocations when that's the case, in order to avoid the following splat. BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0 Backtrace: .. [<8004dc94>] (__might_sleep) from [<801371f4>] (kmem_cache_alloc_trace+0x160/0x238) r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0 [<80137094>] (kmem_cache_alloc_trace) from [<800cbd60>] (graph_trace_open+0x30/0xd0) r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8 r4:810d16f0 [<800cbd30>] (graph_trace_open) from [<800c79c4>] (trace_init_global_iter+0x50/0x9c) r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30 [<800c7974>] (trace_init_global_iter) from [<800c7aa0>] (ftrace_dump+0x90/0x2ec) r4:810d2580 r3:00000000 [<800c7a10>] (ftrace_dump) from [<80414b2c>] (sysrq_ftrace_dump+0x1c/0x20) r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a r4:808d5394 [<80414b10>] (sysrq_ftrace_dump) from [<800169b8>] (return_to_handler+0x0/0x18) [<80415498>] (__handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210 [<80415668>] (handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) [<8042a760>] (pl011_int) from [<800169b8>] (return_to_handler+0x0/0x18) r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60 r4:8454fe00 [<8007782c>] (handle_irq_event_percpu) from [<80077b44>] (handle_irq_event+0x4c/0x6c) r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60 r4:84f01e00 [<80077af8>] (handle_irq_event) from [<8007aa28>] (handle_fasteoi_irq+0xf0/0x1ac) r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000 [<8007a938>] (handle_fasteoi_irq) from [<80076dc0>] (generic_handle_irq+0x3c/0x4c) r6:00000026 r5:00000000 r4:00000026 r3:8007a938 [<80076d84>] (generic_handle_irq) from [<80077128>] (__handle_domain_irq+0x8c/0xfc) r4:808c1e38 r3:0000002e [<8007709c>] (__handle_domain_irq) from [<800087b8>] (gic_handle_irq+0x34/0x6c) r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015 r4:8880210c r3:808c7ef0 [<80008784>] (gic_handle_irq) from [<80014044>] (__irq_svc+0x44/0x7c) Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-13 20:30:12 +00:00
/* We can be called in atomic context via ftrace_dump() */
gfpflags = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL;
data = kzalloc(sizeof(*data), gfpflags);
if (!data)
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
goto out_err;
tracing: Handle ftrace_dump() atomic context in graph_trace_open() graph_trace_open() can be called in atomic context from ftrace_dump(). Use GFP_ATOMIC for the memory allocations when that's the case, in order to avoid the following splat. BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0 Backtrace: .. [<8004dc94>] (__might_sleep) from [<801371f4>] (kmem_cache_alloc_trace+0x160/0x238) r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0 [<80137094>] (kmem_cache_alloc_trace) from [<800cbd60>] (graph_trace_open+0x30/0xd0) r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8 r4:810d16f0 [<800cbd30>] (graph_trace_open) from [<800c79c4>] (trace_init_global_iter+0x50/0x9c) r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30 [<800c7974>] (trace_init_global_iter) from [<800c7aa0>] (ftrace_dump+0x90/0x2ec) r4:810d2580 r3:00000000 [<800c7a10>] (ftrace_dump) from [<80414b2c>] (sysrq_ftrace_dump+0x1c/0x20) r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a r4:808d5394 [<80414b10>] (sysrq_ftrace_dump) from [<800169b8>] (return_to_handler+0x0/0x18) [<80415498>] (__handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210 [<80415668>] (handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) [<8042a760>] (pl011_int) from [<800169b8>] (return_to_handler+0x0/0x18) r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60 r4:8454fe00 [<8007782c>] (handle_irq_event_percpu) from [<80077b44>] (handle_irq_event+0x4c/0x6c) r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60 r4:84f01e00 [<80077af8>] (handle_irq_event) from [<8007aa28>] (handle_fasteoi_irq+0xf0/0x1ac) r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000 [<8007a938>] (handle_fasteoi_irq) from [<80076dc0>] (generic_handle_irq+0x3c/0x4c) r6:00000026 r5:00000000 r4:00000026 r3:8007a938 [<80076d84>] (generic_handle_irq) from [<80077128>] (__handle_domain_irq+0x8c/0xfc) r4:808c1e38 r3:0000002e [<8007709c>] (__handle_domain_irq) from [<800087b8>] (gic_handle_irq+0x34/0x6c) r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015 r4:8880210c r3:808c7ef0 [<80008784>] (gic_handle_irq) from [<80014044>] (__irq_svc+0x44/0x7c) Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-13 20:30:12 +00:00
data->cpu_data = alloc_percpu_gfp(struct fgraph_cpu_data, gfpflags);
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
if (!data->cpu_data)
goto out_err_free;
for_each_possible_cpu(cpu) {
pid_t *pid = &(per_cpu_ptr(data->cpu_data, cpu)->last_pid);
int *depth = &(per_cpu_ptr(data->cpu_data, cpu)->depth);
int *ignore = &(per_cpu_ptr(data->cpu_data, cpu)->ignore);
int *depth_irq = &(per_cpu_ptr(data->cpu_data, cpu)->depth_irq);
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
*pid = -1;
*depth = 0;
*ignore = 0;
*depth_irq = -1;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
}
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
iter->private = data;
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
return;
out_err_free:
kfree(data);
out_err:
pr_warn("function graph tracer: not enough memory\n");
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
}
void graph_trace_close(struct trace_iterator *iter)
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
{
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
struct fgraph_data *data = iter->private;
if (data) {
free_percpu(data->cpu_data);
kfree(data);
}
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
}
static int
func_graph_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set)
{
if (bit == TRACE_GRAPH_PRINT_IRQS)
ftrace_graph_skip_irqs = !set;
if (bit == TRACE_GRAPH_SLEEP_TIME)
ftrace_graph_sleep_time_control(set);
if (bit == TRACE_GRAPH_GRAPH_TIME)
ftrace_graph_graph_time_control(set);
return 0;
}
static struct trace_event_functions graph_functions = {
.trace = print_graph_function_event,
};
static struct trace_event graph_trace_entry_event = {
.type = TRACE_GRAPH_ENT,
.funcs = &graph_functions,
};
static struct trace_event graph_trace_ret_event = {
.type = TRACE_GRAPH_RET,
.funcs = &graph_functions
};
static struct tracer graph_trace __tracer_data = {
.name = "function_graph",
.update_thresh = graph_trace_update_thresh,
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
.open = graph_trace_open,
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
.pipe_open = graph_trace_open,
tracing/function-graph-tracer: various fixes and features This patch brings various bugfixes: - Drop the first irrelevant task switch on the very beginning of a trace. - Drop the OVERHEAD word from the headers, the DURATION word is sufficient and will not overlap other columns. - Make the headers fit well their respective columns whatever the selected options. Ie, default options: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) 0.646 us | } 1) | mem_cgroup_del_lru_list() { 1) 0.624 us | lookup_page_cgroup(); 1) 1.970 us | } echo funcgraph-proc > trace_options # tracer: function_graph # # CPU TASK/PID DURATION FUNCTION CALLS # | | | | | | | | | 0) bash-2937 | 0.895 us | } 0) bash-2937 | 0.888 us | __rcu_read_unlock(); 0) bash-2937 | 0.864 us | conv_uni_to_pc(); 0) bash-2937 | 1.015 us | __rcu_read_lock(); echo nofuncgraph-cpu > trace_options echo nofuncgraph-proc > trace_options # tracer: function_graph # # DURATION FUNCTION CALLS # | | | | | | 3.752 us | native_pud_val(); 0.616 us | native_pud_val(); 0.624 us | native_pmd_val(); About features, one can now disable the duration (this will hide the overhead too for convenient reasons and because on doesn't need overhead if it hasn't the duration): echo nofuncgraph-duration > trace_options # tracer: function_graph # # FUNCTION CALLS # | | | | cap_vm_enough_memory() { __vm_enough_memory() { vm_acct_memory(); } } } And at last, an option to print the absolute time: //Restart from default options echo funcgraph-abstime > trace_options # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 261.339774 | 1) + 42.823 us | } 261.339775 | 1) 1.045 us | _spin_lock_irq(); 261.339777 | 1) 0.940 us | _spin_lock_irqsave(); 261.339778 | 1) 0.752 us | _spin_unlock_irqrestore(); 261.339780 | 1) 0.857 us | _spin_unlock_irq(); 261.339782 | 1) | flush_to_ldisc() { 261.339783 | 1) | tty_ldisc_ref() { 261.339783 | 1) | tty_ldisc_try() { 261.339784 | 1) 1.075 us | _spin_lock_irqsave(); 261.339786 | 1) 0.842 us | _spin_unlock_irqrestore(); 261.339788 | 1) 4.211 us | } 261.339788 | 1) 5.662 us | } The format is seconds.usecs. I guess no one needs the nanosec precision here, the main goal is to have an overview about the general timings of events, and to see the place when the trace switches from one cpu to another. ie: 274.874760 | 1) 0.676 us | _spin_unlock(); 274.874762 | 1) 0.609 us | native_load_sp0(); 274.874763 | 1) 0.602 us | native_load_tls(); 274.878739 | 0) 0.722 us | } 274.878740 | 0) 0.714 us | native_pmd_val(); 274.878741 | 0) 0.730 us | native_pmd_val(); Here there is a 4000 usecs difference when we switch the cpu. Changes in V2: - Completely fix the first pointless task switch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 01:04:53 +00:00
.close = graph_trace_close,
tracing: Fix function graph trace_pipe to properly display failed entries There is a case where the graph tracer might get confused and omits displaying of a single record. This applies mostly with the trace_pipe since it is unlikely that the trace_seq buffer will overflow with the trace file. As the function_graph tracer goes through the trace entries keeping a pointer to the current record: current -> func1 ENTRY func2 ENTRY func2 RETURN func1 RETURN When an function ENTRY is encountered, it moves the pointer to the next entry to check if the function is a nested or leaf function. func1 ENTRY current -> func2 ENTRY func2 RETURN func1 RETURN If the rest of the writing of the function fills the trace_seq buffer, then the trace_pipe read will ignore this entry. The next read will Now start at the current location, but the first entry (func1) will be discarded. This patch keeps a copy of the current entry in the iterator private storage and will keep track of when the trace_seq buffer fills. When the trace_seq buffer fills, it will reuse the copy of the entry in the next iteration. [ This patch has been largely modified by Steven Rostedt in order to clean it up and simplify it. The original idea and concept was from Jirka and for that, this patch will go under his name to give him the credit he deserves. But because this was modify by Steven Rostedt anything wrong with the patch should be blamed on Steven. ] Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-11-24 12:57:38 +00:00
.pipe_close = graph_trace_close,
.init = graph_trace_init,
.reset = graph_trace_reset,
.print_line = print_graph_function,
.print_header = print_graph_headers,
.flags = &tracer_flags,
.set_flag = func_graph_set_flag,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_function_graph,
#endif
};
static ssize_t
graph_depth_write(struct file *filp, const char __user *ubuf, size_t cnt,
loff_t *ppos)
{
unsigned long val;
int ret;
ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
if (ret)
return ret;
fgraph_max_depth = val;
*ppos += cnt;
return cnt;
}
static ssize_t
graph_depth_read(struct file *filp, char __user *ubuf, size_t cnt,
loff_t *ppos)
{
char buf[15]; /* More than enough to hold UINT_MAX + "\n"*/
int n;
n = sprintf(buf, "%d\n", fgraph_max_depth);
return simple_read_from_buffer(ubuf, cnt, ppos, buf, n);
}
static const struct file_operations graph_depth_fops = {
.open = tracing_open_generic,
.write = graph_depth_write,
.read = graph_depth_read,
.llseek = generic_file_llseek,
};
static __init int init_graph_tracefs(void)
{
int ret;
ret = tracing_init_dentry();
if (ret)
return 0;
trace_create_file("max_graph_depth", TRACE_MODE_WRITE, NULL,
NULL, &graph_depth_fops);
return 0;
}
fs_initcall(init_graph_tracefs);
static __init int init_graph_trace(void)
{
max_bytes_for_cpu = snprintf(NULL, 0, "%u", nr_cpu_ids - 1);
if (!register_trace_event(&graph_trace_entry_event)) {
pr_warn("Warning: could not register graph trace events\n");
return 1;
}
if (!register_trace_event(&graph_trace_ret_event)) {
pr_warn("Warning: could not register graph trace events\n");
return 1;
}
return register_tracer(&graph_trace);
}
core_initcall(init_graph_trace);